Ask AI

dagster_cloud.yaml#

This reference is applicable to Dagster+.
Namedagster_cloud.yaml
StatusActive
RequiredRequired for Dagster+
Description Similar to the workspace.yaml in open source to define code locations for Dagster+.
Uses Defines multiple code locations for Dagster+. For Hybrid deployments, this file can be used to manage environment variables/secrets.

    File location#

    The dagster_cloud.yaml file should be placed in the root of your Dagster project. Below is an example of a file structure modified from the Dagster+ ETL quickstart.

    quickstart_etl
    ├── README.md
    ├── quickstart_etl
    │   ├── __init__.py
    │   ├── assets
    │   ├── docker_image
    ├── ml_project
    │   ├── quickstart_ml
    │     ├── __init__.py
    │     ├── ml_assets
    ├── random_assets.py
    ├── quickstart_etl_tests
    ├── dagster_cloud.yaml
    ├── pyproject.toml
    ├── setup.cfg
    └── setup.py
    

    If your repository contains multiple Dagster projects in subdirectories - otherwise known as a monorepository - add the dagster_cloud.yaml file to the root of where the Dagster projects are stored.


    File structure#

    Settings are formatted using YAML. For example, using the file structure above as an example:

    # dagster_cloud.yaml
    
    locations:
      - location_name: data-eng-pipeline
        code_source:
          package_name: quickstart_etl
        build:
          directory: ./quickstart_etl
          registry: localhost:5000/docker_image
      - location_name: ml-pipeline
        code_source:
          package_name: quickstart_ml
        working_directory: ./ml_project
        executable_path: venvs/path/to/ml_tensorflow/bin/python
      - location_name: my_random_assets
        code_source:
          python_file: random_assets.py
        container_context:
          k8s:
            env_vars:
              - database_name
              - database_username=hooli_testing
            env_secrets:
              - database_password
    

    Settings#

    The dagster_cloud.yaml file contains a single top-level key, locations. This key accepts a list of code locations; for each code location, you can configure the following:

    Location name#

    This key is required. The location_name key specifies the name of the code location. The location name will always be paired with a code source.

    # dagster_cloud.yaml
    
    locations:
      - location_name: data-eng-pipeline
        code_source:
          package_name: quickstart_etl
    
    PropertyDescription
    location_nameThe name of your code location that will appear in the Dagster UI Code locations page
    • Format - string

    Code source#

    This section is required. The code_source defines how a code location is sourced.

    A code_source key must contain either a module_name, package_name, or file_name parameter that specifies where to find the definitions in the code location.

    # dagster_cloud.yaml
    
    locations:
      - location_name: data-eng-pipeline
        code_source:
          package_name: quickstart_etl
    
    PropertyDescription
    code_source.package_nameThe name of a package containing Dagster code
    • Format - string(folder name)
    code_source.python_fileThe name of a Python file containing Dagster code, for example:analytics_pipeline.py
    • Format - string(.py file name)
    code_source.module_nameThe name of a Python module containing Dagster code, for example:analytics_etl
    • Format - string(module name)

    Working directory#

    Use the working_directory setting to load Dagster code from a different directory than the root of your code repository. This setting allows you to specify the directory you want to load your code from.

    Consider the following project:

    quickstart_etl
    ├── README.md
    ├── project_directory
    │   ├── quickstart_etl
    │     ├── __init__.py
    │     ├── assets
    │   ├── quickstart_etl_tests
    ├── dagster_cloud.yaml
    ├── pyproject.toml
    ├── setup.cfg
    └── setup.py
    

    To load from /project_directory, the dagster_cloud.yaml code location would look like this:

    # dagster_cloud.yaml
    
    locations:
      - location_name: data-eng-pipeline
        code_source:
          package_name: quickstart_etl
        working_directory: ./project_directory
    
    PropertyDescription
    working_directoryThe path of the directory that Dagster should load the code source from
    • Format - string(path)

    Build#

    The build section contains two parameters:

    • directory - Setting a build directory is useful if your setup.py or requirements.txt is in a subdirectory instead of the project root. This is common if you have multiple Python modules within a single Dagster project.
    • registry - Applicable only to Hybrid deployments. Specifies the Docker registry to push the code location to.

    In the example below, the Docker image for the code location is in the root directory and the registry and image defined:

    # dagster_cloud.yaml
    
    locations:
      - location_name: data-eng-pipeline
        code_source:
          package_name: quickstart_etl
        build:
          directory: ./
          registry: your-docker-image-registry/image-name # e.g. localhost:5000/myimage
    
    PropertyDescription
    build.directoryThe path to the directory in your project that you want to deploy. If there are subdirectories, you can specify the path to only deploy a specific project directory.
    • Format - string(path)
    • Default - .
    build.registryApplicable to Hybrid deployments. The Docker registry to push your code location to
    • Format - string(docker registry)

    Python executable#

    For Dagster+ Hybrid deployments, the Python executable that is installed globally in the image, or the default Python executable on the local system if you use the local agent, will be used. To use a different Python executable, specify it using the executable_path setting. It can be useful to have different Python executables for different code locations.

    For Dagster+ Serverless deployments, you can specify a different Python version by following these instructions.

    # dagster_cloud.yaml
    
    locations:
      - location_name: data-eng-pipeline
        code_source:
          package_name: quickstart_etl
        executable_path: venvs/path/to/dataengineering_spark_team/bin/python
      - location_name: machine_learning
        code_source:
          python_file: ml_model.py
        executable_path: venvs/path/to/ml_tensorflow/bin/python
    
    PropertyDescription
    executable_pathThe file path of the Python executable to use
    • Format - string(path)

    Container context#

    If using Hybrid deployment, you can define additional configuration options for code locations using the container_context parameter. Depending on the Hybrid agent you're using, the configuration settings under container_context will vary.

    Refer to the configuration reference for your agent for more info: