Federating execution between Airflow instances with Dagster
This tutorial demonstrates using dagster-airlift
to observe DAGs from multiple Airflow instances, and federate execution between them using Dagster as a centralized control plane.
Using dagster-airlift
we can
- Observe Airflow DAGs and their execution history
- Directly trigger Airflow DAGs from Dagster
- Set up federated execution across Airflow instances
All of this can be done with no changes to Airflow code.
Overview
This tutorial will take you through an imaginary data platform team that has the following scenario:
- An Airflow instance
warehouse
, run by another team, that is responsible for loading data into a data warehouse. - An Airflow instance
metrics
, run by the data platform team, that deploys all the metrics constructed by data scientists on top of the data warehouse.
Two DAGs have been causing a lot of pain lately for the team: warehouse.load_customers
and metrics.customer_metrics
. The warehouse.load_customers
DAG is responsible for loading customer data into the data warehouse, and the metrics.customer_metrics
DAG is responsible for computing metrics on top of the customer data. There's a cross-instance dependency relationship between these two DAGs, but it's not observable or controllable. The data platform team would ideally only like to rebuild the metrics.customer_metrics
DAG when the warehouse.load_customers
DAG has new data. In this guide, we'll use dagster-airlift
to observe the warehouse
and metrics
Airflow instances, and set up a federated execution controlled by Dagster that only triggers the metrics.customer_metrics
DAG when the warehouse.load_customers
DAG has new data. This process won't require any changes to the Airflow code.