Usage
To start, create a directory with a following structure, where manifest.json
is a file generated by dbt:
.
├── config
│ ├── base
│ │ ├── airflow.yml
│ │ ├── dbt.yml
│ │ └── k8s.yml
│ └── dev
│ └── dbt.yml
├── dag.py
└── manifest.json
Then, put the following code into dag.py
:
from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory
from airflow.models import Variable
from os import path
dag = AirflowDagFactory(path.dirname(path.abspath(__file__)), Variable.get("env")).create()
For older versions of Airflow (before 2.0) the dag file need to be slightly bigger:
from airflow import DAG
from pytimeparse import parse
from os import path
from airflow.models import Variable
from dbt_airflow_factory.config_utils import read_config
from dbt_airflow_factory.airflow_dag_factory import AirflowDagFactory
dag_factory = AirflowDagFactory(path.dirname(path.abspath(__file__)), Variable.get("env"))
config = dag_factory.read_config()
with DAG(default_args=config["default_args"], **config["dag"]) as dag:
dag_factory.create_tasks(config)
When uploaded to Airflow DAGs directory, it will get picked up by Airflow, parse manifest.json
and prepare a DAG to run.
Configuration files
It is best to look up the example configuration files in tests directory to get a glimpse of correct configs.
You can use Airflow template variables
in your dbt.yml
and k8s.yml
files, as long as they are inside quotation marks:
target: "{{ var.value.env }}"
some_other_field: "{{ ds_nodash }}"
Analogously, you can use "{{ var.value.VARIABLE_NAME }}"
in airflow.yml
, but only the Airflow variable getter.
Any other Airflow template variables will not work in airflow.yml
.
Creation of the directory with data-pipelines-cli
DBT Airflow Factory works best in tandem with data-pipelines-cli tool. dp not only prepares directory for the library to digest, but also automates Docker image building and pushes generated directory to the cloud storage of your choice.