Airflow Made Easy | Local Setup Using Docker

Here, we guide you step-by-step on setting up Airflow locally with Docker. Simplify your workflow and explore our easy-to-follow instructions for seamless Airflow deployment on your machine.


Project logo

Airflow Made Easy | Local Setup Using Docker

Execute Airflow Unit Tests

Deploy GitHub Pages

This is my Apache Airflow Local development setup using docker-compose. It will also include some sample DAGs and workflows.

Recent Updates:

03-Dec-2023

  • Upgrade to airflow 2.7.3
  • Upgraded superset to add secret key
  • Added superset database connection image
  • Works on M1 Mac

03-May-2022

  • Added Dockerfile to extend airflow image
  • Adding additional Pypi package (td-client)
  • Upgrade to Airflow 2.3.0

29-Jun-2021

  • Updated image to Airflow 2.1.1
  • Leveraging _PIP_ADDITIONAL_REQUIREMENTS to install additional dependencies
  • Developing and testing operators for Treasure Data
  • Read more at Treasure Data

📝 Table of Contents

🧐 About

Setup Apache Airflow 2.0 locally on Windows 10 (WSL2) via Docker Compose. The oiginal docker-compose.yaml file was taken from the official github repo.

This contains service definitions for

  • airflow-scheduler
  • airflow-webserver
  • airflow-worker
  • airflow-init - To initialize db and create user
  • flower
  • redis
  • postgres - This is backend for airflow. I am also creating additional database userdata as a backend for my data flow. This is not recommended. Its ideal to have separate databases for airflow and your data.

I have added additional command to add a airflow db connection as part of the docker-compose

Directories I am mounting:

  • ./dags
  • ./logs
  • ./plugins
  • ./sql - for Sql files. We can leveraje jinja templating in our queries. Refer the sample Dag.
  • ./test - Has Unit tests for Airflow Dags.
  • ./pg-init-scripts - This has scripts to create additional database in postgres.

Data Engineering Projects

Here you will find some personal projects that I have worked on. These projects will throw light on some of the airflow features I have used and learnings related to other technologies.

Data Visualization

To experiment with Apache Superset. Read more here

🏁 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Clone this repo to your machine

docker-compose -f docker-compose.yaml up airflow-init
docker-compose -f docker-compose.yaml up

Prerequisites

What things you need to install the software and how to install them.

You should have Docker and Docker-compose v1.27.0 or more installed on your machine

  • Install and configure WSL2
  • I also had to reset my Ubuntu installation and thats when it asked me to create a user.

Installing

A step by step series of examples that tell you how to get a development env running.

Clone the Repo

git clone

Start docker build

#To extend airflow image
docker-compose build

docker-compose -f docker-compose.yaml up airflow-init

docker-compose -f docker-compose.yaml up

Keep checking docker processes to make sure all machines are helthy

docker ps

Once you notice that all containers are healthy.

Add a connection to Postgres via command line and then Access Airflow UI

docker exec -it airflow-docker_airflow-worker airflow connections add 'postgres_new' --conn-uri 'postgres://airflow:airflow@postgres:5432/airflow'
http://localhost:8080

End with an example of getting some data out of the system or using it for a little demo.

🔧 Running the tests

Unit test for airflow dags has been defined and present in the test folder. This folder is also mapped to the docker containers inside the docker-compose.yaml file. Follow below steps to execute unittests after the docker containers are running:

./airflow bash
python -m unittest discover -v

Github Workflow for running tests

I had to create another docker-compose to be able to execute unit tests whenever I push code to master. Please refer

Break down into end to end tests

Another #TODO

🎈 Usage

Now you can create new dags and place them in your local system and can see it coming live on web UI. Refer the sample dag in the repo.

### Important : Edit the postgres_default connection from the UI or through command line if you want to persist data in postgres as part of the dags you create. Even better you can always add a new connection.

Update: This is now taken care of the in the updated Docker compose file. The connection and the new database are created
  ./airflow.sh bash

  airflow connections add 'postgres_new' --conn-uri 'postgres://airflow:airflow@postgres:5432/airflow'

  connect to postgres and create new database with name 'userdata'

docker exec -it airflowdocker_postgres_1 /bin/bash psql -U airflow create database userdata;


  Turn on Dag: PostgreOperatorTest_Dag

⛏️ Built Using

✍️ Authors

🎉 Acknowledgements

Cleanup

docker-compose down --volumes --rmi all