Treasure data digdag visualization

Treasure data is a leading CDP platform and leverages dig-dag for data pipelines within itself. digdag is similar to airflow where we define tasks inside a dag and is available as an open source. The CDP is a powerful platform but unfortunately no easy way to visualize the pipelines which makes it hard for onboarding when there are some complex pipelines already built. So I had the below sequence of steps to solve this problem for myself.


Solution Approach

  1. Create Visualization
  2. Make it colorful and interactive
    1. Use Graphiviz to create dot diagram
    2. Render svg and the cmapx
    3. Create html file and embed svg under `img` tag
    4. Insert the cmapx into the html after the img tag
  3. Have the ability to navigate across workflows.
  4. Update the graph everytime there is a code change.
  5. Make it part of CICD

Installation

pip install -r requirements.txt

The workflows are below and the graphs are interactive:

The scheduled workflows are

Usage example

TODO

Github Link

References