top of page

Data Pipelines with Apache Airflow

When data analysts and data scientists use data, they rely that is accurate and diverse enough in order to come up with these amazing models that help drive business decisions. Data comes from many places and from different times; a system could start collecting data tomorrow or it was already started 20 years ago; Imagine that 😅




DAG

This project showcases how to design and schedule a series of jobs/steps using Apache Airflow with the following purposes

  • Backfill data

  • Build a dimensional data model using python

  • load data from AWS S3 bucket to AWS Redshift Datawarehouse

  • run quality checks on the data

  • Use or create custom operators and available hooks to create reusable code


[Github] This project was completed under the Data Engineer Udacity Nanodegree link


tech: AWS Redshift, Python, Apache Airflow, Docker

bottom of page