When data analysts and data scientists use data, they rely that is accurate and diverse enough in order to come up with these amazing models that help drive business decisions. Data comes from many places and from different times; a system could start collecting data tomorrow or it was already started 20 years ago; Imagine that 😅
DAG
This project showcases how to design and schedule a series of jobs/steps using Apache Airflow with the following purposes
Backfill data
Build a dimensional data model using python
load data from AWS S3 bucket to AWS Redshift Datawarehouse
run quality checks on the data
Use or create custom operators and available hooks to create reusable code
tech: AWS Redshift, Python, Apache Airflow, Docker
Kommentare