I built this pipeline to extract SatCat data from SpaceTrack and visualized it using Google Data Studio.
To further develop my data engineering skills as well as learn more ETL processes and technologies.
- Create AWS resources with Terraform
- Extract data using SpaceTrack API
- Load into AWS S3
- Copy into AWS Redshift
- Orchestrate with Airflow in Docker
- Transform using dbt
- Visualize with Google Data Studio Dashboard
In order to run this pipeline, first clone the repo then follow the instructions linked below.
git clone https://github.com/wbarakat/SatCat_ETL.git
cd SatCat_ETL
- Add unit tests for extraction functions
- Add more tests for data quality/validation: missing values, duplicates, etc