Skip to content

Latest commit

 

History

History
42 lines (25 loc) · 1.52 KB

README.md

File metadata and controls

42 lines (25 loc) · 1.52 KB

Satellite Catalog ETL Pipeline

I built this pipeline to extract SatCat data from SpaceTrack and visualized it using Google Data Studio.

Motivation

To further develop my data engineering skills as well as learn more ETL processes and technologies.

Architecture

  1. Create AWS resources with Terraform
  2. Extract data using SpaceTrack API
  3. Load into AWS S3
  4. Copy into AWS Redshift
  5. Orchestrate with Airflow in Docker
  6. Transform using dbt
  7. Visualize with Google Data Studio Dashboard

Dashboard

Setup

In order to run this pipeline, first clone the repo then follow the instructions linked below.

git clone https://github.com/wbarakat/SatCat_ETL.git
cd SatCat_ETL

Instructions

Future Steps

  • Add unit tests for extraction functions
  • Add more tests for data quality/validation: missing values, duplicates, etc