Skip to content

wbarakat/SatCat_ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Satellite Catalog ETL Pipeline

I built this pipeline to extract SatCat data from SpaceTrack and visualized it using Google Data Studio.

Motivation

To further develop my data engineering skills as well as learn more ETL processes and technologies.

Architecture

  1. Create AWS resources with Terraform
  2. Extract data using SpaceTrack API
  3. Load into AWS S3
  4. Copy into AWS Redshift
  5. Orchestrate with Airflow in Docker
  6. Transform using dbt
  7. Visualize with Google Data Studio Dashboard

Dashboard

Setup

In order to run this pipeline, first clone the repo then follow the instructions linked below.

git clone https://github.com/wbarakat/SatCat_ETL.git
cd SatCat_ETL

Instructions

Future Steps

  • Add unit tests for extraction functions
  • Add more tests for data quality/validation: missing values, duplicates, etc

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published