This project is about streaming change data capture by using AWS Kinesis and push data into histor Data Warehouse (storing historical data)
From transaction database:
We can build Data Warehouse(store historical data) using SCD type 2
- AWS account to setup infrastructure
- docker build image to push lambda using container
The data will be generated by a data generation script at
The script will generate data and push data to AWS Dynamodb using Boto3 but you have to add permission to connect AWS Dynamodb
Setup AWS Kinesis Data Stream
- Go to Amazon Kinesis console
- Create data stream -> Enter the name of data stream and create (example: user_stream)
Setup AWS Dynamodb
- Go to AWS Dynamodb console
- Create user_dim table
- Go to user_dim -> Exports and Streams
- From Amazon Kinesis data stream details -> Enable -> Then choose the stream you create above
Build your image and push image to Amazon Elastic Container Registory(ECR)
- Go to Amazon Elastic Container Registory console -> Repositories -> Then create your repositories to storage image
- Go to View push commands and follow the instructions
Setup AWS Lambda
- Go to AWS Lambda -> Create Function -> Choose Container image -> Then create name of Lambda function and choose the image in your repository
- Enable Kinesis Stream: Choose your lambda function -> Add trigger -> Choose Kinesis as your source and choose Kinesis Data Stream that you create above
- Add Permission for Lambda to access Kinesis: Go to Configuration -> Permission -> Role name -> Add AmazonKinesisReadOnlyAccess permission in IAM Role of Lambda
Setup AWS RDS using Postgres
- You can watch this video to create Data Warehouse(using Postgres database) and review data using DBeaver:
- After you create database, run this scripts create_user_dim.sql to create table user_dim
All of that AWS service you can setup using AWS CLI or AWS CDK
- First, I will run the
and it will create 10 user and insert into database
Contact to me if you has any question about this project