This repository contains code for producing data for kinesis and then reading from AWS Kinesis using Databricks Delta Live Tables.
- AWS credentials and sufficient rights for creating a Kinesis data stream
- Databricks Premium Tier account
- Databricks rights for running delta live tables
- Youtube Data from kaggle (https://www.kaggle.com/datasets/datasnaek/youtube-new)
- Clone https://github.com/MUmarAmanat/dlt_aws.git into Databricks Repos
- First execute
StreamDataProducer
Notebook - Execute
StreamDataProducer
Notebook from Workflow/Delta Live Tables as DLT pipeline