Reference Architectures for Datalakes on AWS
-
Updated
May 13, 2020 - HTML
Reference Architectures for Datalakes on AWS
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
Project files for the post: Running PySpark Applications on Amazon EMR using Apache Airflow: Using the new Amazon Managed Workflows for Apache Airflow (MWAA) on AWS.
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Project files for the post: Running PySpark Applications on Amazon EMR: Methods for Interacting with PySpark on Amazon Elastic MapReduce.
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
Bits of code I use during live demos
Run templatable playbooks of Hadoop/Spark/et al jobs on Amazon EMR
Amazon EMR Notebook to show how to read from and write to Delta tables with Amazon EMR
3NF-normalize Yelp data on S3 with Spark and load it into Redshift - automate the whole thing with Apache Airflow
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Data Analytics Specialty (DAS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
This repo provides cross-account integration code samples using Amazon S3 Access points
Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.
📓 Repository/Tutorial for initiallizing Jupyter Notebook and Spark cluster on Amazon EMR
Orchestrate an Amazon EMR on Amazon EKS Spark job with AWS Step Functions
Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads
Page rank implementation in SPARK to rank authors and venues based on their publications in the DBLP dataset.
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Add a description, image, and links to the amazon-emr topic page so that developers can more easily learn about it.
To associate your repository with the amazon-emr topic, visit your repo's landing page and select "manage topics."