Skip to content

Latest commit

 

History

History

Project 3-STEDI Human Balance Analytics

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Project 3 - STEDI Human Balance Analytics

Project Introduction

Spark and AWS Glue allow you to process data from multiple sources, categorize the data, and curate it to be queried in the future for multiple purposes.

In this project, as a data engineer on the STEDI Step Trainer team, you'll need to extract the data produced by the STEDI Step Trainer sensors and the mobile app, and curate them into a data lakehouse solution on AWS so that Data Scientists can train the learning model.

Project Details

The STEDI Team has been hard at work developing a hardware STEDI Step Trainer that:

  • trains the user to do a STEDI balance exercise;
  • and has sensors on the device that collect data to train a machine-learning algorithm to detect steps;
  • has a companion mobile app that collects customer data and interacts with the device sensors.

STEDI has heard from millions of early adopters who are willing to purchase the STEDI Step Trainers and use them.

Several customers have already received their Step Trainers, installed the mobile application, and begun using them together to test their balance. The Step Trainer is just a motion sensor that records the distance of the object detected. The app uses a mobile phone accelerometer to detect motion in the X, Y, and Z directions.

The STEDI team wants to use the motion sensor data to train a machine learning model to detect steps accurately in real-time. Privacy will be a primary consideration in deciding what data can be used.

Some of the early adopters have agreed to share their data for research purposes. Only these customers’ Step Trainer and accelerometer data should be used in the training data for the machine learning model.

Project Summary

Landing Zone

Glue Tables created (SQL DDL scripts):

Screenshot of select statements from Athena showing the customer landing data and accelerometer landing data:

  • customer_landing table:

  • accelerometer_landing table:

  • record count check for each landing zone table:

Trusted Zone

Glue Job Scripts:

Use Athena to query Trusted Glue Tables

  • customer_trusted table:

Curated Zone

Glue Job Scripts:

  • customer_trusted_to_curated.py
  • trainer_trusted_to_curated - Populate machine_learning_curated Glue Table - an aggregated table that has each of the Step Trainer Readings, and the associated accelerometer reading data for the same timestamp, but only for customers who have agreed to share their data

Screenshot of all Glue Tables