Learn how to deploy a live, real-time dashboard in a Python web app powered by end to end infrastructure as code!
The repo sets up simulated real-time iot data, 3 data pipelines, cicd trigger automation, and 40 GCP resources in total through terraform-all in under 10 minutes!
Main Goal: Get something living and breathing on YOUR screen. Take away what's USEFUL FOR YOU, even if it's simply some code snippets! โ๏ธ ๐
Questions Explored:
- How do I make future Sung's life easier with this repo?
- Using this for code snippets and conversation starters
- Is terraform as easy as to pick up as random internet dwellers say?
- Yes
- Can I build a frontend application in Python?
- Yes, easier than expected
- What's it like working with Bigtable and are read/writes as fast as advertised?
- It's cumbersome in general setting up and reading/writing to a NoSQL database, but it is pretty fast. Reads/writes usually less than 500ms per query
- Is cicd worth the setup effort?
- Yes if you do it gitops style. I did not do so. However, it serves as a good, automated paper trail for what's deployed/destroyed in your environments
- What's it like working with KMS encryption/decryption?
- Functionality is easier than expected. Curating IAM rules is hard
- How do I make your life easier exploring the above with this repo?
- You tell me ๐
What you'll be making!
- Use this as a demo for yourself and your team to launch an end to end data application
- Gain familiarity with terraform infrastructure as code to make launching future data pipelines easier: terraform modules
- Explore building a frontend in Python using the open source Dash Framework
- Reference the build yaml files as starter cicd templates for yourself: first_build.yaml, cloudbuild.yaml, destroy_build.yaml
- See how easy it is to build and push a docker image to a container registry: Dockerfile
- Get a look and feel for encrypting and decrypting credentials using KMS: secrets creator
- Explore how to read and write to a bigtable database with Python: cloud function src
- Show me how to do it better and how YOU are using it! ;)
What you'll ALSO be making!
- Buckets and core infrastructure to kick off the data pipelines. Decrypts and re-encrypts private service account keys each build
- Streaming real-time data with Dataflow-provided templates written in Java and a custom Python cloud function to write to Bigtable
- Store and consume data for multiple audiences
- The frontend webapp as demonstrated by the gif above! Decrypts once to access IoT Core registered devices
- Most important part: people worth sharing all this juicy data with
- Logging and monitoring automatically happen in the background. Some IAM access is created in terraform
Component | Product Overview | Purpose | Azure Equivalents | AWS Equivalents |
---|---|---|---|---|
Cloud Storage | Object store for all kinds of file types | Store sensitive files such as tfstate and the private service account key. Raw data for ad hoc usage | File/Blob Storage | S3 |
Cloud Build | Build workflows for testing and deployment across multiple environments | Deploy, CICD, and destroy terraform-managed infrastructure | Pipelines | CodePipeline |
Compute Engine | Scalable virtual machines | Simulate devices registering to IoT Core | Virtual Machines | EC2 |
Cloud IoT Core | Manage, deploy, and ingest data from dispersed devices | Manages the simulated devices and ingests their data to Pub/Sub | IoT Hub | IoT Core |
Cloud Pub/Sub | Message queue for ingesting and delivering data to other services | Middleware that serves as a shock-absorber and funnels data for further transformation | Service Bus, Storage Queues | Kinesis |
Key Management Service | Managed encryption keys for secrets protection | Encrypts and decrypts the private service account key for each deployment | Key Vault | KMS |
Cloud Dataflow | Serverless stream and batch data processing | Loads data into parquet files and into a BigQuery table | Stream Analytics | Glue |
Cloud Functions | Event-driven serverless compute | Writes simulated temperature device data to Bigtable | Functions | Lambda Functions |
Cloud Bigtable | NoSQL database for large workloads | Stores simulated device data and configured for time series read operations | Table Storage | DynamoDB |
BigQuery | Serverless analytics data warehouse | Stores simulated device data for aggregate reporting metrics using standard SQL | Data Lake Analytics, Data Lake Store | Redshift/Athena depending on who you ask ;) |
Cloud Run | Run Docker containers in a fully-managed, serverless app | Hosts the dash app that visualizes simulated device data in real-time by querying Bigtable every second | Container Instances | Fargate |
Cloud IAM | Access control for managing cloud resources | Gives cloud build and terraform access to deploy and edit the services in scope | IAM | IAM |
-
Sign up for a free trial OR use an existing GCP account
-
Manually fork the repo through the github interface
- Create a new Google cloud project
- Manually connect the github app to cloud build through the github/GCP interfaces or follow these instructions-note the link is for a different tutorial
Note: you may likely be prompted to manually enable the Cloud Build api
Note: The rest of these instructions are written for cloud shell
- Clone the repo and get into starting position for deployment
# set the project ID within cloud shell
gcloud config set project <PROJECT_ID>
git clone https://github.com/<your-github-username>/iot-python-webapp.git
# change directory into the repo
cd iot-python-webapp/
What your terminal should look like
- Run the initial setup shell script that performs one-time tasks
# Example: bash ./initial_setup.sh -e [email protected] -u user_123 -p ferrous-weaver-256122 -s demo-service-account -g gcp_signup_name_3 -b master
# Notes: leave the GITHUB_BRANCH_NAME as "master" for this demo
# You can find the GCP_USERNAME for your project in the cloud shell terminal before the "@" in "realsww123@cloudshell"
# I recommend you investigate the script which showcases actions to NOT be managed by terraform
# Creates secret encryptions, terraform service accounts, and buckets as pre-requisites to the terraform deployment
# append this syntax to the end of the bash command
# if you want to save your terminal output to a text file
####
2>&1 | tee SomeFile.txt
####
# template
bash ./initial_setup.sh [-e GITHUB_EMAIL] [-u GITHUB_USERNAME] [-p PROJECT_ID] [-s SERVICE_ACCOUNT_NAME] [-g GCP_USERNAME] [-b GITHUB_BRANCH_NAME]
Double check the secrets file is uploaded to the bucket and terraform files reflect what you set your command line arguments
- Run the first cloud build job that sets up everything in your project
# note: enabling apis may lag behind other services
# it is accounted for in the initial setup script above
gcloud builds submit --config=first_build.yaml
- Check to see if the webapp exists in the url listed in the terminal after the
first_build.yaml
completes sucessfully
gcloud beta run services list --platform managed
Click on the link to launch the web app
Instead of manually clicking ever-changing interfaces in the console for any changes, this step looks through the code and robustly applies those changes in a transparent way. This allows for an easy to follow paper trail and rollbacks by simply rerunning a previous build.
Commit and push changes to your github repo. This will automatically trigger a build using the logic in cloudbuild.yaml
# This will create a new commit to the master branch in github
# Note: MUST be the first commit to trigger build properly
# Any other commit will not reference the appropriate terraform config
# you recently created above
git status
git add --all
git commit -m "Update terraform config files"
git push origin master
Explore the cloud build history to verify a successful build
Check to see if the app exists after the cloudbuild history updates.
You should see an updated timestamp to the web app
gcloud beta run services list --platform managed
# deletes devices in IoT registry
# destroys terraform deployed resources
gcloud builds submit --config=destroy_build.yaml
Note: if you want to destroy everything, you can delete everything via the console OR simply delete the project you ran the deployment instructions in for a clean slate!
-
I store the tfstate in a remote storage bucket to prevent multiple deployments overriding each other
-
Bigtable was used to taste and see how fast read/writes were for time series data. Turns out each read/write takes less than 500ms on average, which is pretty fast for Python
-
Terraform has yet to create an official module dependency framework. They currently have resource dependency, but it's an incredible amount of code overhead to implement for enabling google apis: click here. Thankfully, module dependency is on the official roadmap, so I'm leaving this in the backlog to enhance after this feature is released: click here
-
Cloud function writes to Bigtable because it's more than enough to handle 3 devices sending concurrent invocations. Dataflow is an alternative
-
Dataflow Java templates are used to write to BigQuery and GCS because it was easy as pie to implement
-
KMS is used to launch terraform services with specific role access AND for cloud run to access the IoT device registry. In a real-world context, it'd follow least-privilege access principles
-
There is no formal testing of this demo outside of multiple walkthroughs of the deployment instructions. My goal was to explore, not to create the most robust app for production on day one
-
KMS key rings can NOT be deleted, so that GCP has a record of key ring names that can't be used anymore. If you're going to redeploy, you must rename the key ring or it'll error out
-
An IoT registry can not be force deleted if devices are tied to it
-
Cloud Run for terraform is still needing further development. Need work outside terraform to allow app to expose to public internet
-
For google apis, if it's the first time enabling, it may error out and force you to manually enable or rerun the terraform build
-
Managing secrets and setting up IAM at a granular level is a project of its own. You'll notice most of the roles grant wide permissions for demo purposes
-
Setting up good parameters for interoperability across modules requires robust, upfront repo planning
-
Dataflow jobs have to be replaced everytime you redeploy infrastructure with terraform-even if you don't make any changes! This will disrupt the live data flow, so be mindful when redeploying
-
Terraform features follow a couple months delay after a new GCP service is released
-
Next time, I would create a distinct pub/sub push subscription for the cloud function and pull subscriptions for the dataflow jobs for the same topic to employ the proper throughput mechanisms
-
PLEASE EXPLICITLY VERSION ALL YOUR DEPENDENCIES SUCH AS TERRAFORM PROVIDER VERSIONS AND PYTHON PACKAGES OR THEY WILL BITE YOU IN THE BUTT
-
My stackshare decision!: Think twitter for developers
-
IoT Reference Example: The java equivalent of what this repo does
-
Another IoT Reference Example: Official GCP documentation for reference architecture
-
Terraform Cloud Build Example: If you want to focus on cloudbuild setup
-
IoT Pipeline Qwiklab: Where I got the device simulator scripts and general starting point
All feedback is welcome! You can use the issue tracker to submit bugs, ideas, etc. Pull requests are splendid!
My master branch will be protected, so no changes will come through without my formal approval.