This repository contains the specification for the third deliverable of the 'Projects' class. Currently, the services are organized as Docker swarm stack in compose.yml
and the infrastructure is organized in Terraform files in terraform/
.
To install the development pre-requisites, please follow the instructions in the links below:
First, change your current working directory to the project's root directory and bootstrap the project:
# change current working directory
$ cd <path/to/cs-data-ingestion>
# bootstraps development and project dependencies
$ make bootstrap
NOTE: By default, poetry creates and manages virtual environments to install project dependencies -- meaning that it will work isolated from your global Python installation. This avoids conflicts with other packages installed in your system.
If you wish to deploy the stack locally, jump to the Docker section. If you wish to deploy the services to AWS, on the other hand, continue to the Terraform section.
Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers by generating an execution plan describing what it will do to reach the desired state (described in the project's Terraform files), and then executes it to build the described infrastructure. As the configuration changes, Terraform is able to determine what changed and create incremental execution plans which can be applied. For this project, the infrastructure is deployed to AWS.
Follow the instructions in AWS CLI documentation to configure your AWS account locally. After that, update the variable profile
to point to your account profile.
After you're done configuring your AWS profiles, change your current working directory to where Terraform
files are located and initialize it:
# change current working directory
$ cd terraform
# prepares the current working directory for use
$ terraform init
Now, apply the changes required to reach the desired state of the configuration described in the Terraform files. Make sure to correctly reference your SSH Key Pair or else Terraform won't be able to deploy the project's services:
# applies required changes and passes the SSH key pair as parameters
$ terraform apply -var 'key_name=key' -var 'public_key_path=~/.ssh/key.pub'
Note: Make sure the output for
SSH Agent
istrue
:SSH Agent: true
. In case it isn't, please run$ eval "$(ssh-agent -s)"
and$ ssh-add ~/.ssh/key
and try again. Also,key
should actually be the SSH file frompublic_key_path
(but without the.pub
at the end).
At this point, if the project was deployed correctly, you should be able to access the following resources:
- Airflow Webserver UI at
http://<aws_instance.web.public_ip>:8080
- Flask frontend at
http://<aws_instance.web.public_ip>:5000
Note: <aws_instance.web.public_ip> is the final output of the
$ terraform apply ...
command.
Besides the available resources, you may also SSH into the deployed machine at any time:
# connect to provisioned instance via SSH
$ ssh -i ~/.ssh/key.pub ubuntu@<aws_instance.web.public_ip>
In case you are having problems, you may want to look at Hashicorp's Terraform AWS Provider Documentation.
Once you're done, you may remove what was created by terraform apply
:
# change current working directory
$ cd terraform
# destroys the Terraform-managed infrastructure
$ terraform destroy
Considering that the stack is organized as a Docker swarm stack, the following dependencies must be installed:
NOTE: If you're using a Linux system, please take a look at Docker's post-installation steps for Linux!
Once you have Docker
installed, pull the Docker images of the services used by the stack:
# fetches services' docker images
$ make docker-pull
Now, build the missing Docker images with the followin command:
# builds services' docker images
$ make docker-build
NOTE: In order to build development images, use
$ make docker-build-dev
command instead!
Finally, update the env.d
files for each service with the appropriate configurations, credentials, and any other necessary information.
NOTE: in order to generate a fernet key for Airflow, please take a look here.
In your deployment machine, initialize Docker Swarm mode:
# joins the swarm
$ docker swarm init
Note: For more information on what is Swarm and its key concepts, please refer to Docker's documentation.
Now that the deployment machine is in swarm mode, deploy the stack:
# deploys/updates the stack from the specified file
$ docker stack deploy -c compose.yml cs-data-ingestion
Check if all the services are running and have exactly one replica:
# list the services in the cs-data-ingestion stack
$ docker stack services cs-data-ingestion
You should see something like this:
ID NAME MODE REPLICAS IMAGE PORTS
9n8ldih68jnk cs-data-ingestion_redis replicated 1/1 bitnami/redis:6.0
f49nmgkv3v9i cs-data-ingestion_airflow replicated 1/1 bitnami/airflow:1.10.13 *:8080->8080/tcp
fxe80mcl98si cs-data-ingestion_postgresql replicated 1/1 bitnami/postgresql:13.1.0
ii6ak931z3so cs-data-ingestion_airflow-scheduler replicated 1/1 bitnami/airflow-scheduler:1.10.13
vaa3lkoq133d cs-data-ingestion_airflow-worker replicated 1/1 bitnami/airflow-worker:1.10.13
ipsdstxfvnpl cs-data-ingestion_frontend replicated 1/1 cs-data-ingestion:frontend *:5000->5000/tcp
At this point, the following resources will be available to you:
- Airflow Webserver UI is available at
http://localhost:8080
- Flask frontend is available at
http://localhost:5000/v1/render/images
NOTE: In case
localhost
doesn't work, you may tryhttp://0.0.0.0:<port>
instead.
In order to check a service's logs, use the following command:
# fetch the logs of a service
$ docker service logs <service_name>
NOTE: You may also follow the log output in realtime with the
--follow
option (e.g.docker service logs --follow cs-data-ingestion_airflow
). For more information on service logs, refer to Docker's documentation.
Once you're done, you may remove what was created by docker swarm init
:
# removes the cs-data-ingestion stack from swarm
$ docker stack rm cs-data-ingestion
# leaves the swarm
$ docker swarm leave
NOTE: All the data created by the stack services will be lost. For more information on swarm commands, refer to Docker's documentation.
We are always looking for contributors of all skill levels! If you're looking to ease your way into the project, try out a good first issue.
If you are interested in helping contribute to the project, please take a look at our Contributing Guide. Also, feel free to drop in our community chat and say hi. π
Also, thank you to all the people who already contributed to the project!
Copyright Β© 2020-present, CS Data Ingestion Contributors. This project is ISC licensed.