This demo is a homework for job application at Aiven. It demonstrates recording system metrics to Postgres database via apache Kafka. Kafka producer and consumer are implemented with Monix-kafka library and Slick is used to access Postgres.
All the needed services are deployed to Aiven cloud by terraform-based automation.
- An empty project in Aiven console with enough credits.
- all
init-*
scripts require Python (either 2 or 3) to be available on$PATH
aspython
- all
init-*
scripts require aiven CLI installed and available on$PATH
init-aiven
additionally requires Terraform- main code is writen in Scala 2.13, which requires JDK 11 or later.
- build and run rely on sbt. Minimal sbt launcher is provided, so only JDK is a prerequisite, but existing sbt instance can be used as well.
- all the shell commands below are to be run from the root of this repo clone
WARNING: this works on actual cloud infrastructure! Never run against production project!
Run ./init-aiven
and confirm each step.
After this is done, further updates to terraform config can be applied with terraform apply
from the same clone of this repo. If needed to run from different copy, the file terraform.tfstate
needs to be moved there.
./init-kafka
will setup everything needed to connect to Kafka.
This can be done independently on different copies of this repo, setting-up as many instances as desired.
./init-postgres
will setup everything needed to run Postgres reader.
This can be done independently on different copies of this repo as well.
Integration tests run against real servers in aiven cloud, deployed as described above.
They can be started with bin/sbt it:test
, or individually - bin/sbt "it:testOnly ..."
Several main classes defined in Mains.scala.
The sbt command run
will show the menu to select one. Or use bin/sbt "runMain ..."
with qualified name, to start directly.
aiven.kafkapg.ToKafkaConnectEvery3s
sends messages to the topic, read by Kafka-connect service. In this case consumer in not needed - data end up in postgres automaticallyaiven.kafkapg.ToKafkaEvery3s
sends messages to the topic, read by one of the consumers belowaiven.kafkapg.NoiseToKafkaEvery5s
- same as above, but sends noisy messages to test consumer's error-toleranceaiven.kafkapg.FromKafkaToConsole
consumes messages and prints to console, ignoring the errorsaiven.kafkapg.FromKafkaToPg
consumes messages and stores to Postgres, currently fails on errors ("dead-letters" behavior is still TBD)
The remaining two read data from postgres. Both accept host name as an argument, defaulting to all hosts. E.g. :
bin/sbt "runMain aiven.kafkapg.FromPgLast10Records storm"
will show to see last 10 records from host "storm"bin/sbt "runMain aiven.kafkapg.FromPgAvgCPULastHour"
will show average of the CPU load across all hosts during last hour or None if there are no records
All the mains above can be started in parallel in different sbt instances to show the actual message flow. (Warning "..sbt server already running.." can be ignored.)