Use sparkMeasure in flight recorder mode to instrument Spark applications without touching their code. Flight recorder mode attaches a Spark Listener that collects the metrics while the application runs. This describes how to sink Spark metrics to a Prometheus Gateway.
PushGatewaySink is a class that extends the SparkListener infrastructure.
It collects and writes Spark metrics and application info in near real-time to a Prometheus Gateway instance.
provided by the user. Use this mode to monitor Spark execution workload.
Notes, the amount of data generated is relatively small in most applications: O(number_of_stages)
How to use: attach the PrometheusGatewaySink to a Spark Context using the listener infrastructure. Example:
--conf spark.extraListeners=ch.cern.sparkmeasure.PushGatewaySink
Configuration for the is handled with Spark configuration parameters.
Note: you can add configuration using --config option when using spark-submit
use the .config method when allocating the Spark Session in Scala/Python/Java).
Configurations:
Option 1 (recommended) Start the listener for PushGatewaySink:
--conf spark.extraListeners=ch.cern.sparkmeasure.PushGatewaySink
Configuration - PushGatewaySink parameters:
--conf spark.sparkmeasure.pushgateway=SERVER:PORT
Example: --conf spark.sparkmeasure.pushgateway=localhost:9091
--conf spark.sparkmeasure.pushgateway.jobname=JOBNAME // defaut value is pushgateway
Example: --conf spark.sparkmeasure.pushgateway.jobname=myjob1
- The use case for this sink it to extend Spark monitoring, by writing execution metrics into Prometheus via the Pushgateway, as Prometheus has a pull-based architecture. You'll need to configure Prometheus to pull metrics from the Pushgateway. You'll also need to set up a performance dashboard from the metrics collected by Prometheus.
-
Start the Prometheus Pushgateway
- Download and start the Pushgateway, from the Prometheus download page
-
Start Spark with sparkMeasure and attach the PushGatewaySink listener -Note: make sure there is no firewall blocking connectivity between the driver and the Pushgateway
Examples:
bin/spark-shell \
--conf spark.extraListeners=ch.cern.sparkmeasure.PushGatewaySink \
--conf spark.sparkmeasure.pushgateway=localhost:9091 \
--packages ch.cern.sparkmeasure:spark-measure_2.12:0.24
- Look at the metrics being written to the Pushgateway
- Use the Web UI to look at the metrics being written to the Pushgateway
- Open a web browser and go to the WebUI, for example: http://localhost:9091/metrics
- You should see the metrics being written to the Pushgateway as jobs are run in Spark