Skip to content

Commit

Permalink
Minor update to the README
Browse files Browse the repository at this point in the history
  • Loading branch information
LucaCanali committed Dec 20, 2021
1 parent 804dd70 commit c1a7858
Showing 1 changed file with 13 additions and 14 deletions.
27 changes: 13 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,39 @@
![SparkPlugins CI](https://github.com/cerndb/SparkPlugins/workflows/SparkPlugins%20CI/badge.svg?branch=master&event=push)
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/ch.cern.sparkmeasure/spark-plugins_2.12/badge.svg)](https://maven-badges.herokuapp.com/maven-central/ch.cern.sparkmeasure/spark-plugins_2.12)

This repo contains code and examples of Apache Spark Plugins.
Apache Spark plugins are an interface and configuration that allows to inject code on executor start-up and, among others, provide a hook to the Spark metrics system. This provides a way to extend metrics collection beyond what is available in Apache Spark.

This repo contains code and examples of Apache Spark Plugins for Spark 3.x.
Apache Spark plugins are an interface and configuration allowing to inject custom code on executors
start-up, with a hook to the Spark metrics system.
This provides a way to extend metrics collection beyond what is normally available in Apache Spark.

### Motivations
Instrumenting some parts of the Spark workload with plugins provides additional flexibility compared
to instrumentation that is committed in the Apache Spark code, as only users who want to activate
- Instrumenting parts of the Spark workload with plugins provides additional flexibility compared
to extending instrumentation in the Apache Spark code, as only users who want to activate
it can do so, moreover they can play with configuration that may be customized for their environment,
so not necessarily suitable for all possible uses of Apache Spark code.
One important use case is extending Spark instrumentation with custom metrics:
- One important use case is extending Spark instrumentation with custom metrics.
- This repo provides code and examples of plugins applied to measuring Spark on K8S,
Spark I/O from cloud Filesystems, OS metrics, and custom application metrics.
- Note: The code in this repo is for Spark 3.x.
For Spark 2.x, see instead [Executor Plugins for Spark 2.4](https://github.com/cerndb/SparkExecutorPlugins2.4)

### Deployment Notes:
### Implementation Notes:
- Spark plugins implement the `org.apache.spark.api.Plugin` interface, they can be written in Scala or Java
and can be used to run custom code at the startup of Spark executors and driver.
- Plugins basic configuration: `--conf spark.plugins=<list of plugin classes>`
- Plugin JARs need to be made available to Spark executors
- you can distribute the plugin code to the executors using `--jars` and `--packages`.
- for K8S you can also consider making it available directly in the container.
- Link to [Spark monitoring documentation](https://spark.apache.org/docs/latest/monitoring.html#advanced-instrumentation)
- See also: [SPARK-29397](https://issues.apache.org/jira/browse/SPARK-29397), [SPARK-28091](https://issues.apache.org/jira/browse/SPARK-28091), [SPARK-32119](https://issues.apache.org/jira/browse/SPARK-32119).

### Related Work: Spark Metrics System and Spark Performance Dashboard

- for K8S you can also consider making the jars available directly in the container image.
- Most of the Plugins described in this repo are intended to extend the Spark Metrics System.
- See the details on the Spark metrics system at [Spark Monitoring documentation](https://spark.apache.org/docs/latest/monitoring.html#metrics).
- You can find the metrics generated by the plugins in the Spark metrics system stream under the
namespace `namespace=plugin.<Plugin Class Name>`
- See also: [SPARK-29397](https://issues.apache.org/jira/browse/SPARK-29397), [SPARK-28091](https://issues.apache.org/jira/browse/SPARK-28091), [SPARK-32119](https://issues.apache.org/jira/browse/SPARK-32119).

### Related Work and Spark Performance Dashboard

- Spark Performance Dashboard - a solution to ingest and visualize Spark metrics
- Link to the repo on [how to deploy a Spark Performance Dashboard using Spark metrics](https://github.com/cerndb/spark-dashboard)

- DATA+AI summit 2020 talk [What is New with Apache Spark Performance Monitoring in Spark 3.0](https://databricks.com/session_eu20/what-is-new-with-apache-spark-performance-monitoring-in-spark-3-0)
- DATA+AI summit 2021 talk [Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins](https://databricks.com/session_na21/monitor-apache-spark-3-on-kubernetes-using-metrics-and-plugins)

Expand Down

0 comments on commit c1a7858

Please sign in to comment.