Skip to content

Commit

Permalink
Update doc.
Browse files Browse the repository at this point in the history
  • Loading branch information
LucaCanali committed Oct 14, 2020
1 parent 8a60cdb commit 3b2c06a
Showing 1 changed file with 38 additions and 31 deletions.
69 changes: 38 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# SparkPlugins
![SparkPlugins CI](https://github.com/cerndb/SparkPlugins/workflows/SparkPlugins%20CI/badge.svg)

Code and examples of how to use Apache Spark Plugin extensions to the metrics system with Apache Spark 3.0.

A repo with code and examples of Apache Spark Plugin extensions to the metrics system
applied to measuring I/O from cloud Filesystems, OS/system metrics and custom metrics.

---
Apache Spark 3.0 comes with a new plugin framework. Plugins allow extending Spark monitoring functionality.
Apache Spark 3.0 comes with an improved plugin framework. Plugins allow extending Spark monitoring functionality.
- One important use case is extending Spark instrumentation with custom metrics:
OS metrics, I/O metrics, external applications monitoring, etc.
- Note: The code in this repo is for Spark 3.x.
Expand All @@ -13,45 +14,25 @@ Apache Spark 3.0 comes with a new plugin framework. Plugins allow extending Spar
Plugin notes:
- Spark plugins implement the `org.apache.spark.api.Plugin` interface, they can be written in Scala or Java
and can be used to run custom code at the startup of Spark executors and driver.
- Plugins configuration: `--conf spark.plugins=<list of plugin classes>`
- Plugins basic configuration: `--conf spark.plugins=<list of plugin classes>`
- Plugin JARs need to be made available to Spark executors
- YARN: you can distribute the plugin code to the executors using `--jars`.
- K8S, when using Spark 3.0.1 on K8S, `--jars` distribution will **not work**, you will need to make the JAR available in the Spark container when you build it.
- K8S, when using Spark 3.0.1 on K8S, `--jars` or `--packages` distribution will **not work**, you will need to make the JAR available in the Spark container when you build it.
- [SPARK-32119](https://issues.apache.org/jira/browse/SPARK-32119) fixes this issue and allows to use `--jars` to distribute plugin code.
- Link to [Spark monitoring documentation](https://spark.apache.org/docs/latest/monitoring.html#advanced-instrumentation)
- See also [SPARK-29397](https://issues.apache.org/jira/browse/SPARK-29397), [SPARK-28091](https://issues.apache.org/jira/browse/SPARK-28091).

Author and contact: [email protected]

## Getting Started
- Build or download the SparkPlugin `jar`
- Build or download the SparkPlugin `jar`. For example:
- Build from source with:
```
sbt package
```
- Or download from the automatic build in [github actions](https://github.com/cerndb/SparkPlugins/actions)
- Test and debug with a demo plugin `ch.cern.RunOSCommandPlugin`:
this runs an OS command on the executors, customizable (see below), by default `"/usr/bin/touch /tmp/plugin.txt"`.
```
bin/spark-shell --master yarn \
--jars <path>/sparkplugins_2.12-0.1.jar \
--conf spark.plugins=ch.cern.RunOSCommandPlugin
```
- You can see if the plugin has run by checking that the file `/tmp/plugin.txt` has been
created on the executor machines.
## How to use the metrics produced by the plugins
- Or download the jar from the automatic build in [github actions](https://github.com/cerndb/SparkPlugins/actions)
- You can find the metrics generated by the plugin in the Spark metrics system stream (see details below).
- Spark Dashboard using the metrics system:
Link with details on [how to deploy a dashboard using Spark metrics](https://github.com/cerndb/spark-dashboard)
This repo contains code and Spark Plugin examples to extend [Spark metrics instrumentation](https://spark.apache.org/docs/3.0.0/monitoring.html#metrics)
and use an extended grafana dashboard with the Plugin metrics of interest.
## Plugins in this Repository
### Demo and basics
### Demo and Basic PLugins
- [DemoPlugin](src/main/scala/ch/cern/DemoPlugin.scala)
- `--conf spark.plugins=ch.cern.DemoPlugin`
- Basic plugin, demonstrates how to write Spark plugins in Scala, for demo and testing.
Expand All @@ -65,6 +46,26 @@ Author and contact: [email protected]
- Example illustrating how to use plugins to run actions on the OS.
- Action implemented: runs an OS command on the executors, by default it runs: `/usr/bin/touch /tmp/plugin.txt`
- Configurable action: `--conf spark.cernSparkPlugin.command="command or script you want to run"`
- Example:
```
bin/spark-shell --master yarn \
--jars <path>/sparkplugins_2.12-0.1.jar \
--conf spark.plugins=ch.cern.RunOSCommandPlugin
```
- You can see if the plugin has run by checking that the file `/tmp/plugin.txt` has been
created on the executor machines.
## Spark Metrics System and Spark Performance Dashboard
- Most of the Plugins described in this repo are intended to extend the Spark Metrics System.
- See the details on the Spark metrics system at [Spark Monitoring documentation](https://spark.apache.org/docs/latest/monitoring.html#metrics).
- You can find the metrics generated by the plugins in the Spark metrics system stream under the
namespace `namespace=plugin.<Plugin Class Name>`.
- Metrics needs to be stored and visualized
- This repo refers to the work on [how to deploy a Spark Performance Dashboard using Spark metrics](https://github.com/cerndb/spark-dashboard)
---
## Plugins in this Repository
### OS metrics instrumentation with cgroups, for Spark on Kubernetes
- [CgroupMetrics](src/main/scala/ch/cern/CgroupMetrics.scala)
Expand Down Expand Up @@ -111,7 +112,7 @@ Author and contact: [email protected]
#### HDFS extended storage statistics
This Plugin measures HDFS extended statistics.
In particular it provides information on read locality and erasure coding usage (for HDFS 3.x).
In particular, it provides information on read locality and erasure coding usage (for HDFS 3.x).
- [HDFSMetrics](src/main/scala/ch/cern/HDFSMetrics.scala)
- Configure with: `--conf spark.plugins=ch.cern.HDFSMetrics`
Expand Down Expand Up @@ -203,7 +204,8 @@ storage system exposed as a Hadoop Compatible Filesystem). Use this for Spark bu
- Collects I/O metrics for Hadoop-compatible filesystem using Hadoop 2.7 API, use with Spark built with Hadoop 2.7
- Metrics are the same as for CloudFSMetrics, they the prefix `ch.cern.CloudFSMetrics27`.
## Experimental Plugins for I/O time instrumentation
---
## Experimental Plugins for I/O Time Instrumentation
This section details a few experimental Spark plugins used to expose metrics for I/O-time
instrumentation of Spark workloads using Hadoop-compliant filesystems.
Expand All @@ -214,7 +216,10 @@ These plugins use instrumented experimental/custom versions of the Hadoop client
- Note: this requires custom S3A client implementation, see experimental code at: [HDFS and S3A custom instrumentation](https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation)
- Spark config:
- `--conf spark.plugins=ch.cern.experimental.S3ATimeInstrumentation`
- Custom jar needed: `--jars hadoop-aws-3.2.0.jar` built [from this fork](https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation)
- Custom jar needed: `--jars hadoop-aws-3.2.0.jar`
- build [from this fork](https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation)
- or download a pre-built copy of the [hadoop-aws-3.2.0.jar at this link](https://cern.ch/canali/res/hadoop-aws-3.2.0.jar)
- Metrics implemented (gauges), with prefix `ch.cern.experimental.S3ATimeInstrumentation`:
- `S3AReadTimeMuSec`
- `S3ASeekTimeMuSec`
Expand Down Expand Up @@ -259,6 +264,8 @@ These plugins use instrumented experimental/custom versions of the Hadoop client
- `--jars <PATH>/sparkplugins_2.12-0.1.jar`
- Non-standard configuration required for using this instrumentation:
- replace `$SPARK_HOME/jars/hadoop-hdfs-client-3.2.0.jar` with the jar built [from this fork](https://github.com/LucaCanali/hadoop/tree/s3aAndHDFSTimeInstrumentation)
- for convenience you can download a pre-built copy of the [hadoop-hdfs-client-3.2.0.jar at this link](https://cern.ch/canali/res/hadoop-hdfs-client-3.2.0.jar)
- Metrics implemented (gauges), with prefix `ch.cern.experimental.HDFSTimeInstrumentation`:
- `HDFSReadTimeMuSec`
- `HDFSCPUTimeDuringReadMuSec`
Expand Down

0 comments on commit 3b2c06a

Please sign in to comment.