Skip to content

Riposte

Fabiano V. Santos edited this page Nov 28, 2017 · 2 revisions

The purpose of this module is to perform simple queries.

How To Use

You can write any kind of query supported by Spark SQL, for this a table called dataSet is created. You can use kafka, s3 or any kinf of source supported by Spark.

Configurations

  • nightfall.riposte.reader.format: source format, all formats supported by Spark can be used, example: kafka, json, etc. The console option will writte the output to the stdout.
  • nightfall.riposte.reader.path: source path, required by some formats.
  • nightfall.riposte.reader.options.: any configurations that has this prefix will be set as an option of the DataFrameReader.
  • nightfall.riposte.sql: the query string to execute.
  • nightfall.riposte.writer.format: output format, all formats supported by Spark can be used, example: kafka, json, etc.
  • nightfall.riposte.writer.path: output path, required by some formats.
  • nightfall.riposte.writer.options.: any configurations that has this prefix will be set as an option of the DataFrameWritter.
  • nightfall.riposte.print_schema: set this option true to see schema information.

Example

Below a nightfall.properties example:

# Kafka Reader
nightfall.riposte.reader.format=kafka
nightfall.riposte.reader.options.subscribe=events
nightfall.riposte.reader.options.kafka.bootstrap.servers=localhost:9092
nightfall.riposte.reader.options.startingOffsets=earliest

# Kafka Operations
nightfall.riposte.sql=SELECT dataPoint.type AS type, payload.userId AS userId FROM dataset LATERAL VIEW json_tuple(CAST(dataset.value AS STRING), 'type', 'payload') dataPoint AS type, payload LATERAL VIEW json_tuple(dataPoint.payload, 'userId') payload AS userId WHERE type = 'Order'

# Writter
nightfall.riposte.writer.format=console

The above example uses the following json as data source:

{
	"type": "Order",
	"date": 548931349806,
	"payload": {
		"userId": 1,
		"total": 5500
	}
}

Running

Riposte has to running modes: Stream and Batch. We have to separate them due some differences to create a dataset between batches and streams.

Said that, to run a Riposte stream:

./bin/spark-submit \
  --class com.elo7.nightfall.riposte.NightfallRiposte\$Stream \
  --master local[2] \
  /path/to/nightfall-riposte-2.1.0-shaded.jar \
  -e file://path_to_nightfall.properties

And to run a Batch:

./bin/spark-submit \
  --class com.elo7.nightfall.riposte.NightfallRiposte\$Batch \
  --master local[2] \
  /path/to/nightfall-riposte-2.1.0-shaded.jar \
  -e file://path_to_nightfall.properties