-
Notifications
You must be signed in to change notification settings - Fork 1
Riposte
Fabiano V. Santos edited this page Nov 28, 2017
·
2 revisions
The purpose of this module is to perform simple queries.
You can write any kind of query supported by Spark SQL, for this a table called dataSet
is created. You can use kafka, s3 or any kinf of source supported by Spark.
-
nightfall.riposte.reader.format: source format, all formats supported by Spark can be used, example:
kafka
,json
, etc. Theconsole
option will writte the output to the stdout. - nightfall.riposte.reader.path: source path, required by some formats.
-
nightfall.riposte.reader.options.: any configurations that has this prefix will be set as an option of the
DataFrameReader
. - nightfall.riposte.sql: the query string to execute.
-
nightfall.riposte.writer.format: output format, all formats supported by Spark can be used, example:
kafka
,json
, etc. - nightfall.riposte.writer.path: output path, required by some formats.
-
nightfall.riposte.writer.options.: any configurations that has this prefix will be set as an option of the
DataFrameWritter
. -
nightfall.riposte.print_schema: set this option
true
to see schema information.
Below a nightfall.properties example:
# Kafka Reader
nightfall.riposte.reader.format=kafka
nightfall.riposte.reader.options.subscribe=events
nightfall.riposte.reader.options.kafka.bootstrap.servers=localhost:9092
nightfall.riposte.reader.options.startingOffsets=earliest
# Kafka Operations
nightfall.riposte.sql=SELECT dataPoint.type AS type, payload.userId AS userId FROM dataset LATERAL VIEW json_tuple(CAST(dataset.value AS STRING), 'type', 'payload') dataPoint AS type, payload LATERAL VIEW json_tuple(dataPoint.payload, 'userId') payload AS userId WHERE type = 'Order'
# Writter
nightfall.riposte.writer.format=console
The above example uses the following json as data source:
{
"type": "Order",
"date": 548931349806,
"payload": {
"userId": 1,
"total": 5500
}
}
Riposte has to running modes: Stream and Batch. We have to separate them due some differences to create a dataset between batches and streams.
Said that, to run a Riposte stream:
./bin/spark-submit \
--class com.elo7.nightfall.riposte.NightfallRiposte\$Stream \
--master local[2] \
/path/to/nightfall-riposte-2.1.0-shaded.jar \
-e file://path_to_nightfall.properties
And to run a Batch:
./bin/spark-submit \
--class com.elo7.nightfall.riposte.NightfallRiposte\$Batch \
--master local[2] \
/path/to/nightfall-riposte-2.1.0-shaded.jar \
-e file://path_to_nightfall.properties
Elo7