-
Notifications
You must be signed in to change notification settings - Fork 1
Pass Command Line Arguments
Jan Ehmueller edited this page Oct 15, 2017
·
2 revisions
The command line arguments are parsed using scallop. To see the available options use --help
. They can be set by either a short flag (e.g., -o
) or a long flag (e.g., --option
).
# example call starting a Deduplication with a specific config file
spark.sh -m yarn -c de.hpi.ingestion.deduplication.Deduplication ingestion_master.jar --config deduplication_wikidata.xml
The SparkJob
trait defines the value conf: CommandLineConf
. The parsed command line options are written to this value in the method execute()
. Values in the config can be accessed in two ways:
// return an Option of the value (is None when the option was not set)
conf.configOpt
// return the value or throw an error if it is not set
conf.config
-
config
: sets the config file used by the job -
importConfig
: sets the import config file used by the job (only used by DataLakeImports) -
commitJson
: sets the input for the Commit Job (created by the Curation Interface) -
comment
: sets the comment used by a Blocking Job -
tokenizer
: sets the tokeniser used by the TermFrequencyCounter. Can be up to three options (tokenizer, stop words, stemming). An example call would be:--tokenizer CleanCoreNLPTokenizer true true
. -
toReduced
: sets whether or not the LinkAnalysis writes to the reduced columns. This option is used by the ReducedLinkAnalysis -
restoreVersion
: sets the version to which the subject table is restored (used by the the VersionRestore Job) -
diffVersions
: sets the versions to diff in the VersionDiff Job. Must be exactly two versions. An example call would be:--diffVersions 7b410340-243e-11e7-937a-ad9adce5e136 f44df8b0-2425-11e7-aec2-2d07f82c7921