-
Notifications
You must be signed in to change notification settings - Fork 34
Using ade to analyse Spark logs
Along with RFC3164/RFC5424 format Linux Syslogs, ADE can also be run on Spark logs. However, we need to explicitly tell ade that we're using spark: It assumes the logs to be syslogs otherwise. As a starting point, we need to edit the setup file to add this parameter.
Here's what a relevant section of the setup.props file looks like in an ordinary case:
# --------------------------------------------------------------------
# AdeExt properties
# --------------------------------------------------------------------
adeext.msgRateReportFreq=5
adeext.msgRateMsgToKeep=1000
adeext.parseErrorToKeep=100
adeext.parseErrorDaysTolerate=2
adeext.parseErrorTrackNullComponent=false
adeext.runtimeModelDataStoreAtSource=true
adeext.useSparkLogs=true
adeext.msgRate10MinSlotsToKeep=24
adeext.msgRate10MinSubIntervalList=1,2,3,6,12,24
adeext.msgRateMergeSource=true
# --------------------------------------------------------------------
# Paths
# (ade.flowLayoutFileSpark and ade.analysisGroupToFlowNameMapperClassSpark
# are only used when ade.useSparkLogs=true
# --------------------------------------------------------------------
ade.useSparkLogs=true
ade.flowLayoutFile=conf/xml/FlowLayout.xml
ade.flowLayoutFileSpark=conf/xml/FlowLayoutSpark.xml
ade.outputPath=output/
ade.analysisOutputPath=output/continuous
ade.xml.xsltDir=conf/xml
ade.criticalWords.file=conf/criticalWords.txt
ade.analysisGroupToFlowNameMapperClass=org.openmainframe.ade.ext.os.LinuxAnalysisGroupToFlowNameConstantMapper
ade.analysisGroupToFlowNameMapperClassSpark=org.openmainframe.ade.ext.os.SparkAnalysisGroupToFlowNameConstantMapper
ade.outputFilenameGenerator=org.openmainframe.ade.ext.output.ExtOutputFilenameGenerator
ade.inputTimeZone=GMT+00:00
ade.outputTimeZone=GMT
The ade.useSparkLogs
parameter can be toggled to indicate if we're using Spark (true) or not.
Internally, spark log analysis is very similar to that of syslogs, with a few subtle changes. At the heart of it, we have a SparklogLineParser
that parses a single Spark message from the log file. This uses regular expression matching to extract
the timestamp, text, source, component and other relevant fields. This information is used by SparkLogParser to send data to
SparklogMessageReader that processes it and sends it directly to the output stream.
The easiest way to run ade on spark data is to run the spark_analysis_comp_test.sh file (similar to running ade on syslogs) with suitable data to train and analyze. Prerequisites include: Java 8, Apache derby.
Suppose we have derby and ade-1.0.4 installed in the home (~
) directory. To run the test, execute the following statements:
>>> cd ~
>>> ./db-derby-10.11.1.1-bin/bin/startNetworkServer # start the derby database server
>>> cd ade-1.0.4
>>> ./bin/test/spark_analysis_comp_test.sh
The training and analysis data is stored in ade-1.0.4/baseline/spark/upload/ and ade-1.0.4/baseline/spark/analyze/. The script will perform the following steps:
- create a temporary database
- upload the training data to the database
- Train the model groups
- Use these trained model groups to analyze the analysis data
If you're interested in invoking all the functions rather than calling a script, you'd need to use the
controldb create
, upload
, train all
, analyze
commands. You can read more about the ade
command summary here.