SPL Conqueror is a library to identify and visualize the influence of configuration options on non-functional properties such as performance of footprint of configurable software systems. It consists of a set of sub-projects, which we roughly explain in the following. For further details, we refer to the dedicated sections.
-
The core project ("SPLConqueror_Core") provides basic functionalities to model the variability of a software system including their configuration options and constraints among them.
-
The MachineLearning sub-project provides an algorithm to learn a performance-influence model describing the influence of configuration options on non-functional properties. It also specifies interfaces for satisfiability checking of configurations with respect to the variability model and optimization with respect to finding an optimal configuration for a given objective function and non-functional property.
-
The PyML sub-project provides a set of interface to scikit learn, which is a machine-learning framework implemented in python. Using this interface different regression techniques of scikit learn can be used.
-
The CommandLine sub-project offers an interface to automatically execute experiments using different sampling strategies and different machine-learning techniques. To specify the experiments, SPL Conqueror offers a set of commands, which we explain in the dedicated section.
-
The PerformancePrediction_GUI provides a graphical user interface to learn performance-influence models based on a learning set of configurations. Using this GUI, specific sampling strategies can be used.
-
The SPLConqueror_GUI provides a set of different visualisations that can be used to further understand a learned performance-influence model.
-
The ScriptGenerator provides an interface to generate script files that can be used in the CommandLine sub-project.
-
The VariabilityModel_GUI offers the possibility of defining a variability model of the configurable system being considered.
-
The Persistence sub-project offers the possibility of writing objects to the storage device. It can be used to continue the execution of script files that are aborted in their execution.
On Ubuntu 16.04
-
Clone git repository
-
Install Mono and MonoDevelop
sudo apt install mono-complete monodevelop
- Start MonoDevelop and open the root project:
<SPLConquerer-GitRoot>/SPLConqueror/SPLConqueror.sln
-
Perform a right-click on every project of the solution and select the preferred target framework (e.g., .NET4.5) in Options -> Build -> General
-
Perform a right-click on the solution and select Restore NuGet packages Be aware that an internet connection is required to perform this step.
-
Build the root project
On a Mac (OS X (10.11.6))
a1. Clone git repository
-
Download and install latest Xamarin-IDE from https://www.xamarin.com
-
Start Xamarin-IDE and open the root project:
<SPLConquerer-GitRoot>/SPLConqueror/SPLConqueror.sln
-->
- Build root project
On Windows 10
-
Clone git repository
-
Install Visual Studio
-
Open Visual Studio and open the solution
-
Perform a right-click on the solution and select Restore NuGet Packages
- Build the root project
Troubleshooting
1. NuGetIf the NuGet is not able to restore the packages, the following packages have to be added to the projects:
- MachineLearning:
- Accord
- Accord.Math
- SPLConqueror_GUI:
- ILNumerics (v3.3.3.0)
- SolverFoundationWrapper:
- Microsoft.Solver.Foundation (>= v3.0.0)
Additionally, if the package Microsoft.Solver.Foundation is needed, the following steps should be performed:
-
Create a directory for the dll:
mkdir "<SPLConquerer-GitRoot>/SPLConqueror/dll"
-
Copy Microsoft.Solver.Foundation.dll (>= v3.0.0) to "/SPLConqueror/dll"
SPL Conqueror provides four different graphical user interfaces.
VariabilityModel_GUI
The VariabilityModel_GUI can be used to define the variability model of a configurable system or to modify existing models. To create a new variability model for a system, fist use File>New Model. Then, an empty model containing only a root configuration option is created. New options can be added to the model by a right click on an existing option that should be the parent option of the new one. In the Create new Feature dialogue, it is possible to define whether the new option is a binary or a numeric one. For numeric options, also a minimal and maximal value of the value domain have to defined. Besides, if only a subset of all values between the minimal and the maximal value of the domain are allowed, a specific step function can be defined. In this function it is possible to use an alias for the numeric option (n). In the following, we give two examples of the step functions:
- n + 2 (using this function, only even or odd values depending on the minimal value of the value domain are allowed)
- n * 2 (using this function, the minimal value is multiplied by two until the maximal value is reached)
Additionally, constraints between different configuration options can be defined using Edit>Edit Constraints. Last, an alternative group of options can be created using Edit>Edit Alternative Groups.
An example for a variability model is given below:
<vm name="exampleVM">
<binaryOptions>
<configurationOption>
<name>xorOption1</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<children/>
<impliedOptions/>
<excludedOptions>
<option>xorOption2<option>
</excludedOptions>
<defaultValue>Selected</defaultValue>
<optional>False</optional>
</configurationOption>
<configurationOption>
<name>xorOption2</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<children/>
<impliedOptions/>
<excludedOptions>
<option>xorOption1<option>
</excludedOptions>
<defaultValue>Selected</defaultValue>
<optional>False</optional>
</configurationOption>
</binaryOptions>
<numericOptions>
<configurationOption>
<name>numericExample</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<children/>
<impliedOptions/>
<minValue>1</minValue>
<maxValue>10</maxValue>
<stepFunction>numericExample + 2</stepFunction>
<defaultValue>10</defaultValue>
</configurationOption>
</numericOptions>
</vm>
PerformancePrediction_GUI
The PerformancePrediction_GUI provides an interface to learn performance-influence models. To use this GUI, first a variability model and dedicated measurements of the system has to be provided. Afterwards, in the middle are of the GUI, a binary and numeric sampling strategies has to be selected to define a set of configuration used in the learning process. To customize the machine-learning algorithm all of its parameters can be modified. To start the learning process, press the Start learning button.
Note: Please make sure that bagging will be set to false when using this GUI. If bagging is selected, a set of models are learned and all of them are presented in the GUI, which makes understanding the model hard.
After the learning is started, the models, which are learned in an iterative manner are displayed in the lower part of the GUI. Here, the model is split by the different terms, where each term described the identified influence of an individual option or an interaction between options.
SPLConqueror_GUI
This GUI can be used to visualize a learned performance-influence model.
Script generator
The Script generator can be used to define .a-script files that are needed in the CommandLine project.
The CommandLine sub-project provides the possibility to automatically execute experiments using different sampling strategies on different case study systems. To this end, a .a-script file has to be defined. In the following, we explain the different commands in detail.
Basic command-line commands
As SPL Conqueror provides a lot of commands, some of which are vital for an execution of SPL Conqueror. Unless the GUI is not used, knowing the basic command-line commands is crucial for the user.
log <path_to_a_target_file>
Using this command, the output of SPL Conqueror is redirected to the given file.
SPL Conqueror will automatically create this file if it does not existis, otherwise the file will be overwritten. Additionally, an .log_error file is created, which includes the errors during the execution.
Note: If the log
-command is missing, the output will be prompted directly to the console.
For example:
log C:\exampleLog.log
or
log /home/username/exampleLog.log
vm <path_to_model.xml>
To actually perform experiments on a given system, a variability model that covers the variability domain of the system being considered has to be defined. This can be done using the VariabilityModel_GUI.
For example:
vm C:\exampleModel.xml
or
vm /home/username/exampleModel.xml
Such a variability model generally consists of binary and numeric options, with their properties, and optionally boolean and nonBoolean constraints between configuration options and has to be in a .xml-file.
For instance, a variability model with the name exampleVM is defined as follows:
<vm name="exampleVM">
<binaryOptions>
<configurationOption>
<name>xorOption1</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<children/>
<impliedOptions/>
<excludedOptions>
<option>xorOption2<option>
</excludedOptions>
<defaultValue>Selected</defaultValue>
<optional>False</optional>
</configurationOption>
<configurationOption>
<name>xorOption2</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<children/>
<impliedOptions/>
<excludedOptions>
<option>xorOption1<option>
</excludedOptions>
<defaultValue>Selected</defaultValue>
<optional>False</optional>
</configurationOption>
</binaryOptions>
<numericOptions>
<configurationOption>
<name>numericExample</name>
<outputString/>
<prefix/>
<postfix/>
<parent/>
<children/>
<impliedOptions/>
<minValue>1</minValue>
<maxValue>10</maxValue>
<stepFunction>numericExample + 1</stepFunction>
<defaultValue>10</defaultValue>
</configurationOption>
</numericOptions>
</vm>
The nodes outputString, prefix and postfix can be ignored for now. The parent-node can either be empty or have an option-node as child with the name of the option, that is the parent of the current option(similar to excludedOption). The children, impliedOptions and excludedOptions-nodes are analogous with the exception that they can contain several options and define the children and implied options of the current option and the options that are excluded by this option if it is selected. stepFunction defines the function that decides which values the numeric option can have. For further real world examples we refer to Suplemental Material.
all <path_to_a_measurement_file>
This command defines the file containing all measurements of a given system. Exampls for this command are:
all C:\exampleMeasurements.xml
or
all /home/username/exampleMeasurements.xml
For this kind of files, two different formats are supported. The first one is a .csv format. Here each line of the file contains one the measurements for one configuration of the system. This file should contain a header that defines the names of the configuration options as well as the non-functional properies being considered. The second format is a .xml format. A short example using this format is provided in the following:
<results>
<row>
<data column="Configuration">xorOption1,</data>
<data column="Variable Features">numericExample;1</data>
<data column="nfp1">1234</data>
<data column="nfp2">2345</data>
</row>
<row>
<data column="Configuration">xorOption2,</data>
<data column="Variable Features">numericExample;10</data>
<data column="nfp1">4321</data>
<data column="nfp2">5432</data>
</row>
</results>
Further real world examples of measurements in xml format are provided in the Suplemental Material.
Alternatively, the measurements can be provided in a csv-format. Thereby, the first row has to be a header with the name of the binary and numeric options and the names of the non functional properties. In the column of binary options there has to be either true or false, indicating whether the feature was selected in this configuration or not, and in the columns of numeric options the values that were selected in this configuration. In the columns are the values of the properties that were measured for this property. So if we format the above example in csv format:
xorOption1; | xorOption2; | numericExample; | nfp1; | nfp2; |
---|---|---|---|---|
true; | false; | 1; | 1234; | 2345; |
false; | true; | 10; | 4321; | 5432; |
Note: The element separator is ;
, whereas the line separator is \n
.
Before starting the learning process upon the loaded data, one can adjust the settings used for machine learning. SPL Conqueror supports multiple different settings to refine the learning. A list of all currently supported settings is presented in the following:
Name | Description | Default Value | Value Range |
---|---|---|---|
lossFunction | The loss function on which bases options and interactions are added to the influence model | RELATIVE | RELATIVE, LEASTSQUARES, ABSOLUTE |
parallelization | Turns the parallel execution of model candidates on/off. | true | true, false |
bagging | Turns the bagging functionality (ensemble learning) on. This functionality relies on parallelization (may require a larger amount of memory). | false | true, false |
baggingNumbers | Specifies how often an influence model is learned based on a subset of the measurement data. | 100 | int |
baggingTestDataFraction | Specifies the percentage of data taken from the test set to be used in one learning run. | 50 | int |
useBackward | Terms existing in the model can be removed during the learning procedure if removal leads to a better model. | 50 | int |
abortError | The threshold at which the learning process stops. | 1 | double |
limitFeatureSize | Terms considered during the learning procedure can not become arbitrary complex. | false | true, false |
featureSizeThreshold | The maximal number of options participating in one interaction. | 4 | int |
quadraticFunctionSupport | The learner can learn quadratic functions of one numeric option, without learning the linear function apriory, if this property is true. | true | true, false |
crossValidation | Cross validation is used during learning process if this property is true. | false | true, false |
learn_logFunction | If true, the learn algorithm can learn logarithmic functions such as log(soption1). | false | true, false |
learn_accumulatedLogFunction | Allows the creation of logarithmic functions with multiple features such as log(soption1 * soption2). | false | true, false |
learn_asymFunction | Allows the creation of functions with the form 1/soptions. | false | true, false |
learn_ratioFunction | Allows the creation of functions with the form soptions1/soptions2. | false | true, false |
learn_mirrowedFunction | Allows the creation of functions with the form (numericOption.maxValue - soptions). | false | true, false |
numberOfRounds | Defines the number of rounds the learning process have to be performed. | 70 | int |
backwardErrorDelta | Defines the maximum increase of the error when removing a feature from the model. | 1 | double |
minImprovementPerRound | Defines the minimum error in improved a round must reach before either the learning is aborted or the hierarchy is increased for hierarchy learning. | 0.1 | double |
withHierarchy | Defines whether we learn our model in hierarchical steps. | true | true, false |
bruteForceCandidates | Defines how candidate features are generated. | false | true, false |
ignoreBadFeatures | Enables an optimization: we do not want to consider candidates in the next X rounds that showed no or only a slight improvement in accuracy relative to all other candidates. | false | true, false |
stopOnLongRound | If true, stop learning if the whole process is running longer than 1 hour and the current round runs longer then 30 minutes. | true | true, false |
candidateSizePenalty | If true, the candidate score (which is an average reduction of the prediction error the candidate induces) is made dependent on its size. | true | true, false |
learnTimeLimit | Defines the time limit for the learning process. If 0, no time limit. Format: HH:MM:SS | 0 | TimeSpan |
scoreMeasure | Defines which measure is used to select the best candidate and to compute the score of a candidate. | RELERROR | RELERROR, INFLUENCE |
outputRoundsToStdout | If true, the info about the rounds is output not only to the log file at the end of the learning, but also to the stdout during the learning after each round completion. | false | true, false |
Generally, to change the default settings, there are two options, namely:
-
The first is to add the settings in the format
SETTING_NAME:VALUE
to the mlsettings-command. For instance, if the number of learning rounds should be reduced to 25, allow logarithmic functions and don't want to stop on long learning rounds, the associated command would be:mlsettings numberOfRounds:25 learn_logFunction:true stopOnLongRound:false
-
The second option is to define the settings in a separate text file with each line containing a single setting and its value in the format
SETTING_NAME VALUE
. This is useful to use the same machine learning settings across several different runs. Then the content of the text file for the example above should look like this:
numberOfRounds 25
learn_logFunction true
stopOnLongRound false
To load these settings, the command load_mlsettings
can be used with the path to the file with the settings as argument. For example:
load_mlsettings C:\exampleSettings.txt
Please note that all the settings that are not stated will automatically be set to the default values. So if the commands are used to change the settings several times during the same run, the previous settings have no impact on the new settings.
To learn with the data, the non functional property that will be used for the learning algorithm has to be set first. Therefore, any property can be used, which was defined previously in the measurement-file. If we use the previous example, we can either use nfp1 or nfp2. To set nfp1 or nfp2 use the nfp
command. Then the appropriate command with the argument is:
nfp nfp1
or
nfp nfp2
Now, we have have enough to learn with all measurements. For this, just use the learnwithallmeasurements
-command. A .a-script for learning with all measurements at this point, using the examples from above is as follows:
log C:\exampleLog.log
vm C:\exampleModel.xml
all C:\exampleMeasurements.xml
mlsettings numberOfRounds:25 learn_logFunction:true stopOnLongRound:false
nfp nfp1
learnwithallmeasurements
The only thing missing for a very basic usage of SPL Conqueror, is displaying the learning results. For this use the analyze-learning
-command. This will print the current learning history with the learning error into the specified .log-file. Note, that each command for learning overwrites the previous learning history, so analyze-learning should always be the first command after a command for learning.
Finally, a complete basic .a-script file looks like this:
log C:\exampleLog.log
vm C:\exampleModel.xml
all C:\exampleMeasurements.xml
mlsettings numberOfRounds:25 learn_logFunction:true stopOnLongRound:false
nfp nfp1
learnwithallmeasurements
analyze-learning
SPLConqueror also supports learning on a subset of the data. Therefore, one has to set at least one sampling strategy for the binary options first and at least one for the numeric options. In the following, we list all sampling strategies:
Binary/Numeric | Name | Description | Command | Example |
---|---|---|---|---|
Binary | allbinary | Uses all available binary options to create configurations. | allbinary |
allbinary |
Binary | featurewise | Determines all required binary options and then adds options until a valid configuration is reached. | featurewise |
featurewise |
Binary | pairwise | Generates a configuration for each pair of configuration options. Exceptions: parent-child-relationships, implication-relationships. | pairwise |
pairwise |
Binary | negfw | Get one variant per feature multiplied with alternative combinations; the variant tries to maximize the number of selected features, but without the feature in question. | negfw |
negfw |
Binary | random | Get certain number of random valid configurations. The binaryThreshold sets the maximum number of configurations. The randomness is simulated by the modulu value. | random <binaryThreshold> <modulu> |
random 50 3 |
Numeric | plackettburman | A description of the Plackett-Burman design is provided here. | expdesign plackettburman measurements:<measurements> level:<level> |
expdesign plackettburman measurements:125 level:5 |
Numeric | centralcomposite | The central composite inscribe design. This design is defined for numeric options that have at least five different values. | expdesign centralcomposite |
expdesign centralcomposite |
Numeric | random | This design selects a specified number of value combinations for a set of numeric options. The value combinations are created using a random selection of values of the numeric options. | expdesign random sampleSize:<size> seed:<seed> |
expdesign random sampleSize:50 seed:2 |
Numeric | fullfactorial | This design selects all possible combinations of numeric options and their values. | expdesign fullfactorial |
expdesign fullfactorial |
Numeric | boxbehnken | This is an implementation of the BoxBehnken Design as proposed in the "Some New Three Level Designs for the Study of Quantitative Variables". | expdesign boxbehnken |
expdesign boxbehnken |
Numeric | hypersampling | expdesign hypersampling precision:<precisionValue> |
expdesign hypersampling precision:25 | |
Numeric | onefactoratatime | expdesign onefactoratatime distinctValuesPerOption:<values> |
expdesign onefactoratatime distinctValuesPerOption:5 | |
Numeric | kexchange | expdesign kexchange sampleSize:<size> k:<kvalue> |
expdesign kexchange sampleSize:10 k:3 |
For instance, all binary options and random numeric options with a sample size of 50 and a seed of 3 should be used for learning, the following lines have to be appended to the .a-script:
allbinary
expdesign random sampleSize:50 seed:3
Note: allbinary
in combination with fullfactorial
results in all measurements being taken into the sample set.
start
To learn only with a subset of the measurements, the command start
can be used. This command requires having set a binary and a numeric sampling strategy, before executing it.
Note: A numeric sampling strategy is only needed if the variability model contains numeric options.
If, for instance, only a subset of the data should be used for learning, the result looks as follows:
log C:\exampleLog.log
vm C:\exampleModel.xml
all C:\exampleMeasurements.xml
mlsettings numberOfRounds:25 learn_logFunction:true stopOnLongRound:false
nfp nfp1
allbinary
expdesign random sampleSize:50 seed:3
start
analyze-learning
clean-sampling
Due to the different results of the sampling strategies, it is reasonable to try different sampling strategies and parameters for these strategies. To avoid having to start a new run for each sampling strategy combination, SPL Conqueror also supports clearing all strategies.
For this just use the command: clean-sampling
Of course, if someone wants to learn with a subset of the data after clearing the sampling, one has to first set sampling strategies before learning once again.
clean-learning
Under normal circumstances, SPL Conqueror cleans up the learning data itself. So handling this is usually not required, but if someone wants to forcefully clear all machine learning settings and the learned functions, the command clean-learning
could be used.
script <path_to_script>
Sometimes it makes sense to split up the current .a-script into smaller scripts or run a batch of scripts. For this SPL Conqueror has the script
command.
An example would be as follows:
script C:\subScript.a
Additional command-line commands
printconfigs <file> <prefix> <postfix>
With the command printconfigs
, all sampled configurations are printed to a persistent file. The command requires a target file as first argument and optionally a prefix or prefix and postfix, that will be printed at the start or end of each configuration, respectively. A special usage of this command is printing all valid configurations of a variability model, using the allbinary
and fullfactorial
sampling strategies.
A short example using printconfigs to print all valid configurations into a text file:
vm C:\exampleVM.xml
allbinary
expdesign fullfactorial
printconfigs C:\allConfigurations.txt prefix postfix
Until now, the elements outputString
, prefix
and postfix
of the variability model were ignored. These attributes are used by the printconfigs command and printed if the option in question is selected.
optionorder <firstOption> <secondOption> ...
In the case, that the options of a configuration should be printed in a certain order, e.g., to use the output as argument for the tested applicatin, the optionorder
command should be used, which sorts all options in the specified order and prints them.
For example:
optionorder optionC optionB optionA
<sampling strategy> validation
SPL Conqueror offers the possibility to use the validation set.
This validation set is then used to validate the learning results. In case no validation set is specified, the learning set will also be used to validate the results.
To do so, the command validation
has to be added after the sampling strategies.
For example:
allbinary validation
expdesign random sampleSize:50 seed:3 validation
printsettings
Using the printsettings command, the current machine-learning settings are printed into the .log-file or ,in case you didn't set a .log-file, into the console.
measurementstocsv <file>
In the case that the measurements should be printed to a .csv-file, the command measurementstocsv
can be used.
For example:
measurementstocsv C:\measurementsAsCSV.csv
Note: The element separator is ;
, whereas the line separator is \n
.
evaluationset <file>
By default, SPL Conqueror uses all measurements from the measurements-file for the computation of the error rate.
To change the evaluation set, the command evaluationset
can be used.
The file can be either a .csv-file or a .xml-file.
For example:
evaluationset C:\evaluationMeasurements.xml
Note: The format specified in the evaluation-file is the same as in the measurements-file.
resume-log <abortedAFile>
In the case that SPL Conqueror aborts unexpectedly, for instance because of a system crash, in a lot of cases the learning-process can be resumed.
To do so, a new .a-script has to be created, which contains the resume-log
command with the .a-script that aborted as argument.
For example:
resume-log C:\abortedScript.a