Skip to content
Todd Schiller edited this page Apr 20, 2015 · 7 revisions

The Celeriac tool produces program traces for .NET programs for use with the Daikon dynamic invariant detector. Daikon uses machine learning to infer likely program invariants and properties. The types of properties inferred by Daikon include “.field > abs(y)”; “y = 2*x+3”; “array a is sorted”; “for all list objects lst, lst.next.prev = lst”; “for all treenode objects n, n.left.value < n.right.value”; “p != null ⇒ p.content in myArray”; and many more.

Celeriac currently supports the C#, F#, and Visual Basic .NET languages.

Downloading Celeriac

Snapshot binaries are available from AppVeyor.

To build Celeriac from source, check out the source and build the Celeriac solution using Visual Studio. The repository includes DLL's for all referenced dependencies.

To build/run Celeriac on Mono, refer to the Mono Support wiki page.

Setup

The following setup ensures that Celeriac can properly instrument your run your program:

  1. Set the CELERIAC_HOME environment variable to the directory containing Celeriac.dll.

  2. Add Celeriac.dll to the Global Assembly Cache. In administrator mode, enter the following command:

    gacutil /i %CELERIAC_HOME%\Celeriac.dll
    

As an alternative to adding Celeriac to the Global Assembly Cache (GAC), you can copy the Celeriac DLLs to the directory of the assembly you are instrumenting.

NOTE: currently, if the assembly you are instrumenting refers other assemblies that aren't in the GAC, you must copy the files the directory containing the assemblies. (Resolved in Issue #101?)

Generating a Trace with Celeriac

Celeriac supports two modes: online and offline mode. In online mode, Celeriac instruments an executable when the executable is run. In offline mode, Celeriac creates an instrumented copy of the assembly on disk; the instrumented assembly outputs a trace when executed. Offline mode is used when generating traces for class libraries and WPF programs.

While Celeriac is compatible with both debug and release builds, a PDB file is necessary to guarantee the correctness of the program, and to properly name the program points and variables. If a PDB file is not present, Celeriac will attempt to proceed (debugging information is required for generating comparability information, however).

Online Tracing

To produce a trace for an executable, run the program using the Celeriac Launcher. For example, if you normally run the program using following command:

MyProgram.exe arg1 arg2 arg3

, instead use the following command:

CeleriacLauncher.exe celeriacArg1 celeriacArg2 MyProgram.exe arg1 arg3 arg3

A full listing of command line options is located on the Command Line Options wiki page.

Offline Tracing

To instrument an assembly to produce a trace when it is run, use Celeriac's --save-program option.

CeleriacLauncher.exe --save-program=InstrumentedClassLibrary.dll ClassLibrary.dll

A full listing of command line options is located on the Command Line Options wiki page.

This command will create an instrumented copy of the ClassLibrary.dll named InstrumentedClassLibrary.dll. If you are instrumented a class library, you will need to replace the original assembly with the instrumented assembly so that it is used by the executable. For example:

mv ClassLibrary.dll ClassLibrary.dll.orig
mv InstrumentedClassLibrary.dll ClassLibrary.dll

Optional: Verifying the Rewritten Assembly

To verify that Celeriac produced a valid .NET assembly, you can use Microsoft's PEVerify tool. On Windows 8, the tool is located at C:\Program Files\Microsoft SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools\PEVerify.exe:

PEVerify.exe <assembly>

Celeriac Output

When run, Celeriac creates a daikon-output directory in the current working directory. Celeriac creates a file (with the extension .dtrace) in the directory that contains two pieces of information:

  • Metadata describing the Daikon program points (method entrances and exits) and all expressions at each program point (arguments, fields, properties, etc.). Note: Celeriac does not track local variables.
  • The trace of the program's execution

The metadata is produced during instrumentation; the trace is produced when the program is run. In online mode, a single command performs both instrumentation and tracing. In offline mode, a metadata file (with the extension .decls) is produced when the assembly is instrumented, and the trace (with the extension .dtrace) is generated when the program is run. A new trace file is generated for each run unless the --append-dtrace option is supplied. A separate trace is generated for each Application Domain (to avoid synchronization across domains).

The declaration file and trace file formats are specified in the Daikon Developer's Manual: Declarations, Trace Records.

Inferring Likely Invariants with Daikon

To generate Code Contracts, but not insert them into the source code, use the csharpcontract format:

java -cp "C:\path\to\daikon.jar" daikon.Daikon --format csharpcontract AssemblyName.decls AssemblyName.dtrace

Note that the metadata file (the .decls file) must be provided before any trace files (.dtrace files).

See the Daikon User Manual for more information about generating invariants from a trace.

Detecting Contracts Involving Getters and Pure Method

In order to safely generate information about a method (or properties) for an expression, Celeriac must know that the method or property is pure -- that the method does not modify the state of the program in a visible way. Additionally, the method must not depend on external resources (e.g., Date.Now).

Properties are methods that behave like both methods and fields. Like fields, they get/set a value for the object. However, like methods, they can throw exceptions, modify other program state (e.g., logging), or return a different value each time they are called. Since auto-generated property getters are pure, Celeriac can detect auto-generated property getters. However, in general, you will have to indicate which methods and property getters are pure.

Creating a Purity File

A purity file is a list of assembly qualified types and method names that should be called when outputting information about an expression of the type.

To have Celeriac output the list of nullary methods and properties that are reachable from the assembly (up to a maximum nesting depth), enter the following command:

CeleriacLauncher.exe --emit-nullary-info --purity-blacklist=prefix-blacklist.txt <assembly>

where the optional blacklist file contains method name prefixes that likely indicate that a method is not pure (e.g., Write, Add, etc). If none is provided a sensible default list is used.

The output additionally includes unary static methods that act on the same type as the method’s declaring type (for example, the string.IsNullOrEmpty method).

See the Command Line Options wiki page for a complete list of options.

Tracing Pure Methods

To have Celeriac output trace information for the pure methods, use the --purity-file command line option:

CeleriacLauncher.exe --purity-file=purity-file.txt ... <assembly>

Excluding Information from Traces and Invariants

By default, Celeriac will instrument every method in the assembly and provide information about every field. In some cases, this poses two problems: (1) the instrumented program runs prohibitively slowly, and (2) the generated invariants are uninteresting.

Celeriac provides many command line options for controlling which information it tracks; descriptions of these options are listed on the Command Line Options wiki page. This section describes some common usage to get you started.

Good Things to Ignore

Here are some things we've learned to exclude from analysis. In the future, we hope to provide heuristics to handle these automatically:

  • Auto-generated Parsers: the comparability analysis, trace generation, and Daikon runs very slowly on them due to their structure. If they're auto-generated, there's generally no need to generate invariants for them anyway.
  • Static system variables: unless you know your program uses them in a meaningful way, your best bet is to ignore them using the --omit-var= option (see below). (Fixing Issue #99 should eliminate the need to do this.)
  • Methods for common system types: unless you know your program uses a specific fact (e.g., the Trim method of string), don't include the method in the purity file, as it will be calculated for every use of the type.

Ignoring Comparisons of Unrelated Variables

By default, Daikon attempts to infer invariants over all combinations of variables of the same type, even if they are unrelated. Running Celeriac with the --comparability flag produces information about which variables might be related because they are used together. Comparability information helps reduce the number of false invariants that Daikon guesses.

Note: debugging symbols (i.e., a PDB file) are required to run the comparability analysis on a DLL

Celeriac performs comparability analysis using a static analysis during instrumentation. This approach is different than that taken by DynComp, the comparability analysis for Daikon's Java front-end. DynComp performs a dynamic analysis, observing interactions between variables when the program is run.

For example, running the command:

CeleriacLauncher.exe --comparability <assembly.dll> 

will produce a comparability file assembly.dll.cmp. Celeriac uses the comparability information when generating the declaration metadata during instrumentation. Because the comparability analysis can be slow for some assemblies, it's often beneficial to reuse the same comparability file when experimenting with other Celeriac parameters. To do so, supply a file name with the comparability argument, e.g.,:

CeleriacLauncher.exe --comparability=assembly.dll.cmp <assembly.dll> 

The Command Line Options page provides additional information about comparability analysis options:

  • --comparability=filename: augment declarations with comparability information from the specified file. If no file is specified, runs the comparability analysis
  • --generate-comparability --generate-comparability: runs the comparability analysis, saving the results for later use

Ignoring Static Fields, Constants, and Enumerations

Static fields (and enumeration values) include the fully qualified name of they type that declares the variable. Therefore, these variables can be excluded from a trace using the --omit-var option and a regular expression based on the qualified name of the type. For example, to exclude system fields and constants, the following regular expression can be supplied to Celeriac:

CeleriacLauncher.exe --omit-var="(System\..*?\..*)" ... <assembly> 

Method Call Sampling

Sampling can reduce the size of the produced trace with little effect on the quality of the produced invariants. To use sampling, use the --sample-start=_n_ option when instrumenting the assembly. The first n calls to each method will be traced. After the first n calls, sampling occurs with exponential back-off: first, every n th call will be traced, then every 10n calls, 100n calls etc. If --sample-start=10 then the samples are 1, 2, 3, ..., 10, 20, ... 100, 200, ..., 1000, 2000, ... , and so on. n must be greater than 0 for this option to have any effect.

Troubleshooting

Windows Anti-Malware Service CPU Load

When running a binary instrumented with Celeriac, the Windows Anti-Malware Service may begin consuming a large amount of CPU resources. We don't know what causes this, however the problem can be avoided by temporarily disabling the real-time protection feature of the Windows Anti-Malware Service.

Clone this wiki locally