IWHO is a python library for representing machine instructions. Its focus is on enabling easy instantiations of instructions with parameters.
The representation of an ISA in IWHO provides a set of instruction schemes, which are, effectively, instructions with holes instead of operands. These holes are annotated with operands that can be used there in an instantiation of the instruction scheme. A key feature of IWHO (which distinguishes it, e.g., from a similarly used combination of capstone and keystone) is that it provides a closed (dis)assembly loop: You can instantiate a list of IWHO instruction schemes with allowed operands, assemble it to a hex string, and disassemble this hex string to an equal list of instruction scheme instantiations.
Other use-cases include:
- Disassemble a hex string into a sequence of instructions, query their instruction schemes, instantiate them with (partially) changed operands, and dump the result as assembly or hex string.
- Do the same with an assembly string.
- Generate (test) inputs for instruction throughput predictors/microarchitectural code analyzers.
Under the hood, IWHO (dis)assembles instructions using LLVM's llvm-mc tool. Disassembled instructions are matched to registered instruction schemes via a simple parser.
IWHO is not suitable for use as a full (dis)assembler framework since, e.g., relocations as memory operands are not represented in a semantically meaningful way.
Currently, only the x86-64 ISA is implemented, with instruction schemes extracted from uops.info.
This repo is a part of the AnICA project. Here are more related resources, for some context:
- The project page provides general information on the project.
- AnICA, the repo for the implementation of the core AnICA algorithm. Start there if you want to work with AnICA without the artifact VM and don't want to use the browser-based user interface.
- AnICA-UI, the repo for the accompanying browser-based user interface for inspecting discovered inconsistencies. Start there if you want to work with AnICA without the artifact VM and you want to use the UI.
- iwho (this repository), a subcomponent of AnICA that provides a convenient abstraction around instructions, which in this project greatly eases the task of randomly sampling valid basic blocks. Start there if you only want to use the instruction schemes abstraction, independent of AnICA.
- AnICA-Artifact, which provides the scripts used to generate the AnICA research artifact.
- The pre-built artifact on Zenodo, including a Vagrant VM and a guide to reproduce our results.
This is a research prototype, expect things to break!
These steps are for installing IWHO on its own. If you install IWHO as part of AnICA, follow the steps there instead of the ones here. In particular, AnICA and IWHO should use the same virtual python environment.
Make sure that you have llvm-mc
on your path (most likely by installing LLVM).
It is used to handle basic instruction (dis)assembly tasks.
Furthermore, you need a python3 setup with the venv
standard module available.
- Get the repository and its submodule(s):
git clone <repo> iwho cd iwho
- Set up the virtual environment for IWHO and install python dependencies and
the IWHO package itself there:
Whenever you run IWHO commands from a shell, you need to have activated the virtual environment in this shell before:
./setup_venv.sh
source ./env/iwho/bin/activate
- Download the uops.info xml file:
Extract the InsnSchemes from the xml file:
./inputs/uops_info/fetch_xml.sh
./build_schemes.sh
- Run the tests:
./tests/test_iwho.py
You can directly interact with the IWHO instruction representations with the
interactive scripts/playground.py
script.
Run it, with an optional IWHO config (see below) and optionally providing an
input basic block in hex encoding or as assembly, for example:
./scripts/playground
./scripts/playground -c configs/iwho/default.json
./scripts/playground -x "c0ffee"
./scripts/playground -a "add rax, rbx; sub rcx, rdx"
./scripts/playground -a ".att_syntax; addq %rbx, %rax; subq %rdx, %rcx"
If available, IPython will be used for improved interactivity (run pip install ipython
in the virtual environment to install it).
If instructions are provided via arguments, they are loaded, displayed, and
inserted to the insns
list before starting the interactive session.
The interactive session prints the available variables and functions, which can be used to encode and decode instructions.
You can check the human-readable IWHO representation of the loaded instructions by just typing insns
.
Once you have set up a predictor registry (see below), you can also use tool/iwho-predict
to run registered basic block throughput predictors one given basic blocks:
iwho-predict -c ./configs/predictors/default.json -a bb.s <pred_key_pattern1> ...
iwho-predict -c ./configs/predictors/default.json -x 'c0ffee' <pred_key_pattern1> ...
The positional arguments can be keys from the used predictor registry (see below) or python regex patterns matching one or more of these keys.
The API documentation can be built with pdoc3.
After installing pdoc3 (pip install pdoc3
) in the virtual environment, run the following command to generate html documentation in the html
directory:
pdoc --html iwho --force
There are a number of convenience scripts to obtain and install several basic block throughput predictors for use with IWHO in the subdirectories of the predictors
directory (get.sh
).
For Ithemal, there is a submodule instead of a getter script since we use a modified version of the original docker container. We added an RPyC-based interface to query into the container for throughputs and fixed some of the build scripts in the container that were affected by bit rot.
If you want to use the Ithemal docker container for throughput predictions with IWHO, you therefore need to fetch the corresponding submodule:
git submodule update --init --recursive
Then build and start the docker container (this requires sudo (the scripts query where necessary), a running docker service, and some time):
cd ./predictors/ithemal/
./build.sh
./start.sh
You can stop the ithemal docker container again with the following script:
./stop.sh
IWHO is configured with a json file. For an example configuration, see configs/iwho/default.json
, the default.
The context_specifier
is a string identifier of the ISA declaration to use.
Currently, only the x86 instruction schemes extracted from uops.info are supported.
If you specify only a prefix of an available context specifier (like x86
), one of the matching ISA declarations is chosen.
The filters
entry is a list of filters used to control what instruction schemes from the selected ISA declaration are used.
Possible entries are (duplicates are allowed):
{"kind": "no_cf"}
: only instruction schemes that do not affect control flow{"kind": "with_measurements", "archs": ["SKL", ...]}
: only instruction schemes for which measurements are available for all of the given microarchitectures{"kind": "only_mnemonics", "mnemonics": ["add", ...]}
: only instruction schemes with one of the specified mnemonics{"kind": "blacklist", "file_path": "./path/to/schemes.csv"}
: only instruction schemes that are not in the specified file{"kind": "whitelist", "file_path": "./path/to/schemes.csv"}
: only instructions that are in the specified file
IWHO also provides an interface to many basic block throughput predictors.
This component is configured with additional json config files:
For an example see configs/predictors/default.json
:
It specifies how many prediction tasks may be performed concurrently (0 for as many as there are processor cores) and a path to a predictor registry file.
To run predictors, you need to create such a registry file, for example as configs/predictors/pred_registry.json
to work with the default config.
This predictor registry needs to contain a dictionary that assigns configurations to unique predictor identifiers.
You may copy and adjust entries from configs/predictors/pred_registry_template.json
to this file, while inserting install-dependent paths to the tools where necessary.