A toy superoptimizer for a made-up assembly language. See the blog post: My first superoptimizer.
It works by generating every possible permutation of code instructions and operands, then tests each generated program for equivalence to the original program. I broke the project up into these steps:
- Design a simple assembly language.
- Write an emulator that executes an assembly program and returns the final state.
- Write a function that tests the equivalence of two programs based on their before/after state.
- Write an assembler that can translate to and from human-readable assembly and the internal assembly representation.
- Write an optimizer that generates every program of length 1 to n instructions with every possible operand from 0 to k with x memory cells.
To focus on the superoptimizer and not making a comprehensive, realistic assembly language, I limited it to a boring set of instructions:
- LOAD val which loads the immediate value into memory location 0.
- SWAP mem, mem which swaps the values of the two memory locations.
- XOR mem, mem which performs a bitwise XOR operation on the values of the memory locations and stores the result in the first's location.
- INC mem which increments the value at the memory location.
There are many possible improvements:
- Start state. Right now it assumes the start state is always the same, which means there is no concept of program input.
- Program equivalence. A set of inputs and outputs should be specified such that two programs can actually be tested for equivalence.
- Pruning. Many nonsensical programs are generated, which significantly slows it down.
- More instructions. There need to be more instructions, especially a conditional instruction, to give the superoptimizer more opportunities to make improvements.
The blog post expands on all of these details.