Skip to content

Latest commit

 

History

History
1476 lines (1178 loc) · 55.9 KB

SYNTAX-6-OPTIMIZATIONS.markdown

File metadata and controls

1476 lines (1178 loc) · 55.9 KB

Code optimization

Code optimization runs on compiled (ML) code. The compiled code is inspected for sequences of instructions which can be removed or replaced by functionally equivalent, but shorter and/or faster sequence of instructions. The new sequence might even be longer than the original one, if it is executing faster (see the goal option).

The information on compiler optimizations is a bit technical. It might be useful if you're trying to better understand how Mindcode generates the ML code.

Temporary Variables Elimination

The compiler sometimes creates temporary variables whose only function is to store output value of an instruction before passing it somewhere else. This optimization removes all assignments from temporary variables that carry over the output value of the preceding instruction. The set instruction is removed, while the preceding instruction is updated to replace the temporary variable with the target variable used in the set statement.

The optimization is performed only when the following conditions are met:

  1. The set instruction assigns from a temporary variable.
  2. The temporary variable is used in exactly one other instruction. The other instruction immediately precedes the instruction producing the temporary variable.
  3. All arguments of the other instruction referencing the temporary variable are output ones.

push and pop instructions are ignored by the above algorithm. push/pop instructions of any eliminated variables are removed by the Stack Optimization down the line.

Case Expression Optimization

Case expressions allocate temporary variable to hold the value of the input expression, even if the input expression is actually a user defined variable. This optimization detects these instances, removes the temporary variable and replaces it with the original variable containing the value of the input expression. The set instruction is removed, while the other instructions are updated to replace the temporary variable with the one used in the set statement.

The optimization is performed only when the following conditions are met:

  1. The set instruction assigns to a case-expression temporary variable.
  2. The set instruction is the first of all those using the temporary variable (the check is based on absolute instruction sequence in the program, not on the actual program flow).
  3. Each subsequent instruction using the temporary variable conforms to the code generated by the compiler (i.e. has the form of jump target <condition> __astX testValue)

push and pop instructions are ignored by the above algorithm. push/pop instructions of any eliminated variables are removed by the stack usage optimization down the line.

Dead Code Elimination

This optimization inspects the entire code and removes all instructions that write to variables, if none of the variables written to are actually read anywhere in the code.

This optimization support basic and aggressive levels of optimization. On the aggressive level, the optimization removes all dead assignments, even assignments to unused global and main variables.

Dead Code Elimination also inspects your code and prints out suspicious variables:

  • Unused variables: those are the variables that were, or could be, eliminated. On basic level, some unused variables might not be reported.
  • Uninitialized variables: those are global variables that are read by the program, but never written to. (The Data Flow Optimization detects uninitialized local and function variables.)

Both cases deserve a closer inspection, as they might be a result of a typo in a variable name.

Jump Normalization

This optimization handles conditional jumps whose condition can be fully evaluated:

  • always false conditional jumps are removed,
  • always true conditional jumps are converted to unconditional ones.

A condition can be fully evaluated if both of its operands are literals, or if they're variables whose values were determined to be constant by the Data Flow Optimization.

The first case reduces the code size and speeds up execution. The second one in itself improves neither size nor speed, but allows those jumps to be handled by other optimizations aimed at unconditional jumps.

Jump Optimization

Conditional jumps are sometimes compiled into an op instruction evaluating a boolean expression, and a conditional jump acting on the value of the expression.

This optimization turns the following sequence of instructions:

op <comparison> var A B
jump label equal/notEqual var false

into

jump label <inverse of comparison>/<comparison> A B

Prerequisites:

  1. jump is an equal/notEqual comparison to false,
  2. var is a temporary variable,
  3. var is not used anywhere in the program except these two instructions,
  4. <comparison> has an inverse/<comparison> exists as a condition.

Single Step Elimination

This optimizer simplifies the following sequences of jump that are a result of certain combinations of optimizations:

  1. A conditional or unconditional jump targeting the next instruction (such a jump is obviously unnecessary).
  2. A conditional or unconditional jump identical to the next instruction (unconditional jumps would be also eliminated by Unreachable Code Elimination, but that isn't true for conditional jumps).

Expression Optimization

This optimization looks for certain expressions that can be performed more efficiently. Currently, the following optimizations are available:

  • floor function called on a multiplication by a constant or a division. Combines the two operations into one integer division (idiv) operation. In the case of multiplication, the constant operand is inverted to become the divisor in the idiv operation.
  • sensor var @this @x and sensor var @this @y are replaced by set var @thisx and set var @thisy respectively. Data Flow Optimization can then apply constant propagation to the @thisx/@thisy built-in constants.
  • Multiplication by literal zero is replaced by a set instruction setting the target variable to a zero.
  • Multiplication/division by one and addition/subtraction of zero are replaced by a set instruction setting the target variable to the other operand.
  • All set instructions assigning a variable to itself (e.g. set x x) are removed.

If the optimization level is aggressive, the following additional expressions are handled:

  • If the @constant in a sensor var @constant @id instruction is a known item, liquid, block or unit constant, the Mindustry's ID of the objects is looked up and the instruction is replaced by set var <id>, where ID is a numeric literal. Mindcode contains a list of known Mindustry objects and their IDs, obtained by using Mindustry Metadata Extraction extension on the latest Mindustry version. These IDs are expected to be immutable, but the optimization can be turned off by setting the optimization level to basic if the actual IDs turn out to be different from the IDs known to Mindcode.

If Expression Optimization

This optimization consists of three types of modifications performed on blocks of code created by if/ternary expressions. All possible optimizations are done independently.

Value propagation

The value of ternary expressions and if expressions is sometimes assigned to a user-defined variable. In these situations, the true and false branches of the if/ternary expression assign the value to a temporary variable, which is then assigned to the user variable. This optimization detects these situations and when possible, assigns the final value to the user variable directly in the true/false branches:

abs = if x < 0
    negative += 1;  // The semicolon is needed to separate the two statements
    -x
else
    positive += 1
    x
end

produces this code:

jump 4 greaterThanEq x 0
op add negative negative 1
op mul abs x -1
jump 6 always 0 0
op add positive positive 1
set abs x
end

As the example demonstrates, value propagation works on more than just the set instruction. All instructions producing a single value are handled, specifically:

  • set
  • op
  • read
  • sensor
  • packcolor

Forward assignment

Some conditional expressions can be rearranged to save instructions while keeping execution time unchanged:

print(x < 0 ? "negative" : "positive")

Without If Expression Optimization, the produced code is

jump 3 greaterThanEq x 0
set __tmp1 "negative"
jump 4 always 0 0
set __tmp1 "positive"
print __tmp1

Execution speed:

  • x is negative: 4 instructions (0, 1, 2, 4) are executed,
  • x is positive: 3 instructions (0, 3, 4) are executed.

The if expression optimization turns the code into this:

set __tmp1 "positive"
jump 3 greaterThanEq x 0
set __tmp1 "negative"
print __tmp1

Execution speed:

  • x is negative: 4 instructions (0, 1, 2, 3) are executed,
  • x is positive: 3 instructions (0, 1, 3) are executed.

The execution time is the same. However, one less instruction is generated.

The forward assignment optimization can be done if at least one of the branches consist of just one instruction, and both branches produce a value which is then used. Depending on the type of condition and the branch sizes, either true branch or false branch can get eliminated this way. Average execution time remains the same, although in some cases the number of executed instructions per branch can change by one (total number of instructions executed by both branches remains the same).

Compound condition elimination

The instruction generator always generates true branch first. In some cases, the jump condition cannot be expressed as a single jump and requires additional instruction (this only happens with the strict equality operator ===, which doesn't have an opposite operator in Mindustry Logic).

The additional instruction can be avoided when the true and false branches in the code are swapped. When this optimizer detects such a situation, it does exactly that:

if @unit.dead === 0
    print("alive")
else
    print("dead")
end

Notice the print("dead") occurs before print("alive") now:

sensor __tmp0 @unit @dead
jump 4 strictEqual __tmp0 0
print "dead"
jump 0 always 0 0
print "alive"
end

Chained if-else statements

The elsif statements are equivalent to nesting the elsif part in the else branch of the outer expression. Optimizations of these nested statements work as expected:

y = if x < 0
    "negative"
elsif x > 0
    "positive"
else
    "zero"
end

produces

set y "negative"
jump 5 lessThan x 0
set y "zero"
jump 5 lessThanEq x 0
set y "positive"
end

saving three instructions over the code without if statement optimization:

jump 3 greaterThanEq x 0
set __tmp1 "negative"
jump 8 always 0 0
jump 6 lessThanEq x 0
set __tmp3 "positive"
jump 7 always 0 0
set __tmp3 "zero"
set __tmp1 __tmp3
set y __tmp1
end

Data Flow Optimization

This optimization inspects the actual data flow in the program and removes instructions and variables (both user defined and temporary) that are dispensable or have no effect on the program execution. Each individual optimization performed is described separately below.

Data Flow Optimizations can have profound effect on the resulting code. User-defined variables can get completely eliminated, and variables in expressions can get replaced by various other variables that were determined to hold the same value. The goal of these replacements is to allow elimination of some instructions. The optimizer doesn't try to avoid variable replacements that do not lead to instruction elimination - this would make the resulting code more understandable, but the optimizer would have to be more complex and therefore more prone to errors.

Note

One of Mindcode goals is to facilitate making small changes to the compiled code, allowing users to change crucial parameters in the compiled code without a need to recompile entire program. To this end, assignments to global variables are never removed. Any changes to set instructions in the compiled code assigning value to global variables are fully supported and the resulting program is fully functional, as if the value was assigned to the global variable in the source code itself.

All other variables, including main variables and local variables, can be completely removed by this optimization. Even if they stay in the compiled code, changing the value assigned to a main variable may not produce the same effect as compiling the program with the other value. In other words, changing a value assigned to main variable in the compiled code may break the compiled program.

If you wish to create a parametrized code, follow these rules for best results:

  • Use global variables as placeholders for the parametrized values.
  • Assign the parameter values at the very beginning of the program (this way the parameters will be easy to find in the compiled code).
  • If you sanitize or otherwise modify the parameter value before being used by the program, store the results of these operations in non-global variables, unless you need them to be global (accessible from multiple functions).
  • Do not modify instructions other than set instructions assigning values to global variables in the compiled code.

Optimization levels

On basic optimization level, the Data Flow Optimization preserves last values assigned to main variables (except main variables that served as a loop control variable in at least one unrolled loop). This protection is employed to allow trying out snippets of code in the web application (which runs on basic optimization level by default) without the optimizer eliminating most or all of the compiled code due to it not having any effect on the Mindustry world.

#set optimization = basic
x0 = 0.001
y0 = 0.002
x1 = x0 * x0 + y0 * y0
y1 = 2 * x0 * y0

produces

set x0 0.001
set y0 0.002
op add x1 0.000001 0.000004
op mul y1 0.002 0.002
end

On aggressive optimization level, no special protection to main variables is awarded, and they can be completely removed from the resulting code:

#set optimization = aggressive
x0 = 0.001
y0 = 0.002
x1 = x0 * x0 + y0 * y0
y1 = 2 * x0 * y0

produces

end

All the assignment were removed as they wouldn't have any effect on the Mindustry world when run in actual logic processor.

Handling of uninitialized variables

The data flow analysis reveals cases where variables might not be properly initialized, i.e. situations where a value of a variable is read before it is known that some value has been written to the variable. Warnings are generated for each uninitialized variable found.

Since Mindustry Logic executes the code repeatedly while preserving variable values, not initializing a variable might be a valid choice, relying on the fact that all variables are assigned a value of null by Mindustry at the beginning. If you intentionally leave a variable uninitialized, you may just ignore the warning. To avoid the warning, move the entire code into an infinite loop and initialize the variables before that loop. For example:

count += 1
print(count)
printflush(message1)

can be rewritten as

count = 0
while true
    count += 1
    print(count)
    printflush(message1)
end

Data Flow Optimization assumes that values assigned to uninitialized variables might be reused on the next program execution. Assignments to uninitialized variables before calling the end() function are therefore protected, while assignments to initialized variables aren't - they will be overwritten on the next program execution anyway:

foo = rand(10)
if initialized == 0
    print("Initializing...")
    // Do some initialization
    initialized = 1
    foo = 1
    end()
end
print("Doing actual work")
print(initialized)
print(foo)

produces this code:

op rand foo 10 0
jump 5 notEqual initialized 0
print "Initializing..."
set initialized 1
end
print "Doing actual work"
print initialized
print foo
end

Notice the initialized = 1 statement is preserved, while foo = 1 is not.

This protection is also applied to assignment to uninitialized variables made before calling a user function which, directly or indirectly, calls the end() function:

print(foo)
foo = 5
bar()
foo = 6
bar()

def bar()
    end()
end

preserves both assignments to foo:

print foo
set foo 5
jump 6 always 0 0
set foo 6
jump 6 always 0 0
end
end
end

See also end() function.

Unnecessary assignment elimination

All assignments to variables (except global variables) are inspected and unnecessary assignments are removed. The assignment is unnecessary if the variable is not read after being assigned, or if it is not read before another assignment to the variable is made:

a = rand(10)
a = rand(20)
print(a)
a = rand(30)

compiles to:

op rand a 20 0
print a
op rand a 30 0
end

The first assignment to a is removed, because a is not read before another value is assigned to it. The last assignment to a is preserved on basic optimization level, but would be removed on aggressive level, because a is not read after that assignment at all.

An assignment can also become unnecessary due to other optimizations carried by this optimizer.

Constant propagation

When a variable is used in an instruction and the value of the variable is known to be a constant value, the variable itself is replaced by the constant value. This can in turn make the original assignment unnecessary. See for example:

#set optimization = aggressive
a = 10
b = 20
c = @tick + b
printf("$a, $b, $c.")

produces

op add c @tick 20
print "10, 20, "
print c
print "."
end

Constant folding

Constant propagation described above ensures that constant values are used instead of variables where possible. When a deterministic operation is performed on constant values (such as addition by the op add instruction), constant folding evaluates the expression and replaces the operation with the resulting value, eliminating an instruction. For example:

#set optimization = aggressive
a = 10
b = 20
c = a + b
printf("$a, $b, $c.")

produces

print "10, 20, 30."
end

Looks quite spectacular, doesn't it? Here's what happened:

  • The optimizer figured out that variables a and b are not needed, because they only hold a constant value.
  • Then it found out the c = a + b expression has a constant value too.
  • What was left was a sequence of print statements, each printing a constant value. Print Merging optimization on aggressive level then merged it all together.

Not every opportunity for constant folding is detected at this moment. While x = 1 + y + 2 is optimized to op add x y 3, x = 1 + y + z + 2 it too complex to process as this moment and the constant values of 1 and 2 won't be folded at compile time.

If the result of a constant expression doesn't have a valid mlog representation, the optimization is not performed. In other cases, loss of precision might occur.

Common subexpressions optimization

The Data Flow Optimizer keeps track of expressions that have been evaluated. When the same expression is encountered for a second (third, fourth, ...) time, the result of the last computation is reused instead of evaluating the expression again. In the following code:

a = rand(10)
b = a + 1
c = 2 * (a + 1)
d = 3 * (a + 1)
print(a, b, c, d)

the optimizer notices that the value a + 1 was assigned to b after it was computed for the first time and reuses it in the subsequent instructions:

op rand a 10 0
op add b a 1
op mul c 2 b
op mul d 3 b
print a
print b
print c
print d
end

Again, not every possible opportunity is used. Instructions are not rearranged, for example, even if doing so would allow more evaluations to be reused.

On the other hand, entire complex expressions are reused if they're identical. In the following code

a = rand(10)
b = rand(10)
x = 1 + sqrt(a * a + b * b)
y = 2 + sqrt(a * a + b * b)
print(x, y)

the entire square root is evaluated only once:

op rand a 10 0
op rand b 10 0
op mul __tmp2 a a
op mul __tmp3 b b
op add __tmp4 __tmp2 __tmp3
op sqrt __tmp5 __tmp4 0
op add x 1 __tmp5
op add y 2 __tmp5
print x
print y
end

Function call optimizations

Variables and expressions passed as arguments to inline functions, as well as return values of inline functions, are processed in the same way as other local variables. Using an inlined function therefore doesn't incur any overhead at all in Mindcode.

Data flow analysis, with some restrictions, is also applied to stackless and recursive function calls. Assignments to global variables inside stackless and recursive functions are tracked and properly handled. Optimizations are applied to function arguments and return values. (This optimization has completely replaced earlier Function call optimization and Return value optimization - all optimizations that could be performed by those optimizations - and some that couldn't - are performed by Data Flow Optimization now.)

Support for other optimizations

Data Flow Optimization gathers information about variables and the values they attain in the program. Apart from using this information for its own optimizations, it also provides it to the other optimizers. The other optimizers can then improve their own optimizations further (e.g. to determine that a conditional jump is always true or always false and act accordingly). Data Flow Optimization is therefore crucial for best performance of some other optimizers as well. Some optimizations, such as Loop Unrolling, might outright require the Data Flow Optimization to be active for their own work.

Loop Hoisting

Loop hoisting is an optimization which tries to identify loop invariant code (i.e. code inside loops which executes identically in each loop iteration) and moves it in front of the loop. This way the code is executed only once, instead of on each loop iteration.

Fo example, in the following code

A = 10
for i = 0; i < A; i += 1
    print(2 * A)
end

the evaluation of 2 * A is moved in front of the loop in the compiled code (changes highlighted):

    0  set A 10
    1  set i 0
+   2  op mul __tmp1 2 A
    3  jump 0 greaterThanEq 0 A
-      op mul __tmp1 2 A
    4  print __tmp1
    5  op add i i 1
-      jump 3 lessThan i A
+   6  jump 4 lessThan i A
    7  end

A loop condition is processed as well as a loop body, and invariant code in nested loops is hoisted all the way to the top when possible:

A = 10
for j in 0 ... A
    i = 0
    while i < A + 10
        i = i + 1
        print(i)
    end
end

compiles into

    0  set A 10
    1  set j 0
+   2  op add __tmp1 A 10
    3  jump 0 greaterThanEq 0 A
    4  set i 0
-      op add __tmp1 A 10
-      jump 10 greaterThanEq 0 __tmp1
+   5  jump 9 greaterThanEq 0 __tmp1
    6  op add i i 1
    7  print i
-      op add __tmp1 A 10
    8  jump 6 lessThan i __tmp1
    9  op add j j 1
-      jump 3 lessThan j A
+  10  jump 4 lessThan j A
   11  end

On aggressive optimization level, Loop Hoisting is capable of handling some if expressions as well:

#set loop-hoisting = aggressive
A = 10
for i in 1 ... A
    k = (A % 2 == 0) ? "Even" : "Odd"
    print(i, k)
end
print("end")

produces

    0  set A 10
    1  set i 1
-      jump 11 greaterThanEq 1 A
    2  set k "Odd"
    3  op mod __tmp1 A 2
-      jump 7 notEqual __tmp1 0
+   4  jump 6 notEqual __tmp1 0
    5  set k "Even"
+   6  jump 11 greaterThanEq 1 A
    7  print i
    8  print k
    9  op add i i 1
-      jump 3 lessThan i A
+  10  jump 7 lessThan i A
   11  print "end"
   12  end

At this moment, the following limitations apply:

  • If the loop contains a stackless or recursive function call, all global variables are marked as loop dependent and expressions based on them aren't hoisted, since the compiler must assume the value of the global variable could be changed inside a function.
  • if expressions are hoisted only when part of simple expressions. Specifically, when the if expression is nested in a function call (such as print(A < 0 ? "positive" : "negative")), it won't be optimized.

Loop Optimization

The loop optimization improves loops with the condition at the beginning by performing these modifications:

  • If the loop jump condition is invertible, the unconditional jump at the end of the loop to the loop condition is replaced by a conditional jump with inverted loop condition targeting the first instruction of the loop body. This doesn't affect the number of instructions, but executes one less instruction per loop.
    • If the loop condition isn't invertible (that is, the jump condition is ===), the optimization isn't done, since the saved jump would be spent on inverting the condition, and the code size would increase for no benefit at all.
  • If the previous optimization was done and the loop condition is known to be true before the first iteration of the loop, the optimizer removes the jump at the front of the loop. The Loop Optimizer utilizes information gathered by Data Flow Optimization to evaluate the initial loop condition.
  • Loop conditions that are complex expressions spanning several instructions can still be replicated at the end of the loop, if the code generation goal is set to speed (the default setting at the moment). As a result, the code size might actually increase after performing this kind of optimization. See optimization for speed for details on performing this kind of optimizations.

The result of the first two optimizations in the list can be seen here:

LIMIT = 10
for i in 0 ... LIMIT
    cell1[i] = 1
end
print("Done.")

produces

set LIMIT 10
set i 0
jump 6 greaterThanEq 0 LIMIT
write 1 cell1 i
op add i i 1
jump 3 lessThan i LIMIT
print "Done."
end

Executing the entire loop (including the i variable initialization) takes 32 steps. Without optimization, the loop would require 43 steps. That's quite significant difference, especially for small loops.

The third modification is demonstrated here:

while switch1.enabled and switch2.enabled
    print("Doing something.")
end
print("A switch has been reset.")

which produces:

sensor __tmp0 switch1 @enabled
sensor __tmp1 switch2 @enabled
op land __tmp2 __tmp0 __tmp1
jump 9 equal __tmp2 false
print "Doing something."
sensor __tmp0 switch1 @enabled
sensor __tmp1 switch2 @enabled
op land __tmp2 __tmp0 __tmp1
jump 4 notEqual __tmp2 false
print "A switch has been reset."
end

Loop Unrolling

Loop unrolling is a speed optimization, and as such is only active when the goal option is set to speed or auto. Furthermore, loop unrolling depends on the Data Flow optimization and isn't functional when Data Flow Optimization is not active.

The principle of loop unrolling

Loop unrolling optimization works by replacing loops whose number of iterations can be determined by the compiler with a linear sequence of instructions. This results in speedup of program execution, since the jump instruction representing a condition which terminates the loop, and oftentimes also instruction(s) that change the loop control variable value, can be removed from the unrolled loop and only instructions actually performing the intended work of the loop remain. The optimization is most efficient on loops that are very "tight" - contain very little instructions apart from the loop itself. The most dramatic practical example is probably something like this:

#set loop-unrolling = off
for i in 0 ... 10
    cell1[i] = 0
end

This code clears the first ten slots of a memory cell. Without loop unrolling, the code would look like this:

set i 0
write 0 cell1 i
op add i i 1
jump 2 lessThan i 10

It takes 31 instruction executions to perform this code. With loop unrolling, though, the code is changed to this:

write 0 cell1 0
write 0 cell1 1
write 0 cell1 2
write 0 cell1 3
write 0 cell1 4
write 0 cell1 5
write 0 cell1 6
write 0 cell1 7
write 0 cell1 8
write 0 cell1 9

The size of the loop is now 10 instructions instead of 4, but it takes just these 10 instructions to execute, instead of the 31 in the previous case, executing three times as fast!

The price for this speedup is the increased number of instructions themselves. Since there's a hard limit of 1000 instructions in a Mindustry Logic program, loops with large number of iterations obviously cannot be unrolled. See speed optimization for an explanation of how Mindcode decides whether to unroll a loop.

Apart from removing the superfluous instructions, loop unrolling also replaces variables with constant values. This can make further optimizations opportunities arise, especially for a Data Flow Optimizer and possibly for others. A not particularly practical, but nonetheless striking example is this program which computes the sum of numbers from 0 to 100:

sum = 0
for i in 0 .. 100
    sum += i
end
print(sum)

which compiles to (drumroll, please):

print 5050
end

What happened here is this: the loop was unrolled to individual instructions in this basic form:

set sum 0
set i 0
op add sum sum i
op add i i 1 
op add sum sum i
op add i i 1 
...

Data Flow Optimization then evaluated all those expressions using the combination of constant propagation and constant folding, all the way into the final sum of 5050, which is then directly printed.

Loop unrolling preconditions

A list iteration loop can be always unrolled, if there's enough instruction space left.

For other loops, unrolling can generally be performed when Mindcode can determine the loop has a certain fixed number of iterations and can infer other properties of the loop, such as a variable that controls the loop iterations. A loop should be eligible for the unrolling when the following conditions are met:

  • The loop is controlled by a single, local or main variable: the loop condition must consist of a variable which is modified inside a loop, and a constant or an effectively constant variable. Loops based on global variables cannot be unrolled.
  • The loop control variable is modified inside the loop only by op instructions which have the loop control variable as a first argument and a constant or an effectively constant variable as a second argument; the op instructions must be deterministic. Any other instruction that sets the value of loop control variable precludes loop unrolling.
  • All modifications of the loop control variable happen directly in the loop body: the variable must not be modified in a nested loop or in an if statement, for example.
  • The loop has a nonzero number of iterations. The upper limit of the number of iterations depends on available instruction space, but generally can never exceed 1000 iterations.

Furthermore:

  • If the optimization level is basic, the loop control variable can only be modified by add and sub operations. Every value the loop control variable attains in individual iterations must be an integer (it means that the starting value, ending value and every change of the iteration variable must be an integer as well). The loop condition must be expressed using one of these operators: >, <, >= or <=. In this mode, the total number of iterations is computed using the starting and ending value of the variable and the change in each iteration (resulting in a fast computation).
  • If the optimization level is aggressive, every deterministic update of loop control variable by a constant value and every form of loop condition is allowed. In this case, Mindcode determines the total number of iterations by emulating the entire execution of the loop, until the loop exits or the maximum possible number of iterations allowed by available instruction space is reached (meaning the loop cannot be unrolled).
  • break and continue statements are supported. However, when using a break statement, the entire loop is still unrolled. Possible superfluous code after a break statement might be removed by other optimizers.

Examples of loops that can be unrolled on basic optimization level:

// Basic case
for i in 0 .. 10
    cell1[i] = cell1[i + 1]
end

// Two separate increments of the loop control variable
j = 0
while j < 20
    println(j)
    j += 1
    println(j)
    j += 2
end

// The loop control variable can be used in further expressions
for k in 0 ... 10
    cell1[k] = cell1[2 * k]
end

for l in 0 ... 10
    println(l % 2 ? "Odd" : "Even")
end

// Loop inside an inline function: can be unrolled if a constant value is passed into the size parameter
inline def copy(src, dst, size)
    for i in 0 ... size
        dst[i] = src[i]
    end
end

// This will produce a loop that can be unrolled
copy(cell1, cell2, 10)

// This produces a loop that CANNOT be unrolled: SIZE is not a constant value
SIZE = 10
copy(cell1, cell2, SIZE)

// Some loops containing expressions in the condition can still be unrolled,
// but it strongly depends on the structure of the expression
i = 0
while i += 1 < 10
    print(i)
end 

Examples of loops that can be unrolled on aggressive optimization level:

// An operation different from add and sub is supported
for i = 1; i < 100000; i <<= 1
    println(i)
end

// This loop can be unrolled, because it terminates
// (if the condition was j > 0, it would never terminate)
j = 10
while j > 1
    j += 1
    println(i)
    j \= 2
    println(i)
end

// This loop is unrolled, but the number of iterations is 11!
// The code produces the same output as if it wasn't unrolled.
// This is because of rounding errors when evaluating floating-point expressions 
for k = 0; k < 1; k += 0.1
    println(k)
end

Examples of loops that cannot be unrolled:

// LIMIT is a global variable and as such the value assigned to it isn't considered constant
// (see Data Flow Optimization)
LIMIT = 10
for i in 0 ... LIMIT
    cell1[i] = 0
end

// The loop control variable is changed inside an if
i = 0
while i < 10
    i += 1
    print(i)
    if i % 5 == 0
        i += 1
        print("Five")
    end
end

// There isn't a single loop control variable - loop condition depens on both i and j
for i = 0, j = 10; i < j; i += 1, j -= 1
    print(i)
end

// The expression changing the loop control variable is too complex.
// (Rewriting the assignment to i *= 2, i += 1 would allow unrolling) 
i = 0
while i < 1000
  i = 2 * i + 1
  print(i)
end

// This loop won't be unrolled. We know it ends after 5 iterations due to the break statement,
// but Mindcode assumes 2000 iterations, always reaching the instruction limit.  
for i in 0 ... 2000
    if i > 5 break end
    print(i)
end

Unrolling nested loops

Nested loops can also be unrolled, and the optimizer prefers unrolling the inner loop:

k = 0
for i in 0 ... 100
    for j in 0 ... 100
        k = k + rand(j) // Prevents collapsing the loop by Data Flow Optimization 
    end
end

Both loops are eligible for unrolling at the beginning, and the inner one is chosen. After that the outer loop can no longer be unrolled, because the instruction limit would be hit.

Sometimes unrolling an outer loop can make the inner loop eligible for unrolling too. In this case, the inner loop cannot be unrolled first, as it is not constant:

#set optimization = aggressive
first = true
for i in 1 .. 5
    for j in i .. 5
        if first
            first = false
        else
            print(" ")
        end
        print(10 * i + j)
    end
end

In this example, the outer loop is unrolled first, after which each copy of the inner loop can be unrolled independently. Further optimizations (including Print Merging) then compact the entire computation into a single print instruction:

print "11 12 13 14 15 22 23 24 25 33 34 35 44 45 55"
end

Function Inlining

Function Inlining is a speed optimization, and as such is only active when the goal option is set to speed or auto.

The principles of function inlining

Function inlining converts out-of-line function calls into inline function calls. This conversion alone saves a few instructions: storing the return address, jumping to the function body and jumping back at the original address. However, many additional optimizations might be available once a function is inlined, especially if the inlined function call is using constant argument values. In such a situation many other powerful optimizations, such as constant folding or loop unrolling, may become available.

User defined, non-recursive function which is called just once in the entire program, is automatically inlined and this cannot be prevented: such code is always both faster and smaller. It is also possible to declare individual functions using the inline keyword, forcing all calls of such functions to be inlined.

Automatic function inlining

This optimization can inline additional functions that aren't recursive and also aren't declared inline in the source code. If there's enough instruction space, all function calls may be inlined and the original function body removed from the program.

When the optimization level is set to aggressive and there isn't enough instruction space, only a single one or several specific function calls may be inlined; in such case the original function body remains in the program and is used by the function calls that weren't inlined. If there are only last two function calls remaining, either both of them, or none of them, will be inlined.

It is therefore no longer necessary to use the inline keyword, except in cases when Mindcode's automatic inlining chooses function different from the one(s) you prefer to be inlined.

Case Switching

Case Switching is a speed optimization, and as such is only active when the goal option is set to speed or auto.

Case expressions are normally compiled to a sequence of conditional jumps: for each when branch the entry condition(s) of that clause is evaluated; when it is false, the control is transferred to the next when branch, and eventually to the else branch or end of the expression. This means the case expression evaluates - on average - half of all existing conditions, assuming even distribution of the input values of the case expression. (If some input values of the case expressions are more frequent, it is possible to achieve better average execution times by placing those values first.)

The Case Switching optimization improves case expressions which branch on integer values of the expression.

Warning

It is assumed that a case statement branching exclusively on integer values always get integer value on input as well. If the input value of the case expression may take on non-integer values, this optimization will produce wrong code. At this moment Mindcode isn't able to recognize such situation; if this is the case, you need to disable the Case Switching optimization manually.

The sequence of conditional jumps in such statements is replaced by a jump table. A jump table facilitates direct jumps to the corresponding when branch. The actual instructions used to build a jump table are

jump <else branch address> lessThan value minimal_when_value
jump <else branch address> greaterThan value maximal_when_value
op add @counter <start_of_jump_table - minimal_when_value> value
start_of_jump_table:
jump <when branch for minimal when value address>
jump <when branch for minimal when value + 1 address>
jump <when branch for minimal when value + 2 address>
...
jump <when branch for maximal when value address>

The jump table is put in front of the when branches. Original conditions in front of each processed when branch are removed. Each when branch jumps to the end of the case expression as usual.

To build the jump table, the minimum and maximum value of existing when branches are determined first. Values outside this range are handled by the else branch (if there isn't an explicit else branch in the case statement, the else branch just jumps to the end of the case expression). Values inside this range are mapped to a particular when branch, or, if the value doesn't correspond to any of the when branches, to the else branch.

The first two instructions in the example above (jump lessThan, jump greaterThan) handle the cases where the input value lies outside the range supported by the jump table. The op add @counter instruction then transfers the control to the corresponding specific jump in the jump table and consequently to the proper when branch.

The jump table executes at most four instructions on each case expression execution (less if the input value lies outside the supported range). We've mentioned above that the original case statement executes half of the conditional jumps on average. This means that converting the case expression to a jump table only makes sense when there's more than 8 conditional jumps in the case expression.

Notes:

  • If you put the more frequent values first in the case expression, and the value distribution is very skewed, converting the case expression to the jump table might actually worsen the average execution time. Mindcode has no way to figure this on its own; if you encounter this situation, you might need to disable the Case Switching optimization for your program.
  • If the input value of the case expression could be determined by Mindcode to already lie in a specific range, it would be possible to avoid the jump lessThan and/or jump greaterThan instructions at the beginning of the jump table, as their function is to ensure the value lies in a proper range. This would provide a potentially significant additional speedup and is planned for some future version.
  • Currently, there's no limit on the size of the jump table. For a case expression handling values 1 to 10 and then a value of 100, the jump table would have 100 entries. This affects computation of the optimization's benefit, and might make the optimization less favorable compared to other optimizations; however if available space permits it, such a jump table would be created.

Preconditions

The following conditions must be met for a case expression to be processed by this optimization:

  • All values used in when clauses must be effectively constant.
  • All values used in when clauses must be integers.
  • Values used in when clauses must be unique.
  • There are no ranges in the when clauses.

Example

value = floor(rand(20))
text = case value
    when 0 then "None"
    when 1 then "One"
    when 2, 3, 4 then "A few"
    when 5, 6, 7, 8 then "Many"
    when 10 then "Ten"
    else "I don't known this number!"
end
print(text)

The above case expression is transformed to this:

op rand __tmp0 20 0
op floor value __tmp0 0
op min __tmp3 value 11
op max __tmp3 __tmp3 -1
op add @counter __tmp3 6
jump 28 always 0 0
jump 18 always 0 0
jump 20 always 0 0
jump 22 always 0 0
jump 22 always 0 0
jump 22 always 0 0
jump 24 always 0 0
jump 24 always 0 0
jump 24 always 0 0
jump 24 always 0 0
jump 28 always 0 0
jump 26 always 0 0
jump 28 always 0 0
set __tmp2 "None"
jump 29 always 0 0
set __tmp2 "One"
jump 29 always 0 0
set __tmp2 "A few"
jump 29 always 0 0
set __tmp2 "Several"
jump 29 always 0 0
set __tmp2 "Ten"
jump 29 always 0 0
set __tmp2 "I don't known this number!"
print __tmp2
end

Return Optimization

Return Optimization is a speed optimization, and as such is only active when the goal option is set to speed or auto.

The Return Optimization is very simple. Whenever there's an unconditional jump to the final sequence of instructions representing a return from the call (which is always three instructions long), the jump is replaced by the entire return sequence. The jump execution is avoided at the price of two additional instructions.

The impact of this optimization is probably marginal. Recursive functions are of limited use by themselves, and this optimization only applies in a rather specific contexts.

Jump Straightening

This optimization detects situations where a conditional jump skips a following, unconditional one and replaces it with a single conditional jump with a reversed condition and a target of the second jump. Example:

jump __label0 equal __tmp9 false
jump __label1
label __label0

will be turned to

jump __label1 notEqual __tmp9 false

Optimization won't be done if the condition does not have an inverse (strictEqual).

These sequences of instructions may arise when using the break or continue statements:

while true
    ...
    if not_alive
        break
    end
end

Jump Threading

If a jump (conditional or unconditional) targets an unconditional jump, the target of the first jump is redirected to the target of the second jump, repeated until the end of jump chain is reached. Moreover:

  • on aggressive level, end instruction is handled identically to jump 0 always,
  • conditional jumps in the jump chain are followed if:
    • their condition is identical to the condition of the first jump, and
    • the condition arguments do not contain a volatile variable (@time, @tick, @counter etc.).
  • unconditional jumps targeting an indirect jump (a set @counter instruction, internally represented by a virtual
    goto instruction) are replaced with the indirect jump itself.

No instructions are removed or added, but the execution of the code is faster.

Unreachable Code Elimination

This optimizer removes instructions that are unreachable. There are several ways unreachable instructions might appear:

  1. Jump Threading can create unreachable jumps that are no longer targeted.
  2. User-created unreachable regions, such as while false ... end, or code following a while true loop.
  3. User defined functions which are called from an unreachable region.

Instruction removal is done by analyzing the control flow of the program and removing instructions that are never executed. When Jump Normalization is not active, some section of unreachable code may not be recognized.

The end instruction, even when not reachable, is not removed unless the optimization level is aggressive. Main body program and function definitions are each terminated by the end instruction and removing it might make the produced code somewhat less readable.

Stack Optimization

Optimizes the stack usage -- eliminates push/pop instruction pairs determined to be unnecessary. The following optimizations are performed:

  • push/pop instruction elimination for function variables that are not used anywhere apart from the push/pop instructions. This happens when variables are eliminated by other optimizations. The optimization is done globally, in a single pass across the entire program.
  • push/pop instruction elimination for function variables that are read neither by any instruction between the call instruction and the end of the function, nor by any instruction which is part of the same loop as the call instruction. Implicit reads by recursive calls to the same function with the value of the parameter unchanged are also detected.
  • push/pop instruction elimination for function parameters/variables that are never modified within the function.

Print Merging

This optimization merges together print instructions with string literal arguments. The print instructions will get merged even if they aren't consecutive, assuming there aren't instructions that could break the print sequence (jump, label or print <variable>).

Effectively, this optimization eliminates a print instruction by turning this

println("Items: ", items)
println("Time: " @time)

into this:

print("Items: ", items)
print("\nTime: ", @time "\n")

On aggressive level, all constant values - not just string constants - are merged. For example:

#set optimization = aggressive
const MAX_VALUE = 10
printf("Step $i of $MAX_VALUE\n")

produces

print "Step "
print i
print " of 10\n"

On basic optimization level, the output would be:

print "Step "
print i
print " of "
print 10
print "\n"

String length limit

On basic level, the optimization won't merge print instructions if the merge would produce a string longer than 34 characters (36 when counting the double quotes). On aggressive level, such instructions will be merged regardless. This can create long string constants, but according to our tests these can be pasted into Mindustry processors even if they're longer than what the Mindustry GUI allows to enter.

Remarks

The print merging optimization will also merge print instructions generated by the remark() function. Instructions generated by remarks are merged differently to standard print instructions:

  • print instructions generated from different remark() function calls are never merged. Only instructions generated from a single remark() are merged together.
  • All constant values are merged together regardless of the resulting string length, even on the basic optimization level.

If the print merging optimization is not active, instructions from remark() functions aren't merged.

Optimization for speed

In some cases, it is possible to rearrange the code in such a way that, even though it consist of more instructions, it actually executes faster. An example of this kind of optimization is function call inlining: instead of having a single copy of the function called from various places, the entire code of the function can be replicated at the places it is called from. The speedup stems from the fact that passing arguments to the function, storing the return address and later returning back from the function can be avoided this way, while the code size increase is a consequence of having two or more copies of the function instead of just one.

Mindustry processors limit the total number of instructions in a program to a thousand. If the compiled code exceeds this limit, it is unusable, but if the compiled code is smaller than the limit, the unused instruction space brings no benefit at all. The best approach therefore is to use as much of the existing instruction space without exceeding the limit to generate faster code where possible.

Mindcode provides the goal option to specify whether Mindcode should generate code that is smaller (the goal is size) or a code that is faster (speed). There's a third option - auto - which is the default for the compiler and currently is identical to the speed option. When the goal is set to size, only optimizations with a zero or negative cost are performed and no statistics about speed optimizations is displayed.

Another setting affecting the speed optimizations is the instruction-limit option. It allows to change the instruction limit considered by the optimizer.

When generating the code, there might be several opportunities for speed optimizations. It is possible that by applying all of them the maximal number of instructions would be exceeded. Mindcode follows the following process to choose the optimizations that will be realized:

  • At certain point in the optimization process, all opportunities for speed optimizations are analyzed and those that can be realized in the remaining instruction space are gathered.
  • For each such possible optimization, the following quantities are computed:
    • cost: the number of additional instructions that will be created by the optimization,
    • benefit: a measure of execution improvement achieved by the optimization,
    • efficiency: the efficiency of the optimization is computed by dividing the benefit by cost, obtaining a measure of program improvement per additional instruction of code.
  • The optimization with the highest efficiency is applied. If several optimizations have the same efficiency, the one with the smallest cost is applied.
  • The entire process is repeated from start until there are no optimization possibilities left, or none of the optimizations is permissible given available instruction space.

In short, as many as possible optimizations are applied in the order from one giving best returns to the one giving worst return. Different strategies for choosing optimizations might be possible (for example, instead of realizing one large optimization, it might be better to apply two smaller ones), but at this point these different arrangements aren't considered.

Oftentimes, an optimization being applied might open up opportunities for further optimizations (in some cases it might be possible to unroll inner loop after unrolling the outer loop, for example). Since all remaining optimizations are considered after applying the most efficient one, Mindcode will automatically discover and process these additional optimizations, in the order described above.

It's quite obvious that calculating the benefit is a key part of the optimization evaluation. Avoiding an instruction that is executed a lot provides more benefit that avoiding one that is executed just once (in Mindustry, each instruction takes the same time to execute, simplifying the things greatly). Mindcode therefore assigns each instruction a weight, a measure of how often the instruction is expected to be executed. In general, it is impossible to compute this number precisely. Mindcode uses a very simple algorithm to determine instruction weights:

  • At the beginning of code generation, current weight is established:
    • 1 for the main program,
    • number of places a function is called from for out-of-line or recursive functions.
  • Each generated instruction is assigned the current weight.
  • When entering a branch of an if statement, the current weight is halved. It is restored when exiting the branch. This corresponds to a very simplistic expectation that the condition will be evaluated to true 50% of the time, and therefore a branch in the if statement will be executed only half as often as the surrounding code.
  • When entering a when branch of a case expression, the weight is divided by the number of branches in the expression. The reasoning (and inaccuracy) is analogous to the if statement approximation described above.
  • When entering a loop body, the weight is multiplied by 25. The implicit expectation is that a loop will iterate 25 times on average (which is certainly wrong). Even when the number of iterations is known at the compilation time, the weight adjustment is the same for all loops. This is to prevent Mindcode from preferring optimizations of loops with large, known number of iterations (such as setting all values of memory bank to 0) to other loops whose number of iterations cannot be known.
  • Weight of stackless and recursive functions is adjusted:
    • Stackless function weights are iteratively recomputed as a total weight of the respective CALL instructions of the given function.
    • Recursive function weights are then computed as a total weight of the respective CALLREC instructions of the given function. No iterative updating is made for recursive functions.
  • When instructions are duplicated or moved during optimizations, their weights are adjusted according to the context into which they were duplicated or moved.

The benefit of an optimization is then computed as the sum of weights of instructions that would be avoided thanks to the optimization. The net result is that Mindcode strongly prefers optimizing code inside loops, and defers optimizations inside the branches of if and case statements.


« Previous: Compiler directives   |   Next: Function reference for Mindustry Logic 6 »