Skip to content

Latest commit

 

History

History
310 lines (241 loc) · 10.9 KB

Programming-Guide.md

File metadata and controls

310 lines (241 loc) · 10.9 KB
title parent excerpt
Programming Guide
Thorin.md
This guide describes the most important classes and programming concepts in Thorin. Please also read Thorin's Language Reference Manual. This guide assumes that you are already familiar with Thorin.

A First Tutorial

Before getting into detail, let us discuss how to manually construct a Thorin representation of the following program:

fn main(i: int) -> int {
    let a: int;
    if i == 0 {
        a = 23;
    } else {
        a = 42;
    }
    a
}

As outlined in the Thorin [[Language Reference Manual]] this program is represented in Thorin as follows:

main(i: u32, return: fn(u32))
    u1 cond = cmp_eq i, u32 0
    fn() to = select cond, if_then, if_else
    to()

if_then()
    next(23)

if_else()
    next(42)

next(a: u32)
    return(a)

Now let us construct this program with the C++ API:

World world; // serves as container for our program

// create the four functions:
auto main = world.lambda(world.fn_type(
                    {world.type_u32(), world.fn_type({world.type_u32()})}));
auto if_then = world.lambda(); // defaults to 'fn()'
auto if_else = world.lambda(); // defaults to 'fn()'
auto next = world.lambda(world.fn_type({world.type_u32()}));

// create comparison:
auto cmp = world.relop_cmp_eq(main->param(0), world.literal_u32(0));

// wire lambdas
main->branch(cmp, if_then, if_else);
if_then->jump(next, {world.literal_u32(23)});
if_else->jump(next, {world.literal_u32(23)});
next->jump(main->param(1), {next->param(0)});

Here, we computed SSA form manually. However, if we look at the original example it is not directly visible where to add parameters to local functions and so forth. For this reason, AnyDSL offers a built-in SSA construction algorithm. Simply create a unique number which serves as handle for each variable.

World world; // serves as container for our program

// create the four functions:
auto main = world.lambda(world.fn_type(
                    {world.type_u32(), world.fn_type({world.type_u32()})}));
// same as world.lambda() but this time if_then/if_else is "unsealed"
auto if_then = world.basicblock(); 
auto if_else = world.basicblock();
auto next = world.lambda(world.fn_type({world.type_u32()}));

// create comparison:
auto cmp = world.relop_cmp_eq(main->param(0), world.literal_u32(0));

// seal "basic blocks" as soon as possible, 
// i.e., you promise to not add additional predecessors later on
main->branch(cmp, if_then, if_else);
if_then->seal();
if_else->seal();

// We use handle '0' to represent variable 'a' in the original program.
// Now let us create the assignments:
if_then->set_value(0, world.literal_u32(23));
if_else->set_value(0, world.literal_u32(42));

// wire next
if_then->jump(next, {});
if_else->jump(next, {});
next->seal();
next->jump(main->param(1), {next->get_value(0, world.type_u32()}));

Please refer to Braun et al. for details about this algorithm.

Basics

Casts

Use the as method as static cast. An assertion will be raised in the debug version if this cast is not possible during runtime. Moreover, it is checked at compile time whether the target type of the cast is a subtype of the casted type. Thus, prefer the as method over an ordinary cast.

Similarly, The isa method is a more convenient version of a dynamic_cast. It also checks subtyping like as:

// short hand for: const T* t = static_cast <const T*>(foo)
const T* a = foo->as<T>(); 
// short hand for:       T* t = static_cast <      T*>(foo)
      T* b = foo->as<T>(); 
// short hand for: const T* t = dynamic_cast<const T*>(foo)
const T* a = foo->isa<T>(); 
// short hand for:       T* t = dynamic_cast<      T*>(foo)
      T* b = foo->isa<T>(); 

Note that in C++ you can do the following to build a "type-if":

void foo(Def def) {
    if (auto primop = def->isa<PrimOp>()) {
        // do sth with primop
    else if (auto param = def->isa<Param>()) {
        // do sth with param
    } else {
        // must be a Lambda; there are only three kinds of definitions
        const Lambda* lambda = def->as<Lambda>(); 
        // you can also get rid of const in the case of a Lambda (see below)
        Lambda* lambda = def->as_lambda();
        // do sth with lambda
    }
}

Array and ArrayRef

An Array is just a fixed sized array allocated on the heap:

Array<int> a(10); // an array of 10 ints

An ArrayRef points to an Array, std::vector or a C-style array:

// points to 'a', 
// no additional dynamic memory operations (heap operations) are performed
ArrayRef<int> ref(a); 

This is the preferred class when passing a list of things to functions. AnyDSL makes heavy use of this. Remember that ArrayRef already is the reference. So pass this class as value to other functions. References to an ArrayRef are almost always superfluous.

Note that ArrayRef has an constructor for std::initializer_list. Thus, all functions/methods expecting ArrayRefs can also use C++11's brace syntax:

// fn(u32, u32, u64)
auto fn_type = world.fn_type(
                {world.type_u32(), world.type_u32(), world.type_u64()}); 

Ranged-based for

We highly encourage to use C++11's range-based for loop whenever possible. Note that the aforementioned Array and ArrayRef classes work just fine with range-based for.

Definitions

As outlined in the Thorin [[Language Reference Manual]] there exists three possibilities to define values:

  1. primops,
  2. lambdas and
  3. their parameters.

Each definition inherits from the Def class. A Primop class inheriting from Def is the base class for all primops. The Lambda class (also inheriting from Def) represents a function. Once created, a Lambda instance automatically creates its parameters (represented by the Param class which is the third and last class inheriting from Def):

World w;
// create a lambda of type fn(u32, f32)
auto lambda = w.lambda(w.fn_type({w.type_u32(), w.type_f32());
auto uparam = lambda->param(0);
auto fparam = lambda->param(1);

We can later on add additional parameters if required. Pay attention that any potential users of this lambda are likely to become broken. Thus, only do this when constructing a function but never afterwards:

lambda->append_param(w.type_u1)); // lambda is now of type fn(u32, f32, u1)
auto bparam = lambda->param(2);

Note that lambdas are always mutable. You can always modify their contents. In particular, you would like to fill a lambda by specifying a body. You do this by letting this function jump to a target:

// creates a jump: some_function(uparam, fparam)
lambda->jump(some_function, {uparam, fparam}));

An assertion is raised if a type error occurs, i.e., some_function's type must be fn(u32, f32). You can always update the body to your liking:

lambda->jump(some_other_function, {a, b, c});

Do not worry about leaking memory. This is all handled by Thorin. Any definitions which become dead and/or unreachable will be deleted after invoking cleanup_world:

// get rid of dead and unreachable code; also unused types will be destroyed.
cleanup_world(w);

The World

The most important class in AnyDSL is the World class. This class serves three main purposes:

  • It is a factory class to construct functions, primops and types. The constructor of the actual classes are private. Always use World's factory methods to retrieve an instance of your node.
  • When constructing primops several local optimizations are performed (see below).
  • The whole program you want to construct lives in a World instance. You can create several World instances. But these are completely independent of each other. If you mix things across world instances your program is broken.

The C++ const qualifier and mutating objects

Once created, primops, types and parameters are immutable. They are internally stored in the world; parameters are handled by their Lambda. For this reason you will always get const pointers to parameters and primops.

Mutating Lambdas

However as outlined above, lambdas are mutable. To be on the safe side you will always get immutable objects. If this object is in fact a Lambda you want to get rid of the const qualifier as lambdas are mutable. Use the as_lambda and isa_lambda for this purpose:

void foo(Def def) {
    if (auto lambda = def->isa_lambda()) { // dynamic_cast
        // you may now mutate 'lambda'
    }
    //...
    // static_cast (checked in the debug version)
    auto lambda = def->as_lambda(); 
    // you may now mutate 'lambda'
}

Mutating Primops and Parameters

Primops and Parameters cannot really be mutated. Instead, create a new thing and replace the old node with the new one:

const PrimOp* old_primop = ...;
const PrimOp* new_primop = ...;
old_primop->replace(new_primop);

From now on, old_primop still lives in the World but it is dead. Unless you create new users for old_primop it will be wiped out as dead code after the next invocation of World::cleanup(). Maybe you want to "update" the second operand of some primop. You can do this by "rebuilding" the primop in question:

Def add = ...;
Array<Def> new_ops = add.ops();
new_ops[1] = world.literal_u32(42);
Def updated = world.rebuild(add, new_ops);
add->replace(update);

Primops and Local Optimizations

When ever you construct a new primop, local optimizations are performed in order to simplify the program under construction. For this reason, World's factory methods always return Def:

auto def = world.arithop_add(world.literal_u32(1), world.literal_u32(2));
// def is NOT an ArithOp; it is a PrimLit of value 3

Thus, never assume anything about def. And there really is no reason to do this. It may even be the case that due to the built-in common subexpression elimination you get a pointer to a definition you have already built:

auto def1 = world.arithop_add(a, b);
// ...
auto def2 = world.arithop_add(a, b);
// def1 == def2 due to the built-in CSE