Skip to content

Commit

Permalink
Use c3 linearization in the MOP (#1033)
Browse files Browse the repository at this point in the history
Runtime improvements

MOP:
- Use c3 linearization for class precedence list.
- Implement and use the C3 linearization algorithm for multiple inheritance.
- Unify struct fields and class slots, using struct inheritance then class precedence list for slot order. This improves on previous semantics, but introduces a slight incompatibility and will require updating the optimizer.
- Accordingly update various modules in std.

Also:
- Define defmutable in prelude.
- Add a (dump-backtrace?) parameter to control backtrace printing.
- Support build-manifest in the runtime.
- Move DBG macro in the runtime for past and future debugging.

Regenerate bootstrap.
  • Loading branch information
fare authored Jan 30, 2024
1 parent a24ed4e commit c47bc90
Show file tree
Hide file tree
Showing 107 changed files with 35,465 additions and 32,436 deletions.
5 changes: 3 additions & 2 deletions doc/reference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@
This is the reference documentation for Gerbil. We aim to
exhaustively document the Scheme primitives, the Gambit primitives, the Gerbil [core prelude](gerbil/prelude/README.md), [runtime builtins](gerbil/runtime/README.md), the
[standard library](std/README.md), and the Meta-[expander context](gerbil/expander/README.md).
We also have a [guide for developers of Gerbil](dev/README.md).

The idea of extensive and easy to use documentation is at our forefront. This is still a WIP and there's more to come. If you need certain things now see the [R5RS](https://schemers.org/Documents/Standards/R5RS/HTML/) document for basic Scheme primitives used as a part of our prelude along with the [Gambit Manual](https://www.iro.umontreal.ca/~gambit/doc/gambit.html) for our underlying implementation internals.
The idea of extensive and easy to use documentation is at our forefront. This is still a WIP and there's more to come. If you need certain things now see the [R5RS](https://schemers.org/Documents/Standards/R5RS/HTML/) document for basic Scheme primitives used as a part of our prelude along with the [Gambit Manual](https://www.iro.umontreal.ca/~gambit/doc/gambit.html) for our underlying implementation internals.

If you're viewing this as a webpage online almost every page has a link whereby you can edit and request a commit. Even if it's just pointing out the issue every part helps.

When information is missing, out of date, or unclear, it's a bug! If you cannot edit try to contact us (by email, on Gitter or GitHub, etc.) and we'll get it done.
When information is missing, out of date, or unclear, it's a bug! If you cannot edit try to contact us (by email, on Gitter or GitHub, etc.) and we'll get it done.

218 changes: 187 additions & 31 deletions doc/reference/dev/bootstrap.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,139 @@
# The Gerbil Bootstrap

Gerbil is fully self-hosted, 100% written in itself. So how does it
all fit together?
Gerbil is fully self-hosted, with both its compiler and runtime 100% written in itself.
So how does it all fit together?

## Overview of Bootstrapping in General

At all times, a bootstrapped language keeps alongside the source code
of its implementation (runtime and compiler), written in the language itself,
a precompiled "bootstrap" version of its implementation,
sufficient to build the current version.
This "bootstrap" version is either source code in a *host* language,
or object code for a host environment or collection of host environments
(typically, either a single bytecode binary for a VM implemented in C, or
a series of executables for each of many supported platforms,
e.g. each of Linux, Windows, macOS on each of x86-64, aa64, etc.).

When building the bootstrapped language,
we start from the host language or environment,
use it to build and run the bootstrap version of the implementation,
with which build the current version of the implementation.
That current version can at times be blessed as a new bootstrap implementation.

A bootstrapped implementation is in contrast with a *cross-implementation*:
an implementation _manually_ written in a *host* language
that differs from the target language being implemented.
Confusingly, the host language is often called
the meta-language when talking about a compiler, or
the base language when talking a runtime or an interpreter,
even though meta- and base- are used as mutual opposites in such context.

Every bootstrapped implementation started with a regular cross-implementation
as its bootstrap implementation, though that cross-implementation may have
long since been superseded by many subsequent compiled versions where
the manually written host code was replaced by code automatically generated
from source code in the bootstrapped language.

In the case of Gerbil, we use Gambit Scheme as a host language, and
we keep a precompiled bootstrap implementation in the directory
`GERBIL_SRCDIR/src/bootstrap/`.

## Pros and Cons of Bootstrapping

### Pros of Bootstrapping

There are many advantages to bootstrapping the implementation of
a programming language:

- You can use all the features of your language while developping it,
instead of being stuck with another language, necessarily unsatisfying
in enough ways that you're developing a different one.

- You don't have to rely on other people with respect to
the implementation language: bugs, tooling, standards, compatibility,
release cycles, etc. You can do it all yourself!

- If there is a language or compiler feature you wish you had
while developing your implementation, you can first implement it
then later use it to write further features or rewrite existing ones.

- You don't have to cultivate and simultaneously hold in your head
two different languages, their semantics, pragmatics, libraries, and
colloquial styles as you develop your implementation.

### Cons of Bootstrapping

There are also a few disadvantages to bootstrapping the implementation of
a programming language:

- Once you bootstrap, you forfeit any advanced feature or tooling of
your previous cross-implementation's host language that
you haven't yet reimplemented in your own language.

- You can't rely on other people with respect to the implementation language:
bugs, tooling, standards, compatibility, release cycles, etc.
You must do it all yourself!

- You must follow strict constraints (detailed below for Gerbil)
to ensure that at any time you have a working bootstrap implementation
capable of running all programs in your language
including your current implementation.

- In particular, when making changes to implementation, you cannot make
incompatible changes to any feature used by the implementation itself:
renaming a function, deleting anything, modifying some encoding, etc.
Changes must be introduced in several steps, each generation maintaining
compatibility with both the immediate previous and next generations.

- The semantics of your language, the meaning of programs written in it,
become more difficult to assess by either humans or automated analyses,
each time you regenerate the bootstrap implementation.

- In some rare but egregious cases, unintended bugs introduced in one version
of the code can cause problems after several "generations" of bootstrapping
especially in the case of insufficient regression testing,
causing a lot of confusion and ultimately necessitating for developers
to "go back in time" and run again a potentially long chain
of bootstrap versions each suitably fixed.

- In extreme cases, malicious behavior can be deliberately hidden
in the bootstrap implementation without visible trace in the source code.
See Ken Thompson's famous 1984 Turing Award Lecture
["Reflections on Trusting Trust"](http://genius.cat-v.org/ken-thompson/texts/trusting-trust/).

## The Chain of Trust

The key premise of Gerbil is simple: it is a meta-language, a
meta-dialect of Scheme that bootstraps from precompiled sources using
Gambit.

This has implications for the chain of trust regarding the security of
your software:
- There is _no precompiled binary involved_, nor will there ever be
one. The bootstrap is purely source based, which means you can
actually read and audit the bootstrap sources. It is not pretty, but
it is readable; so if you feel so inclined I recommend taking a look
at `GERBIL_SRCDIR/src/bootstrap`.
- Just like Gerbil bootstraps from precompiled Gambit code, Gambit
bootstraps from precompiled C code. This is also auditable, albeit
not an easy read.
- The bootstrap chain is anchored on the C compiler. Ultimately, If
you trust your C compiler, then you can _verifiably_ trust the
Gerbil bootstrap.

For the Gerbil core team, where we all use GCC, this can be
summarized in a quotable one liner:
The last point on "trusting trust" has implications, whereby
the security of your software against "supply chain attacks"
depends on a chain of trust that includes your host environment
and every part of your language implementation.
Gerbil's choice of bootstrapping with Gambit Scheme as a host language
has several implications:

- There is _no precompiled binary involved_, nor will there ever be one.
The bootstrap is purely source based, which means you can
actually read and audit the bootstrap sources in Gambit Scheme.
That code is not pretty, yet remains readable and amenable to audit;
if you feel so inclined you may take a look at
`GERBIL_SRCDIR/src/bootstrap`.

- Just like Gerbil bootstraps from precompiled Gambit code,
Gambit bootstraps from precompiled C code.
This is also auditable, albeit not an easy read.

- The bootstrap chain is anchored on the C compiler.
Ultimately, If you trust your C compiler,
then you can _verifiably_ trust the Gerbil bootstrap.

For the Gerbil core team, where we all use GCC,
this can be summarized in a quotable one liner:

> In GNU we trust; everyone else pays cash.
## The Long and Arduous History of Bootstrap

The first version of the Gerbil, let's call that the proto-Gerbil, was
The first version of the Gerbil, lets call that the proto-Gerbil, was
bootstrapped by vyzo a long time ago using a hand-written unhygienic
interpreter for the core language. Once that was done, vyzo wrote the
expander and the first version of the compiler, then the expander
Expand All @@ -41,10 +144,9 @@ Initially, the runtime was written in Gambit with a set of macros;
that was called `gx-gambc`. In the v0.18 release cycle, where Gerbil
became fully self hosted, all the traces have disappeared from the
source tree, as they are dead code. They still exist in the
repo's commit history if you want to do some historical research and
repos commit history if you want to do some historical research and
peek into the deep past to understand the evolution of Gerbil.


## How Gerbil Builds Itself

The build process can be summarized in the following steps:
Expand All @@ -59,7 +161,6 @@ The build process can be summarized in the following steps:
5. the Gerbil core system and universal binary is compiled using the bootstrap compiler with `boot-gxi` (stage 1).
6. the newly compiled `gerbil` binary compiles the rest of the system.


## Practical Matters

### Recompiling the Bootstrap
Expand All @@ -71,7 +172,7 @@ This can be accomplished with the following incantations in `$GERBIL_SRCDIR/src`

- To compile the bootstrap runtime:
```
gxc -d bootstrap -s -S -O gerbil/runtime/{gambit,system,util,loader,control,mop,error,thread,syntax,eval,repl,init}.ss gerbil/runtime.ss
gxc -d bootstrap -s -S -O gerbil/runtime/{gambit,util,system,loader,control,c3,mop,error,thread,syntax,eval,repl,init}.ss gerbil/runtime.ss
```

- To compile the bootstrap core prelude:
Expand All @@ -91,14 +192,69 @@ gxc -d bootstrap -s -S -O gerbil/expander/{common,stx,core,top,module,compile,ro

- To compile the bootstrap compiler:
```
gxc -d bootstrap -s -S -O gerbil/compiler/{base,compile,optimize-base,optimize-xform,optimize-top,optimize-spec,optimize-ann,optimize-call,optimize,driver,ssxi}.ss gerbil/compiler.ss
gxc -d bootstrap -s -S -O gerbil/compiler/{base,compile,optimize-base,optimize-xform,optimize-top,optimize-spec,optimize-ann,optimize-call,optimize,driver,ssxi}.ss gerbil/compiler.ss
```

- Finally, if you've made changes to it, you should also copy the core.ssxi.ss optimizer prelude:
- Finally, if youve made changes to it, you should also copy the core.ssxi.ss optimizer prelude:
```
cp gerbil/prelude/core.ssxi.ss bootstrap/gerbil
```

### Strictures on Modifying Parts of the Gerbil Bootstrap

***Every change to the Gerbil Bootstrap
must be API-compatible from one version to the next***:
both the old and new versions of Gerbil
(before and after recompiling the bootstrap) must be able to use them.

You *can* make API-incompatible changes from one version to another version,
but this must necessarily involve *several steps*
each of which will be API-compatible:

- First, you cannot make any backward-incompatible API change, such as
changing the calling convention of a function or macro e.g.
so you must use a symbol instead of a string,
or a 1-based index instead of a 0-based index, etc.
- You *could* modify a function to temporarily accept either a symbol or string
and do a conversion inside; but you obviously cannot determine whether
an user-provided index should be interpreted as 1-based or 0-based.
- The solution is to create a *new* API with *new* names that
must absolutely not clash with the old names.
Add a suffix or prefix such as `*`, `/2` or `%`, or take the opportunity
to give functions better and/or more systematic names.
- The *old* API will temporarily coexist with use the *new* API.
- When shared data structures are involved, the *old* API
as called by the previous bootstrap implementation may have to be
reimplemented in terms of the *new* API used by the next generation.
- The internal representations used by the new API may thus have to include
extra information needed by the old API that it doesn’t need,
or the new API may have to maintain two redundant representations together,
until after the old API is removed. This extra information
or redundant representation can be removed in a later phase.
- You can use the old bootstrap implementation to generate a next version
of the bootstrap implementation that uses the new API,
while the old API remains available to the old version.
- In one or many iterations, you can make sure the old API is not used anywhere
anymore in Gerbil and its libraries.
- Only after you bootstrapped a version of Gerbil that does not at all
use the old API, you may wholly remove that old API:
this is now a backward-compatible change.
- If for some reason you really like the old name or hate the new name,
and “just” want to make an incompatible API change,
the name is made available anew after the old API was wholly removed
and a version that doesn’t use it has been bootstrapped into existence.
You may therefore start a new cycle of API changes as above to modify the API
to use this now-available-again name.
- As a cultural requirement meant to facilitate semantic analysis and
a well-founded reproducible and debuggable bootstrapping history,
we ask you to commit a separate PR for each phase of such an API change,
such that each committed version of Gerbil can be compiled
by the immediate previous one (but usually not by arbitrary older ones,
which would be overconstraining and prevent refactoring and progress).

These strictures mean that you must stage your changes in multiple commits,
and regenerate the bootstrap compiler at each step.

### Debugging

If you have been making changes in the core system and building a new
Expand All @@ -124,7 +280,7 @@ will and supports serveral commands:
easily navigate code in emacs.
- `env` applies the arguments in the build environment.

So if you have made changes and want to rebuild gerbil, you don't have
So if you have made changes and want to rebuild gerbil, you dont have
to redo everything from scratch with `make`; you can simply build the
stage you want, and once you are satisfied you can move to the next
stage or push your branch so that CI does the job for you.
Expand Down Expand Up @@ -160,4 +316,4 @@ $ ./build.sh env gerbil test ./...
...
```

And that's it! Happy Hacking.
And thats it! Happy Hacking.
21 changes: 11 additions & 10 deletions doc/reference/gerbil/runtime/MOP.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ Same as `(direct-instance? klass obj)`.
super := type-descriptor or #f; the struct type to inherit from
fields := fixnum; number of (new) fields in the type
name := symbol; the (displayed) type name
plist := alist; type properties
properties := alist; type properties
ctor := symbol or #f; id of constructor method
field-names := list of symbols or #f; (displayed) field names
Expand Down Expand Up @@ -305,20 +305,21 @@ Converts *obj* to a list, which conses its type and to its fields.

## make-class-type
``` scheme
(make-class-type id super slots name plist ctor) -> type-descriptor
(make-class-type id name direct-supers direct-slots properties constructor) -> type-descriptor
id := symbol; the type id
super := list of type-descriptors or #f; super types
slots := list of symbols; class slot names
plist := alist; type properties
ctor := symbol or #f; id of constructor method
id := symbol; the unique type id
name := symbol; the possibly not unique source type name used when displaying the class
direct-supers := list of type-descriptors or #f; super types
direct-slots := list of symbols; class slot names
properties := alist; type properties (NB: not a plist)
constructor := symbol or #f; id of constructor method
plist elements:
alist elements:
(transparent: . boolean) ; controls whether the object is transparent
in equality and printing
(final: . boolean) ; controls whether the class if final
(print: slot ...) ; printable slots
(equal: slot ...) ; equality comparable slots
(print: slot ...) ; list of printable slots, or boolean
(equal: slot ...) ; list of equality comparable slots, or boolean
```

Creates a new class type descriptor.
Expand Down
7 changes: 5 additions & 2 deletions doc/reference/std/debug.md
Original file line number Diff line number Diff line change
Expand Up @@ -399,6 +399,9 @@ Returns true if the `thread`'s message queue is empty.
If the `tag` doesn't evaluate to `#f`, print the tag, then on separate lines
the source of each expression `expr1` to `exprN` (as by `write`)
followed by its single or multiple return values (as by `prn`).
When an expression is preceded by a quoted form (as in `'form`)
then that form is printed instead of the following expression
(which can help when the expression is large and uninformative to print).
Finally, return the values of the last expression `exprN`.

You can easily wrap an expression in a `DBG` form so as to print its value,
Expand All @@ -409,11 +412,11 @@ in some part of your code.
Example:
```scheme
> (define-values (x y z) (values 1 2 3))
> (* 10 (DBG foo: x (values [(+ x y) z] #t) (+ x y z)))
> (* 10 (DBG foo: x (values [(+ x y) z] #t) 'result (+ x y z)))
foo
x => 1
(values (@list (+ x y) z) #t) => [3 3] #t
(+ x y z) => 6
result => 6
60
```
In the above example the tag `foo` and the indented lines are printed by `DBG`,
Expand Down
29 changes: 28 additions & 1 deletion doc/reference/std/errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,30 @@ displaying the exception with `display-exception`).
```

Invokes `thunk` with an exception handler that dumps the exception
stack trace with `dump-stack-trace!`.
stack trace with `dump-stack-trace!`
if `(dump-stack-trace?)` is true (the default).

### dump-stack-trace?
```scheme
(define dump-stack-trace? (make-parameter #t))
```
A parameter that controls whether some a stack trace will be dumped
when an exception is presented to the user.

This parameter is notably used by the `display-exception` method for
the `Error` type in the runtime, and by `with-exception-stack-trace`
(see above) that is also used in the default exception handler installed
in threads spawned with `spawn-actor`.
It is also heeded by `exit-with-error` (see below).

You can `(dump-stack-trace? #f)`
or locally `(parameterize ((dump-stack-trace? #f)) ...)`
to disable this stack trace dump,
in case you are building a program for end-users rather than for developers,
and want to control what limited error output they see.
Or you can re-enable them based on a debug flag at the CLI
in cases you want them to provide you with extra debugging information,
or log bug reports directly to your servers, etc.

### dump-stack-trace!
```scheme
Expand All @@ -451,6 +474,10 @@ This parameter controls whether `call-with-exit-on-error`, `with-exit-on-error`,
will exit if an error is caught, rather than pass on the error
and return to the REPL (or let a more fundamental function exit).

If `dump-stack-trace?` is true, then the exception will be reprinted
a second time with `dump-stack-trace?` bound to false, so that
the text of the exception should be printed a second time after the stack dump.

### call-with-exit-on-error
```scheme
(call-with-exit-on-error thunk)
Expand Down
Loading

0 comments on commit c47bc90

Please sign in to comment.