Use c3 linearization in the MOP (#1033)

Runtime improvements MOP: - Use c3 linearization for class precedence list. - Implement and use the C3 linearization algorithm for multiple inheritance. - Unify struct fields and class slots, using struct inheritance then class precedence list for slot order. This improves on previous semantics, but introduces a slight incompatibility and will require updating the optimizer. - Accordingly update various modules in std. Also: - Define defmutable in prelude. - Add a (dump-backtrace?) parameter to control backtrace printing. - Support build-manifest in the runtime. - Move DBG macro in the runtime for past and future debugging. Regenerate bootstrap.
mighty-gerbils · Jan 30, 2024 · c47bc90 · c47bc90
1 parent a24ed4e
commit c47bc90
Show file tree

Hide file tree

Showing 107 changed files with 35,465 additions and 32,436 deletions.
diff --git a/doc/reference/README.md b/doc/reference/README.md
@@ -3,10 +3,11 @@
 This is the reference documentation for Gerbil.  We aim to
 exhaustively document the Scheme primitives, the Gambit primitives, the Gerbil [core prelude](gerbil/prelude/README.md), [runtime builtins](gerbil/runtime/README.md), the
 [standard library](std/README.md), and the Meta-[expander context](gerbil/expander/README.md).
+We also have a [guide for developers of Gerbil](dev/README.md).
 
-The idea of extensive and easy to use documentation is at our forefront. This is still a WIP and there's more to come. If you need certain things now see the [R5RS](https://schemers.org/Documents/Standards/R5RS/HTML/) document for basic Scheme primitives used as a part of our prelude along with the [Gambit Manual](https://www.iro.umontreal.ca/~gambit/doc/gambit.html) for our underlying implementation internals. 
+The idea of extensive and easy to use documentation is at our forefront. This is still a WIP and there's more to come. If you need certain things now see the [R5RS](https://schemers.org/Documents/Standards/R5RS/HTML/) document for basic Scheme primitives used as a part of our prelude along with the [Gambit Manual](https://www.iro.umontreal.ca/~gambit/doc/gambit.html) for our underlying implementation internals.
 
 If you're viewing this as a webpage online almost every page has a link whereby you can edit and request a commit. Even if it's just pointing out the issue every part helps.
 
-When information is missing, out of date, or unclear, it's a bug! If you cannot edit try to contact us (by email, on Gitter or GitHub, etc.) and we'll get it done. 
+When information is missing, out of date, or unclear, it's a bug! If you cannot edit try to contact us (by email, on Gitter or GitHub, etc.) and we'll get it done.
 
diff --git a/doc/reference/dev/bootstrap.md b/doc/reference/dev/bootstrap.md
@@ -1,36 +1,139 @@
 # The Gerbil Bootstrap
 
-Gerbil is fully self-hosted, 100% written in itself. So how does it
-all fit together?
+Gerbil is fully self-hosted, with both its compiler and runtime 100% written in itself.
+So how does it all fit together?
+
+## Overview of Bootstrapping in General
+
+At all times, a bootstrapped language keeps alongside the source code
+of its implementation (runtime and compiler), written in the language itself,
+a precompiled "bootstrap" version of its implementation,
+sufficient to build the current version.
+This "bootstrap" version is either source code in a *host* language,
+or object code for a host environment or collection of host environments
+(typically, either a single bytecode binary for a VM implemented in C, or
+a series of executables for each of many supported platforms,
+e.g. each of Linux, Windows, macOS on each of x86-64, aa64, etc.).
+
+When building the bootstrapped language,
+we start from the host language or environment,
+use it to build and run the bootstrap version of the implementation,
+with which build the current version of the implementation.
+That current version can at times be blessed as a new bootstrap implementation.
+
+A bootstrapped implementation is in contrast with a *cross-implementation*:
+an implementation _manually_ written in a *host* language
+that differs from the target language being implemented.
+Confusingly, the host language is often called
+the meta-language when talking about a compiler, or
+the base language when talking a runtime or an interpreter,
+even though meta- and base- are used as mutual opposites in such context.
+
+Every bootstrapped implementation started with a regular cross-implementation
+as its bootstrap implementation, though that cross-implementation may have
+long since been superseded by many subsequent compiled versions where
+the manually written host code was replaced by code automatically generated
+from source code in the bootstrapped language.
+
+In the case of Gerbil, we use Gambit Scheme as a host language, and
+we keep a precompiled bootstrap implementation in the directory
+`GERBIL_SRCDIR/src/bootstrap/`.
+
+## Pros and Cons of Bootstrapping
+
+### Pros of Bootstrapping
+
+There are many advantages to bootstrapping the implementation of
+a programming language:
+
+  - You can use all the features of your language while developping it,
+    instead of being stuck with another language, necessarily unsatisfying
+    in enough ways that you're developing a different one.
+
+  - You don't have to rely on other people with respect to
+    the implementation language: bugs, tooling, standards, compatibility,
+    release cycles, etc. You can do it all yourself!
+
+  - If there is a language or compiler feature you wish you had
+    while developing your implementation, you can first implement it
+    then later use it to write further features or rewrite existing ones.
+
+  - You don't have to cultivate and simultaneously hold in your head
+    two different languages, their semantics, pragmatics, libraries, and
+    colloquial styles as you develop your implementation.
+
+### Cons of Bootstrapping
+
+There are also a few disadvantages to bootstrapping the implementation of
+a programming language:
+
+  - Once you bootstrap, you forfeit any advanced feature or tooling of
+    your previous cross-implementation's host language that
+    you haven't yet reimplemented in your own language.
+
+  - You can't rely on other people with respect to the implementation language:
+    bugs, tooling, standards, compatibility, release cycles, etc.
+    You must do it all yourself!
+
+  - You must follow strict constraints (detailed below for Gerbil)
+    to ensure that at any time you have a working bootstrap implementation
+    capable of running all programs in your language
+    including your current implementation.
+
+  - In particular, when making changes to implementation, you cannot make
+    incompatible changes to any feature used by the implementation itself:
+    renaming a function, deleting anything, modifying some encoding, etc.
+    Changes must be introduced in several steps, each generation maintaining
+    compatibility with both the immediate previous and next generations.
+
+  - The semantics of your language, the meaning of programs written in it,
+    become more difficult to assess by either humans or automated analyses,
+    each time you regenerate the bootstrap implementation.
+
+  - In some rare but egregious cases, unintended bugs introduced in one version
+    of the code can cause problems after several "generations" of bootstrapping
+    especially in the case of insufficient regression testing,
+    causing a lot of confusion and ultimately necessitating for developers
+    to "go back in time" and run again a potentially long chain
+    of bootstrap versions each suitably fixed.
+
+  - In extreme cases, malicious behavior can be deliberately hidden
+    in the bootstrap implementation without visible trace in the source code.
+    See Ken Thompson's famous 1984 Turing Award Lecture
+    ["Reflections on Trusting Trust"](http://genius.cat-v.org/ken-thompson/texts/trusting-trust/).
 
 ## The Chain of Trust
 
-The key premise of Gerbil is simple: it is a meta-language, a
-meta-dialect of Scheme that bootstraps from precompiled sources using
-Gambit.
-
-This has implications for the chain of trust regarding the security of
-your software:
-- There is _no precompiled binary involved_, nor will there ever be
-  one. The bootstrap is purely source based, which means you can
-  actually read and audit the bootstrap sources. It is not pretty, but
-  it is readable; so if you feel so inclined I recommend taking a look
-  at `GERBIL_SRCDIR/src/bootstrap`.
-- Just like Gerbil bootstraps from precompiled Gambit code, Gambit
-  bootstraps from precompiled C code. This is also auditable, albeit
-  not an easy read.
-- The bootstrap chain is anchored on the C compiler. Ultimately, If
-  you trust your C compiler, then you can _verifiably_ trust the
-  Gerbil bootstrap.
-
-For the Gerbil core team, where we all use GCC, this can be
-summarized in a quotable one liner:
+The last point on "trusting trust" has implications, whereby
+the security of your software against "supply chain attacks"
+depends on a chain of trust that includes your host environment
+and every part of your language implementation.
+Gerbil's choice of bootstrapping with Gambit Scheme as a host language
+has several implications:
+
+  - There is _no precompiled binary involved_, nor will there ever be one.
+    The bootstrap is purely source based, which means you can
+    actually read and audit the bootstrap sources in Gambit Scheme.
+    That code is not pretty, yet remains readable and amenable to audit;
+    if you feel so inclined you may take a look at
+    `GERBIL_SRCDIR/src/bootstrap`.
+
+  - Just like Gerbil bootstraps from precompiled Gambit code,
+    Gambit bootstraps from precompiled C code.
+    This is also auditable, albeit not an easy read.
+
+  - The bootstrap chain is anchored on the C compiler.
+    Ultimately, If you trust your C compiler,
+    then you can _verifiably_ trust the Gerbil bootstrap.
+
+For the Gerbil core team, where we all use GCC,
+this can be summarized in a quotable one liner:
 
 > In GNU we trust; everyone else pays cash.
 
 ## The Long and Arduous History of Bootstrap
 
-The first version of the Gerbil, let's call that the proto-Gerbil, was
+The first version of the Gerbil, let’s call that the proto-Gerbil, was
 bootstrapped by vyzo a long time ago using a hand-written unhygienic
 interpreter for the core language.  Once that was done, vyzo wrote the
 expander and the first version of the compiler, then the expander
@@ -41,10 +144,9 @@ Initially, the runtime was written in Gambit with a set of macros;
 that was called `gx-gambc`.  In the v0.18 release cycle, where Gerbil
 became fully self hosted, all the traces have disappeared from the
 source tree, as they are dead code. They still exist in the
-repo's commit history if you want to do some historical research and
+repo’s commit history if you want to do some historical research and
 peek into the deep past to understand the evolution of Gerbil.
 
-
 ## How Gerbil Builds Itself
 
 The build process can be summarized in the following steps:
@@ -59,7 +161,6 @@ The build process can be summarized in the following steps:
    5. the Gerbil core system and universal binary is compiled using the bootstrap compiler with `boot-gxi` (stage 1).
    6. the newly compiled `gerbil` binary compiles the rest of the system.
 
-
 ## Practical Matters
 
 ### Recompiling the Bootstrap
@@ -71,7 +172,7 @@ This can be accomplished with the following incantations in `$GERBIL_SRCDIR/src`
 
 - To compile the bootstrap runtime:
 ```
-gxc -d bootstrap -s -S -O gerbil/runtime/{gambit,system,util,loader,control,mop,error,thread,syntax,eval,repl,init}.ss gerbil/runtime.ss
+gxc -d bootstrap -s -S -O gerbil/runtime/{gambit,util,system,loader,control,c3,mop,error,thread,syntax,eval,repl,init}.ss gerbil/runtime.ss
 ```
 
 - To compile the bootstrap core prelude:
@@ -91,14 +192,69 @@ gxc -d bootstrap -s -S -O gerbil/expander/{common,stx,core,top,module,compile,ro
 
 - To compile the bootstrap compiler:
 ```
-gxc  -d bootstrap -s -S -O gerbil/compiler/{base,compile,optimize-base,optimize-xform,optimize-top,optimize-spec,optimize-ann,optimize-call,optimize,driver,ssxi}.ss gerbil/compiler.ss
+gxc -d bootstrap -s -S -O gerbil/compiler/{base,compile,optimize-base,optimize-xform,optimize-top,optimize-spec,optimize-ann,optimize-call,optimize,driver,ssxi}.ss gerbil/compiler.ss
 ```
 
-- Finally, if you've made changes to it, you should also copy the core.ssxi.ss optimizer prelude:
+- Finally, if you’ve made changes to it, you should also copy the core.ssxi.ss optimizer prelude:
 ```
 cp gerbil/prelude/core.ssxi.ss bootstrap/gerbil
 ```
 
+### Strictures on Modifying Parts of the Gerbil Bootstrap
+
+***Every change to the Gerbil Bootstrap
+must be API-compatible from one version to the next***:
+both the old and new versions of Gerbil
+(before and after recompiling the bootstrap) must be able to use them.
+
+You *can* make API-incompatible changes from one version to another version,
+but this must necessarily involve *several steps*
+each of which will be API-compatible:
+
+- First, you cannot make any backward-incompatible API change, such as
+  changing the calling convention of a function or macro e.g.
+  so you must use a symbol instead of a string,
+  or a 1-based index instead of a 0-based index, etc.
+- You *could* modify a function to temporarily accept either a symbol or string
+  and do a conversion inside; but you obviously cannot determine whether
+  an user-provided index should be interpreted as 1-based or 0-based.
+- The solution is to create a *new* API with *new* names that
+  must absolutely not clash with the old names.
+  Add a suffix or prefix such as `*`, `/2` or `%`, or take the opportunity
+  to give functions better and/or more systematic names.
+- The *old* API will temporarily coexist with use the *new* API.
+- When shared data structures are involved, the *old* API
+  as called by the previous bootstrap implementation may have to be
+  reimplemented in terms of the *new* API used by the next generation.
+- The internal representations used by the new API may thus have to include
+  extra information needed by the old API that it doesn’t need,
+  or the new API may have to maintain two redundant representations together,
+  until after the old API is removed. This extra information
+  or redundant representation can be removed in a later phase.
+- You can use the old bootstrap implementation to generate a next version
+  of the bootstrap implementation that uses the new API,
+  while the old API remains available to the old version.
+- In one or many iterations, you can make sure the old API is not used anywhere
+  anymore in Gerbil and its libraries.
+- Only after you bootstrapped a version of Gerbil that does not at all
+  use the old API, you may wholly remove that old API:
+  this is now a backward-compatible change.
+- If for some reason you really like the old name or hate the new name,
+  and “just” want to make an incompatible API change,
+  the name is made available anew after the old API was wholly removed
+  and a version that doesn’t use it has been bootstrapped into existence.
+  You may therefore start a new cycle of API changes as above to modify the API
+  to use this now-available-again name.
+- As a cultural requirement meant to facilitate semantic analysis and
+  a well-founded reproducible and debuggable bootstrapping history,
+  we ask you to commit a separate PR for each phase of such an API change,
+  such that each committed version of Gerbil can be compiled
+  by the immediate previous one (but usually not by arbitrary older ones,
+  which would be overconstraining and prevent refactoring and progress).
+
+These strictures mean that you must stage your changes in multiple commits,
+and regenerate the bootstrap compiler at each step.
+
 ### Debugging
 
 If you have been making changes in the core system and building a new
@@ -124,7 +280,7 @@ will and supports serveral commands:
   easily navigate code in emacs.
 - `env` applies the arguments in the build environment.
 
-So if you have made changes and want to rebuild gerbil, you don't have
+So if you have made changes and want to rebuild gerbil, you don’t have
 to redo everything from scratch with `make`; you can simply build the
 stage you want, and once you are satisfied you can move to the next
 stage or push your branch so that CI does the job for you.
@@ -160,4 +316,4 @@ $ ./build.sh env gerbil test ./...
 ...
 ```
 
-And that's it! Happy Hacking.
+And that’s it! Happy Hacking.
diff --git a/doc/reference/gerbil/runtime/MOP.md b/doc/reference/gerbil/runtime/MOP.md
@@ -150,7 +150,7 @@ Same as `(direct-instance? klass obj)`.
   super       := type-descriptor or #f; the struct type to inherit from
   fields      := fixnum; number of (new) fields in the type
   name        := symbol; the (displayed) type name
-  plist       := alist; type properties
+  properties  := alist; type properties
   ctor        := symbol or #f; id of constructor method
   field-names := list of symbols or #f; (displayed) field names
 
@@ -305,20 +305,21 @@ Converts *obj* to a list, which conses its type and to its fields.
 
 ## make-class-type
 ``` scheme
-(make-class-type id super slots name plist ctor) -> type-descriptor
+(make-class-type id name direct-supers direct-slots properties constructor) -> type-descriptor
 
-  id     := symbol; the type id
-  super  := list of type-descriptors or #f; super types
-  slots  := list of symbols; class slot names
-  plist  := alist; type properties
-  ctor   := symbol or #f; id of constructor method
+  id             := symbol; the unique type id
+  name           := symbol; the possibly not unique source type name used when displaying the class
+  direct-supers  := list of type-descriptors or #f; super types
+  direct-slots   := list of symbols; class slot names
+  properties     := alist; type properties (NB: not a plist)
+  constructor    := symbol or #f; id of constructor method
 
-plist elements:
+alist elements:
  (transparent: . boolean) ; controls whether the object is transparent
                             in equality and printing
  (final: . boolean)       ; controls whether the class if final
- (print: slot ...)        ; printable slots
- (equal: slot ...)        ; equality comparable slots
+ (print: slot ...)        ; list of printable slots, or boolean
+ (equal: slot ...)        ; list of equality comparable slots, or boolean
 ```
 
 Creates a new class type descriptor.

diff --git a/doc/reference/std/debug.md b/doc/reference/std/debug.md
@@ -399,6 +399,9 @@ Returns true if the `thread`'s message queue is empty.
 If the `tag` doesn't evaluate to `#f`, print the tag, then on separate lines
 the source of each expression `expr1` to `exprN` (as by `write`)
 followed by its single or multiple return values (as by `prn`).
+When an expression is preceded by a quoted form (as in `'form`)
+then that form is printed instead of the following expression
+(which can help when the expression is large and uninformative to print).
 Finally, return the values of the last expression `exprN`.
 
 You can easily wrap an expression in a `DBG` form so as to print its value,
@@ -409,11 +412,11 @@ in some part of your code.
 Example:
 ```scheme
 > (define-values (x y z) (values 1 2 3))
-> (* 10 (DBG foo: x (values [(+ x y) z] #t) (+ x y z)))
+> (* 10 (DBG foo: x (values [(+ x y) z] #t) 'result (+ x y z)))
 foo
   x => 1
   (values (@list (+ x y) z) #t) => [3 3] #t
-  (+ x y z) => 6
+  result => 6
 60
 ```
 In the above example the tag `foo` and the indented lines are printed by `DBG`,

diff --git a/doc/reference/std/errors.md b/doc/reference/std/errors.md
@@ -426,7 +426,30 @@ displaying the exception with `display-exception`).
 ```
 
 Invokes `thunk` with an exception handler that dumps the exception
-stack trace with `dump-stack-trace!`.
+stack trace with `dump-stack-trace!`
+if `(dump-stack-trace?)` is true (the default).
+
+### dump-stack-trace?
+```scheme
+(define dump-stack-trace? (make-parameter #t))
+```
+A parameter that controls whether some a stack trace will be dumped
+when an exception is presented to the user.
+
+This parameter is notably used by the `display-exception` method for
+the `Error` type in the runtime, and by `with-exception-stack-trace`
+(see above) that is also used in the default exception handler installed
+in threads spawned with `spawn-actor`.
+It is also heeded by `exit-with-error` (see below).
+
+You can `(dump-stack-trace? #f)`
+or locally `(parameterize ((dump-stack-trace? #f)) ...)`
+to disable this stack trace dump,
+in case you are building a program for end-users rather than for developers,
+and want to control what limited error output they see.
+Or you can re-enable them based on a debug flag at the CLI
+in cases you want them to provide you with extra debugging information,
+or log bug reports directly to your servers, etc.
 
 ### dump-stack-trace!
 ```scheme
@@ -451,6 +474,10 @@ This parameter controls whether `call-with-exit-on-error`, `with-exit-on-error`,
 will exit if an error is caught, rather than pass on the error
 and return to the REPL (or let a more fundamental function exit).
 
+If `dump-stack-trace?` is true, then the exception will be reprinted
+a second time with `dump-stack-trace?` bound to false, so that
+the text of the exception should be printed a second time after the stack dump.
+
 ### call-with-exit-on-error
 ```scheme
 (call-with-exit-on-error thunk)