Skip to content
Andriy Makukha edited this page Nov 12, 2020 · 11 revisions

How is TMG implemented?

Internally, TMG uses the threaded code technique and consists of both interpreter and a compiler.

Compiler translates TMGL programs into an intermediate representation (also known as the driving table). The TMGL compiler is implemented in TMGL itself.

The interpreter executes the intermediate representation during the runtime. In this port it is implemented in C.

The interpreter and the intermediate representation are both contained in a single executable.

What is this driving table?

The driving table is an intermediate representation of a TMGL program. Basically, it is an array which consists of labels, literals and addresses of builtins.

In the original implementation, the driving table was just a subset of assembly code. In this implementation, the driving table consists of two C arrays: one for the actual code and another for labels within the first one.

Can a compiler developed in TMG produce machine code directly?

No, TMG-based compiler cannot directly produce machine code, as it can only output printable ASCII characters.

TMG was always meant to translate one programming language to another. For example, you can translate your code in a high-level programming language into assembly code. Then a third-party tool can be used to translate that to the machine code.

How does TMG produce an executable?

When you are using TMG, your TMGL program gets translated (compiled) into an intermediate representation first. Then the intermediate representation gets compiled together with the interpreter code into a single executable by a third-party compiler (C compiler in our case).

Can TMG compile itself?

Yes, in the above terms. But it still relies on third-party tools to produce executables, because the interpreter part is implemented in another language (C or assembly).

Strictly speaking, TMG can only compile itself into an intermediate representation. To produce an actual executable, it requires assembler (in case of the original TMG implementation) or C compiler (in case of this port). So TMG is not an entirely self-hosting compiler.

What languages can be parsed by TMG?

In 1970, Alexander Birman showed that TMG can parse all deterministic context-free languages (CFL), as well as some context-sensitive languages (non-CFL), in linear time.

Can TMG parse ambiguous grammars?

This version of TMG has some builtins for LR-parsing which can help you parse ambiguous grammars. However, these builtins are badly documented and untested. If you achieve parsing of an ambiguous grammar using TMG, please, let us know!

Why was TMG abandoned after Version 6 Unix?

First of all, TMG is notoriously hard to program in. Secondly, it is bad in parsing ambiguous grammars. Thirdly, because TMG was implemented using threaded code technique, it was too slow on the old hardware.

For these reasons, Stephen Johnson was prompted to develop Yacc, a bottom-up parser generator which quickly became the primary compiler-compiler at Bell Labs.

Another possible reason is that it was hard to port to C at that time. TMG relied on deep recursion and jumps between routines using simple GOTO logic. This port relies on tail call optimization of modern C compilers to achieve the same. But in 1970s, when Unix was being reimplemented in C before the release of Version 7, this might have been not possible yet using the available C compilers. Therefore, a deeper reimplementation of TMG would have been necessary were it decided to be ported to C back then.