Double-pass assembler, written in ANSI C90 for an imaginary 14-bit computer.
The goal of the project is to write an assembler for an assembly language.
The assembler needs to convert the assembly language, which is defined below, into machine code, which is also defined below.
First of all, the assembler spreads the macros in the code. This stage is called the preProcessor.
Then, spreading the macros, the assembler converts the assembly language into machine code.
It does so by going through the code twice:
-
First Pass - The assembler goes through the code and creates a symbol table, which contains all the labels in the code, and their addresses. It also converts all instructions into binary words, of course only those that do not contain a label, since it does not yet know their addresses.
-
Second Pass - The assembler goes through the code again, fills the missing lables addresses, and converts the binary code into machine code.
The computer consists of a CPU, Registers, and RAM.
The CPU has 8 registers: r0
, r1
, r2
, r3
, r4
, r5
, r6
, r7
. The size of each register is 14 bits.
The RAN has has 256 (0 - 255) memory cells, and the size of each memory cell is also 14 bits.
A cell in memory is also called a word. Each machine instruction is encoded into a number of memory words.
A word is a 14-bit number, which is divided into 6 parts, described in the following table:
Bit(s) | 13 12 | 11 10 | 9 8 7 6 | 5 4 | 3 2 | 1 0 |
---|---|---|---|---|---|---|
Meaning | param1 | param2 | opcode | source operand addressing | target operand addressing | ERA |
⚠️ I'm not going to explain the meaning of each part.
The assembly language consists of 16 different instructions:
Instruction | Opcode |
---|---|
mov |
00 |
cmp |
01 |
add |
02 |
sub |
03 |
not |
04 |
clr |
05 |
lea |
06 |
inc |
07 |
dec |
08 |
jmp |
09 |
bne |
10 |
red |
11 |
prn |
12 |
jsr |
13 |
rts |
14 |
stop |
15 |
It is also consists of 4 different directives:
.data
- Defines a sequence of integers..string
- Defines a sequence of characters..entry
- Defines a label as an entry point, so it can be uesd in other assembly files (.extern
's brother)..extern
- Defines a label as an external label. It tells the assembler that this label is defined in other assembly file (.entry
's brother).
The machine code consists of only two characters: .
& /
, where .
represents 0
and /
represents 1
.
There are 4 types of lines in the assembly language:
- Empty Line - A line that contains nothing but whitespace characters (
\t
,space(s)
or\n
). The assembler ignores these lines. - Comment Line - A line that starts with a semicolon (
;
). The assembler ignores these lines. - Instruction Line - A line that contains an instruction. The assembler converts these lines into machine code.
- Directive Line - A line that contains a directive. The assembler converts these lines into machine code.
Each line can be followed by a label
. Label is like a variable name, which can be used to reference a memory cell.
Example of a using label:
1| XYZ: mov r0, r1
2| bne LOOP(XYZ, r3)
The assembly language supports macros.
A macro is a sequence of instructions, which can be called by a single instruction.
A macro is defined by the following syntax:
1| mcr MACRO_NAME
2| MACRO_CODE
3| endmcr
And then can be called by just writing the macro's name:
MACRO_NAME
Therefor, the end result will be the same as if the macro's code was written instead of the macro's name.