-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathHACKING
1476 lines (1103 loc) · 44.3 KB
/
HACKING
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
========================
GNU Poke - Hacking Notes
========================
Welcome, adventurous poker! This file contains useful information for
you.
Copyright (C) 2019 Jose E. Marchesi
This file is part of GNU poke.
GNU poke is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
GNU poke is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU poke. If not, see <https://www.gnu.org/licenses/>.
.. Please be as good as to update the table of contents below if you
modify the sectioning of the document. If in Emacs,
M-xrst-toc-update should take care of it automatically.
.. contents::
..
1 Nomenclature
2 Maintainers
2.1 GNU Maintainer
2.2 Global Reviewers
2.3 Write After Approval
2.4 Personal Branches
2.5 Installing Obvious Changes
3 Development Environment
3. 1 Autotools
3. 2 Dejagnu
3. 3 Flex
3. 4 readline
3. 5 Boehm GC
3. 6 Jitter
3. 7 libtextstyle
3. 8 Building
3. 9 Gettext
3.10 Running an Uninstalled Poke
3.11 Continuous Integration
4 Coding Style and Conventions
4.1 Writing C
4.2 Writing Poke
4.3 Writing RAS
5 Deciding on What to Work on
6 Submitting a Patch
7 Maintenance
8 Poke Architecture
9 The Poke Compiler
9.1 Compiler Overview
9.2 The bison Parser in pkl-tab.y
9.3 The AST
9.4 Compiler Passes and Phases
9.4.1 Naming Conventions for Phases
9.4.2 Naming Conventions for Handlers
9.4.3 Transformation Phases
9.4.4 Analysis Phases
9.4.5 Type System Phases
9.4.6 Middle End Handlers should be Re-executable
9.5 The Type System
9.5.1 Type Expressions
9.6 Adding Compiler Built-Ins
10 The Poke Virtual Machine
10.1 Exception Handling
10.2 Offsets and bit-offsets in the PVM
11 Memory Management
11.1 Using ASTREF
11.2 Using ASTDEREF
11.3 PVM values in PVM routines
11.4 PVM values in AST nodes
12 Terminal Handling
12.1 pk-term
12.2 Styling Classes
12.3 Debugging Styling
13 Debugging Poke
13.1 Building with Debugging support
13.2 Using GDB extensions
13.3 Valgrind and Poke
13.4 Debugging PVM Assembly Code
14 Future Developments
15 Appendix: The Source Tree
15.1 The Compiler
15.2 The Poke Virtual Machine
15.3 The IO Subsystem
15.4 Poke Program
15.5 Pickles and Libraries
15.6 Test Suite
15.7 Documentation
15.8 Other Stuff
Nomenclature
------------
We call ``poke`` the program. When the context may induce confusion
(since ``poke`` is a pretty common word) then we use ``GNU poke``.
``Poke`` (with upper case ``P``) is the name of the domain-specific
language implemented by ``poke``.
A ``pickle`` is a Poke source file containing definitions of types,
variables, functions, etc, that conceptually apply to some definite
domain. For example, ``elf.pk`` is a pickle that provides facilities
to poke ELF object files. Pickles are not necessarily related to file
formats: a set of functions to work with bit patterns, for example,
could be implemented in a pickle ``bitpatterns.pk``.
Maintainers
-----------
GNU Maintainer
~~~~~~~~~~~~~~
Jose E. Marchesi <[email protected]>
Global Reviewers
~~~~~~~~~~~~~~~~
Jose E. Marchesi <[email protected]>
Write After Approval
~~~~~~~~~~~~~~~~~~~~
The people below have write access to the git repository, and can
install their changes after getting explicit approval from a global
reviewer.
Egeyar Bagcioglu <[email protected]>
Luca Saiu <[email protected]>
Darshit Shah <[email protected]>
Personal Branches
~~~~~~~~~~~~~~~~~
Anyone having write access to the git repository is allowed to push
and maintain personal branches. These branches should be called
``WHO/WHAT``, where ``WHO`` is the nick identifying the owner of the
branch and ``WHAT`` a description of what it contains.
Example::
jemarch/hyperlinks-server
Personal branches are intended to ease the interaction between
developers, and to provide a convenient basis for testing large
changes.
Personal branches can be rebased, and deleted. Please do not write
into a personal branch unless you have the explicit approval of the
branch owner.
Installing Obvious Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~
Anyone having write access to the git repository is allowed to push
obvious changes to non-personal branches. The "obvious" category
includes typos in comments, renaming of variables, etc.
If you commit and push a non-obvious change, you are still required to
send an email to the mailing list stating you installed the change.
Please include a suggestive tag in your email's subject, something
like ``[COMMITTED]``. Also, make sure to include the patch itself.
Development Environment
-----------------------
Autotools
~~~~~~~~~
This distribution uses whatever versions of Automake, Autoconf, and
Gettext are listed in NEWS; usually the latest ones released. If you
are getting the sources from git (or change configure.ac), you'll need
to have these tools installed to (re)build. You'll also need
help2man. All of these programs are available from
ftp://ftp.gnu.org/gnu.
Dejagnu
~~~~~~~
The poke testsuite uses DejaGNU. Please install it if you indent to
run the tests. If you want to hack poke, you definitely want to run
the tests :)
Flex
~~~~
You will need a recent version of flex, since we are using some recent
options like "reentrant" or "bison-bridge". flex version 2.6.1 works
fine.
readline
~~~~~~~~
Poke uses libreadline in order to provide a nice line editor in the
``(poke)`` prompt. The ideal version of use is GNU readline. Any
recent version will suffice.
However, in case you don't have libreadline installed, a minimum
version from gnulib is used instead.
Boehm GC
~~~~~~~~
poke uses the Boehm conservative garbage collector for managing the
memory of some of its subsystems. Therefore, you must have it
installed.
Note that if you have the Boehm GC installed in a prefix different
from the one that contains pkg-config, you need to set PKL_CONFIG_PATH
so that pkg-config finds it::
export PKG_CONFIG_PATH=${INSTALL_PREFIX_OF_LIBGC}/lib/pkgconfig
Jitter
~~~~~~
In order to build and run poke, you need Luca Saiu's jitter. Jitter
is available at http://ageinghacker.net/git/cgit.cgi/jitter.
The appropriate version of Jitter is now downloaded and bootstrapped
automatically by Poke's ``bootstrap`` script, which frees the user
from the annoyance of installing Jitter as a dependency.
Configuring and compiling Poke will also compile and configure
Jitter in a subdirectory. Jitter, when configured in ``sub-package
mode`` as Poke does, only generates static libraries and requires
no installation.
libtextstyle
~~~~~~~~~~~~
GNU poke uses libtextstyle in order to provide styled output. If the
library is not found, then a dummy version of it from gnulib is used
instead... that does not any styling!
At the moment libtextstyle lives in a subdirectory of GNU gettext.
See http://www.gnu.org/s/gettext for more information.
Building
~~~~~~~~
After getting the git sources, and installing the tools above, you can
run::
$ ./bootstrap --skip-po
Then, you can run ``configure``::
$ mkdir build/ && cd build
$ ../configure
Finally::
$ make
$ make check
Gettext
~~~~~~~
When updating gettext, besides the normal installation on the system,
it is necessary to run gettextize -f in this hierarchy to update the
po/ infrastructure. After doing so, rerun gnulib-tool --import since
otherwise older files will have been imported. See the Gnulib manual
for more information.
Running an Uninstalled Poke
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once poke is compiled, you can run it before installing by defining
the ``POKEDATADIR`` environment variable to point to the ``src/``
directory in the sources tree. Another variable to set is
``POKESTYLESDIR``.
For example::
$ pwd
/home/jemarch/gnu/hacks/poke/build/
$ export POKEDATADIR=../src POKESTYLESDIR=../etc
$ ./src/poke
Continuous Integration
~~~~~~~~~~~~~~~~~~~~~~
The package is built automatically, at regular intervals. You find the
latest build results here::
https://gitlab.com/gnu-poke/ci-distcheck/pipelines
https://gitlab.com/gnu-poke/ci-distcheck/-/jobs?scope=finished
Coding Style and Conventions
----------------------------
Writing C
~~~~~~~~~
In Poke we follow the GNU Coding Standards. Please see
http://www.gnu.org/prep/standards/.
Writing Poke
~~~~~~~~~~~~
- Do not separate magnitudes and units when writing offsets. Do it
like this::
16#B
instead of::
16 #B
- Use Camel_Case for type names, but do not use Camel_Case for
variable/function names!
- Surround pretty-printed values with #< and >. This is to notify
the reader that the value has been pretty-printed.
Writing RAS
~~~~~~~~~~~
We recommend to use the Emacs mode in ``etc/poke-ras-mode.el`` to
write ``.pks`` files.
Writing poke Tests
------------------
The poke testsuites live in the ``testsuite/`` subdirectory. This
section contains useful hints for adding tests there.
Naming Tests
~~~~~~~~~~~~
For testing a functionality ``foo``, name your test ``foo.pk`` or
``foo-N.pk`` where ``N`` is a number.
If the test is a ``do-compile`` whose compilation is expected to fail,
name the test ``func-diag.pk`` or ``func-diag-N.pk``. Here "diag"
means diagnostic.
Always set obase
~~~~~~~~~~~~~~~~
If your test relies on printing integer values in the REPL (or using
the ``%v`` formatting tag in a ``printf``) please make sure to set an
explicit output numerical base, like in::
/* { dg-command {.set obase 10} } */
This way, we won't have to change the tests if at some point we change
the default obase.
Put each test in its own file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you are writing tests for a specific functionality, like for
example a standard function ``foo``, it may seem logical to put all
the tests in a single file ``foo.pk`` like::
/* { dg-do run } */
/* { dg-command {foo (1)} } */
/* { dg-output "expected result" } */
/* { dg-command {foo (1)} } */
/* { dg-output "\nexpected result" } */
[... and so on ...]
However, this is not a good idea. If some of the "subtests" fail, it
becomes difficult to determine which one is the culprit looking at the
test log file.
It is better to put each test in it's own file: ``foo-1.pk``,
``foo-2.pk`` and so on.
dg-output may require a newline
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If despite the advise above you really need to put more than dg-output
in a dg-run test file, please be aware you need to prefix all of them
(but the first one) with a newline, like in::
/* { dg-output "foo" } */
/* { dg-output "\nbar" } */
/* { dg-output "\n baz" } */
Deciding on What to Work on
---------------------------
We maintain a detailed task list in the ``TODO`` file at the root of
the source tree. Please take a look.
Submitting a Patch
------------------
If you hack a feature/improvement/bugfix for poke and want to get it
integrated upstream, please keep the following points in mind:
- If your patch changes the user-visible characteristics of poke,
please include an update for the user manual.
- If your patch adds or changes the way poke works internally, in a
significant way, please consider including an update for the
``HACKING`` file.
- Please include a GNU-style ChangeLog in the patch description, but
do not include it in the thunks. This is to ease reviewers to apply
your patch for testing. Of course, include the thunk in the final
push! (We will get rid of manually ChangeLog entries soon.)
- Make sure to run ``make syntax-check`` before submitting the patch,
and fix any reported problem. Note that the maintainer reviewing
your patch will also do this, so this is a great time to save an
iteration ;)
- Let's keep poke.git master linear... no merges please. Pull with
``--ff-only``.
- Send the patch to the ``poke-devel`` mailing list.
- Use text email only. No html please.
- Inline the patch in the body of your email, or alternatively attach
it as ``text/x-diff`` or ``text/x-patch``. This is to ease
reviewers to quote parts of the patch.
Maintenance
-----------
This section describes ``make`` targets that performs several
maintenance tasks.
syntax-check
Run several syntax-related checks in the source files. It is useful
to run this target before submitting code to be reviewed, and while
reviewing other people's code.
Note that sometimes the results have to be taken with a pinch of
salt. This happens, for example, when a rule oriented to C is
applied to, say, an AWK file. In these cases, consider adding a
``.x-sc_*`` fine-tuning file. But please ask in poke-devel first.
coverage
This target builds *poke* with code coverage support, runs the
testsuite, and generates a nice html report under
``$(top_builddir)/doc/coverage/``. It is necessary to have the
``lcov`` program for this to work.
refresh-po
This target download the latest available translations from the
translationproject and installs them in the source tree.
update-copyright
Run this rule once per year (usually early in January) to update all
the copyright years in the project. By default this excludes all
variants of COPYING. Exceptions to this procedure (such as
``ChangeLog..*`` for rotated change logs) can be added in the file
``.x-update-copyright``.
Poke Architecture
-----------------
This figure depicts the overall architecture of Poke::
+----------+
| compiler |
+----------+ +------+
| | |
v | |
+----------+ | |
| PVM | <--->| IO |
+----------+ | |
^ | |
| | |
v +------+
+----------+
| command |
+----------+
The Poke Compiler
-----------------
Compiler Overview
~~~~~~~~~~~~~~~~~
This figure depicts the architecutre of the compiler::
/--------\
| source |
\---+----/
|
v
+-----------------+
| Parser |
+-----------------+
| analysis and |
| transformation |
| phases |
+-----------------+
| code generation |
| phase |
+-----------------+
| Macro assembler |
+-----------------+
|
v
/---------\
| program |
\---------/
The bison Parser in pkl-tab.y
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The only purpose of the bison parser in pkl-tab.y is to do the
syntactic analysis, build the initial AST, and set the locations of
the AST nodes.
Unfortunately, currently it also does some extra work, due to
limitations in the LARL parser:
- It builds the compile-time environment and register type, variable
and function declarations.
- It annotates variables with their lexical addresses.
- It links return statements with their containing functions.
- It annotates return statements with he number of lexical frames they
should pop before exitting the function.
As we shall see below, any further analysis and transformations on the
AST are performed by the compiler phases, which are implemented
elsewhere. This greatly helps to keep the parser code clean and easy
to read, and also eases changing the syntactic structure of poke
programs.
The AST
~~~~~~~
Compiler Passes and Phases
~~~~~~~~~~~~~~~~~~~~~~~~~~
These are the phases currently implemented in the poke compiler (the
phases marked with a * are optional)::
[parser]
--- Front-end pass
trans1 Transformation phase 1.
anal1 Analysis phase 1.
typify1 Type analysis and transformation 1.
promo Operand promotion phase.
trans2 Transformation phase 2.
* fold Constant folding.
typify2 Type analysis and transformation 2.
trans3 Transformation phase 3.
anal2 Analysis phase 2.
--- Middle-end pass
trans4 Transformation phase 4.
--- Back-end pass
analf Analysis final phase.
gen Code generation.
The phases above are organized in several passes:
Pass1
anal1 typify1 promo trans2 fold typify2 trans3 anal2
Pass2
trans4
Pass3
analf gen
Naming Conventions for Phases
.............................
We use the following convention to name phases::
{NAME}{SUFFIX}
where ``NAME`` reflects a phase category (see below) and ``SUFFIX`` is
usually an integer that specifies the order in which the phases are
applied. Thus, for example, ``name4`` is performed after ``name1``.
Sometimes, ``SUFFIX`` is ``f`` (meaning "final").
The suffix is not used if there is only one phase in the given
category.
We use the following phase categories:
anal
For phases whose main purpose is to perform checks on the AST,
and/or the contents of the AST nodes, and emit errors/warnings.
trans
For phases whose main purpose is to alter the structure of the AST,
and/or the contents of the AST nodes.
typify
For phases whose main purpose is to perform type checks, and
otherwise do work on types.
promo
For phases whose main purpose is to perform coercions wherever
appropriate. Currently there is only one phase in this category.
fold
For phases whose main purpose is to pre-compute areas of the AST
whenever it is possible to do so at compile-time. Currently there
is only one phase in this category, that performs constant folding.
gen
For phases whose main purpose is to generate PVM code. Currently
there is only one phase in this category.
The phases in category ``NAME`` are implemented in the source files
``src/pkl-NAME.[ch]``.
Naming Conventions for Handlers
...............................
We use the following convention to name phase handlers::
pkl_PHASE_{ps,pr}_NODE
where ``PHASE`` can be a complete phase name (like ``typify1``) if the
handler is to be installed in that phase only, or a phase category
name (like ``typify``) if the handler is to be installed in several
phases in that category. If the phase is to be executed in pre-order,
``pr`` follows, otherwise, ``ps``. Finally, ``NODE`` is the name of
the AST node.
For example, the handler::
pkl_anal1_ps_comp_stmt
is installed in the phase ``anal1``, executes in post-order, and
serves the AST nodes with code ``PKL_AST_COMP_STMT``.
Transformation Phases
.....................
trans1
- Finishes strings by expanding \-sequences, emitting diagnostics if
an invalid \-sequence is found.
trans4
- Reverses the list of actual arguments in function calls, so the
code generator tackles them in the right (reversed) order, as it
is expected by the callee.
Analysis Phases
...............
anal1
- Checks that every return statement is linked to a function.
- Checks that no return statement is linked to a void function.
Type System Phases
..................
typify1
- Checks that the expression in which a funcall is applied is a
function, and that the types of the formal parameters mach the
types of the funcall arguments.
- Checks that void functions are not called in contexts where a
value is expected.
typify2
- Checks that the type of the expression in a return statement
matches the return type of its containing function.
Middle End Handlers should be Re-executable
...........................................
When a type is referenced by name, for example in a map::
Foo @ 0#B
The AST associated with the type is processed again thru the compiler
middle-end phases. This means that if a handler modifies an AST
subtree, it should either do it in a way the new structure will be
still valid if submitted to the same handler again.
An example of this is the ``pkl_trans1_ps_print_stmt`` handler.
The Type System
~~~~~~~~~~~~~~~
This section describes the type system implemented in the 'poke'
language.
Type Expressions
................
A *type expression* denotes some particular type. Type expressions
can be one of:
A *simple type*
Simple types are types that are not composed of other types. In
this discussion we use the following sexp-like notations for them:
(int N)
Signed integer of N bits, where 0 < N <= 64.
(uint N)
Likewise, but the integer is unsigned.
string
NULL-terminated C-like string.
void
This is the null type. Used for several purposes.
A *product*
Products of two type expressions are used to aggregate types in more
complex structures, such as lists. We denote them by using the
following sexp-like notation:
(T1 . T2)
product of the type expressions T1 and T2.
In order to simplify, we use the same list abbreviation used by Lisp
in order to denote aggregations of types built with products::
(T1 . (T2 . (T3 . T4))) -> (T1 T2 T3 T4)
Note that type products are not really valid types by themselves.
An *array type*
A *struct type*
Type expressions for structs are characterized by many attributes.
We denote these expressions by using the following sexp-like
notation.
(struct PINNED ((L1 N1 T1 C1)...))
where L1 is the label of the first element: a poke expression
evaluating to an offset. N1 is the name of the element, which is
optional. T1 is a type expression denoting the type of the
element. C1 is a poke expression evaluating to a boolean; it is
the constraint associated to the element. Of all these
attributes, only T1 is mandatory. PINNED is a boolean indicating
whether structs having this type are pinnned.
A *function type*
Type expressions for functions are characterized by a type
expression denoting the types of it's arguments and the type of the
value returned by the function. We denote them using the following
sexp-like notation:
(fun T1 T2)
where T1 is the type of the arguments to the function, and T2 is
the type of the value returned by the function.
Usually T1 will be an aggregation of types built as nested
products. For example, the type expression for a function that
takes three 32-bit signed integers and returns a string is::
(fun ((int 32) (int 32) (int 32)) string)
If a type expression denotes the type of a function which doesn't
take any argument, T1 should be 'void'. Likewise, if the function
doesn't return a value, T2 should be 'void'.
Adding Compiler Built-Ins
~~~~~~~~~~~~~~~~~~~~~~~~~
Compiler built-ins are predefined functions, provided by the compiler,
that generate particular assembler instructions.
The first step in defining a new built-in is to make the lexer to
recognize tokens of the form ``__PKL_BUILTIN_NAME__`` where ``NAME``
is some meaningful name, like for example ``RAND``::
"__PKL_BUILTIN_RAND__" { return BUILTIN_RAND; }
Then, add a new rule to the rule ``comp_stmt`` in the bison parser.
Built-ins are equivalent to compound statements. For example, this is
the rule for the rand built-in::
| pushlevel BUILTIN_RAND
{
$$ = pkl_ast_make_builtin (pkl_parser->ast,
PKL_AST_BUILTIN_RAND);
PKL_AST_LOC ($$) = @$;
/* Pop the frame pushed by the `pushlevel' above. */
pkl_parser->env = pkl_env_pop_frame (pkl_parser->env);
}
Next step is to generate the code for the built-in. This is done
expanding the ``pkl_gen_ps_comp_stmt`` rule in the code generation.
Keep in mind that the generated code should conform a valid function
body. For example, this is the code generation part for rand::
case PKL_AST_BUILTIN_RAND:
pkl_asm_insn (PKL_GEN_ASM, PKL_INSN_RAND);
pkl_asm_insn (PKL_GEN_ASM, PKL_INSN_RETURN);
break;
The final step is to define the built-in function proper, in the
compiler run-time, in ``pkl-rt.pk``::
defun rand = int<32>: __PKL_BUILTIN_RAND__;
The Poke Virtual Machine
------------------------
Exception Handling
~~~~~~~~~~~~~~~~~~
Exception types are signed 32-bit integers, and are defined in
``src/pkl-rt.pkl``.
There are two ways an exception can be raised in the PVM:
- Explicitly, when the instruction ``raise`` is executed.
- Implicitly, when some instruction needs to fail. For example,
an integer division instruction divides by zero.
In either case, the treatment of a raised exception is the same:
1. Pop an exception handler from the exception handler stack.
2. If the exception handler matches the raised exception type, then
i. Restore the heights of the main and return stacks.
ii. Restore the dynamic environment.
iii. Push the cached exception type to the stack.
iv. Branch to the exception handler.
3. Repeat.
The default exception handler, which catches "unhandled" exceptions,
is installed by the macro-assembler in ``src/pkl-asm.c:pkl_asm_new``
and ``src/pkl-asm.c:pkl_asm_finish``. It calls the function
``_pkl_exception_handler``, that is defined in the compiler runtime in
``src/pkl-rt.pkl``.
Offsets and bit-offsets in the PVM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The PVM supports a ``pvm_off`` boxed value, to denote pairs of
magnitudes and units. Both accessor macros (in ``pvm-val.h``) and PVM
instructions (``ogetm``, ``ogetu``) are provided to access their
components.
Many other PVM entities need to denote offsets in a way or another.
For example, struct fields in ``pvm_struct`` values need to record
their relative offset with respect the beginning of the struct.
It may come to mind, quite naturally, to use ``pvm_off`` values to
denote these offsets. It is very elegant. However, we decided to use
"bit offsets" instead, stored in 64-bit ``pvm_long`` values.
There are two reasons for this:
- First of all, performance. It is fairly common to operate with the
absolute value of these offsets, in bits. In fact, in most cases
that is the only purpose of maintaining them. Having them stored in
``pvm_off`` values means we have to multiply every time we want to
get their magnitude. This is a waste, for no good reason.
- To avoid code coupling. PVM offsets are very cool, but they are
also complex: the unit is arbitrary. This means in many cases we
have to assume the nature of the unit, mainly bits. This is very
fragile.
So, the take-home message is: in the PVM, restrict the presence of
``pvm_off`` values to the ones generated by the code generator.
Whenever an offset is needed in some internal PVM structure, use
bit-offsets instead encoded as ``ulong<64>`` values.
Memory Management
-----------------
Different parts of poke use different strategies for memory
management:
- The compiler front-end uses reference counting to handle AST nodes.
- The PVM uses the Boehm GC collector for values and the run-time
environment.
- Everything else uses ``malloc``/``free``.
This sometimes leads to tricky situations, some of which are
documented in the subsections below.
Using ASTREF
~~~~~~~~~~~~
The AST uses reference counting in order to manage the memory used by
the nodes. Every time you store a pointer to an AST node, you should
use the macro ``ASTREF`` in order to increase it's counter::
pkl_ast_node foo = ASTREF (node);
Note that the ``pkl_ast_make_*`` constructors do ``ASTREF``
internally, so you don't need to use it in calls like::
pkl_ast_node new = pkl_ast_make_struct (ast, 5, elems_node);
There is a little caveat: the way ASTREF is defined, it requires a
l-value to work properly. Therefore, this doesn't work::
pkl_ast_node foo = ASTREF (PKL_AST_TYPE (node));
instead, write::
pkl_ast_node type = PKL_AST_TYPE (node);
pkl_ast_node foo = ASTREF (type);
Using ASTDEREF
~~~~~~~~~~~~~~
``ASTDEREF`` decreases the reference counter of the provided AST
node. The passed value should be a l-value.
In practice you will seldom find yourself in the need to use
``ASTDEREF``. Just make sure that every ``ASTREF`` is paired with a
``pkl_ast_node_free``.
However, there are situations where ``ASTDEREF`` is necessary in order
toavoid a memory leak. For example, consider transformations like ``a
-> b`` to ``a -> x -> b``. In that case, you sould use something
like::
b = PKL_AST_KIND_WHAT (node);
x = pkl_ast_make_xxx (ast, ASTDEREF (b));
PKL_AST_KIND_WHAT (node) = ASTREF (x);
This works because ``pkl_ast_make_xxx`` does an ``ASTREF`` to ``b``
internally. The final result is that the reference counter of ``b``
doesn't change at all.
PVM values in PVM routines
~~~~~~~~~~~~~~~~~~~~~~~~~~
PVM routines (data structures of type ``pvm_routine``) are
allocated by Jitter in complicated data structures, internally relying
on ``malloc``. Their content is therefore not automatically visible to
the GC.
Now, the instructions in a routine can contain literal PVM values, and
some of these values will be boxed. For example, the following
routine contains a pointer to a ``pvm_val_box``::
;; Initialize the element index to 0UL, and put it
;; in a local.
push ulong<64>0
regvar $eidx
There are two places where PVM routines are stored in other data
structures: in closures, and in the compiler.
A closure is a kind of PVM value itself, and therefore allocated by
the GC. It is composed by a PVM routine, a run-time environment and
an entry point into the routine (as a Jittery VM ``program point``)::
struct pvm_cls
{
pvm_routine routine;
pvm_program_point entry_point;
struct pvm_env *env;
};
However, since ``routine`` is malloc-allocated, the GC can't traverse
it. Consequently, the references to contained boxed values won't be
accounted for, and these values will be collected if there are no more
refences to them!
The solution, recommended by Luca Saiu, is to keep an array of
pointers in the closure structure, containing the pointers to every
boxed value used in ``routine``::
struct pvm_cls
{
pvm_routine routine;
void **pointers;
const void *entry_point;
struct pvm_env *env;
};
The subsystem responsible for collecting the pointers is the
macro-assembler in ``pkl-asm.c``. ``pkl_asm_finish`` will return the
array of pointers, and it is up to the caller (in the code generator)
to install it the corresponding closure structure.
The second place where a PVM routine is stored in other data
structures is the compiler functions ``pkl_compile_file`` and
``pkl_compile_buffer``. In both functions, the compiled PVM routine
is executed and then discarded. However, it is still required to have
the ``pointers`` array linked from the C stack, to prevent the GC from
freeing values used in the routine. That's the purpose of the weird
local variables, which are set but never used::