forked from databricks/tpch-dbgen
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPORTING.NOTES
220 lines (181 loc) · 8.96 KB
/
PORTING.NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
# @(#)PORTING.NOTES 2.1.8.1
Table of Contents
==================
1. General Program Structure
2. Naming Conventions and Variable Usage
3. Porting Procedures
4. Compilation Options
5. Customizing QGEN
6. Further Enhancements
7. Known Porting Problems
8. Reporting Problems
1. General Program Structure
The code provided with TPC-H and TPC-R benchmarks includes a database
population generator (DBGEN) and a query template translator(QGEN). It
is written in ANSI-C, and is meant to be easily portable to a broad variety
of platforms. The program is composed of five source files and some
support and header files. The main modules are:
build.c: each table in the database schema is represented by a
routine mk_XXXX, which populates a structure
representing one row in table XXXX.
See Also: dss_types.h, bm_utils.c, rnd.*
print.c: each table in the database schema is represented by a
routine pr_XXXX, which prints the contents of a
structure representing one row in table XXX.
See Also: dss_types.h, dss.h
driver.c: this module contains the main control functions for
DBGEN, including command line parsing, distribution
management, database scaling and the calls to mk_XXXX
and pr_XXXX for each table generated.
qgen.c: this module contains the main control functions for
QGEN, including query template parsing.
varsub.c: each query template includes one or more parameter
substitution points; this routine handles the
parameter generation for the TPC-H/TPC-R benchmark.
The support utilities provide a generalized set of functions for data
generation and include:
bm_utils.c: data type generators, string management and
portability routines.
rnd.*: a general purpose random number generator used
throughout the code.
dss.h:
shared.h: a set of '#defines' for limits, formats and fixed
values
dsstypes.h: structure definitions for each table definition
2. Naming Conventions and Variable Usage
Since DBGEN will be maintained by a large number of people, it is
particularly important to observe the coding, variable naming and usage
conventions detailed here.
#define
--------
All #define directives are found in header files (*.h). In general,
the header files segregate variables and macros as follows:
rnd.h -- anything exclusively referenced by rnd.c
dss.h -- general defines for the benchmark, including *all*
extern declarations (see below).
shared.h -- defines related to the tuple definitions in
dsstypes.h. Isolated to ease automatic processing needed by many
direct load routines (see below).
dsstypes.h -- structure definitons and typedef directives to
detail the contents of each table's tuples.
config.h -- any porting and configuration related defines should
go here, to localize the changes necessary to move the suite
from one machine to another.
tpcd.h -- defines related to QGEN, rather than DBGEN
extern
------
DBGEN and QGEN make extensive use of extern declarations. This could
probably stand to be changed at some point, but has made the rapid
turnaround of prototypes easier. In order to be sure that each
declaration was matched by exactly one definition per executatble,
they are all declared as EXTERN, a macro dependent on DECLARER. In
any module that defines DECLARER, all variables declared EXTERN will
be defined as globals. DECLARER should be declared only in modules
containing a main() routine.
Naming Conventions
------------------
defines
o All defines use upper case
o All defines use a table prefix, if appropriate:
O_* relates to orders table
L_* realtes to lineitem table
P_* realtes to part table
PS_* relates to partsupplier table
C_* realtes to customer table
S_* relates to supplier table
N_* relates to nation table
R_* realtes to region table
T_* relates to time table
o All defines have a usage prefix, if appropriate:
*_TAG environment variable name
*_DFLT environment variable default
*_MAX upper bound
*_MIN lower bound
*_LEN average length
*_SD random number seed (see rnd.*)
*_FMT printf format string
*_SCL divisor (for scaled arithmetic)
*_SIZE tuple length
3. Porting Procedures
The code provided should be easily portable to any machine providing an
ANSI C compiler.
-- Copy makefile.suite to makefile
-- Edit the makefile to match the name of your C compiler
and to include appropriate compilation options in the CFLAGS
definition
-- make.
Special care should be taken in modifying any of the monetary calcu-
lations in DBGEN. These have proven to be particularly sensitive to
portability problems. If you decide to create the routines for inline
data load (see below), be sure to compare the resulting data to that
generated by a flat file data generation to be sure that all numeric
conversions have been correct.
If the compile generates errors, refer to "Compilation Options", below.
The problem you are encountering may already have been addressed in the
code.
If the compile is successful, but QGEN is not generating the appropriate
query syntax for your environment, refer to "Customizing QGEN", below.
For other problems, refer to "Reporting Problems" at the end of this
document.
4. Compilation Options
config.h and makefile.suite contain a number of compile time options intended
to make the process of porting the code provided with TPC-H/TPC-R as easy as
possible on a broad range of platforms. Most ports should consist of reviewing
the possible settings described in config.h and modifying the makefile
to employ them appropriately.
5. Customizing QGEN
QGEN relies on a number of vendor-specific conventions to generate
appropriate query syntax. These are controlled by #defines in tpcd.h,
and enabled by a #define in config.h. If you find that the syntax
generated by QGEN is not sufficient for your environment you will need
to modify these to files. It is strongly recomended that you not change
the general organization of the files.
Currently defined options are:
VTAG -- marks a variable substitution point [:]
QDIR_TAG -- environent variable which points to query templates
[DSS_QUERY]
GEN_QUERY_PLAN -- syntax to generate a query plan ["Set Explain On;"]
START_TRAN -- syntax to begin a transaction ["Begin Work;"]
END_TRAN -- syntax to end a transaction ["Commit Work;"]
SET_OUTPUT -- syntax to redirect query output ["Output to"]
SET_ROWCOUNT -- syntax to set the number of rows returned
["{return %d rows}"]
SET_DBASE -- syntax to connect to a database
6. Further Enhancements
load_stub.c provides entry points for two likely enhancements.
The ld_XXXX routines make it possible to load the
database directly from DBGEN without first writing the database
population out to the filesystem. This may prove particularly useful
when loading larger database populations. Be particularly careful about
monetary amounts. To assure portability, all monetary calcualtion are
done using long integers (which hold money amounts as a number of
pennies). These will need to be scaled to dollars and cents (by dividing
by 100), before the values are presented to the DBMS.
The hd_XXXX routines allow header information to be written before the
creation of the flat files. This should allow system which require
formatting information in database load files to use DBGEN with only
a small amount of custom code.
qgen.c defines the translation table for query templates in the
routine qsub().
varsub.c defines the parameter substitutions in the routine varsub().
If you are porting DBGEN to a machine that is not supports a native word
size larger that 32 bits, you may wish to modify the default values for
BITS_PER_LONG and MAX_LONG. These values are used in the generation of
the sparse primary keys in the order and lineitem tables. The code has
been structured to run on any machine supporting a 32 bit long, but
may be slightly more efficient on machines that are able to make use of
a larger native type.
7. Known Porting Problems
The current codeline will not compile under SunOS 4.1. Solaris 2.4 and later
are supported, and anyone wishing to use DBGEN on a Sun platform is
encouraged to use one of these OS releases.
8. Reporting Problems
The code provided with TPC-H/TPC-R has been written to be easily portable,
and has been tested on a wide variety of platforms, If you have any
trouble porting the code to your platform, please help us to correct
the problem in a later release by sending the following information
to the TPC D subcommittee:
Computer Make and Model
Compiler Type and Revision Number
Brief Description of the problem
Suggested modification to correct the problem