Skip to content

Commit

Permalink
Initial version of the benchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
tvondra committed Feb 18, 2012
1 parent 7c09977 commit 752a739
Show file tree
Hide file tree
Showing 33 changed files with 1,692 additions and 0 deletions.
24 changes: 24 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Copyright (c) 2011, Tomas Vondra <[email protected]>
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the <organization> nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Empty file removed README
Empty file.
107 changes: 107 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
TPC-H PostgreSQL benchmark
==========================
This repository contains a simple implementation that runs a TPC-H-like
benchmark with a PostgreSQL database. It builds on the official TPC-H
benchmark available at http://tpc.org/tpch/default.asp (uses just the
dbgen a qgen parts).


Preparing dbgen and qgen
------------------------
The first thing you need to do is to prepare the tool that generates
data and queries. This step is more thoroughly explained at my blog at

http://www.fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/

but let's briefly repeat what needs to be done.

First, download the TPC-H benchmark from http://tpc.org/tpch/default.asp
and extract it to a directory

$ wget http://tpc.org/tpch/spec/tpch_2_14_3.tgz
$ mkdir tpch
$ tar -xzf tpch_2_14_3.tgz -C tpch

and then prepare the Makefile - create a copy from makefile.suite

$ cd tpch/dbgen
$ cp makefile.suite Makefile
$ nano Makefile

and modify it so that it contains this (around line 110)

CC=gcc
DATABASE=ORACLE
MACHINE=LINUX
WORKLOAD=TPCH

and compile it using `make` as usual. Now you should have `dbgen` and
`qgen` tools that generate data and queries.


Generating data
---------------
Right, so let's generate the data using the `dbgen` tool - there's one
important parameter 'scale' that influences the amount of data. It's
roughly equal to number of GB of raw data, so to generate 10GB of data
just do

$ ./dbgen -s 10

which creates a bunch of .tbl files in Oracle-like CSV format

$ ls *.tbl

and to convert them to a CSV format compatible with PostgreSQL, do this

$ for i in `ls *.tbl`; do sed 's/|$//' $i > ${i/tbl/csv}; echo $i; done;

Finally, move these data to the 'dss/data' directory or somewhere else,
and create a symlink to /tmp/dss-data (that's where tpch-load.sql is
looking for for the data from).

It's a good idea to place this directory on a ramdrive so that it does not
influence the benchmark (e.g. it's a very bad idea to place the data on the
same drive as PostgreSQL data directory).


Generating queries
------------------
Now we have to generate queries from templates specified in TPC-H benchmark.
The templates provided at tpch.org are not suitable for PostgreSQL. So
I have provided slightly modified queries in the 'dss/templates' directory
and you should place the queries in 'dss/queries' dir.

for q in `seq 1 22`
do
DSS_QUERY=dss/templates ./qgen $q >> dss/queries/$q.sql
sed 's/^select/explain select/' dss/queries/$q.sql > dss/queries/$q.explain.sql
cat dss/queries/$i.sql >> dss/queries/$i.explain.sql;
done

Now you should have 44 files in the dss/queries directory. 22 of them will
actually run the queries and the other 22 will generate EXPLAIN plan of
the query (without actually running it).


Running the benchmark
---------------------
The actual benchmark is implemented in the 'tpch.sh' script. It expects
an already prepared database and four parameters - directory where to place
the results, database and user name. So to run it, do this:

$ ./tpch.sh ./results tpch-db tpch-user

and wait until the benchmark.


Processing the results
----------------------
All the results are written into the output directory (first parameter). To get
useful results (timing of each query, various statistics), you can use script
process.php. It expects two parameters - input dir (with data collected by the
tpch.sh script) and output file (in CSV format). For example like this:

$ php process.php ./results output.csv

This should give you nicely formatted CSV file.
3 changes: 3 additions & 0 deletions drop-caches.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/sh

echo "3" > /proc/sys/vm/drop_caches
28 changes: 28 additions & 0 deletions dss/templates/1.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
-- $ID$
-- TPC-H/TPC-R Pricing Summary Report Query (Q1)
-- Functional Query Definition
-- Approved February 1998
:x
:o
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval ':1' day
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus
LIMIT 1;
38 changes: 38 additions & 0 deletions dss/templates/10.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
-- $ID$
-- TPC-H/TPC-R Returned Item Reporting Query (Q10)
-- Functional Query Definition
-- Approved February 1998
:x
:o
select
c_custkey,
c_name,
sum(l_extendedprice * (1 - l_discount)) as revenue,
c_acctbal,
n_name,
c_address,
c_phone,
c_comment
from
customer,
orders,
lineitem,
nation
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and o_orderdate >= date ':1'
and o_orderdate < date ':1' + interval '3' month
and l_returnflag = 'R'
and c_nationkey = n_nationkey
group by
c_custkey,
c_name,
c_acctbal,
c_phone,
n_name,
c_address,
c_comment
order by
revenue desc
LIMIT 20;
34 changes: 34 additions & 0 deletions dss/templates/11.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
-- $ID$
-- TPC-H/TPC-R Important Stock Identification Query (Q11)
-- Functional Query Definition
-- Approved February 1998
:x
:o
select
ps_partkey,
sum(ps_supplycost * ps_availqty) as value
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = ':1'
group by
ps_partkey having
sum(ps_supplycost * ps_availqty) > (
select
sum(ps_supplycost * ps_availqty) * :2
from
partsupp,
supplier,
nation
where
ps_suppkey = s_suppkey
and s_nationkey = n_nationkey
and n_name = ':1'
)
order by
value desc
LIMIT 1;
35 changes: 35 additions & 0 deletions dss/templates/12.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
-- $ID$
-- TPC-H/TPC-R Shipping Modes and Order Priority Query (Q12)
-- Functional Query Definition
-- Approved February 1998
:x
:o
select
l_shipmode,
sum(case
when o_orderpriority = '1-URGENT'
or o_orderpriority = '2-HIGH'
then 1
else 0
end) as high_line_count,
sum(case
when o_orderpriority <> '1-URGENT'
and o_orderpriority <> '2-HIGH'
then 1
else 0
end) as low_line_count
from
orders,
lineitem
where
o_orderkey = l_orderkey
and l_shipmode in (':1', ':2')
and l_commitdate < l_receiptdate
and l_shipdate < l_commitdate
and l_receiptdate >= date ':3'
and l_receiptdate < date ':3' + interval '1' year
group by
l_shipmode
order by
l_shipmode
LIMIT 1;
27 changes: 27 additions & 0 deletions dss/templates/13.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
-- $ID$
-- TPC-H/TPC-R Customer Distribution Query (Q13)
-- Functional Query Definition
-- Approved February 1998
:x
:o
select
c_count,
count(*) as custdist
from
(
select
c_custkey,
count(o_orderkey)
from
customer left outer join orders on
c_custkey = o_custkey
and o_comment not like '%:1%:2%'
group by
c_custkey
) as c_orders (c_custkey, c_count)
group by
c_count
order by
custdist desc,
c_count desc
LIMIT 1;
20 changes: 20 additions & 0 deletions dss/templates/14.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
-- $ID$
-- TPC-H/TPC-R Promotion Effect Query (Q14)
-- Functional Query Definition
-- Approved February 1998
:x
:o
select
100.00 * sum(case
when p_type like 'PROMO%'
then l_extendedprice * (1 - l_discount)
else 0
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
lineitem,
part
where
l_partkey = p_partkey
and l_shipdate >= date ':1'
and l_shipdate < date ':1' + interval '1' month
LIMIT 1;
40 changes: 40 additions & 0 deletions dss/templates/15.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
-- $ID$
-- TPC-H/TPC-R Top Supplier Query (Q15)
-- Functional Query Definition
-- Approved February 1998
:x
create view revenue:s (supplier_no, total_revenue) as
select
l_suppkey,
sum(l_extendedprice * (1 - l_discount))
from
lineitem
where
l_shipdate >= date ':1'
and l_shipdate < date ':1' + interval '3' month
group by
l_suppkey;

:o
select
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
from
supplier,
revenue:s
where
s_suppkey = supplier_no
and total_revenue = (
select
max(total_revenue)
from
revenue:s
)
order by
s_suppkey
LIMIT 1;

drop view revenue:s;
Loading

0 comments on commit 752a739

Please sign in to comment.