Skip to content

Commit

Permalink
Merge pull request #285 from mabel-dev/opteryx-0.19.1
Browse files Browse the repository at this point in the history
  • Loading branch information
rschu1ze authored Jan 13, 2025
2 parents b5438d0 + 87b812d commit 6c877ba
Show file tree
Hide file tree
Showing 3 changed files with 101 additions and 72 deletions.
85 changes: 57 additions & 28 deletions opteryx/README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,59 @@
# Opteryx

Opteryx is an in-process query engine written in Python/Cython, that uses Apache Arrow as its in-memory format. For more information, please check <https://opteryx.dev/>

We use the split parquet files for benchmarking and collect them into a folder. Opteryx is an ad hoc engine so there is no loading or preprocessing of the files before querying.

## Generate benchmark results

The steps are broadly to:
- create the environment
- install python
- download the benchmark
- run the benchmark

1. manually start a AWS EC2 instance
- Ubuntu 24
- 64-bit Architecture
- `c6a.4xlarge`
- Root Storage: 500GB gp2 SSD
- Advanced: EBS-optimized, disabled
1. wait for status check passed, then ssh to EC2 `ssh ec2-user@{ip}`
1. `sudo apt-get update -y`
1. `sudo apt-get install git -y`
1. `git clone https://github.com/ClickHouse/ClickBench`
1. `cd ClickBench/opteryx`
1. `sudo ./benchmark.sh`

### Know Issues:

1. Not all functions used in queries are supported
Opteryx is an in-process SQL query engine written in Python/Cython that leverages Apache Arrow as its in-memory format. Designed for ad hoc queries, Opteryx directly queries data from storage without requiring any preloading or preprocessing.

For more information, visit:

- [Opteryx Documentation](https://opteryx.dev/)
- [Opteryx GitHub Repository](https://github.com/mabel-dev/opteryx)

This page provides instructions for benchmarking Opteryx using the split Parquet files provided by ClickBench.

---

## Generating Benchmark Results

To generate benchmark results, follow these steps:

### **High-level Steps**
1. Set up the environment.
2. Install Python and required dependencies.
3. Download the benchmark dataset.
4. Run the benchmark script.

### **Detailed Instructions**

1. **Start an AWS EC2 instance**
- OS: Ubuntu 24
- Architecture: 64-bit
- Instance Type: `c6a.4xlarge`
- Root Storage: 500 GB gp2 SSD
- Advanced Settings: Ensure EBS-optimization is **disabled**.

2. **SSH into the instance** (after the status checks are complete):
~~~bash
ssh ec2-user@{ip}
~~~

3. **Update the package list and install Git**
~~~bash
sudo apt-get update -y
sudo apt-get install git -y
~~~

4. **Clone the ClickBench repository**
~~~bash
git clone https://github.com/ClickHouse/ClickBench
cd ClickBench/opteryx
~~~

5. **Run the benchmark script**
~~~bash
sudo ./benchmark.sh
~~~

### Known Issues

- `COUNT(DISTINCT a)` does not distinct the values and instead performs `COUNT(a)`
- Queries 28 and 29 fail due to errors with result handling.
- Queries 33 and 34 fail due to Out of Memory (OOM) errors.
2 changes: 1 addition & 1 deletion opteryx/benchmark.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ source ~/opteryx_venv/bin/activate

# Upgrade pip in the virtual environment
~/opteryx_venv/bin/python -m pip install --upgrade pip
~/opteryx_venv/bin/python -m pip install --upgrade opteryx==0.19.0
~/opteryx_venv/bin/python -m pip install --upgrade opteryx==0.19.1

# Download benchmark target data, partitioned
mkdir -p hits
Expand Down
86 changes: 43 additions & 43 deletions opteryx/results/c6a.4xlarge.json
Original file line number Diff line number Diff line change
@@ -1,57 +1,57 @@
{
"system": "Opteryx",
"date": "2025-01-03",
"date": "2025-01-12",
"machine": "c6a.4xlarge, 500gb gp2",
"cluster_size": 1,

"tags": ["stateless", "column-oriented", "serverless"],
"tags": ["stateless", "column-oriented", "serverless", "embedded"],

"load_time": null,
"data_size": null,
"load_time": 0,
"data_size": 14737666736,

"result": [
[77.68286019,29.084952828,23.55688197],
[77.810634329,26.0867356,24.86595307],
[79.230139546,26.686569177,25.762528345],
[78.450207475,26.440336982,25.289398001],
[77.991279847,26.273618918,24.844396223],
[83.918753482,32.00380127,30.428968437],
[79.125507787,27.020479631,25.59874012],
[77.767367135,25.913671621,24.453950072],
[78.92868145,28.000046786,24.995282663],
[79.940016765,30.305665005,26.720482348],
[77.681855849,27.025015144,24.132380214],
[77.97244604,27.349726129,24.829382675],
[81.067719941,33.869129645,27.722787227],
[81.215835941,33.269824954,29.266522612],
[82.033178767,34.644121029,29.201580797],
[82.878232208,30.349711897,27.770567938],
[88.171956465,43.084057134,40.66026422],
[86.764721785,45.994963951,42.136986603],
[99.861500416,74.165919709,77.014719177],
[77.770708577,25.8834046,24.761696048],
[101.896347422,101.921734572,101.743395915],
[76.097265519,46.621316244,45.688334477],
[72.403114574,64.190767252,63.094558033],
[160.218033383,155.974184601,155.741726312],
[77.901767908,32.446312498,28.024561571],
[77.721654432,33.410950191,32.842640114],
[77.204705381,33.824132809,33.485804124],
[1.251006235,0.080118607,0.079875281],
[2.645037241,0.327796665,0.322614463],
[6.387398368,1.504896835,1.499059588],
[7.460576192,0.987895168,0.966523026],
[4.401359451,0.912877849,0.893981607],
[12.320326644,7.069771109,7.092357523],
[3.935113921,1.243809051,1.243042595],
[1.896092628,0.332665942,0.333926139],
[5.798907738,1.688745059,1.660498791],
[8.716103374,2.802368571,2.76843743],
[4.906623416,0.858288399,0.864481993],
[4.643145917,0.91338155,0.917756443],
[8.638826312,5.027937472,5.024799922],
[11.103621078,5.122953438,5.059397517],
[9.971599601,5.994739178,6.056814905],
[8.363821252,5.146455632,5.095904078],
[17.399167033,11.560112941,11.425701327],
[16.370925705,10.964158442,10.830549828],
[47.285541349,37.693475389,37.785730663],
[3.050276403,0.450876149,0.448603179],
[88.568256935,75.190905253,75.117089773],
[31.465927223,15.890914096,15.977203933],
[57.700829953,29.783009247,29.692388078],
[189.13288871,105.996291875,106.223819633],
[8.940774668,2.822498171,2.811170418],
[9.213474468,5.615692754,5.627027588],
[10.768470076,4.447919066,4.454657129],
[65.428467698,48.413063046,48.393783439],
[null,null,null],
[null,null,null],
[10.685209962,4.279293566,4.232359597],
[17.715912453,6.756813817,6.639794835],
[41.617862558,32.592246341,32.431919028],
[null,null,null],
[80.331376591,34.774830422,29.366794209],
[82.104625252,36.307472442,31.302443854],
[112.26673808,89.699623603,95.464025202],
[null,null,null],
[null,null,null],
[83.955414574,37.929031861,31.710591612],
[77.749915126,26.084233865,23.806121945],
[77.918746245,26.841146726,24.742239257],
[77.689923627,26.506452677,23.814573353],
[77.955791328,26.489759285,24.764873077],
[77.715559694,26.623824908,24.599919408],
[77.857314939,25.607481945,24.885317652],
[78.133630836,25.635509722,22.651223805]
[9.469920016,6.516282466,6.50819504],
[2.398682936,0.799101635,0.795153701],
[1.824587198,0.570538801,0.535241727],
[2.176879841,0.575095047,0.555826225],
[3.528212778,1.696285802,1.66057755],
[1.711232974,0.347184312,0.347677032],
[1.645870038,0.331091157,0.326748534],
[4.054081036,2.734678917,2.710241577]
]
}

0 comments on commit 6c877ba

Please sign in to comment.