GitHub

Authors: Saeed Fathollahzadeh, Kia Teymourian, Chris Jermaine

Paper Name: How Object Model and Its Implementation Impacts Performance of Big Data Systems: A Controlled Experimental Study

Paper Links:

Link
PDF

Source Code Info:

Repository: serialization (https://github.com/fathollahzadeh/serialization)
Programming Language: C++, Java, Rust
Packages/Libraries Needed: OpenJDK, Clang++, Rust

Dataset Used:

Twiter: https://www.cs.utexas.edu/~kiat/datasets/tweets_1M_rows.txt.bz2

Hardware and Software Info: The experiments use both a scale-up, one-socket AMD server, and a scale-out cluster of 7 single-socket Intel servers:

Scale-up node: AMD EPYC 7302 CPU @ 3.0-3.3 GHz (16 physical/32 virtual cores) with 512KB, 8MB and 128MB L1, L2 and L3 caches, and 128 GB DDR4 RAM (peak performance is 768 GFLOP/s, 183.2 GB/s).The OpenJDK 11 with 120 GB max and initial JVM heap sizes for Java
Scale-out cluster: 7 nodes, each a single on Google Cloud Platform using N2 Series which have 4 vCPU cores (Intel Cascade), 32 GB RAM, and 750 GB (2× 375 GB) local SSD (Sustained random IOPS limit: read=340,000 and write=180,000). The OpenJDK 11 with 28 GB max and initial JVM heap sizes for Java based implementations.

Setup and Experiments: The entire experimental evaluation can be run via ./runAll.sh, which deletes the results and plots and performs setup, dataset download and preparation, local and distributed experiments, and plotting. However, for a more controlled evaluation, we recommend running the individual steps separately:

    ./run1SetupDependencies.sh
    ./run2SetupSerialization.sh
    ./run3DownloadData.sh
    ./run4PrepareData.sh
    ./run5LocalExperiments.sh # on scale-up node
    ./run5LocalExperimentsExternalSort.sh # on scale-up GC node
    ./run6NetworkExperiments.sh # on scale-out GC cluster
    ./run7LocalMemoryExperiments.sh # on scale-up node

Object Relationships and Ratio Frequency of Tweet Objects (for one million tweet dataset):

Last Update: Apr 26, 2023 (first draft)

Name		Name	Last commit message	Last commit date
Latest commit History 880 Commits
Experiments		Experiments
images		images
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compileBenchmarkObjectFlatBuffersCPP.sh		compileBenchmarkObjectFlatBuffersCPP.sh
compileBenchmarkObjectProtoCPP.sh		compileBenchmarkObjectProtoCPP.sh
compileFlatBuffersCPP.sh		compileFlatBuffersCPP.sh
compileFlatBuffersJava.sh		compileFlatBuffersJava.sh
compileProtoCPP.sh		compileProtoCPP.sh
compileProtoJava.sh		compileProtoJava.sh
oldReadme.md		oldReadme.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

License

fathollahzadeh/serialization

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages