This project aims to compare data formats in size and compression time
Generates N number of Library data structure with N number of Book entries, then serializes it in three formats:
-
Avro (
AvroLibraryCompressionTest
)- We write a single Avro record (with nested "Book" array, "Author") to different files:
- No compression
- Snappy
- Deflate
- Compare file sizes and time taken.
- We write a single Avro record (with nested "Book" array, "Author") to different files:
-
JSON (
JsonLibraryCompressionTest
)- We serialize the same structure to JSON (using Jackson).
- Then write:
- no-compression
.json
- gzip-compressed
.json.gz
- snappy-compressed
.json.snappy
- no-compression
- Compare sizes and performance.
-
Protobuf (
ProtobufLibraryCompressionTest
)- Using the same concept (library, books, author).
- We generate a
Library
message, serialize to bytes, then store:- raw
.bin
- gzip
- snappy
- raw
- Compare file sizes and serialization speed.
- Java 11+
- Maven 3.6+ (or higher)
You might want to compile your LibraryOuterClass using protoc
In windows:
protoc.exe --java_out=. .\library.proto
In linux however it's simply as:
protoc --java_out=. library.proto
Then compile and run:
mvn clean compile
# windows
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.JsonLibraryCompressionTest"
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.AvroLibraryCompressionTest"
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.ProtobufLibraryCompressionTest"
#linux
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.AvroLibraryCompressionTest"
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.JsonLibraryCompressionTest"
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.ProtobufLibraryCompressionTest"
#Line by line comparison:
#win
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.JsonLibraryCompressionTestLineByLine"
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.AvroLibraryCompressionTestLineByLine"
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.ProtobufLibraryCompressionTestLineByLine"
#linux
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.AvroLibraryCompressionTestLineByLine"
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.JsonLibraryCompressionTestLineByLine"
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.ProtobufLibraryCompressionTestLineByLine"