Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Huffman Coding Implementation #196

Draft
wants to merge 58 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
c63dbfd
Move Huffman tree functions to separate header.
ben-e-whitney May 10, 2022
f43b69f
Delete unused comparison function.
ben-e-whitney May 10, 2022
258bec8
Replace `size_t` with `std::size_t`.
ben-e-whitney May 10, 2022
bf1d545
Replace `malloc` calls with `new` expressions.
ben-e-whitney May 11, 2022
444541d
Replace `new_htree_node` with a constructor.
ben-e-whitney May 11, 2022
362a5d5
Add `const` to Huffman tree variable types.
ben-e-whitney May 11, 2022
a8d3119
Use `nullptr` instead of `0` for pointer values.
ben-e-whitney May 11, 2022
4125d93
Pass `huffman_encoding` parameters by reference.
ben-e-whitney May 11, 2022
846e8c7
Use `std::vector` for Huffman codec array.
ben-e-whitney May 11, 2022
9ff8b96
Gather codecs and frequency table into struct.
ben-e-whitney May 11, 2022
6a249be
Add `Bits` to allow iteration over bits of array.
ben-e-whitney May 16, 2022
24f3052
Allow nonzero end bit offsets in `Bits`.
ben-e-whitney May 18, 2022
8888b56
Add Huffman encoding regression tests.
ben-e-whitney May 28, 2022
d513127
Reimplement Huffman encoding with `HuffmanCode`.
ben-e-whitney May 31, 2022
44cdc12
Return struct from rewritten Huffman encoder.
ben-e-whitney May 31, 2022
38a4d96
Return struct from original Huffman encoder.
ben-e-whitney Jun 1, 2022
83f31e0
Avoid buffer copies in `huffman_encoding`.
ben-e-whitney Jun 1, 2022
c52d44e
Separately copy hit buffer, trailing zero bytes.
ben-e-whitney Jun 2, 2022
7a757a6
Add Huffman compression regression tests.
ben-e-whitney Jun 2, 2022
58c13e3
Reimplement Huffman compression with constituents.
ben-e-whitney Jun 2, 2022
69d0e72
Remove timing statements.
ben-e-whitney Jun 3, 2022
f049be1
Return struct from original Huffman decoder.
ben-e-whitney Jun 3, 2022
81cc00b
Add Huffman decoding regression tests.
ben-e-whitney Jun 3, 2022
0da8b2c
Reimplement Huffman decoding with `Bits`.
ben-e-whitney Jun 6, 2022
55dc22d
Use `HuffmanCode` in decoding reimplementation.
ben-e-whitney Jun 6, 2022
e50685d
Add `sizeof` checks to Huffman reimplementations.
ben-e-whitney Jun 7, 2022
8be8f87
Remove `compress_memory_huffman` from library.
ben-e-whitney Jun 7, 2022
e5eb83f
Add Huffman decompression regression tests.
ben-e-whitney Jun 7, 2022
a59ebeb
Add Huffman decompression reimplementation.
ben-e-whitney Jun 7, 2022
cc78644
Remove `huffman_{en,de}coding` from library.
ben-e-whitney Jun 8, 2022
2a46192
Rename reimplemented Huffman functions.
ben-e-whitney Jun 8, 2022
7d2825b
Copy input buffer in legacy Huffman encoder.
ben-e-whitney Jun 8, 2022
4fd5399
Directly set `HuffmanCode` endpoints.
ben-e-whitney Jun 8, 2022
9f112da
Fix calculation of `HuffmanCode::ncodewords`.
ben-e-whitney Jun 9, 2022
6d61980
Generalize function to parse header from buffer.
ben-e-whitney Jun 9, 2022
501183c
Add Huffman encoding with protocol buffer header.
ben-e-whitney Jun 9, 2022
fec4c94
Add static data member for default symbol range.
ben-e-whitney Jun 9, 2022
ebf3c34
Separate codeword decoding, missed buffer lookup.
ben-e-whitney Jun 9, 2022
1c44399
Pass index–frequency pair range as iterator pair.
ben-e-whitney Jun 9, 2022
1532c66
Add function to check quantization buffer size.
ben-e-whitney Jun 10, 2022
56004d2
Automatically calculate Huffman hit buffer size.
ben-e-whitney Jun 15, 2022
25e639d
Add `HuffmanEncodedStream` {,de}serializer.
ben-e-whitney Jun 15, 2022
a05703d
Select serialization compressor at runtime.
ben-e-whitney Jun 15, 2022
3e53ce4
Enable `RFMH` in `{,de}compress`.
ben-e-whitney Jun 14, 2022
d0fe35f
Add tests for `RFMH` in `{,de}compress`.
ben-e-whitney Jun 16, 2022
309823d
Rename `compressors.hpp` to `lossless.hpp`.
ben-e-whitney Jun 20, 2022
0e6c563
Separate lossless compressor implementations.
ben-e-whitney Jun 20, 2022
b697831
Contain `z_const` casts to `lossless_zlib.cpp`.
ben-e-whitney Jun 20, 2022
0848131
Rename lossless compression functions.
ben-e-whitney Jun 20, 2022
48529f9
Change argument order in periodic data tests.
ben-e-whitney Jun 20, 2022
547d4e0
Remove unused `NOOP_COMPRESSOR` decompressor.
ben-e-whitney Jun 20, 2022
021e330
Add `Chain` to allow iterator range concatenation.
ben-e-whitney Jun 21, 2022
1d07149
Limit sizes of frequency and 'missed' subtables.
ben-e-whitney Jun 28, 2022
f887c7d
Add comments motivating `Supertable`, `Chain` use.
ben-e-whitney Jun 30, 2022
6e281a6
Add member functions for common size computations.
ben-e-whitney Jun 30, 2022
aeb89a9
Fix CPU lossless in MGARD-X
JieyangChen7 Jul 14, 2022
eed1eb1
Put CartesianProduct and CartesianProduct::iterator in #ifndef __NVCC…
JieyangChen7 Jul 18, 2022
4ef023d
Add quantization type function template.
ben-e-whitney Jul 19, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ endif()
list(INSERT CMAKE_MODULE_PATH 0 "${CMAKE_CURRENT_LIST_DIR}/cmake")

set(MGARD_VERSION_MAJOR "1")
set(MGARD_VERSION_MINOR "2")
set(MGARD_VERSION_MINOR "3")
set(MGARD_VERSION_PATCH "0")

set(MGARD_FILE_VERSION_MAJOR "1")
set(MGARD_FILE_VERSION_MINOR "0")
set(MGARD_FILE_VERSION_MINOR "1")
set(MGARD_FILE_VERSION_PATCH "0")

project(
Expand Down Expand Up @@ -201,9 +201,15 @@ set(
MGARD_LIBRARY_CPP
src/compress.cpp
src/compress_internal.cpp
src/compressors.cpp
src/utilities.cpp
src/huffman.cpp
src/lossless_zlib.cpp
src/lossless_dispatcher.cpp
src/format.cpp
)
if(zstd_FOUND)
list(APPEND MGARD_LIBRARY_CPP src/lossless_zstd.cpp)
endif()

set(MAXIMUM_DIMENSION 4 CACHE STRING "Maximum supported dimension for self-describing decompression.")

Expand Down
4 changes: 1 addition & 3 deletions include/compress.tpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,14 @@
#include "MGARDConfig.hpp"
#include "TensorMultilevelCoefficientQuantizer.hpp"
#include "TensorNorms.hpp"
#include "compressors.hpp"
#include "decompose.hpp"
#include "format.hpp"
#include "lossless.hpp"
#include "quantize.hpp"
#include "shuffle.hpp"

namespace mgard {

using DEFAULT_INT_T = std::int64_t;

template <std::size_t N, typename Real>
CompressedDataset<N, Real>
compress(const TensorMeshHierarchy<N, Real> &hierarchy, Real *const v,
Expand Down
2 changes: 1 addition & 1 deletion include/compress_internal.tpp
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#include <cstdlib>

#include "compress.hpp"
#include "compressors.hpp"
#include "decompose.hpp"
#include "lossless.hpp"
#include "quantize.hpp"
#include "shuffle.hpp"

Expand Down
30 changes: 23 additions & 7 deletions include/format.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,14 @@ serialize_header_crc32(std::uint_least64_t crc32);
//!\param p Pointer whose alignment will be checked.
template <typename T> void check_alignment(void const *const p);

//! Check that a quantization buffer has the right alignment and a valid size.
//!
//!\param header Self-describing dataset header.
//!\param p Quantization buffer.
//!\param n Size in bytes of quantization buffer.
void check_quantization_buffer(const pb::Header &header, void const *const p,
const std::size_t n);

//! Determine whether an integral type is big endian.
template <typename Int> bool big_endian();

Expand All @@ -74,6 +82,11 @@ template <typename Int> bool big_endian();
//!\return `Dataset::Type` corresponding to `Real`.
template <typename Real> pb::Dataset::Type type_to_dataset_type();

//! Return the `Quantization::Type` value corrresponding to an integral type.
//!
//!\return `Quantization::Type` corresponding to `Int`.
template <typename Int> pb::Quantization::Type type_to_quantization_type();

//! Allocate a quantization buffer of the proper alignment and size.
//!
//!\param header Self-describing dataset header.
Expand Down Expand Up @@ -165,16 +178,19 @@ pb::Header read_metadata(BufferWindow &window);
//!\param header Header of the self-describing buffer.
void write_metadata(std::ostream &ostream, const pb::Header &header);

//! Parse the header of a self-describing buffer.
template <typename T>
//! Parse a message from a buffer window.
//!
//! The buffer pointer will be advanced past the header.
//!
//!\param window Window into the self-describing buffer. The current position
//! should be the start of the header.
//!\param header_size Size in bytes of the header.
//!\return Header of the self-describing buffer.
pb::Header read_header(BufferWindow &window,
const std::uint_least64_t header_size);
//! This function was originally written to parse the header from a
//! self-describing buffer.
//
//!\param window Buffer window containing the serialized message. The current
//! position should be the start of the message.
//!\param nmessage Size in bytes of the message.
//!\return Parsed message.
T read_message(BufferWindow &window, const std::uint_least64_t nmessage);

//! Check that a dataset was compressed with a compatible version of MGARD.
//!
Expand Down
22 changes: 22 additions & 0 deletions include/format.tpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,26 @@ template <typename Int> bool big_endian() {
return not*reinterpret_cast<unsigned char const *>(&n);
}

template <typename T>
T read_message(BufferWindow &window, const std::uint_least64_t nmessage) {
// The `CodedInputStream` constructor takes an `int`.
if (nmessage > std::numeric_limits<int>::max()) {
throw std::runtime_error("message is too large (size would overflow)");
}
// Check that the read will stay in the buffer.
unsigned char const *const next = window.next(nmessage);
T message;
google::protobuf::io::CodedInputStream stream(
static_cast<google::protobuf::uint8 const *>(window.current), nmessage);
if (not message.ParseFromCodedStream(&stream)) {
throw std::runtime_error(
"message parsing encountered read or format error");
}
if (not stream.ConsumedEntireMessage()) {
throw std::runtime_error("part of message left unparsed");
}
window.current = next;
return message;
}

} // namespace mgard
Loading