Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode into an uninitialized byte slice #63

Merged
merged 4 commits into from
Dec 30, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 29 additions & 7 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ macro_rules! assert_sizeof {

use lossy_pht::LossyPHT;
use std::fmt::{Debug, Formatter};
use std::mem::MaybeUninit;

mod builder;
mod lossy_pht;
Expand Down Expand Up @@ -250,18 +251,33 @@ impl<'a> Decompressor<'a> {
Self { symbols, lengths }
}

/// Returns the capacity required for decompression.
pub fn decompressed_capacity(&self, compressed: &[u8]) -> usize {
size_of::<Symbol>() * (compressed.len() + 1)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should call this max_decompression_capacity. it's an upper-bound since symbols need not be 8 bytes (and most of them time they'll be <= 3bytes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep fair, although the check is that the slice must be == this capacity. We could make it <=, but would require a bounds check inside each iteration which feels... slow


/// Decompress a byte slice that was previously returned by a compressor using
/// the same symbol table.
pub fn decompress(&self, compressed: &[u8]) -> Vec<u8> {
let mut decoded: Vec<u8> = Vec::with_capacity(size_of::<Symbol>() * (compressed.len() + 1));
let ptr = decoded.as_mut_ptr();
/// the same symbol table into an uninitialized slice of bytes.
///
/// Returns the length of the decoded bytes.
///
/// ## Panics
///
/// If the decoded slice is not the same length as the `decompressed_capacity`.
pub fn decompress_into(&self, compressed: &[u8], decoded: &mut [MaybeUninit<u8>]) -> usize {
assert_eq!(
decoded.len(),
self.decompressed_capacity(compressed),
"decoded slice must have the same length as the decompressed capacity"
);
let ptr: *mut u8 = decoded.as_mut_ptr().cast();

let mut in_pos = 0;
let mut out_pos = 0;

while in_pos < compressed.len() {
// out_pos can grow at most 8 bytes per iteration, and we start at 0
debug_assert!(out_pos <= decoded.capacity() - size_of::<Symbol>());
debug_assert!(out_pos <= decoded.len() - size_of::<Symbol>());
// SAFETY: in_pos is always in range 0..compressed.len()
let code = unsafe { *compressed.get_unchecked(in_pos) };
if code == ESCAPE_CODE {
Expand Down Expand Up @@ -296,9 +312,15 @@ impl<'a> Decompressor<'a> {
"decompression should exhaust input before output"
);

// SAFETY: we enforce in the loop condition that out_pos <= decoded.capacity()
unsafe { decoded.set_len(out_pos) };
out_pos
}

/// Decompress a byte slice that was previously returned by a compressor using the same symbol
/// table into a new vector of bytes.
pub fn decompress(&self, compressed: &[u8]) -> Vec<u8> {
let mut decoded = Vec::with_capacity(self.decompressed_capacity(compressed));
let len = self.decompress_into(compressed, decoded.spare_capacity_mut());
unsafe { decoded.set_len(len) };
decoded
}
}
Expand Down
Loading