Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashtable documentation #93

Merged
merged 11 commits into from
Jun 17, 2024
190 changes: 156 additions & 34 deletions src/data_structures/hashtable.rs
Original file line number Diff line number Diff line change
@@ -1,24 +1,85 @@
use std::collections::LinkedList;

/// The growth factor of the hash table when resizing.
const GROWTH_FACTOR: usize = 2;

/// The load factor bound of the hash table. The hash table will resize itself when the number of
/// elements exceeds the load factor bound.
const LOAD_FACTOR_BOUND: f64 = 0.75;

/// The initial capacity of the hash table.
const INITIAL_CAPACITY: usize = 3000;

/// A hash table implementation with separate chaining. It uses a linked list to store elements
/// with the same hash.
///
/// # Notes:
///
/// The hash table will resize itself when the number of elements exceeds the load factor bound.
/// The hash table will grow by a factor of 2 when resizing.
/// The hash table uses a default initial capacity of 3000.
///
/// # Examples:
///
/// ```rust
/// use rust_algorithms::data_structures::HashTable;
///
/// let mut hash_table = HashTable::new();
///
/// hash_table.insert(1usize, 10);
/// let result = hash_table.search(1);
///
/// assert_eq!(result, Some(&10));
/// ```
#[derive(Debug, PartialEq, Eq)]
pub struct HashTable<K, V> {
elements: Vec<LinkedList<(K, V)>>,
count: usize,
}

/// Implement Default for HashTable
impl<K: Hashable + std::cmp::PartialEq, V> Default for HashTable<K, V> {
/// Create a new HashTable with the default initial capacity.
///
/// # Examples:
///
/// ```rust
/// use rust_algorithms::data_structures::HashTable;
///
/// let hash_table: HashTable<usize, usize> = HashTable::default();
///
/// assert!(hash_table.is_empty());
/// ```
fn default() -> Self {
Self::new()
}
}

/// A trait for types that can be hashed.
pub trait Hashable {
fn hash(&self) -> usize;
}

/// Implement Hashable for usize
/// This is useful for testing purposes but doesn't provide a meaningful hash function.
impl Hashable for usize {
fn hash(&self) -> usize {
*self
}
}

impl<K: Hashable + std::cmp::PartialEq, V> HashTable<K, V> {
/// Create a new HashTable with the default initial capacity.
///
/// # Examples:
///
/// ```rust
/// use rust_algorithms::data_structures::HashTable;
///
/// let hash_table = HashTable::<usize, usize>::new();
///
/// assert!(hash_table.is_empty());
/// ```
pub fn new() -> HashTable<K, V> {
let initial_capacity = INITIAL_CAPACITY;
let mut elements = Vec::with_capacity(initial_capacity);
Expand All @@ -30,6 +91,49 @@ impl<K: Hashable + std::cmp::PartialEq, V> HashTable<K, V> {
HashTable { elements, count: 0 }
}

/// Returns the number of elements in the hash table.
///
/// # Examples:
///
/// ```rust
/// use rust_algorithms::data_structures::HashTable;
///
/// let mut hash_table = HashTable::<usize, usize>::new();
///
/// assert_eq!(hash_table.is_empty(), true);
///
/// hash_table.insert(1usize, 10);
///
/// assert_eq!(hash_table.is_empty(), false);
/// ```
pub fn is_empty(&self) -> bool {
self.count == 0
}

/// Insert a key-value pair into the hash table.
///
/// # Arguments:
///
/// * `key` - The key to insert.
/// * `value` - The value to insert.
///
/// # Notes:
///
/// If the key already exists in the hash table, the value will not be overwritten.
/// This is different from the behavior of the standard library's HashMap.
///
/// # Examples:
///
/// ```rust
/// use rust_algorithms::data_structures::HashTable;
///
/// let mut hash_table = HashTable::new();
///
/// hash_table.insert(1usize, 10);
/// let result = hash_table.search(1);
///
/// assert_eq!(result, Some(&10));
/// ```
pub fn insert(&mut self, key: K, value: V) {
if self.count >= self.elements.len() * LOAD_FACTOR_BOUND as usize {
self.resize();
Expand All @@ -39,6 +143,45 @@ impl<K: Hashable + std::cmp::PartialEq, V> HashTable<K, V> {
self.count += 1;
}

/// Determines the capacity of the hash table, which is the number of buckets available
/// for storing elements. The capacity is not the same as the number of elements
/// in the `HashTable``.
///
/// # Returns
///
/// The capacity of the hash table.
///
/// # Examples:
///
/// ```rust
/// use rust_algorithms::data_structures::HashTable;
pub fn capacity(&self) -> usize {
self.elements.capacity()
}

/// Search for a key in the hash table.
///
/// # Arguments:
///
/// * `key` - The key to search for.
///
/// # Returns:
///
/// An Option containing a reference to the value if the key is found, or None if the key is not
/// found.
///
/// # Examples:
///
/// ```rust
/// use rust_algorithms::data_structures::HashTable;
///
/// let mut hash_table = HashTable::new();
///
/// hash_table.insert(1usize, 10);
/// let result = hash_table.search(1);
///
/// assert_eq!(result, Some(&10));
/// ```
pub fn search(&self, key: K) -> Option<&V> {
let index = key.hash() % self.elements.len();
self.elements[index]
Expand Down Expand Up @@ -70,34 +213,13 @@ impl<K: Hashable + std::cmp::PartialEq, V> HashTable<K, V> {
mod tests {
use super::*;

#[derive(Debug, PartialEq, Eq)]
struct TestKey(usize);

impl Hashable for TestKey {
fn hash(&self) -> usize {
self.0
}
}
Comment on lines -73 to -80
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so this is valuable, because this way we test that the hash table works with hashable elements.

What's interesting here is that no tests broke, which means that we aren't testing the actual placement of keys in the table (Testkey(i) would be placed in the i-th bucket, but i, would be placed in the hash(i) % length bucket).

I think this is a bit involved, and I'm not sure how educational it is, so I think it's fine to remove this. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. What was actually upsetting about this was that the key educational bit of a hash table, the bit that makes it 'different' than other data structures, is the hash function...which we are just setting it to the item! which means it might as well just be an array.

But wait! There is more!

We are using a vector and a linked list internally! Which is...just. no.

We definitely should have a linked list as a structure as an example, but we shouldn't be using them internally. The cache thrashing they cause is just not worth their use in modern code. The cost vs benefit just fails. Arena's are significantly better choices and that goes doubly so given rust's hate of multi-borrow mechanics.

vec of Option for storage, K is the index into the vec, v is the value in the vec, and we are off and running. The same could be done with the b-tree but I just couldn't be bothered since I've been focusing on the doc's instead of the core implementation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like to implement that, I'm more than happy to review it!

I still think it's valuable to show suboptimal implementations though: if someone is starting, they should have access to both the dumb, easy implementation and the nicer, optimized, linear-probing implementation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm merging this, feel free to open a separate PR!


#[test]
fn test_insert_and_search() {
let mut hash_table = HashTable::new();
let key = TestKey(1);
let value = TestKey(10);

hash_table.insert(key, value);
let result = hash_table.search(TestKey(1));

assert_eq!(result, Some(&TestKey(10)));
}

#[test]
fn test_resize() {
let mut hash_table = HashTable::new();
let initial_capacity = hash_table.elements.capacity();

for i in 0..initial_capacity * LOAD_FACTOR_BOUND as usize + 1 {
hash_table.insert(TestKey(i), TestKey(i + 10));
hash_table.insert(i, i + 10);
}

assert!(hash_table.elements.capacity() > initial_capacity);
Expand All @@ -106,11 +228,11 @@ mod tests {
#[test]
fn test_search_nonexistent() {
let mut hash_table = HashTable::new();
let key = TestKey(1);
let value = TestKey(10);
let key = 1;
let value = 10;

hash_table.insert(key, value);
let result = hash_table.search(TestKey(2));
let result = hash_table.search(2);

assert_eq!(result, None);
}
Expand All @@ -119,29 +241,29 @@ mod tests {
fn test_multiple_inserts_and_searches() {
let mut hash_table = HashTable::new();
for i in 0..10 {
hash_table.insert(TestKey(i), TestKey(i + 100));
hash_table.insert(i, i + 100);
}

for i in 0..10 {
let result = hash_table.search(TestKey(i));
assert_eq!(result, Some(&TestKey(i + 100)));
let result = hash_table.search(i);
assert_eq!(result, Some(&(i + 100)));
}
}

#[test]
fn test_not_overwrite_existing_key() {
let mut hash_table = HashTable::new();
hash_table.insert(TestKey(1), TestKey(100));
hash_table.insert(TestKey(1), TestKey(200));
hash_table.insert(1, 100);
hash_table.insert(1, 200);

let result = hash_table.search(TestKey(1));
assert_eq!(result, Some(&TestKey(100)));
let result = hash_table.search(1);
assert_eq!(result, Some(&100));
}

#[test]
fn test_empty_search() {
let hash_table: HashTable<TestKey, TestKey> = HashTable::new();
let result = hash_table.search(TestKey(1));
let hash_table: HashTable<usize, usize> = HashTable::new();
let result = hash_table.search(1);

assert_eq!(result, None);
}
Expand Down
Loading