Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of memory after it is freed #8

Closed
rsmarples opened this issue May 21, 2024 · 8 comments · May be fixed by #9
Closed

Use of memory after it is freed #8

rsmarples opened this issue May 21, 2024 · 8 comments · May be fixed by #9

Comments

@rsmarples
Copy link

Reproduer:

// Instantiating a map template.
#define NAME int_int_map
#define KEY_TY int
#define VAL_TY int
#include "verstable.h"

int main(void)
{
    int_int_map our_map;
    int_int_map_itr itr;

    int_int_map_init(&our_map);
    itr = int_int_map_insert(&our_map, 1, 1);
    if (int_int_map_is_end(itr))
    {
        exit(1);
    }
    int_int_map_cleanup(&our_map);
    exit(0);
}
$ clang-tidy -checks=cert-* repro.c
verstable.h:1299:11: note: Use of memory after it is freed
 1299 |           FREE_FN(
      |           ^
 1300 |             new_table.buckets,
rsmarples added a commit to rsmarples/Verstable that referenced this issue May 22, 2024
NOLINT some code as I cannot easily make it pass clang-tidy checks.
Fixes JacksonAllan#8 but will likely need a good review.
@JacksonAllan
Copy link
Owner

JacksonAllan commented May 22, 2024

Hi! Thanks for working on this. I was able to spend a few hours investigating it today. I hadn’t used clang-tidy before, so I hope I’m understanding its output correctly and apologize in advance if I seem uninformed.

The tool seems to report a lot of issues related to the buckets array, namely NULL-pointer dereferences and uses-after-free. I suspect all these warnings are false positives. Consider the simplified example below, in which I’ve replaced _insert with _insert_raw. _insert_raw attempts to insert a new key and, if it can’t do so without violating the maximum load factor, returns an end iterator (_insert, on the other hand, simply wraps _insert_raw in a loop that grows the table if the key cannot be inserted).

// Instantiating a map template.
#define NAME int_int_map
#define KEY_TY int
#define VAL_TY int
#include <https://raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h>

int main(void)
{
    int_int_map our_map;
    int_int_map_init( &our_map );
    int_int_map_insert_raw(&our_map, 1, &(int){1}, false, true);
    int_int_map_cleanup(&our_map);
}

Godbolt link with clang-tidy window.

Here’s the full clang-tidy output:

/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1051:33: warning: Dereference of null pointer [clang-analyzer-core.NullDereference]
 1051 |   size_t home_bucket = HASH_FN( table->buckets[ bucket ].key ) & table->buckets_mask;
      |                                 ^
[<source>:10:5: note: Calling 'int_int_map_init'](javascript:;)
   10 |     int_int_map_init( &our_map );
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:901:3: note: Null pointer value stored to 'our_map.buckets'
  901 |   table->buckets = NULL;
      |   ^~~~~~~~~~~~~~~~~~~~~
[<source>:10:5: note: Returning from 'int_int_map_init'](javascript:;)
   10 |     int_int_map_init( &our_map );
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[<source>:11:5: note: Calling 'int_int_map_insert_raw'](javascript:;)
   11 |     int_int_map_insert_raw(&our_map, 1, &(int){1}, false, true);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1129:7: note: Assuming the condition is true
 1129 |   if( !( table->metadata[ home_bucket ] & VT_IN_HOME_BUCKET_MASK ) )
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1129:3: note: Taking true branch
 1129 |   if( !( table->metadata[ home_bucket ] & VT_IN_HOME_BUCKET_MASK ) )
      |   ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:43: note: Calling 'int_int_map_bucket_count'
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |                                           ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:429:24: note: expanded from macro 'VT_CAT'
  429 | #define VT_CAT( a, b ) VT_CAT_( a, b )
      |                        ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:428:25: note: expanded from macro 'VT_CAT_'
  428 | #define VT_CAT_( a, b ) a##b
      |                         ^
note: expanded from here
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:61: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                                             ^~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:979:3: note: Returning without writing to 'table->buckets'
  979 |   return table->buckets_mask + (bool)table->buckets_mask;
      |   ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:43: note: Returning from 'int_int_map_bucket_count'
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |                                           ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:429:24: note: expanded from macro 'VT_CAT'
  429 | #define VT_CAT( a, b ) VT_CAT_( a, b )
      |                        ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:428:25: note: expanded from macro 'VT_CAT_'
  428 | #define VT_CAT_( a, b ) a##b
      |                         ^
note: expanded from here
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:61: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                                             ^~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:7: note: Assuming the condition is false
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |       ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:35: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:7: note: Left side of '||' is false
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |       ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:35: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                   ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1135:9: note: Assuming the condition is true
 1135 |       ( table->metadata[ home_bucket ] != VT_EMPTY && VT_UNLIKELY( !VT_CAT( NAME, _evict )( table, home_bucket ) ) )
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1135:9: note: Left side of '&&' is true
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1135:69: note: Calling 'int_int_map_evict'
 1135 |       ( table->metadata[ home_bucket ] != VT_EMPTY && VT_UNLIKELY( !VT_CAT( NAME, _evict )( table, home_bucket ) ) )
      |                                                                     ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:429:24: note: expanded from macro 'VT_CAT'
  429 | #define VT_CAT( a, b ) VT_CAT_( a, b )
      |                        ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:428:25: note: expanded from macro 'VT_CAT_'
  428 | #define VT_CAT_( a, b ) a##b
      |                         ^
note: expanded from here
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:61: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                                             ^~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1051:33: note: Dereference of null pointer
 1051 |   size_t home_bucket = HASH_FN( table->buckets[ bucket ].key ) & table->buckets_mask;
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1139:39: warning: Dereference of null pointer [clang-analyzer-core.NullDereference]
 1139 |     table->buckets[ home_bucket ].key = key;
      |                                       ^
[<source>:10:5: note: Calling 'int_int_map_init'](javascript:;)
   10 |     int_int_map_init( &our_map );
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:901:3: note: Null pointer value stored to 'our_map.buckets'
  901 |   table->buckets = NULL;
      |   ^~~~~~~~~~~~~~~~~~~~~
[<source>:10:5: note: Returning from 'int_int_map_init'](javascript:;)
   10 |     int_int_map_init( &our_map );
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[<source>:11:5: note: Calling 'int_int_map_insert_raw'](javascript:;)
   11 |     int_int_map_insert_raw(&our_map, 1, &(int){1}, false, true);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1129:7: note: Assuming the condition is true
 1129 |   if( !( table->metadata[ home_bucket ] & VT_IN_HOME_BUCKET_MASK ) )
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1129:3: note: Taking true branch
 1129 |   if( !( table->metadata[ home_bucket ] & VT_IN_HOME_BUCKET_MASK ) )
      |   ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:43: note: Calling 'int_int_map_bucket_count'
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |                                           ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:429:24: note: expanded from macro 'VT_CAT'
  429 | #define VT_CAT( a, b ) VT_CAT_( a, b )
      |                        ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:428:25: note: expanded from macro 'VT_CAT_'
  428 | #define VT_CAT_( a, b ) a##b
      |                         ^
note: expanded from here
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:61: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                                             ^~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:979:3: note: Returning without writing to 'table->buckets'
  979 |   return table->buckets_mask + (bool)table->buckets_mask;
      |   ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:43: note: Returning from 'int_int_map_bucket_count'
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |                                           ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:429:24: note: expanded from macro 'VT_CAT'
  429 | #define VT_CAT( a, b ) VT_CAT_( a, b )
      |                        ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:428:25: note: expanded from macro 'VT_CAT_'
  428 | #define VT_CAT_( a, b ) a##b
      |                         ^
note: expanded from here
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:61: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                                             ^~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:7: note: Assuming the condition is false
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |       ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:35: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:7: note: Left side of '||' is false
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |       ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:35: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                   ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1135:9: note: Assuming the condition is false
 1135 |       ( table->metadata[ home_bucket ] != VT_EMPTY && VT_UNLIKELY( !VT_CAT( NAME, _evict )( table, home_bucket ) ) )
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1135:52: note: Left side of '&&' is false
 1135 |       ( table->metadata[ home_bucket ] != VT_EMPTY && VT_UNLIKELY( !VT_CAT( NAME, _evict )( table, home_bucket ) ) )
      |                                                    ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1139:39: note: Dereference of null pointer
 1139 |     table->buckets[ home_bucket ].key = key;
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1166:29: warning: Dereference of null pointer [clang-analyzer-core.NullDereference]
 1166 |         VT_LIKELY( CMPR_FN( table->buckets[ bucket ].key, key ) )
      |                             ^
[<source>:10:5: note: Calling 'int_int_map_init'](javascript:;)
   10 |     int_int_map_init( &our_map );
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:901:3: note: Null pointer value stored to 'our_map.buckets'
  901 |   table->buckets = NULL;
      |   ^~~~~~~~~~~~~~~~~~~~~
[<source>:10:5: note: Returning from 'int_int_map_init'](javascript:;)
   10 |     int_int_map_init( &our_map );
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
[<source>:11:5: note: Calling 'int_int_map_insert_raw'](javascript:;)
   11 |     int_int_map_insert_raw(&our_map, 1, &(int){1}, false, true);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1129:7: note: Assuming the condition is false
 1129 |   if( !( table->metadata[ home_bucket ] & VT_IN_HOME_BUCKET_MASK ) )
      |       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1129:3: note: Taking false branch
 1129 |   if( !( table->metadata[ home_bucket ] & VT_IN_HOME_BUCKET_MASK ) )
      |   ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1159:8: note: 'unique' is false
 1159 |   if( !unique )
      |        ^~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1159:3: note: Taking true branch
 1159 |   if( !unique )
      |   ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1162:5: note: Loop condition is true.  Entering loop body
 1162 |     while( true )
      |     ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1165:9: note: Assuming the condition is true
 1165 |         ( table->metadata[ bucket ] & VT_HASH_FRAG_MASK ) == hashfrag &&
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1165:9: note: Left side of '&&' is true
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1166:29: note: Dereference of null pointer
 1166 |         VT_LIKELY( CMPR_FN( table->buckets[ bucket ].key, key ) )
      |                             ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:433:61: note: expanded from macro 'VT_LIKELY'
  433 | #define VT_LIKELY( expression )   __builtin_expect( (bool)( expression ), true )
      |                                                             ^~~~~~~~~~
6 warnings generated.
Suppressed 3 warnings (3 in non-user code).

So clang-tidy is reporting, for example, a (potential?) NULL-pointer dereference on line 1051, which is in the _evict function. _insert_raw calls this function to evacuate the new key’s home bucket if the bucket is occupied by a key that doesn’t belong there.

Given that _init sets key_count to 0, buckets_mask to 0x0000000000000000ull, buckets to NULL, and metadata to (uint16_t *)&vt_empty_placeholder_metadatum (where vt_empty_placeholder_metadatum is defined as static const uint16_t vt_empty_placeholder_metadatum = VT_EMPTY;), we can follow clang-tidy’s output and see where it goes wrong:

/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:7: note: Assuming the condition is false
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||
      |       ^
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:434:35: note: expanded from macro 'VT_UNLIKELY'
  434 | #define VT_UNLIKELY( expression ) __builtin_expect( (bool)( expression ), false )
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1133:7: note: Left side of '||' is false
 1133 |       VT_UNLIKELY( table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD ) ||

table->key_count + 1 > VT_CAT( NAME, _bucket_count )( table ) * MAX_LOAD - i.e. the left side of || - is true, not false as clang-tidy reports (_bucket_count in this instance returns zero, so the equation becomes 0 + 1 > 0 * MAX_LOAD). So the if statement short circuits here and _evict (which comes later in the same if statement) is never called. But even if that wasn’t the case, _evict still wouldn’t be called:

/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1135:9: note: Assuming the condition is true
 1135 |       ( table->metadata[ home_bucket ] != VT_EMPTY && VT_UNLIKELY( !VT_CAT( NAME, _evict )( table, home_bucket ) ) )
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1135:9: note: Left side of '&&' is true
/app/raw.githubusercontent.com/JacksonAllan/Verstable/main/verstable.h:1135:69: note: Calling 'int_int_map_evict'
 1135 |       ( table->metadata[ home_bucket ] != VT_EMPTY && VT_UNLIKELY( !VT_CAT( NAME, _evict )( table, home_bucket ) ) )

home_bucket here is zero (size_t home_bucket = hash & table->buckets_mask;), and metadata points to vt_empty_placeholder_metadatum, which is VT_EMPTY. In other words, the left side of && is false, not true, so _evict is still never called.

It looks like clang-tidy is getting confused by the logic of setting metadata to point to a global placeholder (whose value is VT_EMPTY) rather than e.g. NULL. This approach is an optimization that allows us to skip NULL or zero-bucket-count checks in most places that they would otherwise be needed, but the very absence of those checks seems to be making clang-tidy complain.

I’m not sure what the best solution here is. How important is it to support clang-tidy if we know that it is repeatedly generating false positives? I suspect that the outcome will be // NOLINT scattered throughout the code.

@rsmarples
Copy link
Author

Hi! Thanks for a very clean API for this implementation of a hashmap. Compared to others I've looked at it's a joy to work with. Thank you.

I agree with your assessment about the false positives, except for the use of memory after it's freed. On line 1299, if we failed to insert an entry we then free newtable.buckets, double bucket_count and then continue back to line 1286 and attempt to insert a new entry into a bucket we just freed.

Firstly, the continue statement is odd because it's at the end of the for loop already - it's a non op.
Secondly, bucket_count changing inside the for loop does nothing. It's only used at the top of the while loop creating new_table.

I think the intent of the continue was to goto table creation again and start over with an increased bucket count. Am I wrong here? Is this another false positive and if so can you please explain why?

I’m not sure what the best solution here is. How important is it to support clang-tidy if we know that it is repeatedly generating false positives? I suspect that the outcome will be // NOLINT scattered throughout the code.

I don't think supporting it directly is fair, other analysis tools can and do emit other false positives based on personal experience in other projects. However, if others use analysis tools on their code (which is more and more common place) at least having a comment somewhere about it would be best maybe including a light technical reason why it's a false positive.

Specifically for clang-tidy, there is no way of saying "ignore all analysis from a header file". They have recently committed an option to exclude a header, but I think it will still analyse macros which won't help here so the only way silencing clang-tidy is to add // NOLINT to verstable.h. I also like to push any changes I need to make to 3rd party code upstream to help others :)

I think adding // NOLINT is fine as it's quite short and doesn't affect the readability to any detriment and is not clang-tidy specific.

@JacksonAllan
Copy link
Owner

JacksonAllan commented May 23, 2024

Thanks, glad you're finding it useful! There’s a work-in-progress draft of an article benchmarking the library against other hash table libraries here. The actual data shown currently shown there is old, but newer data is available here.

I agree with your assessment about the false positives, except for the use of memory after it's freed. On line 1299, if we failed to insert an entry we then free newtable.buckets, double bucket_count and then continue back to line 1286 and attempt to insert a new entry into a bucket we just freed.
Firstly, the continue statement is odd because it's at the end of the for loop already - it's a non op.
Secondly, bucket_count changing inside the for loop does nothing. It's only used at the top of the while loop creating new_table.
I think the intent of the continue was to goto table creation again and start over with an increased bucket count. Am I wrong here? Is this another false positive and if so can you please explain why?

You’re absolutely right! Sorry, I should have examined your PR more closely before jumping to conclusions.

It seems that this bug was able to go undetected because of its rarity. This code handles the case that upon doubling the table's bucket count, one of the existing keys cannot be reinserted within 211 quadratic jumps (i.e. what I call “quadratic displacement” in the in-code comments) of its home bucket. Under normal usage, I think this should basically never happen (Edit: _shrink, combined with a bad hash function, could probably trigger it). Even if the hash function is bad, if a key was insertable when the bucket count was x then it is almost certainly insertable when the bucket count is 2x and all the other keys are the same. However, I should perhaps write a unit test that tests this code path, e.g. by using a hash function that somehow checks the bucket count and begins hashing all keys to the same bucket once a certain threshold is reached (the expected behavior is that the table continues trying to grow until the target bucket count exceeds the available memory and _insert returns failure, assuming that malloc actually returns NULL once memory is exhausted).

By the way, this bug also affects CC, whose hash-table implementation is based on Verstable.

So what I’d like to do now is release a patch within the next day or two fixing this bug and the unnecessary cast-away-const in the endian-checking functions (#6).

As for the // NOLINT comments, I’m not sure about introducing them right now because I fear that properly conforming with clang-tidy could require many more similar additions and changes, and I'd rather not introduce a partial solution. In other words, unless you’re confident that adding // NOLINT in just these places is enough, I need to investigate this issue further, e.g. by running clang-tidy on the unit tests and randomized tests against STL (for example, in simple examples clang-tidy was also giving me warnings about using memcpy instead of memcpy_s).

Edit:

Specifically for clang-tidy, there is no way of saying "ignore all analysis from a header file".

There is // BEGINNOLINT and // ENDNOLINT, but I'm not sure nuking clang-tidy across the entire file is a "solution" that users would appreciate. As we saw above, some of the warnings are indeed helpful.

@rsmarples
Copy link
Author

OK, that's fair.

I have to disable the clang-tidy checks for memcpy_s and friends because our build target is Alpine Linux which uses musl libc which does not support those.

Sadly, I cannot use // BEGINNOLINT and '// ENDNOLINT' around including the header. I'm happy using // NOLINT locallly for the time being, but maybe a comment somewhere upstream would at least be useful.

JacksonAllan added a commit that referenced this issue May 25, 2024
@JacksonAllan
Copy link
Owner

I've uploaded a preliminary fix to the dev branch. It's slightly different to the one you proposed. I think there should either be a while loop or a goto, but not both as the goto makes the while redundant. I opted to preserve the while loop on the basis that it's a bit more idiomatic.

bool VT_CAT( NAME, _rehash )( NAME *table, size_t bucket_count )
{
  // The attempt to resize the bucket array and rehash the keys must occur inside a loop that incrementally doubles the
  // target bucket count because a failure could theoretically occur at any load factor due to the displacement limit.
  while( true )
  {
    NAME new_table =  {
      0,
      bucket_count - 1,
      NULL,
      NULL
      #ifdef CTX_TY
      , table->ctx
      #endif
    };

    void *allocation = MALLOC_FN(
      VT_CAT( NAME, _total_alloc_size )( &new_table )
      #ifdef CTX_TY
      , &new_table.ctx
      #endif
    );

    if( VT_UNLIKELY( !allocation ) )
      return false;

    new_table.buckets = (VT_CAT( NAME, _bucket ) *)allocation;
    new_table.metadata = (uint16_t *)( (unsigned char *)allocation + VT_CAT( NAME, _metadata_offset )( &new_table ) );

    memset( new_table.metadata, 0x00, ( bucket_count + 4 ) * sizeof( uint16_t ) );

    // Iteration stopper at the end of the actual metadata array (i.e. the first of the four excess metadata).
    new_table.metadata[ bucket_count ] = 0x01;

    for( size_t bucket = 0; bucket < VT_CAT( NAME, _bucket_count )( table ); ++bucket )
      if( table->metadata[ bucket ] != VT_EMPTY )
      {
        VT_CAT( NAME, _itr ) itr = VT_CAT( NAME, _insert_raw )(
          &new_table,
          table->buckets[ bucket ].key,
          #ifdef VAL_TY
          &table->buckets[ bucket ].val,
          #endif
          true,
          false
        );

        if( VT_UNLIKELY( VT_CAT( NAME, _is_end )( itr ) ) )
-       {
-         FREE_FN(
-           new_table.buckets,
-           VT_CAT( NAME, _total_alloc_size )( &new_table )
-           #ifdef CTX_TY
-           , &new_table.ctx
-           #endif
-         );
-
-         bucket_count *= 2;
-         continue;
-       }
+         break;
      }

+   // If a key could not be reinserted due to the displacement limit, double the bucket count and retry.
+   if( new_table.key_count < table->key_count )
+   {
+     FREE_FN(
+       new_table.buckets,
+       VT_CAT( NAME, _total_alloc_size )( &new_table )
+       #ifdef CTX_TY
+       , &new_table.ctx
+       #endif
+     );
+
+     bucket_count *= 2;
+     continue;
+   }
+
    if( table->buckets_mask )
      FREE_FN(
        table->buckets,
        VT_CAT( NAME, _total_alloc_size )( table )
        #ifdef CTX_TY
        , &table->ctx
        #endif
      );

    *table = new_table;
    return true;
  }
}

The main issue that I'm having at the moment is that I haven't yet managed to craft a dataset and/or hash function that tests this code path (and therefore triggers the bug). What is needed is a dataset/hash function that causes a displacement limit violation when the bucket count is x but not when it is 2x (in which case the bug could presumably be triggered via _shrink). This is hard to craft because the displacement limit is, by design, very high (2048 quadratic jumps from the key's home bucket), but someone who is better than me at math might be able to figure it out.

Also note that the dev branch now contains some other changes that have been pending for a while. Most are insignificant (e.g. corrections to typos in the code comments), but the default integer hash function (was murmur3) has changed to a slightly weaker one (fast-hash) that proved to perform better in my benchmarks.

@rsmarples
Copy link
Author

Change looks good to me! I can't run it through my linter to verify until next week though as I'm on holiday.

@JacksonAllan
Copy link
Owner

JacksonAllan commented May 27, 2024

Hope you're enjoying your break!

I’ve just released v2.1.0, which includes the above fix. The new version is no longer generating the use-after-free warning in Clang-Tidy, as far as I can tell. I also finally managed to trigger the bug using _shrink and an (illegal) maximum load factor of 2.0 (setting MAX_LOAD to anything significantly above 1.0 forces the table to only rehash when a key cannot be inserted due to the displacement limit). The old version crashes, whereas the new version displays the correct behavior.

For the sake of organization, I think I will close this issue soon and open a new one regarding proper Clang-Tidy support.

@JacksonAllan
Copy link
Owner

New issue regarding proper Clang-Tidy support: #11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants