-
Notifications
You must be signed in to change notification settings - Fork 2
Windows implementation is racy #2
Comments
You can look at msvcrt source code for the correct Windows implementation. Specifically, _aligned_malloc that calls _aligned_offset_malloc_base, both are in the source file align.c. Here’s a comment from there that pretty much explains how it works:
This way _aligned_malloc() only calls malloc() once, therefore it’s thread safe. Besides the thread safety, your current Windows implementation is very inefficient. It wastes lots of RAM. VirtualAlloc can only allocate large chunks of memory, the granularity varies between systems but it’s typically 64kb. So you’re spending at least 64kb per allocation. |
Doesn't VirtualAlloc have page granularity (4 KB on most systems)? Anyway, this crate was originally intended only for large allocations with large alignments, both in the order of hundreds of KB to a few MBs. I didn't want to have an overhead of up to 100% for the alignment padding (when size==alignment), so I opted for a solution that can potentially free up the memory only used for alignment (since this is always some whole number of pages). Seeing how literally nobody uses this crate like that, I'd be fine with changing the implementation so that it handles smaller alignment values better, which the msvcrt implementation seems to do. Alternatively, we could copy Rust's own implementation as was suggested in #1. |
The size granularity is page so you aren’t wasting that much RAM, but the allocation granularity is more, 64kb on my system, so you’re wasting quite a lot of address space. Probably OK on a 64-bit OS but there’re still 32 bit systems out there, especially on mobile and embedded platforms (well, Windows no longer has a mobile one, but it still has embedded, e.g. Win10 IoT).
That’s a valid concern. But I think that it’s not a good practice anyway allocate that many small objects. And for larger buffers, even for just 10-20 aligned elements, that overhead becomes very small relatively speaking.
I’ve looked at the code you’ve linked. At the first sight, it looks very similar to what MS is doing in their CRT. |
The Windows implementation currently works like this:
Between the last 2 steps, a different thread could reserve the freed address space and make the allocation fail. To avoid wasting memory, the reserved space somehow has to be freed. Maybe there's a way to free a part of the chunk of address space allocated with
VirtualAlloc
? That way, we could keep the space for the allocation and free the rest.The text was updated successfully, but these errors were encountered: