-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pooling large blocks #54
Comments
I like FastMM a lot but for our production version of ISAPI module we simply cannot use it at all. Btw we also tried TBBMalloc in the past but we ended up with using of Windows' native |
Are those the ones from MSVCrt.dll ? |
Yes
I am not allowed to do that as we do have some extra stuff there but basically it is something like this:
You call then these internal functions |
Thanks, it makes sense, and so I came up with that:
While it works, it is horrendously slow, the test program above runs in 71 seconds ! |
That's interesting, we do have high contention so what I remember our module with FastMM was spending too much time in routines with It definitely depends on type of work the program does. |
Did any of you tried the FastMM4AVX fork? It seems that one of the improvements is the removal of |
Yes, the time reported above is actually the same with the "AVX" version. This is not surprising because I'm convinced the delays come from the absence of pools for large blocks. |
We are using FastMM in its latest version in our application and are very pleased with its capabilities, especially the full debug mode support.
However, in memory intensive cases, its performances are really below what we can achieve with TBBMalloc, but this one has serious disadvantages of its own on machines with lots of cores (>16)
Digging in our own source code, we came up with a simple example that illustrates the issue, which code is as follows:
On my Core i7 computer, this takes around 25s while the same program with TBBMalloc takes only 5.
Looking at FastMM source code, I discovered that this is because our
TStringList
quickly grows above the maximum medium block size which is 264768 bytes and thus leads to lots of calls toVirtualAlloc
insideAllocateLargeBlock
. In the program above, there are 448 calls, which, if this is the only difference, accounts for 40ms per call toVirtualAlloc
(that sounds quite realistic).I tried adjusting
MediumBlockBinGroupCount
so that I get a larger value forMaximumMediumBlockSize
but all I achieved was to get Access violations very fast.In the end, I believe it would be much nicer if the large blocks were also pooled like small and large blocks, which would be very nice for us as we are manipulating lots of objects in lists under our x64 applications.
Would anyone have any suggestion on this subject?
The text was updated successfully, but these errors were encountered: