Timothée Leclaire-Fournier
7f58269866
A benchmark with 50 objects with and without false sharing mitigations gives these values (for the initArray() function) on Windows: - With mitigations: ~5500 microseconds - Without mitigations: ~9000 microseconds On Linux: - With mitigations: ~600 microseconds - Without mitigations: ~700 microseconds |
||
---|---|---|
.idea | ||
.clang-format | ||
.gitignore | ||
allocPool.hpp | ||
CMakeLists.txt | ||
main.cpp | ||
README.md | ||
tests.cpp | ||
tests.hpp |
allocPool - A simple & high performance object pool using modern C++
This is allocPool, a pool of objects in a single header file that allow you to
avoid expensive allocations during runtime. This preallocates objects in the
constructor (with threads) then offers you two functions: getPtr()
and returnPtr(ptr)
.
Using C++ concepts, we can use templates and require the class given to have a
default constructor and to have a .reset()
function. It will be used to clean the
objects before giving them to another caller.
We avoid false sharing by keeping a high amount of work per thread. This should
lead to cache lines not being shared between threads. While this pool uses a hashmap
and a pivot to make returnPtr(ptr)
extremely fast, the construction's main bottleneck is
in the locking and unlocking of the hashmap's mutex. We need to do this since we cannot
write in a std::unordered_map
at different hashes concurrently.
It will automatically grow when the max capacity is reached, though there will be a performance penalty.
Performance
With a simple stub class and a pool of 10000 objects, using the pool to take a pointer and give it back for each element is significantly faster than doing it by hand.
class stub {
public:
stub() {
for (int j{}; j < 1000; j++) { i++; }
};
void reset() {}
private:
int i = 15;
};
On Linux:
Time (milliseconds) required for allocations without pool: 21
Time (milliseconds) required for allocations with pool: 3
Time (milliseconds) required for real allocations when constructing pool: 9
On Windows:
Time (milliseconds) required for allocations without pool: 62
Time (milliseconds) required for allocations with pool: 6
Time (milliseconds) required for real allocations when constructing pool: 51
This trivial example shows some performance improvements that would be much more important should the allocation and construction/destruction of the objects be more complex.
In the case where the allocator is very fast (such as glibc's on Linux) this approach may not be necessary, but in the case of slow allocators (such as Windows' default), it could be interesting to consider such an approach.
Safety
AddressSanitizer, LeakSanitizer and ThreadSanitizer have been used to ensure the safety of the class. Tests have been added to ensure the correct behavior in all cases.