[Chapel Merge] Allow jemalloc to merge/split chunks to reduce fix

Branch: refs/heads/main
Revision: 77e9e75
Author: ronawho
Log Message:

Merge pull request #18299 from ronawho/reduce-fixed-heap-fragmentation

Allow jemalloc to merge/split chunks to reduce fixed heap fragmentation

[reviewed by @gbtitus]

Jemalloc supports a set of chunk hooks that let it manage the lifetime
of memory chunks (the slabs of memory it gets from the "system".) For
comm=none or configurations that don't require a fixed or dynamic heap
we let jemalloc use its default chunk hooks. When using a fixed or
dynamic heap we provide chunk hooks, but we only supported the alloc
hook. We don't support the dealloc hook and historically we did not
support the merge or split hooks. This meant that once jemalloc got a
chunk from our heap it was never returned, and jemalloc was free to use
that chunk again. However, since we didn't support the merge or split
hooks jemalloc couldn't split chunks into smaller ones so while small
allocations could reuse old slots there would be a lot of unusable
memory at the end of the chunk. And since jemalloc couldn't merge
chunks, large allocations couldn't reuse/extend smaller previous chunks.

This led to pretty severe fragmentation when using a fixed heap (e.g.
gasnet configurations that use segment large/fast.) For instance if you
allocated a 100G array and then a 101G array that first 100G region
couldn't be used to help satisfy the 101G one. Similarly, if you
allocated a 100G array and then 2 50G arrays we couldn't split that
first 100G so 50G is wasted. This PR enables the split/merge hooks so
jemalloc is now able to merge for larger allocations and split for
smaller ones or do a combination of splitting and merging to satisfy new
allocations. There was nothing preventing us from enabling these before,
we just didn't realize the importance of them until now. This reduces
fragmentation and allows 2 new tests that have these types of allocation
patterns to pass instead of running out of memory.

Our fixed heap effectively provides an sbrk like interface where you
can only bump the offset into a contiguous blob of memory. This has
inherent fragmentation limitations (and is why most modern allocations,
including jemalloc, use mmap for large allocations), but enabling
merging and splitting should significantly improve the current
situation. A further improvement might be to satisfy large allocations
from the bottom of the heap and smaller ones from the top to avoid
having small allocations interfere with merging/splitting larger ones.

Longer term avoiding a fixed heap and separately allocating large arrays
like we do for ugni is probably the right strategy, but that will
require a significant amount of work. For this we'll either want a mode
that just uses jemalloc in a default configuration and do fully
on-demand memory registration, or we'll want dynamic registration like
we have under ugni. There are varying performance and implementation
tradeoffs for both of these strategies though.

Part of Using a fixed heap can lead to fragmentation · Issue #18286 · chapel-lang/chapel · GitHub

Modified Files:
A test/runtime/configMatters/mem/fragmentation/decreasingAlloc.chpl

A test/runtime/configMatters/mem/fragmentation/decreasingAlloc.good
A test/runtime/configMatters/mem/fragmentation/increasingAlloc.chpl
A test/runtime/configMatters/mem/fragmentation/increasingAlloc.good
M runtime/src/mem/jemalloc/mem-jemalloc.c

Compare: https://github.com/chapel-lang/chapel/compare/46af15aefa9b...77e9e7528986