[Chapel Merge] Optimize memory tracking with a `memThreshold`

Branch: refs/heads/main
Revision: a115645
Author: ronawho
Log Message:

Merge pull request #18465 from ronawho/opt-mem-track-with-threshold

Optimize memory tracking with a memThreshold

[reviewed by @gbtitus]

Memory tracking uses a global mutex to serialize access to a hash table,
this makes concurrent allocations very slow. Previously, even if a
memory threshold was used we would still grab the table lock when
free'ing because we didn't know the pointer size. Here add a
chpl_mem_real_alloc_size that will use the jemalloc API to ask for the
real size of the allocation before acquiring the lock. This allows us
to avoid taking the lock when free'ing allocations below the threshold,
which saves a lot of time.

Note that chpl_mem_real_alloc_size returns the real allocation size,
not requested size. e.g. (chpl_real_alloc_size(chpl_malloc(7)) returns
8. This means that we'll still do some unnecessary locking if the
allocation size is between the requested size and the real size of an
allocation. If we wanted to avoid that we could silently adjust
memThreshold up to the next allocation size class.

Here's a concurrent allocation micro-benchmark that demonstrates the
overhead. Results are on 128-core Rome CPU:

use Time;
config const trials = 1_000_000;

var t: Timer; t.start();
coforall 1..here.maxTaskPar do
  for i in 1..trials do
    var s = i:string;
writeln(t.elapsed());
config Time
w/o memTrack 0.19s
w/ memTrack 144.50s
w/ threshold before 33.06s
w/ threshold now 0.22s

This is motivated by Arkouda, which uses memTrack as a means to detect
if an operation will exceed memory. We recently noticed concurrent
allocations were slower than expected and tracked it down to this.

Related to spike: improve worst-case memory tracking performance · Issue #10415 · chapel-lang/chapel · GitHub
Resolves https://github.com/Cray/chapel-private/issues/1330

Modified Files:
M runtime/include/chpl-mem.h

M runtime/include/mem/cstdlib/chpl-mem-impl.h
M runtime/include/mem/jemalloc/chpl-mem-impl.h
M runtime/src/chplmemtrack.c

Compare: https://github.com/chapel-lang/chapel/compare/4137c72ef7f1...a1156456bac4