[Chapel Merge] Add support for GASNet PSHM and send progress thread

Branch: refs/heads/main
Revision: 8ac3e6e4376b2b110b72b6c2e0af9bbf01c65693
Author: jhh67
Link: Add support for GASNet PSHM and send progress thread by jhh67 · Pull Request #25140 · chapel-lang/chapel · GitHub
Log Message:
Add support for GASNet PSHM and send progress thread (#25140)

This PR makes several changes to add support for GASNet PSHM
(shared-memory bypass) for co-locales, and to improve RDMA performance
via a send progress thread. These are related because the both require
changes to how GASNet progress threads are implemented. The changes

  • Modify the communication task functionality in tasks-qthreads.c to
    support more than one progress thread.
  • When there are co-locales, create an external GASNet progress thread
    to progress PSHM.
  • When there is an external GASNet progress thread, do not create an
    internal RCV thread because it is redundant.
  • By default, set GASNET_SND_THREAD to create an internal GASNet
    thread that progresses send operations. This can be overridden by
    setting GASNET_SND_THREAD=false in the environment.
  • Set GASNET_SND_THREAD_POLL_MODE=exclusive, which gives the GASNet
    send progress thread exclusive access to the NIC.
    core for internal and external GASNet progress threads and bind
    those threads to the core.
    internal and external GASNet progress threads to the locale’s
    accessible cores.
  • In the gasnet shim, set GASNET_SUPERNODE_MAXSIZE=1 if there are
    no-colocales. This will prevent the use of PSHM, which cannot be
    used because the gasnet shim does not create an external GASNet
    progress thread.
  • Interpose comm_task_trampoline between chpl_task_createCommTask and
    chpl_task_wrapper. chpl_task_createCommTask creates a thread that
    invokes comm_task_trampoline which binds itself to the proper
    core(s) before creating another thread that invokes
    chpl_task_wrapper. This ensures that the second thread is bound to
    the proper core(s) and its stack has the proper NUMA affinity. The
    thread created by chpl_task_createCommTask exits.

[Reviewed by @bonachea, @PHHargrove, and @jabraham17. Thank you.]

Compare: Comparing 38d3894f650b3f0b84301174286d23d9379baad5...e1270aa8b647c79cb41fcc9b42c1772dbbc56f25 · chapel-lang/chapel · GitHub

M runtime/include/tasks/qthreads/chpl-tasks-impl-fns.h
M runtime/src/comm/gasnet/comm-gasnet-ex.c
M runtime/src/comm/gasnet/comm-gasnet.c
M runtime/src/tasks/qthreads/tasks-qthreads.c
M runtime/src/topo/hwloc/topo-hwloc.c
M third-party/gasnet/Makefile