[Chapel Merge] Add support for GASNet PSHM and send progress thread

Branch: refs/heads/main
Revision: 8ac3e6e4376b2b110b72b6c2e0af9bbf01c65693
Author: jhh67
Link: Add support for GASNet PSHM and send progress thread by jhh67 · Pull Request #25140 · chapel-lang/chapel · GitHub
Log Message:
Add support for GASNet PSHM and send progress thread (#25140)

This PR makes several changes to add support for GASNet PSHM
(shared-memory bypass) for co-locales, and to improve RDMA performance
via a send progress thread. These are related because the both require
changes to how GASNet progress threads are implemented. The changes
include:

  • Modify the communication task functionality in tasks-qthreads.c to
    support more than one progress thread.
  • When there are co-locales, create an external GASNet progress thread
    to progress PSHM.
  • When there is an external GASNet progress thread, do not create an
    internal RCV thread because it is redundant.
  • By default, set GASNET_SND_THREAD to create an internal GASNet
    thread that progresses send operations. This can be overridden by
    setting GASNET_SND_THREAD=false in the environment.
  • Set GASNET_SND_THREAD_POLL_MODE=exclusive, which gives the GASNet
    send progress thread exclusive access to the NIC.
  • If CHPL_RT_COMM_GASNET_DEDICATED_PROGRESS_CORE=true, reserve one
    core for internal and external GASNet progress threads and bind
    those threads to the core.
  • If CHPL_RT_COMM_GASNET_DEDICATED_PROGRESS_CORE=false, bind the
    internal and external GASNet progress threads to the locale’s
    accessible cores.
  • In the gasnet shim, set GASNET_SUPERNODE_MAXSIZE=1 if there are
    no-colocales. This will prevent the use of PSHM, which cannot be
    used because the gasnet shim does not create an external GASNet
    progress thread.
  • Interpose comm_task_trampoline between chpl_task_createCommTask and
    chpl_task_wrapper. chpl_task_createCommTask creates a thread that
    invokes comm_task_trampoline which binds itself to the proper
    core(s) before creating another thread that invokes
    chpl_task_wrapper. This ensures that the second thread is bound to
    the proper core(s) and its stack has the proper NUMA affinity. The
    thread created by chpl_task_createCommTask exits.

[Reviewed by @bonachea, @PHHargrove, and @jabraham17. Thank you.]

Compare: Comparing 38d3894f650b3f0b84301174286d23d9379baad5...e1270aa8b647c79cb41fcc9b42c1772dbbc56f25 · chapel-lang/chapel · GitHub

Diff:
M runtime/include/tasks/qthreads/chpl-tasks-impl-fns.h
M runtime/src/comm/gasnet/comm-gasnet-ex.c
M runtime/src/comm/gasnet/comm-gasnet.c
M runtime/src/tasks/qthreads/tasks-qthreads.c
M runtime/src/topo/hwloc/topo-hwloc.c
M third-party/gasnet/Makefile
https://github.com/chapel-lang/chapel/pull/25140.diff