20183, "ronawho", "Status of Chapel on ARM macs", "2022-07-08T20:20:05Z"
opened 08:20PM - 08 Jul 22 UTC
TL;DR -- Chapel works on arm based macs, but requires gasnet-udp for simulating … multiple locales and may have suboptimal performance since we can't currently use our preferred codegen and runtime components.
---
As of https://github.com/chapel-lang/chapel/pull/20160, Chapel is fully functional on ARM-based macs. We have nightly testing that passes all tests under `CHPL_COMM=none` and all multi-locale tests under `CHPL_COMM=gasnet` with the [portable UDP conduit](https://chapel-lang.org/docs/platforms/udp.html) and local spawning. However, there are some configurations that don't work and some suboptimal configuration choices that limit performance.
We're using the C-backend instead of our preferred LLVM backend for code generation due to issues discovered in https://github.com/chapel-lang/chapel/issues/19218. We're working on this and tracking the status internally in https://github.com/Cray/chapel-private/issues/3533. There likely aren't any major performance downsides to this, but we'd still like to default to the llvm backend like we do everywhere else for consistency.
We're also using fifo tasking instead of our preferred qthreads implementation. fifo maps each task to a pthread, whereas qthreads multiplexes tasks over system threads in user space. Normally this results in substantial performance benefits, but qthreads currently lacks native assembly for task creation and switching, and the system based `ucontext` has turned out to be slower than expected. Details are in https://github.com/chapel-lang/chapel/pull/20115 and we're tracking this internally in https://github.com/Cray/chapel-private/issues/3531. I'm hopeful that qthreads will add native asm support for their 1.18 release sometime later this year and we can then switch back to qthreads.
Lastly, we're using cstdlib instead of our preferred jemalloc for memory allocation. jemalloc tends to provide much faster concurrent allocations and it's required to support configurations that use a fixed heap (gasnet w/ segment large/fast.) We ran into undiagnosed issues when using jemalloc for multi-locale configurations in https://github.com/chapel-lang/chapel/issues/17825. For the moment we've decided to fall back to cstdlib, which limits performance and also means gasnet segment fast/large will not work. This means that gasnet-udp w/ segment everything is the supported mechanism from testing multi-locale chapel on ARM macs at the moment. This is tracked internally at https://github.com/Cray/chapel-private/issues/3532 and I'm hoping to dig into this issue before the Chapel 1.28 release, but do not have an exact timeline for now.
And lastly `mason external` does not currently work, due to us using an older spack version.This is tracked internally at https://github.com/Cray/chapel-private/issues/3518
TL;DR -- Chapel works on arm based macs, but requires gasnet-udp for simulating multiple locales and may have suboptimal performance since we can't currently use our preferred codegen and runtime components.