Chapel on an open-science Nvidia GPU + ARM system

Hi Chapel User Community —

At the OpenSHMEM 2021 workshop today, there were some questions from the audience about running Chapel and Arkouda on a new 'open-science' system being procured that features Nvidia GPUs and ARM-based CPUs. It was proposed that we kick off a thread in the public community to continue the discussion, so this post is intended to be a place for that, for those who are interested in chiming in or contributing.

Next week's Chapel 1.25 release will feature a nice step forward in terms of code generation from Chapel to Nvidia GPUs, though it's definitely still early days in that effort. Jim Dinan and Oscar Hernandez from Nvidia have offered to serve as friendly faces (time permitting) when we have Nvidia questions or challenges around best practices as we continue to wrestle with that effort going forward [tagging @e-kayrakli, @stonea, @diten, @mppf, and @mstrout who are heading up that effort].

Chapel support for ARM is generally in good shape, though after the Q&A wrapped up, @ronawho reminded me that Qthreads is currently lacking native assembly context switches, which can affect performance for applications that rely heavily on task-switching in performance-critical sections. For the Chapel aggregators we were describing today, this would most likely show up in processing the 'on-clauses' (active messages) that implement Chapel's aggregators' flushBuffer() operations. If that ends up being the case, we should check with the Sandia team to see where this is on their roadmap.

Also mentioned in the discussion was a possible opportunity to rewrite a new Astrophysics code from scratch. That will likely ultimately deserve a thread of its own, but I thought I'd mention it here while we're spinning this conversation up, and tag @npadmana (Yale Astrophysics Professor and Chapel enthusiast) since it may catch his interest (though I recognize that Astrophysics is a big and diverse field).

That's my opening summary, where I've tried to err on the side of vagueness to not call anyone or anything out prematurely. Others who are interested in this topic and effort should feel encouraged to introduce themselves and provide additional detail, as appropriate.



Happy to help! :wave:

1 Like

LANL has several Arm CPU only systems. We also have at least one larger system which is AMD+Nvidia GPU's. If we can help in getting GPU's more capable within Chapel/Arkouda, this would be very useful.

1 Like

For Qthreads, work is in progress for native (assembly-based) context switching on arm64. The branch at GitHub - Qthreads/qthreads at armv8-fastcontext-switch includes the assembly code that we are working on.

We're getting closer. Some but not all of the tests are passing, so there are still some wrinkles to work out. This past month we have focused on getting the latest Qthreads release out, which we had to do for an ECP milestone, so that diverted attention for a time.

At SNL we have lots of arm nodes in several systems on various networks, so the limiting factor for us is not systems to test on but developer effort and eyes on code -- debugging this stuff is hairy of course.

If the new LANL system is a point of collaboration for accelerating this work, perhaps we should see if George Stelle (Qthreads team alum and now LANL staff member) has any interest and available cycles?

1 Like

I will ping him and see what can be done. Thanks!!

I contacted George. He is more than willing to continue to engage. So, he is still interested, just changed his zip code, but not his area code. :wink:

Just joined! Looking forward to the conversations... Oscar