Chapel deployment

Are there any standard ways how Chapel applications are deployed?

We have discussed it very briefly with Brad (@bradcray), and it seems that distributing sources is the most common approach.

It so happens that scientists who use my code are not to experienced with various toolchains/compilers/etc. If I tell them to compile a pure-Chapel program and provide detailed instructions, they just might agree to do it. However, the moment external libraries get involved and users need to compile or module load them, they'll give up...

Potential solutions include

  • Static executables: one just has to download a file, give it exec permissions, and run. Then, as long as the application bundles a bunch of microkernels and dispatches between them automatically, one can nicely benefit from SIMD when it's supported. This is the approach that I've used in the past for single-node applications. It doesn't seem to work for multi-locale case though since now we have at least two executables (launcher & worker). AppImage and similar tools can be used to bundle those files together, but what about communication?
    We have to recompile our application if we change the value of CHPL_COMM and bundling different communication protocols together inside one executable doesn't seem to be supported. This brings us to
  • Containers (Singularity): it would be great if we could bundle all external dependencies and have pre-compiled binaries for various CHPL_COMM values. The container would then automatically pick the right one depending on the underlying hardware. And compared to Docker, Singularity doesn't put a lot of stuff between network libraries and the actual application, so very little (if any) performance is lost. That's all wishful thinking though as I have no idea how to achieve this in practice.
    I know Machine Learning folk do it with e.g. bundling CUDA toolkit, but using the system CUDA driver. That way, one only needs to compile and test the code for a specific CUDA toolkit version rather than trying to locate it on every platform. And no performance is lost since the right driver is used in the end.
    Could we perhaps do something similar with Chapel? I.e. compiling GASNet once and then somehow letting the dynamic linker choose the proper libraries?

These are just some thoughts and I'd be very curious to hear what other people with more experience in Chapel think and have come up with in the past.

Hi Tom —

Welcome to Discourse! With respect to Singularity, @ronawho mentioned that one of our users has done a bit with Chapel and Singularity, and shared his notes, but it's not immediately obvious to me whether it was a Chapel executable within Singularity or Chapel itself (at a glance, it seems to at least be the second). I was hoping to tag him on this thread, but am not finding his Discourse ID quickly, which is surprising to me. I'll mail and see if he can tag on or if it's OK to share his notes.

One reminder I'm hoping the development team can help me with is how well or poorly Chapel binaries relying on GASNet can be moved between systems, where my memory is that GASNet ties itself pretty closely to a system's capabilities, paths, and network capabilities. For a binary, the "paths" part probably doesn't matter, but the others may? This may be a question to take to the GASNet team themselves, but maybe Elliot or others have a quick opinion.

-Brad

I've been thinking about this a bit and I'm curious how many or which configurations would you want to support in a single Chapel executable bundle? Is it just different CHPL_COMM_SUBSTRATEs for various Gasnet? Or also supporting CHPL_COMM=ofi? And what about different launchers? The configuration space gets large very quickly!

I'm less familiar with portability of libfabric that requires providers.

Gasnet is linked to the chapel executable statically so except for host vs. target compatibility I wouldn't expect major issues. Other bundled third party libraries are linked dynamically (gmp comes to mind) and perhaps whatever else your specific program is bringing in from the outside. As it is today, I don't think we would be able to use appimage bundles with our bundled shared third party libraries because eg their RUNPATH is not relative (a requirement for appimage).

That points to using a container that would bundle these third party libraries and the various compiled chapel executables (and their _real counterparts) along with an entrypoint script that would dispatch appropriately depending on the end user's system. I'm not sure we could do this dispatching 100% accurately but perhaps the entrypoint script could at least list all the options. And then the only kicker is it also has to support running the *_real when it gets scheduled.

I think we could set something up to build these various configurations in a container to produce a container (ie. the build process happens in a container and the output is the distributable container). If the build and run environment have the same base container, I'd feel more confident something wouldn't get crossed when running the final product.

Andrew's comment here makes me realize: It could make sense to disable third-party library capabilities that you're not relying on, like GMP (assuming you're not using bigint) or RE2 (assuming you're not using regular expressions). Tougher might be qthreads or jemalloc (if they are dynamically linked... I'm not sure offhand) because we like those option over the fallbacks of CHPL_TASKS=fifo/pthreads or CHPL_MEM=cstdlib due to the performance benefits. But if you found that for your application these options were a wash, that would reduce the dependencies (and size of the binary you were shipping).

-Brad

FWIW I'd expect Singlularity containers to neatly handle the static vs dynamic linking issues. But, I myself don't have any experience trying to run one of these across different Infiniband systems.

@aconsroe-hpe I think different CHPL_COMM_SUBSTRATEs should suffice (ibv,udp, and smp). Such that testing on a laptop can be done with smp, running on local clusters can be done with udp, and people who need larger-scale jobs and have access to proper machines can use ibv. We don't often encounter Cray XC systems, hence no aries in the above list.

I think it's perfectly fine if the dispatching cannot be done automatically and the user has to pass in a command-line option to choose a specific substrate. Could you explain why running *_real could be problematic?

Using the same base container is definitely not a problem.

@bradcray dynamic libraries such as GMP require no special care when used inside Singularity containers. In my experience, they just work (or maybe I just got lucky :smiley: )

1 Like

I am just thinking that if you have a single container executable, the runscript (docs) would be a script that does the dispatch to the right binary inside the container. That exe is just the launcher, so it is then doing something like srun ... /absolute/path/to/foo_real, but foo_real is inside the container. So the launcher would actually need to launch something like srun ... /absolute/path/to/foo_container --real --substrate=ibv (or maybe those go in env vars, but same idea). The runscript would then do the same dispatch logic as before, but call the foo_real binary.

Does that clarify? Were you imagining a different setup?

I would need some help from @ronawho perhaps to shed some light on what it looks like to customize a launcher script today and/or how hard it would be to add support for that.

Thanks, Andrew! Yes, that makes total sense. I'm still getting used to the launcher-worker architecture you guys have (launcher calling srun seems extremely confusing to me...).
Indeed it'd be great to learn a bit more about how launchers work under the hood.

Also, launchers seem to have minimal footprint in terms off dynamic libraries which they need to run. Could we come up with a hack which would first build foo launcher inside the container, copy it to the outside world, and then create a script named foo_real which would simply call the container (without srun and such)? Would something like this fool foo into thinking foo_real is a normal Chapel executable?

Ooh that sounds like a great hack!

I initially thought an even bigger hack would be to symlink foo_real to the container itself and let the runscript take care of everything, but I think for maximal portability we don't want to rely on the underlying launcher to preserve env variables which would be necessary to know whether runscript should act as foo or foo_real and which substrate to use.

I think generating a script like

#!/usr/bin/env bash
CHPL_COMM_SUBSTRATE=ibv CHPL_REAL=1 exec ./container $@

might do the trick. (note that CHPL_REAL is not a real chapel env var and would be checked by the runscript)

In case you're not aware, running foo -v will at least print out the launcher command like srun ... foo_real that gives a bit of insight into what it's doing.

@aconsroe-hpe that -v option is really nice, thanks!

Now I wonder, do we actually need a launcher if we're using a container? I didn't know that CHPL_LAUNCHER=none was an option, but now I'm kind of liking it. It would allow us to generate just one executable (thus avoiding the "container calling container"-type problems), and we can ask the user to write a simple Slurm script, e.g.

#!/bin/bash
#SBATCH -N 2 --exclusive -n 2 -c 128 -t 1-00:00:00 ...
/path/to/container $@

The documentation says that CHPL_LAUNCHER=none doesn't work with Infiniband. What is the reason for it and is there a way to overcome it?

Hi Tom —

Not directly related to your latest response, but rather to this earlier mention of mine:

The following notes are from @npadmana about his experience running singularity with Chapel: My notes on running singularity with Chapel · GitHub in hopes that they may be of some benefit.

W.r.t. your latest post:

The documentation says that CHPL_LAUNCHER=none doesn't work with Infiniband. What is the reason for it and is there a way to overcome it?

My guess is that the mechanisms that we rely on to launch GASNet/ibv programs rely on some hidden arguments to be threaded from the GASNet launcher script to the executable (which woudl suggest that the launcher could be avoided, just at additional complexity for the user?). Let's check with @ronawho who has the most experience with Chapel on IB-based systems to see what he can tell us.

-Brad

The documentation says that CHPL_LAUNCHER=none doesn't work with Infiniband. What is the reason for it and is there a way to overcome it?

My guess is that the mechanisms that we rely on to launch GASNet/ibv programs rely on some hidden arguments to be threaded from the GASNet launcher script to the executable (which would suggest that the launcher could be avoided, just at additional complexity for the user?). Let's check with @ronawho who has the most experience with Chapel on IB-based systems to see what he can tell us.

Can you send a pointer to this documentation? Disclaimer that I don't have any experience with containers or running gasnet-ibv without a launcher, but at a high level GASNet needs something to tell it what nodes are participating and how it should exchange early set up information.

The 3 main supported ways of doing this today are with ssh, mpi, or pmi. With ssh gasnetrun_ibv will launch onto the compute nodes with ssh and then exchange set up information over sockets. With mpi, mpirun is used to launch and MPI is used to exchange set up information. With pmi something like slurm is used to launch and PMI (process management interface) is used to exchange set up information. Today our recommended approach is to use ssh because mpi can add overhead and pmi isn't always available.

Looking at the ssh code for an example, it builds up the list of nodes that will participate and stuffs those into an environment variable. It also builds up GASNET_SPAWN_ARGS, which determines which is the root node and other settings that can be different for different nodes. Maybe we could figure out what all environment variables need to be set and set those ourselves, but we'd have to reach out to the gasnet team to figure out how feasible this is.

Do you know how you're expecting to launch the executable onto the compute nodes? If you're imagining a manual srun or something, maybe PMI support could be added to the container and gasnet could use that to exchange set up information.

@ronawho I got this impression from the last bullet point here Chapel Launchers — Chapel Documentation 1.25

Regarding PMI support in a container (to be honest, I've never done or even tried to do it before) and the link which Brad shared, let me play with it for a few days and I'll be back with an update/further questions.

So I've played around with Singularity and different launchers for a while, and here's what I came up with so far: GitHub - twesterhout/chapel-deployment-playground: Experiments with using Singularity to deploy multi-locale Chapel executables.

README in the repository contains more information, but the gist is that I managed to create a Singularity container which compiles hello6-taskpar-dist example and which can be run on a laptop, on a cluster with Ethernet between nodes, on a cluster with Infiniband between nodes.

The container is heavily inspired by the work of @npadmana, but there are a few differences. OpenMPI is used instead of Intel MPI because a) I don't have a license to use Intel MPI and b) the clusters I'm which targeting, have AMD Epyc. I also really like the fact that apt package manager does the right thing, and I didn't have to compile MPI & PMIX from source. Finally, this configuration doesn't require setting any environment variables or creating binds.

However, when running on an Infiniband system, I get some weird error messages related to PMIX which I have no idea how to fix. They aren't fatal though, and the jobs run fine.

I'd really appreciate any feedback or suggestions on this.

The final step which is currently missing is creating a "release" container with just the necessary libraries to actually run the code. I expect this to greatly reduce the size of the container.

1 Like

The repo now also contains a release.def file which can build a minimal container with just enough libraries to run the executables. This reduces the container size from ~635MB to just 16MB.

1 Like

This is very cool, thanks for the update and notes Tom. The PMIX message you mention isn't at all familiar to me, nor is the source file it's referring to. On our team, @ronawho has the most experience with IB, but I don't know whether he would've seen it either. Googling, it looks like it is probably OpenMPI related? PMIX ERROR: ERROR in file gds_ds12_lock_pthread.c at line 206 · Issue #7516 · open-mpi/ompi · GitHub

-Brad

I'm not familiar with this offhand, but I found the same link Brad did and it seems like the warning is harmless and can be quieted with export PMIX_MCA_gds=hash

Elliot