This behavior got fixed, and I could now submit with a slurm script after recompiling Chapel setting the following. Now, it doesn't launch additional jobs.
export CHPL_HOME="/freya/ptmp/mpa/adutt/chapel-multi_locale/chapel-2.3.0"
export CHPL_COMM="gasnet"
export CHPL_COMM_SUBSTRATE="ofi"
export FI_PROVIDER="psm2"
export CHPL_LAUNCHER="slurm-gasnetrun_ofi"
export GASNET_OFI_SPAWNER="mpi"
export HFI_NO_CPUAFFINITY=1
export CHPL_LLVM="bundled"
export CHPL_TARGET_CPU="skylake"
export OMPI_MCA_btl="tcp,self"
export CHPL_GASNET_CFG_OPTIONS="--with-mpi-cc=mpicc"
Following is my job script:
#!/bin/bash
#SBATCH -t 0:10:0
#SBATCH --nodes=4
#SBATCH --exclusive
#SBATCH --partition=p.test
#SBATCH --output=output.chapel
module purge && module load gcc/11 openmpi/4.1 cmake/3.28 doxygen/1.10.0
export CHPL_HOME="/freya/ptmp/mpa/adutt/chapel-multi_locale/chapel-2.3.0"
export CHPL_COMM="gasnet"
export CHPL_COMM_SUBSTRATE="ofi"
export FI_PROVIDER="psm2"
export CHPL_LAUNCHER="slurm-gasnetrun_ofi"
export GASNET_OFI_SPAWNER="mpi"
export HFI_NO_CPUAFFINITY=1
export CHPL_LLVM="bundled"
export CHPL_TARGET_CPU="skylake"
export OMPI_MCA_btl="tcp,self"
export CHPL_GASNET_CFG_OPTIONS="--with-mpi-cc=mpicc"
$CHPL_HOME/bin/linux64-x86_64/chpl test-locales.chpl --fast -o test-locales
$CHPL_HOME/util/chplenv/printchplbuilds.py
# Set the Chapel program and dynamic number of locales
export PROG="./test-locales"
export ARGS="-nl $SLURM_NNODES" # --verbose" # Dynamically set the number of locales
# Run the Chapel program using srun
echo "Running Chapel program with $SLURM_NNODES locales..."
echo $CHPL_HOME
echo $CHPL_LAUNCHER
echo $GASNET_OFI_SPAWNER
$PROG $ARGS
In the output dump, I get a warning/error (error: The runtime has not been built for this configuration. Run $CHPL_HOME/util/chplenv/printchplbuilds.py for information on available runtimes.
), but everything seems to be okay!
Any tips will absolutely helpful, and I appreciate all the help from this discourse forum, without which it wouldn't have been feasible.
error: The runtime has not been built for this configuration. Run $CHPL_HOME/util/chplenv/printchplbuilds.py for information on available runtimes.
Running Chapel program with 4 locales...
/freya/ptmp/mpa/adutt/chapel-multi_locale/chapel-2.3.0
slurm-gasnetrun_ofi
mpi
(freyag01, 40)
(freyag02, 40)
(freyag03, 40)
(freyag04, 40)
Output with verbose
on
error: The runtime has not been built for this configuration. Run $CHPL_HOME/util/chplenv/printchplbuilds.py for information on available runtimes.
<Current> 0
CHPL_TARGET_PLATFORM: linux64 linux64
CHPL_TARGET_COMPILER: llvm llvm
CHPL_TARGET_ARCH: x86_64 x86_64
CHPL_TARGET_CPU: skylake skylake
CHPL_LOCALE_MODEL: flat flat
CHPL_COMM: gasnet gasnet
CHPL_COMM_DEBUG: - +*
CHPL_COMM_SUBSTRATE: ofi ofi
CHPL_GASNET_SEGMENT: everything everything
CHPL_TASKS: qthreads qthreads
CHPL_TASKS_DEBUG: - -
CHPL_TIMERS: generic generic
CHPL_UNWIND: none none
CHPL_MEM: jemalloc jemalloc
CHPL_ATOMICS: cstdlib cstdlib
CHPL_HWLOC: bundled bundled
CHPL_HWLOC_DEBUG: - -
CHPL_HWLOC_PCI: enable enable
CHPL_RE2: bundled bundled
CHPL_AUX_FILESYS: none none
CHPL_LIB_PIC: none none
CHPL_SANITIZE_EXE: none none
MTIME: NA Feb 06 15:13
Running Chapel program with 4 locales...
/freya/ptmp/mpa/adutt/chapel-multi_locale/chapel-2.3.0
slurm-gasnetrun_ofi
mpi
/freya/ptmp/mpa/adutt/chapel-multi_locale/chapel-2.3.0/third-party/gasnet/install/linux64-x86_64-skylake-llvm-none-debug/substrate-ofi/seg-everything/bin/gasnetrun_ofi -n 4 -N 4 -c 0 -E SLURM_MPI_TYPE,CONDA_SHLVL,LS_COLORS,LD_LIBRARY_PATH,CONDA_EXE,HOSTTYPE,SLURM_NODEID,SLURM_TASK_PID,SSH_CONNECTION,SPACK_PYTHON,LESSCLOSE,SLURM_PRIO_PROCESS,XKEYSYMDB,LANG,SLURM_SUBMIT_DIR,WINDOWMANAGER,LESS,OMPI_MCA_io,HOSTNAME,CHPL_TARGET_CPU,OLDPWD,__MODULES_SHARE_MODULEPATH,CSHEDIT,ENVIRONMENT,PROG,GPG_TTY,OPENMPI_HOME,LESS_ADVANCED_PREPROCESSOR,GASNET_OFI_SPAWNER,MPI_PATH,COLORTERM,CHOLLA_DIR,SLURM_CELL,ROCR_VISIBLE_DEVICES,SLURM_PROCID,CHPL_LAUNCHER,SLURM_JOB_GID,MACHTYPE,SLURMD_NODENAME,JOB_TMPDIR,MINICOM,SLURM_TASKS_PER_NODE,_CE_M,QT_SYSTEM_DIR,OSTYPE,XDG_SESSION_ID,MODULES_CMD,HFI_NO_CPUAFFINITY,SLURM_NNODES,USER,PAGER,DOMAIN,PLUTO_DIR,MORE,CHPL_COMM_SUBSTRATE,PWD,SLURM_JOB_NODELIST,HOME,SLURM_CLUSTER_NAME,CONDA_PYTHON_EXE,SLURM_NODELIST,SLURM_GPUS_ON_NODE,HOST,SSH_CLIENT,CHPL_COMM,XNLSPATH,CPATH,XDG_SESSION_TYPE,KRB5CCNAME,SLURM_JOB_CPUS_PER_NODE,INTERACTIVE,XDG_DATA_DIRS,MPCDF_SUBMODULE_COMBINATIONS,SLURM_TOPOLOGY_ADDR,_CE_CONDA,LIBGL_DEBUG,SLURM_WORKING_CLUSTER,__MODULES_LMALTNAME,GCC_HOME,SLURM_JOB_NAME,PROFILEREAD,TMPDIR,LIBRARY_PATH,SLURM_JOB_GPUS,SLURM_JOBID,SLURM_CONF,LOADEDMODULES,FI_PROVIDER,SLURM_NODE_ALIASES,SLURM_JOB_QOS,SLURM_TOPOLOGY_ADDR_PATTERN,SSH_TTY,FROM_HEADER,MAIL,SLURM_CPUS_ON_NODE,SLURM_JOB_NUM_NODES,SLURM_MEM_PER_NODE,LESSKEY,SPACK_ROOT,SHELL,TERM,XDG_SESSION_CLASS,CMAKE_HOME,SLURM_JOB_UID,ARGS,__MODULES_LMCONFLICT,XCURSOR_THEME,LS_OPTIONS,SLURM_JOB_PARTITION,SLURM_JOB_USER,CUDA_VISIBLE_DEVICES,CHPL_LLVM,SHLVL,SLURM_SUBMIT_HOST,G_FILENAME_ENCODING,SLURM_JOB_ACCOUNT,MANPATH,AFS,CELL,MODULEPATH,CHPL_HOME,SLURM_GTIDS,LOGNAME,DBUS_SESSION_BUS_ADDRESS,CLUSTER,XDG_RUNTIME_DIR,SYS,CHPL_GASNET_CFG_OPTIONS,XDG_CONFIG_DIRS,PATH,SLURM_JOB_ID,_LMFILES_,MODULESHOME,PKG_CONFIG_PATH,INFOPATH,JOB_SHMTMPDIR,G_BROKEN_FILENAMES,HISTSIZE,CPU,DOXYGEN_HOME,SLURM_LOCALID,CVS_RSH,GPU_DEVICE_ORDINAL,LESSOPEN,OMPI_MCA_btl,BASH_FUNC_module%%,BASH_FUNC_spack%%,BASH_FUNC__module_raw%%,BASH_FUNC__spack_shell_wrapper%%,BASH_FUNC_mc%%,BASH_FUNC_ml%%,_, /freya/ptmp/mpa/adutt/chapel-multi_locale/test-locales_real -nl 4 --verbose
0: using core(s) 0-39
oversubscribed = False
1: using core(s) 0-39
2: using core(s) 0-39
3: using core(s) 0-39
QTHREADS: Using 40 Shepherds
QTHREADS: Using 1 Workers per Shepherd
QTHREADS: Using 8384512 byte stack size.
QTHREADS: Using 40 Shepherds
QTHREADS: Using 1 Workers per Shepherd
QTHREADS: Using 8384512 byte stack size.
QTHREADS: Using 40 Shepherds
QTHREADS: Using 1 Workers per Shepherd
QTHREADS: Using 40 Shepherds
QTHREADS: Using 1 Workers per Shepherd
QTHREADS: Using 8384512 byte stack size.
QTHREADS: Using 8384512 byte stack size.
comm task bound to accessible PUs
PSHM is disabled.
executing locale 0 of 4 on node 'freyag01'
0: enter barrier for 'barrier before main'
executing locale 3 of 4 on node 'freyag04'
3: enter barrier for 'barrier before main'
executing locale 2 of 4 on node 'freyag03'
2: enter barrier for 'barrier before main'
executing locale 1 of 4 on node 'freyag02'
1: enter barrier for 'barrier before main'
3: enter barrier for 'fill node 0 globals buf'
0: enter barrier for 'fill node 0 globals buf'
2: enter barrier for 'fill node 0 globals buf'
1: enter barrier for 'fill node 0 globals buf'
0: enter barrier for 'broadcast global vars'
2: enter barrier for 'broadcast global vars'
3: enter barrier for 'broadcast global vars'
1: enter barrier for 'broadcast global vars'
1: enter barrier for 'pre-user-code hook: init done'
2: enter barrier for 'pre-user-code hook: init done'
3: enter barrier for 'pre-user-code hook: init done'
0: enter barrier for 'pre-user-code hook: init done'
0: enter barrier for 'pre-user-code hook: task counts stable'
1: enter barrier for 'pre-user-code hook: task counts stable'
3: enter barrier for 'pre-user-code hook: task counts stable'
0: enter barrier for 'pre-user-code hook: mem tracking inited'
2: enter barrier for 'pre-user-code hook: task counts stable'
1: enter barrier for 'pre-user-code hook: mem tracking inited'
3: enter barrier for 'pre-user-code hook: mem tracking inited'
(freyag01, 40)
2: enter barrier for 'pre-user-code hook: mem tracking inited'
(freyag02, 40)
(freyag03, 40)
(freyag04, 40)
0: enter barrier for 'stop polling'
1: enter barrier for 'stop polling'
2: enter barrier for 'stop polling'
3: enter barrier for 'stop polling'