20293, "ShreyasKhandekar", "Creating tuples in functions called from the body of forall loops makes it ineligible for the GPU", "2022-07-22T23:37:58Z"
Calling out a function that creates a tuple from inside the body of a forall
loop makes the loop non-gpuizable.
I observed this while working with the Sort benchmark of the SHOC suite
Steps to Reproduce
Source Code:
use GPUDiagnostics;
on here.gpus[0] {
startGPUDiagnostics();
forall i in 1..<20 {
createTuple();
}
stopGPUDiagnostics();
writeln(getGPUDiagnostics());
proc createTuple(){
var t : (int,);
}
}
Running the above code will give us the following output
(kernel_launch = 0)
but commenting out the line that creates the tuple inside the function gives us
(kernel_launch = 1)
Configuration Information
- Output of
chpl --version
:
chpl version 1.28.0 pre-release (a2c7053587)
built with LLVM version 13.0.0
Copyright 2020-2022 Hewlett Packard Enterprise Development LP
Copyright 2004-2019 Cray Inc.
(See LICENSE file for more details)
- Output of
$CHPL_HOME/util/printchplenv --anonymize
:
CHPL_TARGET_PLATFORM: cray-xc
CHPL_TARGET_COMPILER: llvm *
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native *
CHPL_LOCALE_MODEL: gpu *
CHPL_COMM: none *
CHPL_TASKS: qthreads
CHPL_LAUNCHER: slurm-srun *
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: bundled
CHPL_LLVM: system *
CHPL_AUX_FILESYS: none
- Back-end compiler and version, e.g.
gcc --version
orclang --version
:
gcc (GCC) 11.2.0 20210728 (Cray Inc.)
- (For Cray systems only) Output of
module list
:
Currently Loaded Modulefiles:
1) modules/3.2.11.4
2) craype-network-aries
3) nodestat/2.3.89-7.0.4.0_34.8__g8645157.ari
4) sdb/3.3.821-7.0.4.0_28.13__g8c59c9d.ari
5) udreg/2.3.2-7.0.4.0_37.11__g5f0d670.ari
6) ugni/6.0.14.0-7.0.4.0_28.11__ge0d449e.ari
7) gni-headers/5.0.12.0-7.0.4.0_38.14__gd0d73fe.ari
8) dmapp/7.1.1-7.0.4.0_40.13__gcec52bc.ari
9) xpmem/2.2.29-7.0.4.0_50.10__g35859a4.ari
10) llm/21.4.635-7.0.4.0_46.8__g33a55bc.ari
11) nodehealth/5.6.32-7.0.4.0_81.14__g66010cb.ari
12) system-config/3.6.3214-7.0.4.0_58.2__gcc05884c.ari
13) slurm/20.11.5-1
14) Base-opts/2.4.142-7.0.4.0_43.5__g8f27585.ari
15) cray-mpich/7.7.20
16) dws/3.0.38-7.0.4.0_69.9__gd993441.ari
17) cudatoolkit/10.2.89_3.28-7.0.3.0_2.66__g52c0314
18) gcc/11.2.0
19) craype/2.7.17.1
20) cray-libsci/20.09.1
21) pmi/5.0.17
22) atp/3.14.9
23) rca/2.2.22-7.0.4.0_27.13__ged51428.ari
24) perftools-base/22.04.0
25) PrgEnv-gnu/6.0.11