New Issue: Creating tuples in functions called from the body of forall loops makes it ineligible for the GPU

20293, "ShreyasKhandekar", "Creating tuples in functions called from the body of forall loops makes it ineligible for the GPU", "2022-07-22T23:37:58Z"

Calling out a function that creates a tuple from inside the body of a forall loop makes the loop non-gpuizable.

I observed this while working with the Sort benchmark of the SHOC suite

Steps to Reproduce

Source Code:

use GPUDiagnostics;

on here.gpus[0] {
    startGPUDiagnostics();
    forall i in 1..<20 {
        createTuple();
    }
    stopGPUDiagnostics();
    writeln(getGPUDiagnostics());

    proc createTuple(){
        var t : (int,);
    }
}

Running the above code will give us the following output

(kernel_launch = 0)

but commenting out the line that creates the tuple inside the function gives us

(kernel_launch = 1)

Configuration Information

  • Output of chpl --version:
chpl version 1.28.0 pre-release (a2c7053587)
  built with LLVM version 13.0.0
Copyright 2020-2022 Hewlett Packard Enterprise Development LP
Copyright 2004-2019 Cray Inc.
(See LICENSE file for more details)
  • Output of $CHPL_HOME/util/printchplenv --anonymize:
CHPL_TARGET_PLATFORM: cray-xc
CHPL_TARGET_COMPILER: llvm *
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native *
CHPL_LOCALE_MODEL: gpu *
CHPL_COMM: none *
CHPL_TASKS: qthreads
CHPL_LAUNCHER: slurm-srun *
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: bundled
CHPL_LLVM: system *
CHPL_AUX_FILESYS: none
  • Back-end compiler and version, e.g. gcc --version or clang --version:
gcc (GCC) 11.2.0 20210728 (Cray Inc.)
  • (For Cray systems only) Output of module list:
Currently Loaded Modulefiles:
  1) modules/3.2.11.4
  2) craype-network-aries
  3) nodestat/2.3.89-7.0.4.0_34.8__g8645157.ari
  4) sdb/3.3.821-7.0.4.0_28.13__g8c59c9d.ari
  5) udreg/2.3.2-7.0.4.0_37.11__g5f0d670.ari
  6) ugni/6.0.14.0-7.0.4.0_28.11__ge0d449e.ari
  7) gni-headers/5.0.12.0-7.0.4.0_38.14__gd0d73fe.ari
  8) dmapp/7.1.1-7.0.4.0_40.13__gcec52bc.ari
  9) xpmem/2.2.29-7.0.4.0_50.10__g35859a4.ari
 10) llm/21.4.635-7.0.4.0_46.8__g33a55bc.ari
 11) nodehealth/5.6.32-7.0.4.0_81.14__g66010cb.ari
 12) system-config/3.6.3214-7.0.4.0_58.2__gcc05884c.ari
 13) slurm/20.11.5-1
 14) Base-opts/2.4.142-7.0.4.0_43.5__g8f27585.ari
 15) cray-mpich/7.7.20
 16) dws/3.0.38-7.0.4.0_69.9__gd993441.ari
 17) cudatoolkit/10.2.89_3.28-7.0.3.0_2.66__g52c0314
 18) gcc/11.2.0
 19) craype/2.7.17.1
 20) cray-libsci/20.09.1
 21) pmi/5.0.17
 22) atp/3.14.9
 23) rca/2.2.22-7.0.4.0_27.13__ged51428.ari
 24) perftools-base/22.04.0
 25) PrgEnv-gnu/6.0.11