22343, "psath", "GPU references to members of CPU-initialized class instances cause uninitialized read with 'local{}' wrapper and runtime fault without", "2023-05-18T19:50:14Z"
opened 07:50PM - 18 May 23 UTC
type: Bug
user issue
area: GPU Support
### Summary of Problem
Expected behavior: class instance is copied to GPU via t… emp/argument and members can be referenced like is already working for records. (EDIT: records are spotty, they seem to work with or without `local { ... }` *only if* there are no class instance references in the kernel). OR an error about 'CPU class instance member access'
Should not silently give zeros for the member reference if wrapped with `local{ ... }`
Should not runtime fault if **not** wrapped with `local{ ... }`
### Steps to Reproduce
Create an instance of a class on the CPU, refer to one of its scalar members within a GPUized `forall` loop.
(I added the `local`-wrapped loop in the middle because it has helped when dealing with pointers on `CHPL_LOCALE_MODEL=gpu` before: https://chapel.discourse.group/t/1-29-0-cannot-deref-a-c-ptrto-unmanaged-class-instance-comm-none-assertion-node-0-failed/19696/3?u=psath)
CHPL_GPU_MEM_STRATEGY does not seem to matter.
**Source Code:**
[gpuUnmanagedMemberRef.chpl.txt](https://github.com/chapel-lang/chapel/files/11511319/gpuUnmanagedMemberRef.chpl.txt)
**Actual output:**
```
CPU and GPU_local arrays differ!
CPU: 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234
GPU_local: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
internal error: gpu-cuda.c:284: Error calling CUDA function: an illegal memory access was encountered (Code: 700)
```
**Expected output:** (None)
```
```
**Compile command:**
`chpl -g --devel --verify gpuUnmanagedMemberRef.chpl -o unmanagedMemberGPU`
**Execution command:**
`./unmanagedMemberGPU`
### Configuration Information
- Output of `chpl --version`:
```
chpl version 1.31.0 pre-release (d7664c9d81)
built with LLVM version 14.0.0
available LLVM targets: m68k, xcore, x86-64, x86, wasm64, wasm32, ve, systemz, sparcel, sparcv9, sparc, riscv64, riscv32, ppc64le, ppc64, ppc32le, ppc32, nvptx64, nvptx, msp430, mips64el, mips64, mipsel, mips, lanai, hexagon, bpfeb, bpfel, bpf, avr, thumbeb, thumb, armeb, arm, amdgcn, r600, aarch64_32, aarch64_be, aarch64, arm64_32, arm64
Copyright 2020-2023 Hewlett Packard Enterprise Development LP
Copyright 2004-2019 Cray Inc.
(See LICENSE file for more details)
```
- Output of `$CHPL_HOME/util/printchplenv --anonymize`:
```
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: bundled
CHPL_LLVM: system
CHPL_AUX_FILESYS: none
```
My build is not in $CHPL_HOME, but I set the following when building the compiler/runtime
```
export CHPL_LOCALE_MODEL=gpu
export CHPL_LLVM=system
export CHPL_HOST_COMPILER=clang
export CHPL_TARGET_COMPILER=llvm
```
- Back-end compiler and version, e.g. `gcc --version` or `clang --version`:
```
$ clang --version
Ubuntu clang version 14.0.0-1ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
```
- (For Cray systems only) Output of `module list`:
Summary of Problem
Expected behavior: class instance is copied to GPU via temp/argument and members can be referenced like is already working for records. OR error about 'outer variable access'
Should not silently give zeros for the member reference if wrapped with local{ ... }
Should not runtime fault if not wrapped with local{ ... }
Steps to Reproduce
Create an instance of a class on the CPU, refer to one of its scalar members within a GPUized forall loop.
(I added the local-wrapped loop in the middle because it has helped when dealing with pointers on CHPL_LOCALE_MODEL=gpu before: [1.29.0] Cannot deref() a c_ptrTo(unmanaged class instance): comm-none: Assertion `node==0` failed - #3 by stonea )
CHPL_GPU_MEM_STRATEGY does not seem to matter.
Source Code:
gpuUnmanagedMemberRef.chpl.txt
Actual output:
CPU and GPU_local arrays differ!
CPU: 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234 5678 8675309 1234
GPU_local: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
internal error: gpu-cuda.c:284: Error calling CUDA function: an illegal memory access was encountered (Code: 700)
Expected output: (None)
Compile command:
chpl -g --devel --verify gpuUnmanagedMemberRef.chpl -o unmanagedMemberGPU
Execution command:
./unmanagedMemberGPU
Configuration Information
Output of chpl --version:
chpl version 1.31.0 pre-release (d7664c9d81)
built with LLVM version 14.0.0
available LLVM targets: m68k, xcore, x86-64, x86, wasm64, wasm32, ve, systemz, sparcel, sparcv9, sparc, riscv64, riscv32, ppc64le, ppc64, ppc32le, ppc32, nvptx64, nvptx, msp430, mips64el, mips64, mipsel, mips, lanai, hexagon, bpfeb, bpfel, bpf, avr, thumbeb, thumb, armeb, arm, amdgcn, r600, aarch64_32, aarch64_be, aarch64, arm64_32, arm64
Copyright 2020-2023 Hewlett Packard Enterprise Development LP
Copyright 2004-2019 Cray Inc.
(See LICENSE file for more details)
Output of $CHPL_HOME/util/printchplenv --anonymize:
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: bundled
CHPL_LLVM: system
CHPL_AUX_FILESYS: none
My build is not in $CHPL_HOME, but I set the following when building the compiler/runtime
export CHPL_LOCALE_MODEL=gpu
export CHPL_LLVM=system
export CHPL_HOST_COMPILER=clang
export CHPL_TARGET_COMPILER=llvm
Back-end compiler and version, e.g. gcc --version or clang --version:
$ clang --version
Ubuntu clang version 14.0.0-1ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
(For Cray systems only) Output of module list: