Hello!
This is Akihiro@GATech.
I'm very excited to see the prototype GPU codegen feature released in 1.24! Congratulations! I'd like to run the gpuAddNums.chpl
example on one of my GPU machines, but looks like the compiler does not generate a fatbin file. Any suggestions?
For more details, please see below:
-
I'm trying to compile this program: chapel/gpuAddNums.chpl at master · chapel-lang/chapel · GitHub
-
I gave the compiler options in gpuAddNums.compopts.
-
The only change I made was to change the compute capability option from sm_60 -> sm_61. However, regardless of this, the compiler gives me this warning:
warning: argument unused during compilation: '--cuda-gpu-arch=sm_61'
-
To build chapel, I used the tar.gz file and here is the output of my printchplenv:
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: gnu
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_REGEXP: re2
CHPL_LLVM: bundled
CHPL_AUX_FILESYS: none
-
Looks like the NVPTX backend is enabled in the bundled LLVM:
$CHPL_HOME/third-party/llvm/install/linux64-x86_64-gnu/bin/llc --version LLVM (http://llvm.org/):
LLVM version 11.0.1
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake
Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_32 - AArch64 (little endian ILP32)
aarch64_be - AArch64 (big endian)
arm64 - ARM64 (little endian)
arm64_32 - ARM64 (little endian ILP32)
nvptx - NVIDIA PTX 32-bit
nvptx64 - NVIDIA PTX 64-bit
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
Please let me know if you need more information!
Thanks,
Akihiro
Hi Akihiro,
Thanks for trying out our GPU codegen feature! Could you try setting the environment variable CHPL_LOCALE_MODEL
to gpu
instead of flat
? Let me know if the fatbin file is still not being generated.
Thanks,
Sarah
Hi Sarah,
Thank you for your prompt reply. With CHPL_LOCALE_MODEL=gpu
, I was able to generate the fatbin file and run it on my GPU!
Just checking, is gpuAddNums.chpl
the only GPU example so far?
Thanks,
Akihiro
Great! Yes, that's the only GPU example we have so far. We're currently working on more complicated GPU examples for the near future.
Sarah
Hi Sarah,
Sorry, I've got another problem when linking my Chapel module that has an external C function with CHPL_GPU_LOCALE=gpu
Could you take a look at it when you get a chance?
I created a simple program (w/o the GPU code generation) that reproduces the problem:
// CPrint.chpl
module CPrint {
extern proc printHello();
}
// test.chpl
use CPrint;
printHello();
// test.h
void printHello();
// test.c
#include <stdio.h>
void printHello() { printf("Hello from C\n"); }
With CHPL_GPU_LOCALE=flat
, I was able to compile and run it:
$ CHPL_LOCALE_MODEL=flat chpl --llvm --ccflags --cuda-gpu-arch=sm_60 --savec=tmp test.h test.o test.chpl -L/usr/local/cuda/lib64
warning: argument unused during compilation: '--cuda-gpu-arch=sm_60'
$ ./test
Hello from C
However, with CHPL_GPU_LOCALE=gpu
, I got an linker error saying that the external C function is not found:
CHPL_LOCALE_MODEL=gpu chpl --llvm --ccflags --cuda-gpu-arch=sm_60 --savec=tmp test.h test.o test.chpl -L/usr/local/cuda/lib64
warning: Unknown CUDA version. version.txt: 10.2.89. Assuming the latest supported version 10.1
warning: Unknown CUDA version. version.txt: 10.2.89. Assuming the latest supported version 10.1
tmp/chpl__module.o: In function `chpl__init_test':
root:(.text+0x128d10): undefined reference to `printHello()'
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
error: Make Binary - Linking
I gave it a try to see the linker command by adding --ldflags -v
, but apparently 1) test.o
, which should include the body of printHello()
, is linked, and 2) there is no significant difference between the flat
and gpu
versions (Looks like -lnuma
is added to the gpu
version though)
Please let me know if you have any suggestions.
Best regards,
Akihiro
Hi Akihiro,
I'm able to reproduce the issue and will work on fixing it. Thanks for bringing it up. In the meantime, if it helps, you should be able to run the following:
extern {
#include <stdio.h>
void printHello() { printf("Hello from C\n"); }
}
printHello();
with both CHPL_LOCALE_MODEL=flat
and CHPL_LOCALE_MODEL=gpu
Hi Sarah,
Thank you very much for reproducing the issue and working on fixing it! Sure, for now, I'll do the workaround. Thank you!
Best regards,
Akihiro
Hi Akihiro,
I'm still working on a fix but I have another workaround, if you modify your test.h
file to
// test.h
#ifdef __cplusplus
extern "C" {
#endif
void printHello();
#ifdef __cplusplus
}
#endif
you should be able to compile test.chpl
with CHPL_LOCALE_MODEL=gpu
(i.e. CHPL_LOCALE_MODEL=gpu chpl --llvm --ccflags --cuda-gpu-arch=sm_60 --savec=tmp test.h test.o test.chpl -L/usr/local/cuda/lib64
)
Hi Sarah,
Thank you very much! It works! I think that's the solution I was looking for.
Best regards,
Akihiro
Hi Akihiro,
I recently implemented a fix for this issue on Chapel's master branch. You should now be able to compile test.chpl
with CHPL_LOCALE_MODEL=gpu
without using a workaround. Let me know if you run into any problems.
Thanks,
Sarah
Hi Sarah,
That's great to hear! Thank you very much.
Best regards,
Akihiro