How to use the prototype GPU codegen feature in 1.24?

ahayashi · April 9, 2021, 1:40pm

Hello!

This is Akihiro@GATech.

I'm very excited to see the prototype GPU codegen feature released in 1.24! Congratulations! I'd like to run the gpuAddNums.chpl example on one of my GPU machines, but looks like the compiler does not generate a fatbin file. Any suggestions?

For more details, please see below:

I'm trying to compile this program: chapel/gpuAddNums.chpl at master · chapel-lang/chapel · GitHub
I gave the compiler options in gpuAddNums.compopts.
The only change I made was to change the compute capability option from sm_60 -> sm_61. However, regardless of this, the compiler gives me this warning:
warning: argument unused during compilation: '--cuda-gpu-arch=sm_61'
To build chapel, I used the tar.gz file and here is the output of my printchplenv:
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: gnu
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_REGEXP: re2
CHPL_LLVM: bundled
CHPL_AUX_FILESYS: none
Looks like the NVPTX backend is enabled in the bundled LLVM:
$CHPL_HOME/third-party/llvm/install/linux64-x86_64-gnu/bin/llc --version LLVM (http://llvm.org/):
LLVM version 11.0.1
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake

Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_32 - AArch64 (little endian ILP32)
aarch64_be - AArch64 (big endian)
arm64 - ARM64 (little endian)
arm64_32 - ARM64 (little endian ILP32)
nvptx - NVIDIA PTX 32-bit
nvptx64 - NVIDIA PTX 64-bit
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64

Please let me know if you need more information!

Thanks,

Akihiro

snguyen · April 9, 2021, 4:25pm

Hi Akihiro,

Thanks for trying out our GPU codegen feature! Could you try setting the environment variable CHPL_LOCALE_MODEL to gpu instead of flat? Let me know if the fatbin file is still not being generated.

Thanks,
Sarah

ahayashi · April 9, 2021, 4:54pm

Hi Sarah,

Thank you for your prompt reply. With CHPL_LOCALE_MODEL=gpu, I was able to generate the fatbin file and run it on my GPU!

Just checking, is gpuAddNums.chpl the only GPU example so far?

Thanks,

Akihiro

snguyen · April 9, 2021, 5:16pm

Great! Yes, that's the only GPU example we have so far. We're currently working on more complicated GPU examples for the near future.

Sarah

ahayashi · April 9, 2021, 5:21pm

Great, Thank you!

Best regards,

Akihiro

ahayashi · April 9, 2021, 6:49pm

Hi Sarah,

Sorry, I've got another problem when linking my Chapel module that has an external C function with CHPL_GPU_LOCALE=gpu Could you take a look at it when you get a chance?

I created a simple program (w/o the GPU code generation) that reproduces the problem:

// CPrint.chpl
module CPrint {
    extern proc printHello();
}

// test.chpl
use CPrint;
printHello();

// test.h
void printHello();

// test.c
#include <stdio.h>
void printHello() { printf("Hello from C\n"); }

With CHPL_GPU_LOCALE=flat, I was able to compile and run it:

$ CHPL_LOCALE_MODEL=flat chpl --llvm --ccflags --cuda-gpu-arch=sm_60 --savec=tmp test.h test.o test.chpl -L/usr/local/cuda/lib64
warning: argument unused during compilation: '--cuda-gpu-arch=sm_60'
$ ./test
Hello from C

However, with CHPL_GPU_LOCALE=gpu, I got an linker error saying that the external C function is not found:

CHPL_LOCALE_MODEL=gpu chpl --llvm --ccflags --cuda-gpu-arch=sm_60 --savec=tmp test.h test.o test.chpl -L/usr/local/cuda/lib64
warning: Unknown CUDA version. version.txt: 10.2.89. Assuming the latest supported version 10.1
warning: Unknown CUDA version. version.txt: 10.2.89. Assuming the latest supported version 10.1
tmp/chpl__module.o: In function `chpl__init_test':
root:(.text+0x128d10): undefined reference to `printHello()'
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
error: Make Binary - Linking

I gave it a try to see the linker command by adding --ldflags -v, but apparently 1) test.o, which should include the body of printHello(), is linked, and 2) there is no significant difference between the flat and gpu versions (Looks like -lnuma is added to the gpu version though)

Please let me know if you have any suggestions.

Best regards,

Akihiro

snguyen · April 9, 2021, 9:00pm

Hi Akihiro,

I'm able to reproduce the issue and will work on fixing it. Thanks for bringing it up. In the meantime, if it helps, you should be able to run the following:

extern {
  #include <stdio.h>
  void printHello() { printf("Hello from C\n"); }
}
printHello();

with both CHPL_LOCALE_MODEL=flat and CHPL_LOCALE_MODEL=gpu

ahayashi · April 9, 2021, 9:36pm

Hi Sarah,

Thank you very much for reproducing the issue and working on fixing it! Sure, for now, I'll do the workaround. Thank you!

Best regards,

Akihiro

snguyen · April 13, 2021, 4:45am

Hi Akihiro,

I'm still working on a fix but I have another workaround, if you modify your test.h file to

// test.h
#ifdef __cplusplus
extern "C" {
#endif

void printHello();

#ifdef __cplusplus
}
#endif

you should be able to compile test.chpl with CHPL_LOCALE_MODEL=gpu (i.e. CHPL_LOCALE_MODEL=gpu chpl --llvm --ccflags --cuda-gpu-arch=sm_60 --savec=tmp test.h test.o test.chpl -L/usr/local/cuda/lib64)

ahayashi · April 13, 2021, 3:03pm

Hi Sarah,

Thank you very much! It works! I think that's the solution I was looking for.

Best regards,

Akihiro

snguyen · April 27, 2021, 4:40pm

Hi Akihiro,

I recently implemented a fix for this issue on Chapel's master branch. You should now be able to compile test.chpl with CHPL_LOCALE_MODEL=gpu without using a workaround. Let me know if you run into any problems.

Thanks,
Sarah

ahayashi · April 28, 2021, 3:21pm

Hi Sarah,

That's great to hear! Thank you very much.

Best regards,

Akihiro

Topic		Replies	Views	Activity
See CUDA kernels after compilation? Users	2	55	June 17, 2024
Announcing Chapel 1.25.0! Announcements	0	403	September 24, 2021

How to use the prototype GPU codegen feature in 1.24?

Related topics