25610, "e-kayrakli", "Some complex math functions don't compile with GPU support enabled", "2024-07-22T21:19:52Z"
opened 09:19PM - 22 Jul 24 UTC
type: Bug
area: Compiler
user issue
area: GPU Support
Reported in https://chapel.discourse.group/t/complex-sine-broken-with-gpu-enable… d-chapel/35562
GPU kernels don't support complex today, but the issue has nothing to do with GPU kernels, where:
```chpl
use Math only sin;
writeln(sin(0.0+1.0i));
```
fails compilation with the failure `CHPL_HOME/modules/internal/ChapelStandard.chpl:24: error: Could not find C function for csin; perhaps it is missing or is a macro?` with NVIDIA.
I can't tell exactly why `csin` is missing. CUDA has complex headers which must provide `csin`. I can reproduce the issue by:
```
chpl foo.chpl --print-commands
# record the last `clang` call
clang <args> foo.c # foo.c just calls `csin`
```
Using the same `args` as `chpl` uses for `clang`, you'll get the same issue. Drop `-x cuda`, and it compiles just fine. I don't see any noticeable difference in `-v` output from clang. I feel like some `#ifdef`s get thrown off either in Chapel runtime or the clang/cuda headers. I suspect this is about a missing flag to that `clang` invocation, but I don't know what that is.
A potential solution is to call `builtin` versions of the missing complex functions, but that doesn't feel quite satisfying as fixing the compilation.
```diff
diff --git a/modules/standard/Math.chpl b/modules/standard/Math.chpl
index 920b56b14b..3a67618681 100644
--- a/modules/standard/Math.chpl
+++ b/modules/standard/Math.chpl
@@ -1215,8 +1215,8 @@ module Math {
inline proc sin(x: complex(128)): complex(128) {
pragma "fn synchronization free"
pragma "codegen for CPU and GPU"
- extern proc csin(z: complex(128)): complex(128);
- return csin(x);
+ extern proc chpl_csin(z: complex(128)): complex(128);
+ return chpl_csin(x);
}
/*
diff --git a/runtime/include/chplmath.h b/runtime/include/chplmath.h
index 93be0bb0dd..8945638205 100644
--- a/runtime/include/chplmath.h
+++ b/runtime/include/chplmath.h
@@ -53,6 +53,8 @@ MAYBE_GPU static inline float chpl_sqrt32(float x) { return sqrtf(x); }
MAYBE_GPU static inline double chpl_fabs64(double x) { return fabs(x); }
MAYBE_GPU static inline float chpl_fabs32(float x) { return fabsf(x); }
+MAYBE_GPU static inline _complex128 chpl_csin(_complex128 x) { return __builtin_csin(x); }
+
// 32-bit Bessel functions aren't available on all platforms. For cases where
// we know they're available use them since they should be faster, but in other
// cases default to using the 64-bit versions and casting.
```
makes the snippet above compile successfully.
Reported in Complex sine broken with gpu-enabled chapel
GPU kernels don't support complex today, but the issue has nothing to do with GPU kernels, where:
use Math only sin;
writeln(sin(0.0+1.0i));
fails compilation with the failure CHPL_HOME/modules/internal/ChapelStandard.chpl:24: error: Could not find C function for csin; perhaps it is missing or is a macro?
with NVIDIA.
I can't tell exactly why csin
is missing. CUDA has complex headers which must provide csin
. I can reproduce the issue by:
chpl foo.chpl --print-commands
# record the last `clang` call
clang <args> foo.c # foo.c just calls `csin`
Using the same args
as chpl
uses for clang
, you'll get the same issue. Drop -x cuda
, and it compiles just fine. I don't see any noticeable difference in -v
output from clang. I feel like some #ifdef
s get thrown off either in Chapel runtime or the clang/cuda headers. I suspect this is about a missing flag to that clang
invocation, but I don't know what that is.
A potential solution is to call builtin
versions of the missing complex functions, but that doesn't feel quite satisfying as fixing the compilation.
diff --git a/modules/standard/Math.chpl b/modules/standard/Math.chpl
index 920b56b14b..3a67618681 100644
--- a/modules/standard/Math.chpl
+++ b/modules/standard/Math.chpl
@@ -1215,8 +1215,8 @@ module Math {
inline proc sin(x: complex(128)): complex(128) {
pragma "fn synchronization free"
pragma "codegen for CPU and GPU"
- extern proc csin(z: complex(128)): complex(128);
- return csin(x);
+ extern proc chpl_csin(z: complex(128)): complex(128);
+ return chpl_csin(x);
}
/*
diff --git a/runtime/include/chplmath.h b/runtime/include/chplmath.h
index 93be0bb0dd..8945638205 100644
--- a/runtime/include/chplmath.h
+++ b/runtime/include/chplmath.h
@@ -53,6 +53,8 @@ MAYBE_GPU static inline float chpl_sqrt32(float x) { return sqrtf(x); }
MAYBE_GPU static inline double chpl_fabs64(double x) { return fabs(x); }
MAYBE_GPU static inline float chpl_fabs32(float x) { return fabsf(x); }
+MAYBE_GPU static inline _complex128 chpl_csin(_complex128 x) { return __builtin_csin(x); }