[1.30] Configuration and Link errors when building, on system where 1.29 previously worked

I'm encountering some obscure issues getting up and running with the new 1.30 release. I'm using the exact same hand-written installer script (via configure) as I was with 1.29 (script and full log attached at bottom) and haven't changed any other software on the machine to my knowledge. I read the patch notes and didn't see any clues either.

psath@talos:~/chapelWorkspace/tool-installs$ uname -a
Linux talos 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
psath@talos:~/chapelWorkspace/tool-installs$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy

This is the reported configuration:

  Currently selected Chapel configuration:

CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: gpu *
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: none
CHPL_HWLOC: bundled
CHPL_RE2: none
CHPL_LLVM: system *
CHPL_AUX_FILESYS: none

I've also exported CC=clang and CXX=clang++

  1. There's some Python errors that seem to be indicating an unset environment variable, CHPL_LLVM_GCC_PREFIX. Is that new with 1.30?
CHPL_DEVELOPER is not set, using OFF
-- No CHPL_LLVM_GCC_PREFIX env var or value given from command line.
Traceback (most recent call last):

  File "/home/psath/chapelWorkspace/tool-installs/chapel-1.30.0-src-talos/util/chplenv/printchplenv.py", line 522, in <module>
    main()
  File "/home/psath/chapelWorkspace/tool-installs/chapel-1.30.0-src-talos/util/chplenv/printchplenv.py", line 511, in main
    compute_all_values()
  File "/home/psath/chapelWorkspace/tool-installs/chapel-1.30.0-src-talos/util/chplenv/printchplenv.py", line 237, in compute_all_values
    chpl_compiler.validate_compiler_settings()
  File "/home/psath/chapelWorkspace/tool-installs/chapel-1.30.0-src-talos/util/chplenv/utils.py", line 43, in memoize_wrapper
    cache[args] = func(*args)
  File "/home/psath/chapelWorkspace/tool-installs/chapel-1.30.0-src-talos/util/chplenv/chpl_compiler.py", line 485, in validate_compiler_settings
    validate_inference_matches('host', 'c')
  File "/home/psath/chapelWorkspace/tool-installs/chapel-1.30.0-src-talos/util/chplenv/chpl_compiler.py", line 473, in validate_inference_matches
    error("Conflicting compiler families: "
  File "/home/psath/chapelWorkspace/tool-installs/chapel-1.30.0-src-talos/util/chplenv/utils.py", line 27, in error
    raise exception(msg)
  1. It seems to be both trying to mix GCC+clang and 2.1) grabbing an ancient version of libclang (from version 8) from inside a cmake directory, rather than the systemwide clang-14 install. Doesn't CMake typically use CC and CXX to infer the host compiler family?
Exception: Conflicting compiler families: CHPL_HOST_COMPILER=gnu but CHPL_HOST_CC=['/usr/bin/clang'] but has family clang
-- Using Python: python3
Traceback (most recent call last):
  File "/home/psath/chapelWorkspace/tool-installs/chapel-1.30.0-src-talos/util/config/write-git-sha", line 57, in <module>
    raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), args.chpl_home)
FileNotFoundError: [Errno 2] No such file or directory: ''
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) (Required is at least version "1.8.17")
-- Using libclang from /usr/lib/cmake/clang-8
psath@talos:~/chapelWorkspace/tool-installs$ which clang
/usr/bin/clang
psath@talos:~/chapelWorkspace/tool-installs$ clang --version
Ubuntu clang version 14.0.0-1ubuntu1
Target: x86_64-pc-linux-gnu
Thread model: posix
psath@talos:~/chapelWorkspace/tool-installs$ dpkg --list | grep libclang
ii  libclang-14-dev                            1:14.0.0-1ubuntu1                       amd64        Clang library - Development package
...
psath@talos:~/chapelWorkspace/tool-installs$ locate libclang-14
/usr/lib/llvm-14/lib/libclang-14.0.0.so
/usr/lib/llvm-14/lib/libclang-14.so
/usr/lib/llvm-14/lib/libclang-14.so.1
/usr/lib/x86_64-linux-gnu/libclang-14.so
...
  1. There's a ton of unresolved LLVM symbols when linking chpl (the pieces of which compile without issue)... probably a side effect of cmake grabbing the old libclang-8. Here's a micro subset collection, covering some pretty fundamental LLVM building blocks:
undefined reference to `llvm::DisableABIBreakingChecks
undefined reference to `llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned long, unsigned long)
undefined reference to `llvm::BasicBlock::BasicBlock(llvm::LLVMContext&, llvm::Twine const&, llvm::Function*, llvm::BasicBlock*)
undefined reference to `llvm::BranchInst::BranchInst(llvm::BasicBlock*, llvm::Instruction*)
undefined reference to `llvm::Module::getDataLayout()
undefined reference to `llvm::Type::getInt64Ty(llvm::LLVMContext&)
...
undefined reference to `llvm::sys::fs::create_directories(llvm::Twine const&, bool, llvm::sys::fs::perms)'
undefined reference to `llvm::sys::fs::equivalent(llvm::Twine const&, llvm::Twine const&, bool&)
llvm::sys::fs::access(llvm::Twine const&, llvm::sys::fs::AccessMode)

Any suggestions or workarounds, or if I missed a bit of new documentation, would be much appreciated. Thanks in advance!
1.30-link-errors.txt (1.5 MB)
install_chapel.sh.txt (1.4 KB)

Thanks for raising this, Paul. This is outside of my strengths, so I've asked others if they can take a look and help out. As you may have seen, we switched to building with CMake in this release, and I expect that's a major factor in why 1.29 worked fine and 1.30 isn't.

-Brad

Thanks Brad,

I have no particular dedication to the ./configure way of doing things. I can poke at retooling my script for cmake as I have time. I'm by no means an expert but have used it sparingly for some in-house clangTools, etc. We'll see.

I'm still cruising on 1.29 in the mean time, but excited to try out the GPU performance improvements and AMD backend in 1.30. And will want to see if some atomic-bearing and deeply-nested kernels become gpu-izable with it, that weren't supported in 1.29

1 Like

If I may jump in from the GPU side of the story; we'd be excited to hear about your experiences with it. In our initial tests we have observed that export CHPL_GPU_MEM_STRATEGY=array_on_device is crucial for good AMD performance. We did have a lot of performance improvements in this release, but we are just starting to scratch the surface. :slight_smile:

On the two topics you mentioned: atomics is something we recently started discussing internally. If that's important to your use case, we can try to prioritize some sort of atomics support. Please let us know.

For the nested kernels: if you mean nested foreach loops, they are supported but the inner ones will not turn into kernels as you'd expect with CUDA's dynamic parallelism. This is also one of the things we are considering working on soon. We've come across a user code that can benefit form such deep nesting as no single nesting level has enough parallelism. It can also help with recursive algorithms.

Sorry about the tangent.

I'm still working on getting kernels implemented before we can get any baseline numbers. First target is RTX 3090 that we already have, but then we have some older AMD Vega 10s we'd just like to get correctness on, but hopefully will have access to an XTX 7900 later in the year for more current performance numbers.

I will start a separate post about atomics once I get some MWEs together. But that does explain why two of my kernels are non GPU-izable in 1.29 :slightly_smiling_face:

Nested across the 3 CUDA block dimensions (which in turn are bounded by some runtime values, if blockSize < runtimeSize they loop, and if blockSize > runtimeSize they noop), with a fourth intra-thread loop. Right now I've got it written as 3 foralls (with the grid/block loop bounds), each of which is paired with a non-parallel for for the runtime bounding, plus the seventh inner-most loop. All in one kernel though, not dynamic parallelism. The innermost CUDA dimension collaborates over a binary search-based intersection, which is where we need the atomics to support the collaboration.

I've been able to reproduce but

I've also exported CC=clang and CXX=clang++

This seems to be needed in order to trigger the problematic behavior. I can reproduce the issue on an Ubuntu Jammy container, but if I remove these, the problem goes away.

Why are you setting these variables? Are you trying to build the compiler itself with clang? Or do you want the compiler to work with clang when using the C backend?

You can use printchplenv --all to see some of the hidden settings that seem to be getting confused.

  • If you want to build Chapel with clang, just use export CHPL_HOST_COMPILER=clang instead of setting CC / CXX. This appears to work.
  • If you want Chapel to support the C backend with clang, export CHPL_TARGET_COMPILER=clang. This also appears to work.

My understanding of the situation here is that there are multiple problems:

  1. The LLVM 8 cmake thing (but I think this is unrelated to your linking issue). As an aside, how is it even finding LLVM 8 on your system? Ubuntu Jammy does not have an LLVM 8 package.
  2. Something about setting CC / CXX in this way is confusing the configuration, but only when printchplenv is run within make. This probably has to do with changes we made to the compiler build process (it now uses cmake). In my experiments, running printchplenv --all on its own does not show the error.
  3. I think the cmake infrastructure for building the compiler (which is new in this release) is not properly failing when there is an error running printchplenv, and it tries to go further but that only obscures the main issue.

For now, I would recommend using export CHPL_HOST_COMPILER=clang instead of setting CC / CXX.

We upgraded from 20.04 in-place recently, it could still be lingering from that. Odd that its found inside a cmake subdir. But I agree, thats more our environment issue than anything to do with yall.

The intent was to build chpl using clang. I tend to have better luck with them supporting newer C++ features earlier and more stably than the gcc toolchain, so I throw that in to most scripts. I hadn't considered the backend though, thanks for that additional variable tip.

Replacing the CC and CXX exports with CHPL_HOST_COMPILER=clang and CHPL_TARGET_COMPILER=llvm (I get an error about the GPU backend needing llvm if I set target to clang) gets me built and linked successfully!

It still mentions -- Using libclang from /usr/lib/cmake/clang-8, though the cmake feature-checking is clearly using the llvm-14 toolchain, and ldd chpl points to the -14 versions of libLLVM and libclang-cpp :man_shrugging:. The Python errors are gone too. (I'll purge -8 once I check with my users, for all our sanity..)

Thanks for your help getting us moved over to 1.30!

I have created a PR to solve problems 2 and 3. Fix problem running 'make' with CC and CXX set by mppf · Pull Request #21985 · chapel-lang/chapel · GitHub

1 Like