Branch: refs/heads/main
Revision: fb3b9b9285daa8c029de2ad9959f0b641965efde
Author: dlongnecke-cray
Link: dlopen: Fix a bug when code-generating some constant arrays under LLVM by dlongnecke-cray · Pull Request #28706 · chapel-lang/chapel · GitHub
Log Message:
dlopen: Fix a bug when code-generating some constant arrays under LLVM (#28706)
Fix a bug in nighty testing by refactoring how some constant arrays are
code-generated. The bug was exposed by #28677.
Future dynamic loading work will need to write the following sort of
pattern for some constant arrays that are defined only during codegen:
// This is actually a constant array. But this "fudging" of the type is valid.
extern const chpl_globals_registry: c_ptr(void);
extern proc useTheSymbolAddressInC(name: c_ptrConst(c_char), addr: c_ptr(void));
useTheSymbolAddressInC('chpl_globals_registry', chpl_globals_registry);
// The actual type of the symbol as it appears in C...
extern void* chpl_globals_registry[];
void useTheSymbolAddressInC(const char* name, void* addr) {
if (!strcmp(name, "chpl_globals_registry")) {
void* ptr1 = (void*) chpl_globals_registry;
void* ptr2 = addr;
assert(ptr1 == ptr2);
}
}
That is, Chapel code which fetches the array symbol and passes off its
address to C. In order for future dynamic loading code to function, it
is critical that these "array-like symbols" behave the same in both C
code and Chapel code, particularly when being copied around. If the
symbol address doesn't appear the same in C as it does in Chapel, then
it means something is going wrong with Chapel code generation.
In a previous PR #28677, I achieved compatibility by setting the arrays
to be GEN_VAL instead of GEN_PTR in the "layered value table" (a
symbol table) that we maintain over LLVM+Clang symbols.
However, the way the test (introduced in that PR) was written exposed a
bug. For the C code portion of the test, I relied on an extern block
of C code. It turns out that we end up bundling all the extern C code
and Chapel code into the same LLVM module.
This is a problem because the Chapel compiler encounters the declaration
and uses of the array symbol within the extern block first in
compilation. The code that defines the array (which in LLVM amounts to
setting its initializer using setInitializer), does not account for
the possibility that it is being used and simply did the following:
if(llvm::GlobalVariable *GVar = llvm::cast_or_null<llvm::GlobalVariable>(
info->module->getNamedGlobal("chpl_globals_registry"))) {
GVar->eraseFromParent();
}
Erasing the existing global variable from the LLVM module and dropping
all uses of it on the floor. It then built a new global with an
identical name.
This bug is very tricky to catch without compiling and using a debug
build of LLVM (by setting CHPL_LLVM_DEVELOPER=1). In release builds,
the LLVM bitcode verifier does not seem to catch the fact that we are
dropping the existing global on the floor while uses remain. This is
only caught when an assert in the debug build fires.
The fix for this issue is to be a bit more principled:
- Make sure to copy the old global's name into a type that will persist,
e.g.,std::string. - Create a new, seemingly identical global variable and insert it into
the module. Because this is the definition, the global may have a
different type (see the code above: the code in Chapel uses
c_ptr(void)while the code in C usesvoid*[]- in this case the
actual type is a fixed length constant array). - Replace all uses of the old global variable with the new one.
- Then call
eraseFromParentto remove the old global. - Finally, set the name of the new global again because any name
conflict has been resolved (by default LLVM will append a number to any
symbols that name conflict with existing ones, and we must avoid that
because this is a globalexternvariable).
Note that it is possible for existing declarations of the symbol to have
one type and the definition to have another. This is OK and completely
expected. Much like in C, we expect the developer not to shoot
themselves in the foot when using this behavior. We do not adjust prior
uses of the symbol in LLVM (and this is even more of a non-issue today
since LLVM now uses only opaque pointers), so it is "undefined behavior"
if the existing type does not make sense (i.e., it is not compatible
with a (decayed) array type in C).
While here, refactor the creation of these array constants to call a
helper method in order to remove a bunch of redundancy.
A fun bugfix-for-a-bugfix was to make sure to copy the old global's name
into a std::string so that it would persist after
global->eraseFromParent() was called. This leak was only exposed on my
Mac machine for some reason!
Note that there is also a helper codegenGlobalConstArray which
performs similar generation - I may well need to adjust its
implementation to call the code introduced here, but I'll leave that as
future work.
TESTING
-
standard -
COMM=gasnet
Reviewed by @jabraham17. Thanks much!
Diff:
M compiler/codegen/codegen.cpp
https://github.com/chapel-lang/chapel/pull/28706.diff