Hello again. I have downloaded and installed 1.30 at work and everything was fine this afternoon. Now at home I have a newer computer, and running make check after the compiler is built I get (3 trials all the same result):
chapel-1.30.0$ make check
[Info] Running minimal test script: $CHPL_HOME/util/test/checkChplInstall
[Info] Found executable chpl in /home/nldias/chapel-1.30.0/bin/linux64-x86_64/chpl.
[Info] Found $CHPL_HOME directory: /home/nldias/chapel-1.30.0
[Info] /home/nldias/.chpl does not exist. Creating it.
[Info] Temporary test job directory: /home/nldias/.chpl/chapel-test-2HhGVw
[Info] Compiling $CHPL_HOME/examples/hello6-taskpar-dist.chpl
[Info] Compiling with CHPL_TARGET_COMPILER=llvm
[Info] Test job compiled into /home/nldias/.chpl/chapel-test-2HhGVw/hello6-taskpar-dist
[Info] $CHPL_LAUNCHER=none is compatible with test script.
[Info] Running test job.
[Info] Test job complete.
[Fail] There was an issue with the installation, test job output incorrect.
[Info] To see the test outputs, export CHPL_CHECK_DEBUG=1 and re-run "make check"
== Actual Test Output (raw, with verbose) ==
internal error: topo-hwloc.c: 333: !((pusPerCore == 0) || (pusPerCore == numPus))
make: *** [Makefile:212: check] Error 20
It appears to be something about the number of Pus, but I can't get anywhere from here.
Any help will be greatly appreciated. Meanwhile, I am stuck with 1.29
Hi Brad: thanks a lot for the tip: setting CHPL_HWLOC=none appears to make the compiler work! I will however erase everything (I just said "make clean" and rebuilt) and start from scratch to make sure. I will keep you posted.
OK, thanks for the report back. I expect the clean build will work equally well. Note that our preferred mode is generally to use CHPL_HWLOC=none, so I think of this as a short-term workaround and not a satisfying solution. We'll hope to come up with that next week.
The problem is this CPU has a variable number of threads per core. It has 8 "performance cores" with 2 threads per core, and 8 "efficient" cores with 1 thread per core. The runtime assumes that all cores have the same number of threads and is checking for that. I'll have to think about how to work around this.
Nelson, I wanted to get back to you with some details on the problem you encountered with the Chapel 1.30 release. We don’t officially support machines with heterogeneous cores, which is what the failing assert in the hwloc topology layer is checking. In the short term the best solution is to set CHPL_HWLOC=none which shouldn't have a noticeable performance impact when running on a desktop or laptop. In the longer run I will remove the assertion and the assumption that it is protecting, but then the number of cores returned by locale.numPUs(logical=false, accessible=false) may not be correct on machines with heterogeneous cores because the runtime can’t get information about cores it can’t access. I plan to make this change in the next week or so, and it would be great if you could test it on your machine as I don't have access to one with heterogeneous cores.
In the much longer term we need to come up with a strategy for handling heterogeneous cores. For example, on a machine with some performance cores and some efficient cores, should we run on only the former, or only the latter, or all of them? How would a user effectively express what they want to happen?
Nelson, I wanted to get back to you with some details on the problem you
encountered with the Chapel 1.30 release. We don’t officially support
machines with heterogeneous cores, which is what the failing assert in the
hwloc topology layer is checking. In the short term the best solution is
to set CHPL_HWLOC=none which shouldn't have a noticeable performance
impact when running on a desktop or laptop. In the longer run I will remove
the assertion and the assumption that it is protecting, but then the number
of cores returned by locale.numPUs(logical=false, accessible=false) may
not be correct on machines with heterogeneous cores because the runtime
can’t get information about cores it can’t access. I plan to make this
change in the next week or so, and it would be great if you could test it
on your machine as I don't have access to one with heterogeneous cores.
Hi John: many thanks for the quick response. I have already tried
HPL_HWLOC=none per Brad's suggestion and it works. It conflicts with
CHPL_RT_NUM_THREADS_PER_LOCALE=MAX_LOGICAL that I used before but I don't
know why :-), so I commented out the latter. By all means, I will be very
glad to test the changes you make on my machine and report back to you.
In the much longer term we need to come up with a strategy for handling
heterogeneous cores. For example, on a machine with some performance cores
and some efficient cores, should we run on only the former, or only the
latter, or all of them? How would a user effectively express what they want
to happen?
The best scenario for me (but I understand that this may be asking for too
much) would be to be able to choose which strategy to use at compile time.
In the meantime, I will run a few tests here with the current version to
discover how many cores it is able to access and if I can specify (by hand,
as it were) which PUs to use. Please bear with me as it is all way beyond
my knowledge basis.
I have come up with a workaround for the problem of having two different kinds of cores ("power" and "efficient") in a computer. I have re-compiled the compiler (make clean; make) without setting CHPL_HWLOC at all. Then, compile "prog.chpl" also without setting CHPL_HWLOC, and using CHPL_RT_NUM_THREADS_PER_LOCALE=MAX_LOGICAL. Finally, and here is the trick, run with
$taskset -c 0--15 ./prog
This (I think!) only uses the first 16 "power" logical cores and the program runs fine. Hope this is useful!
Hi John: sorry for the late reply. For a single locale system, I wonder if it makes sense to issue a flag at run time such as ./prog --pus=0-15, which simply emulates tasksel. My humble opinion is that it does not; better just to make users of heterogeneous cores machines aware that tasksel effectively solves the problem. It appears that right now heterogeneous cores means that some cores are considerably slower, less powerful, and more energy-efficient. Not the cores that I want to use if the point is run things as fast as possible, and to balance the load among the cores. So here are my two cents. Best, Nelson.
Nelson, it would be helpful if you could send me the output of "lstopo --of xml" on your machine. I hope to have better support for heterogeneous cores in Chapel 1.31.0, although it might just be a better error message, time permitting.
Hi Nelson — We're seeing an "attachment is missing" message here, potentially due to the file extension? Can you see if a .txt file extension works better, or mail it to me directly if not and I'll get it on here by hook or by crook?
Nelson, the next release of Chapel will include support for heterogeneous processing units. By default, the runtime will only use the performance (P) cores, although this behavior can be changed by setting CHPL_RT_USE_PU_KIND to one of "performance", "efficiency", or "all".