Help needed: 1.30 make check fails on one of my machines

Hello again. I have downloaded and installed 1.30 at work and everything was fine this afternoon. Now at home I have a newer computer, and running make check after the compiler is built I get (3 trials all the same result):

chapel-1.30.0$ make check
[Info] Running minimal test script: $CHPL_HOME/util/test/checkChplInstall
[Info] Found executable chpl in /home/nldias/chapel-1.30.0/bin/linux64-x86_64/chpl.
[Info] Found $CHPL_HOME directory: /home/nldias/chapel-1.30.0
[Info] /home/nldias/.chpl does not exist. Creating it.
[Info] Temporary test job directory: /home/nldias/.chpl/chapel-test-2HhGVw
[Info] Compiling $CHPL_HOME/examples/hello6-taskpar-dist.chpl
[Info] Compiling with CHPL_TARGET_COMPILER=llvm
[Info] Test job compiled into /home/nldias/.chpl/chapel-test-2HhGVw/hello6-taskpar-dist
[Info] $CHPL_LAUNCHER=none is compatible with test script.
[Info] Running test job.
[Info] Test job complete.
[Fail] There was an issue with the installation, test job output incorrect.
[Info] To see the test outputs, export CHPL_CHECK_DEBUG=1 and re-run "make check"

== Actual Test Output (raw, with verbose) ==
internal error: topo-hwloc.c: 333: !((pusPerCore == 0) || (pusPerCore == numPus))
make: *** [Makefile:212: check] Error 20

It appears to be something about the number of Pus, but I can't get anywhere from here.
Any help will be greatly appreciated. Meanwhile, I am stuck with 1.29 :slight_smile:

For the record, my computer stats are:

System:
Kernel: 5.15.0-67-generic x86_64 bits: 64 compiler: gcc v: 11.3.0 Desktop: Cinnamon 5.6.8
tk: GTK 3.24.33 wm: muffin dm: LightDM Distro: Linux Mint 21.1 Vera base: Ubuntu 22.04 jammy
Machine:
Type: Desktop Mobo: Gigabyte model: B660M GAMING X v: x.x serial:
UEFI: American Megatrends LLC. v: F2 date: 01/25/2022
CPU:
Info: 16-core (8-mt/8-st) model: 12th Gen Intel Core i9-12900K bits: 64 type: MST AMCP
arch: Alder Lake rev: 2 cache: L1: 1.4 MiB L2: 14 MiB L3: 30 MiB
Speed (MHz): avg: 4545 high: 4908 min/max: 800/5100:5200:3900 cores: 1: 4868 2: 4903 3: 4875
4: 4900 5: 4900 6: 4884 7: 4820 8: 4901 9: 4901 10: 4908 11: 4736 12: 4900 13: 4819 14: 4900
15: 4879 16: 4900 17: 3871 18: 3900 19: 3898 20: 3831 21: 3902 22: 3900 23: 3900 24: 3899
bogomips: 152985
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
Device-1: Intel AlderLake-S GT1 vendor: Gigabyte driver: i915 v: kernel ports:
active: HDMI-A-1,HDMI-A-2 empty: DP-1 bus-ID: 00:02.0 chip-ID: 8086:4680
Device-2: Logitech Webcam C270 type: USB driver: snd-usb-audio,uvcvideo bus-ID: 1-7.4:8
chip-ID: 046d:0825
Display: x11 server: X.Org v: 1.21.1.3 driver: X: loaded: modesetting unloaded: fbdev,vesa
gpu: i915 display-ID: :0 screens: 1
Screen-1: 0 s-res: 2820x1600 s-dpi: 96
Monitor-1: HDMI-1 mapped: HDMI-A-1 pos: primary,top-left model: LG (GoldStar) 20M35
res: 900x1600 dpi: 97 diag: 506mm (19.9")
Monitor-2: HDMI-2 mapped: HDMI-A-2 pos: primary,bottom-r model: LG (GoldStar) FULL HD
res: 1920x1080 dpi: 102 diag: 551mm (21.7")
OpenGL: renderer: Mesa Intel UHD Graphics 770 (ADL-S GT1) v: 4.6 Mesa 22.2.5
direct render: Yes
Audio:
Device-1: Intel vendor: Gigabyte driver: snd_hda_intel v: kernel bus-ID: 00:1f.3
chip-ID: 8086:7ad0
Device-2: Logitech Webcam C270 type: USB driver: snd-usb-audio,uvcvideo bus-ID: 1-7.4:8
chip-ID: 046d:0825
Sound Server-1: ALSA v: k5.15.0-67-generic running: yes
Sound Server-2: PulseAudio v: 15.99.1 running: yes
Sound Server-3: PipeWire v: 0.3.48 running: yes
Network:
Device-1: Realtek RTL8125 2.5GbE vendor: Gigabyte driver: r8169 v: kernel pcie: speed: 5 GT/s
lanes: 1 port: 3000 bus-ID: 03:00.0 chip-ID: 10ec:8125
IF: enp3s0 state: up speed: 1000 Mbps duplex: full mac:
Drives:
Local Storage: total: 2.26 TiB used: 657.59 GiB (28.5%)
ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS480G2G0C-00AJM0 size: 447.13 GiB
speed: 31.6 Gb/s lanes: 4 serial: temp: 43.9 C
ID-2: /dev/sda vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB speed: 6.0 Gb/s
serial:
Partition:
ID-1: / size: 318.04 GiB used: 25.5 GiB (8.0%) fs: ext4 dev: /dev/nvme0n1p3
ID-2: /boot/efi size: 3.72 GiB used: 6.1 MiB (0.2%) fs: vfat dev: /dev/nvme0n1p2
ID-3: /home size: 1.79 TiB used: 632.08 GiB (34.5%) fs: ext4 dev: /dev/sda1
Swap:
ID-1: swap-1 type: partition size: 119.21 GiB used: 0 KiB (0.0%) priority: -2
dev: /dev/nvme0n1p1
USB:
Hub-1: 1-0:1 info: Hi-speed hub with single TT ports: 16 rev: 2.0 speed: 480 Mb/s
chip-ID: 1d6b:0002
Device-1: 1-5:2 info: Integrated Express ITE Device type: HID driver: hid-generic,usbhid
rev: 2.0 speed: 12 Mb/s chip-ID: 048d:5702
Hub-2: 1-7:3 info: Genesys Logic Hub ports: 4 rev: 2.0 speed: 480 Mb/s chip-ID: 05e3:0608
Device-1: 1-7.1:5 info: Logitech M105 Optical Mouse type: Mouse driver: hid-generic,usbhid
rev: 2.0 speed: 1.5 Mb/s chip-ID: 046d:c077
Device-2: 1-7.2:6 info: Logitech Keyboard K120 type: Keyboard,HID driver: hid-generic,usbhid
rev: 1.1 speed: 1.5 Mb/s chip-ID: 046d:c31c
Device-3: 1-7.4:8 info: Logitech Webcam C270 type: Video,Audio driver: snd-usb-audio,uvcvideo
rev: 2.0 speed: 480 Mb/s chip-ID: 046d:0825
Hub-3: 1-11:4 info: Genesys Logic Hub ports: 4 rev: 2.0 speed: 480 Mb/s chip-ID: 05e3:0608
Device-1: 1-11.4:7 info: Motorola PCS XT1541 [Moto G 3rd Gen] type:
driver: usbfs rev: 2.0 speed: 480 Mb/s chip-ID: 22b8:2e82
Hub-4: 2-0:1 info: Super-speed hub ports: 10 rev: 3.1 speed: 20 Gb/s chip-ID: 1d6b:0003
Sensors:
System Temperatures: cpu: 55.0 C mobo: 16.8 C
Fan Speeds (RPM): N/A
Repos:
Packages: apt: 2718
No active apt repos in: /etc/apt/sources.list
Active apt repos in: /etc/apt/sources.list.d/google-chrome.list
1: deb [arch=amd64] https: //dl.google.com/linux/chrome/deb/ stable main
Active apt repos in: /etc/apt/sources.list.d/insync.list
1: deb http: //apt.insync.io/mint vera non-free contrib
Active apt repos in: /etc/apt/sources.list.d/official-package-repositories.list
1: deb http: //packages.linuxmint.com vera main upstream import backport
2: deb http: //archive.ubuntu.com/ubuntu jammy main restricted universe multiverse
3: deb http: //archive.ubuntu.com/ubuntu jammy-updates main restricted universe multiverse
4: deb http: //archive.ubuntu.com/ubuntu jammy-backports main restricted universe multiverse
5: deb http: //security.ubuntu.com/ubuntu/ jammy-security main restricted universe multiverse
Info:
Processes: 432 Uptime: 11m Memory: 62.58 GiB used: 4.23 GiB (6.8%) Init: systemd v: 249
runlevel: 5 Compilers: gcc: 11.3.0 alt: 11/12 Client: Unknown python3.10 client inxi: 3.3.13

Hi Nelson —

Thanks for reporting this. Our developer who is most likely to be able to help here is out today, but I've asked him to take a look next week.

Two potential workarounds in the meantime, if you're interested:

  • try setting CHPL_HWLOC=none and rebuildind to disable hwloc
  • or, try commenting out the line in question and re-building to see what happens :slight_smile: (it's in runtime/src/topo/hwloc/topo-hwloc.c)

Thanks again, and have a good weekend,
-Brad

Hi Brad: thanks a lot for the tip: setting CHPL_HWLOC=none appears to make the compiler work! I will however erase everything (I just said "make clean" and rebuilt) and start from scratch to make sure. I will keep you posted.

regards

Nelson

Hi Nelson —

OK, thanks for the report back. I expect the clean build will work equally well. Note that our preferred mode is generally to use CHPL_HWLOC=none, so I think of this as a short-term workaround and not a satisfying solution. We'll hope to come up with that next week.

Best wishes,
-Brad

Yes, a clean build works just as fine :slight_smile:

Many thanks again

Nelson

The problem is this CPU has a variable number of threads per core. It has 8 "performance cores" with 2 threads per core, and 8 "efficient" cores with 1 thread per core. The runtime assumes that all cores have the same number of threads and is checking for that. I'll have to think about how to work around this.

John

Nelson, I wanted to get back to you with some details on the problem you encountered with the Chapel 1.30 release. We don’t officially support machines with heterogeneous cores, which is what the failing assert in the hwloc topology layer is checking. In the short term the best solution is to set CHPL_HWLOC=none which shouldn't have a noticeable performance impact when running on a desktop or laptop. In the longer run I will remove the assertion and the assumption that it is protecting, but then the number of cores returned by locale.numPUs(logical=false, accessible=false) may not be correct on machines with heterogeneous cores because the runtime can’t get information about cores it can’t access. I plan to make this change in the next week or so, and it would be great if you could test it on your machine as I don't have access to one with heterogeneous cores.

In the much longer term we need to come up with a strategy for handling heterogeneous cores. For example, on a machine with some performance cores and some efficient cores, should we run on only the former, or only the latter, or all of them? How would a user effectively express what they want to happen?

John

jhh67 https://chapel.discourse.group/u/jhh67
March 27

Nelson, I wanted to get back to you with some details on the problem you
encountered with the Chapel 1.30 release. We don’t officially support
machines with heterogeneous cores, which is what the failing assert in the
hwloc topology layer is checking. In the short term the best solution is
to set CHPL_HWLOC=none which shouldn't have a noticeable performance
impact when running on a desktop or laptop. In the longer run I will remove
the assertion and the assumption that it is protecting, but then the number
of cores returned by locale.numPUs(logical=false, accessible=false) may
not be correct on machines with heterogeneous cores because the runtime
can’t get information about cores it can’t access. I plan to make this
change in the next week or so, and it would be great if you could test it
on your machine as I don't have access to one with heterogeneous cores.

Hi John: many thanks for the quick response. I have already tried
HPL_HWLOC=none per Brad's suggestion and it works. It conflicts with
CHPL_RT_NUM_THREADS_PER_LOCALE=MAX_LOGICAL that I used before but I don't
know why :-), so I commented out the latter. By all means, I will be very
glad to test the changes you make on my machine and report back to you.

In the much longer term we need to come up with a strategy for handling

heterogeneous cores. For example, on a machine with some performance cores
and some efficient cores, should we run on only the former, or only the
latter, or all of them? How would a user effectively express what they want
to happen?

The best scenario for me (but I understand that this may be asking for too
much) would be to be able to choose which strategy to use at compile time.
In the meantime, I will run a few tests here with the current version to
discover how many cores it is able to access and if I can specify (by hand,
as it were) which PUs to use. Please bear with me as it is all way beyond
my knowledge basis.

Best

Nelson

I have come up with a workaround for the problem of having two different kinds of cores ("power" and "efficient") in a computer. I have re-compiled the compiler (make clean; make) without setting CHPL_HWLOC at all. Then, compile "prog.chpl" also without setting CHPL_HWLOC, and using CHPL_RT_NUM_THREADS_PER_LOCALE=MAX_LOGICAL. Finally, and here is the trick, run with

$taskset -c 0--15 ./prog

This (I think!) only uses the first 16 "power" logical cores and the program runs fine. Hope this is useful!

Cheers

Nelson

I created issue runtime support for heterogeneous cores · Issue #22109 · chapel-lang/chapel · GitHub for this bug. Nelson, thank you for suggesting the tasksel solution. I'd like to get feedback on the best way for the user to specify how the runtime should handle heterogeneous cores.

John

Hi John: sorry for the late reply. For a single locale system, I wonder if it makes sense to issue a flag at run time such as ./prog --pus=0-15, which simply emulates tasksel. My humble opinion is that it does not; better just to make users of heterogeneous cores machines aware that tasksel effectively solves the problem. It appears that right now heterogeneous cores means that some cores are considerably slower, less powerful, and more energy-efficient. Not the cores that I want to use if the point is run things as fast as possible, and to balance the load among the cores. So here are my two cents. Best, Nelson.

Nelson, it would be helpful if you could send me the output of "lstopo --of xml" on your machine. I hope to have better support for heterogeneous cores in Chapel 1.31.0, although it might just be a better error message, time permitting.

John

1 Like

sure thing. here it is:

regards

Nelson

lstopo.txt (26.8 KB)

Hi Nelson — We're seeing an "attachment is missing" message here, potentially due to the file extension? Can you see if a .txt file extension works better, or mail it to me directly if not and I'll get it on here by hook or by crook?

-Brad

@jhh67 and @nelsonluisdias : I've attached the output to Nelson's previous message by renaming it to a .txt file.

-Brad

Nelson, the next release of Chapel will include support for heterogeneous processing units. By default, the runtime will only use the performance (P) cores, although this behavior can be changed by setting CHPL_RT_USE_PU_KIND to one of "performance", "efficiency", or "all".

John

Hi John. Great news. It certainly is very convenient for the general user.
Thanks a lot for taking care of this!

Cheers

Nelson