Multi-Locale run errors

Hi everyone,

I was working on Chapel program, after running an executable file in Spawnshell, it returned the following error.

And after it, my remote nodes are not recognizable as you see as follows (Note: YangHost is my local machine)

Any help would be appreciated,
Marjan

Hi Marjan —

The root cause looks similar to the previous error you had posted (which I don't think we had ever solved) related to Congestion at the destination endpoint. Did you have CHPL_COMM_DEBUG=1 set in your environment before compiling this program?

The mention of Microsoft.NETCore.App makes me realize: Are you running this from a Windows system? Are the nodes themselves WIndows systems as well? And in either case are you running this from Cygwin or using Microsoft's Bash Shell / Linux Subsystem?

Thanks,
-Brad

Hi Brad,
yes, I have set that variable, but I did not get any info about the actual problem. I am now thinking about migrating to Compute Canda Clusters. I have realized that these clusters support chapel. But to be sure, and if you have any information about these clusters, do I need to have any specific configurations?
here is the only doc I have found -->
https://docs.computecanada.ca/wiki/Chapel
Regarding .NET app, no I am working on Linux systems, and I am calling Spanwshell to run dotnet commands whenever I need to have access to the SQLite database from my Chapel program.

Thank you,
Marjan

Hi Marjan —

I have not run on Compute Canada, but some of our users and team members have, so know that it is possible. I don't know whether there are specific settings or invocations to know about but am tagging @e-kayrakli and @simonBourgaultCote who may be able to provide more information.

Thanks for the clarification about the Microsoft text, that makes sense. In a UNIX setting, I'm not aware of anything Chapel or GASNet could do to mess up a system offhand. I'll check if the GASNet team has any further thoughts about these errors you're getting.

-Brad

I do really appreciate all of these. Thank you, Brad

@MarjanAsgari — Before we give up on your local system, would you:

  • make sure CHPL_COMM_DEBUG=1 is still set
  • compile and run one of our example programs:
    $ chpl examples/hello6-taskpar-dist.chpl  # if you're working from a git clone, this is test/release/examples/hello6-taskpar-dist.chpl
    $ ./hello6-taskpar-dist -nl 3   # or however many locales you've been trying to run on
    
  • send me the full output (preferably as cut-and-pasted text rather than a screenshot, if possible)

Also, is there any chance that these nodes you're using are running any sort of filters protecting against (D)DoS attacks?

Thanks,
-Brad

Hi Brad, Sure, I’ll try it this suggestion tomorrow and will send the full output to you. Thank you