File I/O from C code

Is file I/O from C code legal?

if ((open(err_file_name, O_CREAT | O_WRONLY, 0664)) <  0 ) {
}

Fails with a "No such file or directory" error, but works if not embedded in Chapel. So I'm guessing this is not legal.

Thanks,

Gene

Hi Gene,

It looks as though you are using the OS.POSIX module, is that correct?

We do have some tests of that functionality that run nightly
(https://github.com/chapel-lang/chapel/blob/main/test/modules/OS/POSIX/open.chpl),
so I'd be surprised if it didn't work at all, but it's possible that
you're using it in a way we weren't expecting and so didn't test as
thoroughly. If you could provide more of your code, or experiment with
matching that particular test and let us know what does or doesn't work,
that'd help us better diagnose what is going on.

Note that you should also be able to open the file in C code and pass
that back to Chapel in such a way that we can use it through our normal
file strategy by relying on the file initializers
(IO — Chapel Documentation 1.32)
that take c_FILE or file descriptors. Keep in mind that initializers
are called using the new keyword, as opposed to normal method calls (see
Records — Chapel Documentation 1.32 for some
examples and the classes and records chapters of the spec if you'd like
to know more)

Thanks, and hope that helps,
Lydia

1 Like

Thanks Lydia,

Now that I know it should work, I'll spend some time experimenting and let you know what I find.

Gene

1 Like

This is being run using -nl 2. I assumed that the node I launched on would be locale 0, and that the locale 0 file system would be default unless somehow otherwise specified. I'm not using a shared file system, I copy the _real file to the chapel program directory on the "other node" at the tail end of the make file. The problem is that the open() specifies a full path. The open() is executing on the "other node" which does not have specified directory.

1 Like

@geenweb: I was halfway into writing a response wondering whether asymmetry in the file system between the login node and compute nodes could be the issue here. I'm relieved to find that that was the case.

The behavior you're seeing is correct that—in most configurations anyway—the node used for interactive logins, compiling, etc. is not also used for computation by any of the locales (by design, in order to not bog down a potentially interactive, shared resource). It may be possible to cause this to happen in some configurations, if desired—for example, if using CHPL_COMM=gasnet and its ssh-based launcher, I believe that putting the interactive node first in GASNET_SSH_SERVERS would cause it to be used for locale 0.

Otherwise, you'd need to do something to copy/store the file in question to/on the compute node(s) that want to access it as you're now attempting to do.

I'm not quite following the problem you mention:

The problem is that the open() specifies a full path. The open() is executing on the "other node" which does not have specified directory.

Can you not pick a directory on the compute node(s) that is accessible to both the Chapel source code and your copy of the file? Can you say more about what's going wrong?

Thanks,
-Brad

Thanks Brad,

I'm using logrotate to store log and error files from the program I'm writing. So in the open() I specified /var/log/my_program_name/filename_with_pid.log. I envision /var/log/my_program_name existing only on the "master node".

As soon as I realized what the issue was, I put the interactive node first in GASNET_SSH_SERVERS, but that didn't fix it. I thought it should so maybe I missed something in haste. I'll try it again in the morning and let you know.

Best,

Gene

Hi Gene —

If you launch your program with --verbose (e.g., ./myChapelProgram -nl 2 --verbose) you should get output indicating which node each locale is being run on, which may help reason about the impact of your GASNET_SSH_SERVERS setting. As alluded to above, this variable will only have an impact if using GASNet's ssh-based launching mechanism; otherwise, it'll be ignored).

If you get stuck and could provide the output of printchplenv --all --anonymize, that could be helpful.

-Brad

Hi Brad,

Interactive node is 15.212.63.555

If GASNET_SSH_SERVERS=15.212.63.555 15.212.63.111 15.212.63.555 15.212.63.111 15.212.63.555 15.212.63.111 15.212.63.555 15.212.63.111

executing locale 0 of 2 on node ip-15-212-63-111
executing locale 1 of 2 on node ip-15-212-63-555

if GASNET_SSH_SERVERS=15.212.63.111 15.212.63.555 15.212.63.111 15.212.63.555 15.212.63.111 15.212.63.555 15.212.63.111 15.212.63.555

executing locale 0 of 2 on node ip-15-212-63-111
executing locale 1 of 2 on node ip-15-212-63-555

if GASNET_SSH_SERVERS=15.212.63.555 15.212.63.555 15.212.63.111 15.212.63.555 15.212.63.111 15.212.63.555 15.212.63.111 15.212.63.111

executing locale 0 of 2 on node ip-15-212-63-555
executing locale 1 of 2 on node ip-15-212-63-555

Not sure what else to try.

Here is the output of printchplenv --all --anonymize

CHPL_HOST_PLATFORM: linux64 +
CHPL_HOST_COMPILER: gnu +
  CHPL_HOST_CC: gcc +
  CHPL_HOST_CXX: g++ +
CHPL_HOST_ARCH: x86_64 +
CHPL_TARGET_PLATFORM: linux64 +
CHPL_TARGET_COMPILER: llvm +
  CHPL_TARGET_CC: /usr/lib/llvm-14/bin/clang +
  CHPL_TARGET_CXX: /usr/lib/llvm-14/bin/clang++ +
  CHPL_TARGET_LD: /usr/lib/llvm-14/bin/clang++ +
CHPL_TARGET_ARCH: x86_64 +
CHPL_TARGET_CPU: native +
CHPL_LOCALE_MODEL: flat +
CHPL_COMM: gasnet +
  CHPL_COMM_SUBSTRATE: udp +
  CHPL_GASNET_SEGMENT: fast +
  CHPL_GASNET_VERSION: 1
CHPL_TASKS: qthreads +
CHPL_LAUNCHER: amudprun +
CHPL_TIMERS: generic +
CHPL_UNWIND: none +
CHPL_HOST_MEM: jemalloc +
CHPL_MEM: jemalloc +
CHPL_ATOMICS: cstdlib +
  CHPL_NETWORK_ATOMICS: none
CHPL_GMP: none +
CHPL_HWLOC: bundled +
CHPL_RE2: none +
CHPL_LLVM: system +
  CHPL_LLVM_SUPPORT: system +
  CHPL_LLVM_CONFIG: llvm-config-14
  CHPL_LLVM_VERSION: 14
CHPL_AUX_FILESYS: none +
CHPL_LIB_PIC: pic +
CHPL_SANITIZE: none +
CHPL_SANITIZE_EXE: none +

Thanks,

Gene

Gene —

Thanks for that additional data. I thought for sure that GASNet just assigned processes to servers using round-robin based on the order of GASNET_SSH_SERVERS, but your experiments definitely suggest otherwise (as does the documentation below, now that I've read it).

As a sanity check: Are you finding your behavior to be deterministic or non-deterministic when performing multiple runs? And is GASNET_SPAWNFN set to 'S' (or anything at all?)

Though I know it's not how you'd want to set your GASNET_SSH_SERVERS, did the third work properly since locale 0 was mapped to the node you wanted?

Reading the documentation, it seems that the implementation is a bit more complicated than I was thinking, where the following quote is from third-party/gasnet/gasnet-src/other/ssh-spawner/README:

The ssh-spawner will layout processes in a "balanced" distribution and
"blocked" order on a list of hosts (such as obtained from the
GASNET_SSH_SERVERS environment variable).

For P processes and N hosts, "balanced" distribution places ceil(P/N)
processes on the first (P%N) hosts and floor(P/N) on the remainder.

For P divisible by N, this yields P/N processes on every host, while
for other all cases the last (N-P%N) hosts each have one fewer than
the others.

The "blocked" order means the processes on each host are numbered
consecutively, with the first host holding processes starting from rank 0,
the second holding processes starting from rank ceil(P/N), etc.
By default the GASNET_SSH_SERVERS environment variable (or equivalent) is
subject to de-duplication. However, by disabling this behavior (see

GASNET_SSH_KEEPDUP environment variable below) one can exercise additional
control over the placement of process though duplication of hostnames. For
instance, with P=8, GASNET_SSH_SERVERS="node1 node2 node1 node2" and
GASNET_SSH_KEEPDUP=1, the host "node1" will hold processes with ranks 0, 1, 4
and 5, rather than 0, 1, 2 and 3 as would be the case without setting
GASNET_SSH_KEEPDUP. In the extreme case, populating GASNET_SSH_SERVERS with P
entries allows for precise control over placement of every process, when
de-duplication is disabled via GASNET_SSH_KEEPDUP=1.

That said, my reading of this suggests that your two locales should map to the first two entries. To remove some math and room for uncertainty, if you set your GASNET_SSH_SERVERS to just numLocales entries:

GASNET_SSH_SERVERS=15.212.63.555 15.212.63.111

do things work as expected?

Thanks,
-Brad

Hi Brad,

Thanks for that information.

The behavior is deterministic albeit a bit odd. Here are the Chapel/GASNET current settings I have in the .bashrc.

export CHPL_HOME=~/software/chapel/chapel-1.32.0
export PATH=$PATH:~/software/chapel/chapel-1.32.0/bin/linux64-x86_64:/~/software/chapel/chapel-1.32.0/util
export MANPATH=$MANPATH:~/software/chapel/chapel-1.32.0/man
export CHPL_CONFIG=~/software/chapel/chapel-1.32.0
export GASNET_SPAWNFN=S
export GASNET_SSH_CMD="ssh -q"
export GASNET_SSH_OPTIONS=-x
export GASNET_SSH_SERVERS="15.212.63.555 15.212.63.111"
export GASNET_WORKERIP=15.212.63.192

Changing to just the two locales does not make it work as expected. Locale 0 is still 111. It is 111 regardless of the IP address order in GASNET_SSH_SERVERS. I also tried setting
GASNET_SSH_KEEPDUP to 0 and to 1. That had no effect. I also tried setting GASNET_MASTERIP=15.212.63.555. That had no noticeable effect.

Best,

Gene

A couple more follow-ups here:

  1. I was able to reproduce this on my system:

    $ export GASNET_SSH_SERVERS="node11 node10"
    $ ./hello6-taskpar-dist -nl 2
    Hello, world! (from locale 0 of 2 named node10)
    Hello, world! (from locale 1 of 2 named node11)
    $ export GASNET_SSH_SERVERS="node10 node11"
    $ ./hello6-taskpar-dist -nl 2
    Hello, world! (from locale 0 of 2 named node10)
    Hello, world! (from locale 1 of 2 named node11)
    

    (where I'm running this from node11).

  2. I think that documentation I was quoting earlier doesn't actually apply to the amudprun launcher, but just to other launchers. I fear that the amudprun launcher doesn't get nearly as much love since the udp substrate is intended more for portability and experimentation than production runs.

  3. That said, I did post a question to the GASNet team to see whether this behavior was surprising to them, but it was late in the day, so we may not get a reply until tomorrow.

  4. I also started poking through the source code to amudprun to see if I could determine what method it was using to map processes to nodes, but haven't gotten very far yet.

-Brad

Thanks Brad,

I'm glad to hear you were able to reproduce it.

Best,

Gene

Hi Gene —

Just as a quick update, the GASNet team validated my suspicion that there is some sorting going on. Specifically, the binding of Chapel locales / GASNet processes to nodes used to be non-deterministic, so around 2020 they improved the situation to make it deterministic by sorting on IP addresses and using that as the order of assignment. This explains why your locale 0 tends to want to bind to the -111 node rather than the -555 node when it's in the first two entries. So if you were able to manage it such that your login/interactive node was always the lowest of the set you wanted to run on, things should work.

They sent some notes about other ways in which we could consider addressing this, but I need to duck into a few meetings, so will summarize those a bit later, when time permits.

-Brad

Hi Brad,

I changed the IP addresses before sending. But fortunately the one I changed to 111 was the lower number. If not I may have confused everyone. :wink:

I'm actually writing some work around code by setting
export HOSTNAME=`hostname`
in the non-interactive portion of the .bashrc of the locale I want to interact with. Then
std::getenv("HOSTNAME")
can be checked against the locale hostname.

Best,

Gene

1 Like

Hi Gene —

Glad you've been able to come up with a potential workaround. Here are some other notes from the GASNet team that may be useful to you or others referring back to this conversation in the future:

They pointed out the GASNET_WORKER_RANK feature:

* GASNET_WORKER_RANK
  May be set to force a particular rank assignment for worker processes.
  If set by any process, then it must be set by all worker processes before
  init to a disjoint set of integers in the range [0..numprocs).
  It may alternatively be set to the name of another environment variable in the
  worker environment from which to retrieve the assignment (e.g. "SLURM_PROCID",
  "PMIX_RANK", "OMPI_COMM_WORLD_RANK", etc).
  Default behavior is arbitrary rank assignment that groups co-located processes.

and went on to suggest:

The simplest solution would be to set and export GASNET_WORKER_RANK=i to the appropriate integer rank number i for each compute node in .bashrc (possibly after branching on host name).
If they wanted the flexibility of several different layouts on the same nodes they could even use .bashrc to set a family of envvars (MAPPING1=i , MAPPING2=j , etc) and then at job launch set GASNET_WORKER_RANK=MAPPING1 to select a particular mapping.

This approach is admittedly rather hokey and probably cannot handle multiple locales per compute node, but sometimes simple is sufficient.If they are looking for a more general solution they could install SLURM or some other job management system and set GASNET_CSPAWN_CMD to the job spawning command that gives the job layout they want. Alternatively GASNET_CSPAWN_CMD could be a fully customized job launch script that also sets GASNET_WORKER_RANK. More details in conduit docs.

Hope this is helpful,
-Brad