Parent/Child thread

Is locale[0] always the parent thread/process and locales[1..] always children of the parent?

I went fishing in the docs but maybe my searching is woeful.

Thanks

I don't know the answer to that question, but I wouldn't expect that to
be something a user should rely on one way or the other, given that we
try to abstract away threads instead of tracking them separately from
tasks from the user's perspective. For instance, there was a point in
Chapel's history where we had both a CHPL_TASKS and a CHPL_THREADS
environment variable but we changed that back in 2015
(Remove the chplenv variable CHPL_THREADS by ronawho · Pull Request #2932 · chapel-lang/chapel · GitHub).

Lydia

@damianmoz — Do you mean conceptually or practically (e.g., at the system / OS level?)

Thanks for clarifying,
-Brad

I need a global variable and a mirror of that in every logical task.

I need to be able read/write that variable at some point within each such task and then later in that same task read/write t again.

Is that the question you just asked.

Basically, anywhere within a thread will have read-only access to a thread-local variable which is accessible at any point within that thread with
a) read-only access at close to no overhead (used often enough).
b) writeable access with minimal lioverhead (used very infrequently)

At thread instantiation, including the parent, the thread local copy is initialised with a run-time parameter, e.g.

    config const NuCoMa = 0x0;

A change in a child thread does not affect a parent or sibling thread. If you want that level of control, you need to build that in yourself.

If it is in a Module called NuCo, the variable may be reset within a thread as say

    NuCo.set(1234)

and anywhere within that thread, potentially several calls deeper, that subsequently wants to access it, can grab a copy with

    const ncm = NuCo.get()

knowing that this call will have near zero overhead and not cause any register spill across that 'get()' call.

Also, the value needs to reside within that thread. Communication back to the parent is a No-No

My apoligies for my earlier lack of clarity.

Hi Damian —

If I am understanding correctly, I don't think we have the ability to do exactly what you want today. Specifically, there is currently no way to create task- or thread-local storage that doesn't follow normal lexical scoping rules, where naming a variable is resolved by searching up the program's lexical scopes until its definition is found.

I think the pattern that would get you closest to this in Chapel today, would be to:

  1. Declare a module-scope variable that will serve as the initial copy of the variable's value:

    module NuCo {
      config const NuCoMa = 0x0;
    }
    

    Since this is at module scope and const, it will be replicated across all locales, with each having its own local copy of the value.

  2. Then, anytime you create a new parallel task using begin, cobegin, coforall, or forall, add a with (in NuCoMa) to give the tasks implementing that parallel statement their own, copied-in, modifiable copy of NuCoMa to read and write.

Here's a simple example demonstrating this in practice (ATO):

module NuCo {
  config const NuCoMa = 42.0;
}

module Main {
  use NuCo;

  proc main() {
    coforall i in 1..10 with (in NuCoMa) {  // give each task its own, modifiable copy of NuCoMa
      NuCoMa += (i / 10.0);                 // modify it
      writeln(NuCoMa);                      // show that it was modified
    }

    coforall i in 1..10 with (in NuCoMa) {  // do it again, making new copies from the module-scope variable
      NuCoMa += i*1000;
      writeln(NuCoMa);
    }
  }
}

There are a few reasons I think this isn't quite what you're asking for:

The first is that, because Chapel is lexically-scoped, if you were to call a procedure within your coforall and it were to refer to NuCoMa, it would refer to the global const rather than some sort of implicit task- or thread-local storage:

coforall i in 1..10 with (in NuCoMa) {
  foo(i);
}

proc foo(i) {
  NuCoMa += (i / 10.0);  // here, foo() doesn't have a local variable or argument
                         // named `NuCoMa`, so the module-scope one would be accessed instead,
                         // but can't be modified since it's 'const'
}

Of course, you could deal with this by manually threading the variable through such procedures:

coforall i in 1..10 with (in NuCoMa) {
  foo(i, NuCoMa);
}

proc foo(i, ref NuCoMa) {
  NuCoMa += (i / 10.0);  // here, foo() doesn't have a local variable or argument
                         // named `NuCoMa`, so the module-scope one would be accessed instead,
                         // but can't be modified since it's 'const'
}

The second is that if you were to use implicit parallelism, like promotion or whole-array operations, those tasks would not get their own copies of NuCoMa because there are no parallel constructs to which to apply the with clause. For example:

sin(A*NuCoMa);  // This will create a number of tasks, but each will only reference
                // the module-scope NuCoMa because there's no way to attach a with-
                // clause to this expression to give them each their own copy

A = NuCoMa;     // Ditto

Basically, we don't have a way to create some sort of implicit/invisible task-local storage or registry in the language today, so such task-local variables must be implemented through creating and copying normal variables. It might be reasonable to request some sort of per-task registry as a new feature if the approaches suggested above are insufficient.

-Brad

Thanks. I can fake my testing with just the global copy for now.

At least I know that I do not need to continue looking for something that does not exist. I will explain the context at some later date. Thanks again

Basically, I want to test a slightly different twist on the current concept of the floating point control and status register, MXCSR[SSE Instruction Set], a copy of which exists in every program thread

I want a new type of floating point control register, one which only controls the status and whose assignment rules follows that of a variable which thread-global scope I think GCC uses __thread for it and CLANG uses thread_local.

I might be able to fake it by writing a C routine to achieve

get floating point control word
set floating point control word

as that control word exists on a per thread basis. But that approach has high overhead as a call to an external C routine would cause registers to be saved across the subroutine call, it is not portable, and would really need to be done as an in-line assembler instruction such as, on an X86-64

stmxcsr uint 
ldmxcsr uint

Not quite what I am trying to simulate. Even doing it as in-line embedded assembler is likely to be problematic as the instruction is not your average load/store to a register. On an X86-64, it is slower that a square root instruction and involves memory access. On the old 386, it was slower still.

I will think about it a bit more. Once I have a working prototype using a global variable, it might be sufficient for the project which embeds within a NaN result the information about any floating point exception which has occurred. The status feature of these side-effect registers which currently contain such information will be made redundant Once I have a working prototype, I might have learned enough to intelligently ask for a Chapel feature. But more than likely, I will know what I need to know for the project and the need for the feature will disappear. As I noted, time for some more thinking. Maybe down the beach with the sun on my brain for some solar powered boost.