Reindexing to same array

Hello! I tried to “cheat” the compiler into redefining an array’s domain with the code

var x : [0…4] real = {1…5};
writeln(x.domain); // prints {0…4}
x = x.reindex(1…5); // executes silently…does nothing?
writeln(x.domain); // prints {0…4}
for i in 1…5 do writeln(x[i]); // runtime error on x[5]

The compiler did not complain. Apparently the x = x.reindex(1…5) did nothing, and then
there was a runtime error when x[5] was reached which is logical since the domain had
not changed. Is the “reindex doing nothing” the expected compiler behavior?

Thanks in advance, Nelson

Hi Nelson —

Good question. x = x.reindex(1..5); is doing something, just nothing interesting. Specifically, x.reindex(1..5) creates a new array that’s a view of x's elements that’s indexed using 1..5 rather than 0..4. E.g., if you wrote (x.reindex(1..5))[0] (i.e., you tried to access element 0 of this array), you’d get an out-of-bounds error. Similarly, (x.reindex(1..5))[5] would work.

Then, that view you’ve created is assigned to x which still has the original indices 0..4. This is legal because Chapel permits assignment between rectangular arrays with different indices if they have the same size and shape. As a clearer example, the following code is legal Chapel:

var A: [0..4] real;
var B: [1..5] real;
var C: [1..10 by 2] real;

A = B;  // copies B's five elements to A; doesn't change A's indices
A = C;  // copies C's five elements to A; doesn't change A's indices

So your assignment is effectively like the A=B case above, copying the elements (which were already x’s elements, so that’s not very interesting), but not copying/changing/modifying the indices.

A more typical way to use a reindexed expression is using a ref declaration, like:

ref x1 = x.reindex(1..5);

at which point your loop would work as expected (replacing x with x1):

for i in 1…5 do writeln(x1[i]); // runtime error on x[5]

Another common pattern is to pass the reindex expression into a function that takes an array argument, where the function will treat it like any other array.

One final note: reindex views of arrays result in overhead, both to set up the view, and on every access (because the logical reindexing is computed on every access). So think of this feature as a convenience feature rather than one you’d want to lean on heavily in performance-sensitive code.

The “real” way to redefine an array’s domain is to declare the domain as a variable:

var D = {1..5};
var A: [D] real = [i in 1..5] i;

and then to re-assign the domain:

D = {0..4};

however, it’s important to note that for a given index i in both the old and new domain value, A[i] is logically preserved across this assignment. I.e., this isn’t a memory-focused re-alloc command, but a logical change of the array’s indices, preserving any elements that are in both the old and new index set. Thus, in the code above, the resulting values of A are:

0 1 2 3 4

where A[1..4] preserved their original values, A[5] was dropped on the floor, and A[0] just happens to be 0 since that’s the default value for integers (i.e., changing the domain to -1..6 would similarly set elements A[-1] and A[6] to 0).

Hope this is helpful,
-Brad

Hi Brad,

Many thanks for the answer. Your answer is crystal clear, and I should have guessed it from the rule that assignment between arrays with the same shape is permitted! The question arose when I was testing a procedure for matrix-vector multiplication (serial, very old-fashioned fortran-77 like) that would work for any square matrix A shape (n,n) and any vector x shape (n,) regardless of their domain indices. After your reply, I tested 3 versions: v1 only allows domains {1…n,1…n} and {1…n}; v2 uses ref rx = x.reindex{1…n}, etc. locally, and v3 finds the first indices and incoporates them into the code, ie a[ia+i,ja+j]*x[jx+j], etc., where ia,ja,jx are obtained from calls to the a.first, x.first (and y.first). I timed them: n == 1000, and 100 or 1000 repetitions. With all default switches (meaning bounds checking turned on), v1 ~ v3 in speed, while v2 (which uses reindex) is 6.8% slower. Interestingly, when all 3 versions are compiled with --fast, then v2 is slightly faster, so in this setting reindex does not seem to hurt performance… Note that this is overall runtime, and includes calls to fillRandom. Thanks again,

Nelson

Huh, that is interesting, and surprising to me. But I won’t complain.

Assuming this is a classic “triply nested loop” style of mat-mult algorithm, my head goes to doing the index-neutral approach by querying the domains’ dimensions using something like:

const (rows, cols) = Mat.domain.dims();
var Result: [rows] real;
forall i in rows do
  for (jm, jv) in zip(cols, Vect.domain.dim(0)) do
    Result[i] += Mat[i, jm] * Vect[jv];  // or we could play the standard trick of accumulating into a temp

Or, if it were reasonable to assume that the matrix and vector shared the same indices in the second dimension, you could simply assert that to avoid the use of the zippered inner loop and just re-use a single j index for both Mat and Vect.

Hi again Brad. Thanks for showing the totally index-agnostic version
of matrix multiplication :slight_smile: . I included your code as version 4 (but
using for, instead of forall), and timed the 4 versions with
A(1000x1000) x,y(1000), and 1000 repetitions, with --fast. Only the
matmul operations are being timed. The sources are in a Dropbox folder,
that can be downloaded here:

Here is the result, in seconds, in a Dell Vostro ~4 years old, with
8 i7 cores and 8 Gb of RAM:

v1 = all ranges are 1…n
v2 = reindex
v3 = sum index offsets
v4 = zip using dims(), dim(0)

 v1     &     v2      &       v3    &         v4 \\

11.6832 & 11.6516 & 11.6582 & 11.6691

The optimization seems to be producing code that runs equally fast
regardless of the version.

Cheers

Nelson

Cool, thanks for sharing your code and results! Are all four versions amenable to having the outer loop made into a forall? (I haven’t had the chance to open them up to look myself). If so, you should try it to make better use of the other 7 of your 8 cores!

Yes, it is trivial to change the outer loop into a forall. Leads to a ~4x speed gain,

 v1 &     v2 &       v3 &      v4

for --> 11.6832 & 11.6516 & 11.6582 & 11.6691
forall --> 0.367176 & 0.372347 & 0.362295 & 0.376119

because for some reason only 4 cores (0–3) are active. I can see the cores’ loads by looking at a Linux Mint Desklet called CPULoad. But really easy and simple to do :slight_smile:

… and putting export CHPL_RT_NUM_THREADS_PER_LOCALE=8
in .bashrc activates all 8 cores :slight_smile:

I do not know for certain this is the case but I suspect that the reason it was using 4 cores is that Chapel defaults to not using hyperthreads. I suspect your system has 4 cores with 2 hyperthreads each. In fact you can set CHPL_RT_NUM_THREADS_PER_LOCALE to MAX_LOGICAL or MAX_PHYSICAL to toggle between considering the hyperthreads or not without having to specify the number itself (although using the number is fine if it works for you!). See also https://chapel-lang.org/docs/master/usingchapel/tasks.html#controlling-the-number-of-threads .

1 Like

yep, MAX_LOGICAL also activates the 8 “cores” :slight_smile: thanks.

1 Like