Range Issues

Why are

range

and

range(stridable = true)

totally different types?

In fact, I cannot see how

1 .. 10 by 1

and

1 .. 10

are any different. Both have unit stride, i.e. a stride of 1. Why
one if considered stridable and the other not is totally beyond me.

This is important when you want to specify a parameter to a routine which
can be either a stridable or non-stridable range. How is this done now?
As far as I can see, it cannot be done. For example

proc test(r : range)

will not accept

test(1 .. 10 by 1)

If I change the type of β€˜r’ such that it reads as

r : range(stridable = true)

the compiler is happy, but then my routine test will then not accept

test(1..10)

What am I missing or has type strictness gone overboard?

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 … Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

Hi Damian -

To write a procedure accepting ranges of any type you can use

proc test(r : range(?)) { }

This is a frequently encountered issue for ranges so I think it should be added as an example to Ranges β€” Chapel Documentation 1.24 and Ranges β€” Chapel Documentation 1.24 .

The (?) syntax for a formal argument type is described here: Generics β€” Chapel Documentation 1.24 and is in particular needed for generic types with defaults (and range is one such type).

1 Like

Hi Damian β€”

Though these ranges happen to have the same value, a range(stridable=true) can hold 1..10 or 1..10 by 2 where a range(stridable=false) can only hold 1..10, not 1..10 by 2. range is a convenience alias for the range(stridable=false) case, which we believe to be the important/common case to optimize for.

The reason for this distinction in the type is one of efficiency: a non-stridable (bounded) range can be represented by two integers representing the low and high bounds; whereas a stridable range has to store four integers, representing the bounds, stride, and alignment.

But more important than the space overhead is the code generated for the different types. Stridable ranges result in for-loops that step through their integers using the stride integer which isn't generally known at compile-time. Moreover, since stridable ranges support positive or negative strides, and negative strides result in running through ranges backwards, the generated loops also often have to be general w.r.t. iteration order. In contrast, non-stridable ranges step through their integers by 1 in low->high order, resulting in code that is simpler and typically easier for a back-end compiler to reason about and optimize.

This answers the bit of your mail about why we distinguish between these cases in the range's type. I think Michael addressed the part about how to indicate a generic range type, but let us know if you have follow-up questions.

-Brad

Thanks for the reply.

So it seems that I treat the range as a generic type. That works. Thanks.

But are not there serious restrictions on a β€˜proc’ with generic types?

Stay safe - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 … Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer

There are some restrictions, but I don't know if I'd call them serious ones... A procedure taking a generic type can't be exported outside of Chapel because we resolve the generic instantiations by looking at the callsites (so for an exported procedure, we can't guess what the calls will be). But I don't think we support exporting routines that take ranges or other complex types currently anyway, so that probably doesn't apply. I can't think of other significant restrictions offhand, but maybe am just not thinking of something that you are...

You could also create stridable and non-stridable overloads of the routine, which would make them non-generic (and could implement the non-stridable version by converting it to a stridable range and calling the stridable version to avoid the code duplication; though you'd also lose the performance benefits afforded to non-stridable ranges...).

-Brad

Given

proc range.expandHi() return this.low .. this.high + 1;
proc range.shrinkLo() return this.low + 1 .. this.high;

What is the overhead?

In the context of index neutral programming, I am finding myself writing

const (rows, columns) = a.domain.dims()
var rL = rows;

for c in columns do
{
      ....
      rL = rL.shrinkLo(); // i.e. rL.low += 1
}

I need to increment the low limit.

The need to call a whole new method seems like overkill. Is there a simpler way.

There are other times that alll I need to do is grow the high limit.

Hi Damian β€”

For a pattern like yours, I might be inclined to write it as:

var (rL,rH) = (rows.low, rows.high);

for c in columns do
{
      ....
      rL += 1;
      ... rL..rH ...
}

for a few reasons:

  • makes the increment of the low bound a bit more explicit / in the reader's face
  • if the thing you're doing with the rL..rH range is looping over it, I suspect you may get a simpler for loop out of it and avoid creating a new range object (which is what the creation and modification of rL will do)

Or, if columns.size == rows.size, you might do:

for (c,rL) in zip(columns, rows) {
  ... rL..rH ...
}

to more directly tie the value of rL to the loop header rather than the body. Or, I suppose, even if they're different sizes, you could do:

for (c,rL) in zip(columns, rows.low..) {
  ... rL..rH ...
}

OTOH, if you're not using the shrinking row range within the loop body to drive an inner loop and are using the row sub-range expression multiple times within the routine, calling your helper routine may be preferable in terms of creating a new range once and using it multiple times.

To your original "What is the overhead?" question, in general, creating a new range is a matter of setting up a small struct with a few integers in it, and using it is a matter of plucking those values out of the range. Using a helper routine to create the range ought to result in fairly minimal overhead, and you may be able to reduce it further using inline proc. But I don't have enough experiment with benchmarking range creation idioms to be able to say very definitively which will work best.

The thing that makes me suggest using the rL..rH range literal if you're writing a loop is that the compiler transforms loops of the form:

for i in lo..hi do

into something resembling a more straightforward C loop without introducing a range object.

In most heavy-duty computations, I'd like to think that the differences between all of the above, including yours, are a wash. And if that were not the case, it may just suggest we have more optimization work to do here.

-Brad

Here is the final attempt: Your feedback is appreciated.

// LU decomposition by Crout's method - reducing the column 'c'. Note
// that 'r' matches 'c' if the row indices match those of the columns,
// but in an index neutral scenario, this cannot be guaranteed

// Decompose that part of the column above the diagonal, i.e. which 
// is effectively the matrix 'U'. This must be done serially because
// each element depends on all the elements above it in the column.

for (i, re, ce) in zip(rlo + 1 .. r - 1, rlo .., clo ..) do
{
    ac[i] = -inner(a[i, clo..ce], ac[rlo..re], -ac[i]);
}

const rL = r .. rhi;
const cL = clo .. c - 1;
const ref uc = ac[rlo .. r - 1];

// Decompose that part of the column on and below the diagonal
// which is effectively the matrix 'L'. It can be done in parallel.

forall i in rL do
{
    ac[i] = -inner(a[i, clo..ce], uc, -ac[i]);
}

Should I avoid the use of zippering in the first loop and just assign

                var re = rlo, ce = clo;

prior to the loop and then increment 're' and 'ce' each by 1 within the loop.

Would anything in the second loop be enhanced by not using ranges, i..e. something like

forall i in r .. rhi do
{
      ac[i] = -inner(a[i, clo .. c - 1], u, -ac[i]);
}

Besides avoiding the subtraction c - 1 that is.

In case you are wondering what that weird zipped for loop is about.

If I was reducing a square matrix with conventional indexing, i.e. a[1..n, 1..n], then that loop would read as the far simpler

for i in 2 .. r - 1 do
{
   ac[i] = -inner(a[i, 1..i-1], ac[1..i-1], -ac[i]);
}

This is the reduction of an element of a column, the reduceem and is just the inner product of the elements prior to the reducee in that column and the elements of the row to the left of the reducee.

But, as soon as you have write routines that have to cater for index neutrality where the row indices might not be the same as the column indices and where they both have an arbitrary starting index, you end up with a whole new ball-game. Note that in my original code, I can

assert(clo - rlo == ce - re);
assert(i = re + 1);

because the rows are offset by rlo and the columns by clo, I could have written my original code as

for i in rlo + 1 .. r - 1 do
{
   const re - i - 1, ce = re - (rlo - clo);

   ac[i] = -innet(a[i, clo .. ce], ac[rlo .. re], -a[i]);
}

Will that result in more optimal code than zippering?

The background is that I am doing an index-neutral Crout LU decomposition routine with which to test the programming techniques of such an approach.

I will them time it and compare those times with the alternative approach of simply reindexing an array passed to a routine to something which makes more sense.

But that will take time. Besides, I keep getting deleteme errors from the Chapel compiler when I use reindex so I obviously still have some way to go. But other things have priority now.

Hi Damian β€”

Sorry for the slow response here.

I think the choice of whether to express the first loop using zippering, increments by 1 in the body, or a closed form calculation of re, ce is likely a wash performance-wise (or, rather, if one of these did perform very differently than the other, that'd be surprising to me, and something that I'm not able to anticipate). So I'd suggest choosing the one whose style you prefer.

Would anything in the second loop be enhanced by not using ranges

No, I don't believe so. More specifically, while I believe we sometimes generate more efficient serial loops for literal ranges lo..hi relative to named range variables, I think once the loop is parallel, it's likely a wash (because the code to implement a parallel loop is so much more general by nature).

-Brad