Slicing with rank-reduction performance

damianmoz · May 2, 2022, 2:01am

Too often, Fortran programs implementing variational calculus, i.e. Finite Element, Finite Volume and Finite Difference, have declarations like

real ustar(Nxp1,Ny), un(Nxp1,Ny)
real vstar(Nx,Nyp1), vn(Nx,Nyp1)
real CONVx(Nx,Ny), VISCx(Nx,Ny)
real CONVy(Nx,Ny), VISCy(Nx,Ny)

In a refactoring to Chapel, these might naively map directly to declarations like

var ustar, un : [1..Nx+1,1..Ny] real(32);
var vstar, vn : [1..Nx,1..Ny+1] real(32);
var CONVx, VISCx, CONVy, VISCy : [1..Nx,1..Ny] real(32);

The problem with the above is that you have to carry around all these variables throughout the program and your Chapel program inherits those ugly long Fortran parameter lists.

Realizing that these visc[osity] term and conv[ection] terms are just two pieces of data associated with a single point (i, j), i.e. point data, each of which exists in two dimensions, one could declare a single variable, say vc, that contains all of the above point data as:

const D = 1..2; // 1 => X-direction, 2 => Y-direction
const V = 1..2; // 1 => viscosity, 2 => convection

var vc : [V, D, 1..Nx+1, 1..Ny+1] real(32); // point data

In this way, you needs to carry around one variable, vc, instead of four, albeit with some wasted space in the final row and column in two scenarios.

Within one's code, when one wants to do operations on individual component of that point data, you just slice it up, always with rank reduction. That is, within a routine, one might see

ref VISCx = vc[1, 1, .., ..];
ref VISCy = vc[1, 2, .., ..];
ref CONVx = vc[2, 1, .., ..];
ref CONVy = vc[2, 2, .., ..];

This results in such slicing, i.e. rank reducing slices, being the predominant slicing operation in
a Chapel refactorization of these types of Fortran programs.

Looking at another scenario, slicing of some array with a reduction down to a rank of 1 where the slice contains memory-contiguous floating point data, is the operation that many of us want to maps to a hardware vector, a feature that many of us want to exploit heavily in our programs.

I looked at all the occurrences of slicing inside old Fortran programs that my colleagues, clients and I, use, and hope to, have or want to port to Chapel. Therein, rank-reducing slicing accounts for 99+% of the slicing operations. I might have missed a few, but not enough to make a huge difference to that number.

This makes the performance of such rank-reduced slicing really critical. Chapel needs as a goal, the ability to slice with rank reduction being as performant as that without rank reduction.

My 2c.

With thanks to @annacfs for her Finite Volume Computational Fluid Dynamics example seen in GitHub Issue #19636.

bradcray · May 2, 2022, 11:42pm

Hi Damian —

If I wanted to combine those four arrays into one in Chapel, rather than promoting it to a 4D array, I would suggest making it into an array of arrays, as follows:

var vc: [1..2, 1..2] [1..Nx+1, 1..Ny+1] real(32);
ref VISCx = vc[1,1],
    VISCy = vc[1, 2],
    CONVx = vc[2, 1],
    CONVy = vc[2, 2];

In my opinion, this is a more natural way in Chapel to say "here's a 2x2 collection of ~Nx x ~Ny arrays", while also avoiding the need for any rank-change slicing.

None of this is to say that we rank-change slices shouldn't receive further optimization and tuning. Like many optimization problems, there's something of a priority challenge here—we haven't had key user codes relying on this and suffering from it, so it's been easy to put off. If this becomes a limiting factor for someone, they should open an issue on our repository on GitHub.

-Brad

damianmoz · May 6, 2022, 3:44am

Yes. But then all the X-components are then

var xOnly = vc[..][1];

at which point xOnly appears to be

xOnly[1..2][1..Nx+1, 1..Ny+1];

Is this correct?

It still looks like I have reduced the rank in some sense.

Hmmm. I wonder about the X components within a single column of the last 2 dimensions. I need to think about it a bit more.

bradcray · May 10, 2022, 9:01pm

That's true that if you only wanted to access the x or y components, you'd need to do another slice, like vc[.., 1] or vc[.., 1..1]. Alternatively, you might consider using an array of arrays to eliminate the need for a rank-change slice:

var vc: [V] [D] [1..Nx+1, 1..Ny+1] real(32);

-Brad

PS — I want to note that in your last response, you're in danger of butting up against an aspect of the language that always makes me feel a little squeamish, which I'll illustrate in the following example:

var X: [1..3] [1..4, 1..5] real;

Chapel today does not support the ability to write something like:

...X[2..3][2..4, 2..5]...

as a way of saying "slice the first array first and the second array second." The reason for this is that a slice expression like X[2..3] results in a new 1D virtual array that can itself be indexed, sliced, etc. So then the second slice expression [2..4, 2..5] is applied to this new virtual array, but is the wrong rank (2D rather than the 1D that the virtual array was expecting to support). This is perhaps clearer to see by breaking the expression up into two portions:

ref X23 = x[2..3];   // X23 is logically a 1D array
X23[2..4, 2..5];    // ... so it can't be accessed with a 2D slice

I'm always interested in more opinions and thoughts on this topic, where putting them at
How should accessing a slice be interpreted? · Issue #5568 · chapel-lang/chapel · GitHub would be a good place to do so.

damianmoz · May 10, 2022, 10:46pm

Thanks for the extra insight. I will use that issue in future. I had not realised it existed although that was probably because my search terms were inadequate.

Topic		Replies	Views
Werid performance slow-down Users	3	230	January 12, 2023
Partial reductions (operate on 2D-array and produce 1-D array) Users	19	441	December 31, 2022
Performance of assignments involving slices Users	12	277	December 13, 2021
Performance Considerations Users	19	411	April 21, 2021
Code optimization Users	28	524	December 28, 2022

Slicing with rank-reduction performance

Related topics