Chapel also now has support for Sparse storage although I have no experience with it and I believe that this is defined as being in the early prototype stage. I will need to install 1.29 to get experience with this because it seems quite new.
One can also implement other schemes that one can implement oneself such as the use Illiffe vectors, packed, banded, or skyline. What should one consider if doing this to ensure that self-implemented schemes parallelize well in a forall.
I am particularly interested in what I need to consider in the layout along anti-diagonal bands or waves, i.,e. where say the 16 elements of a 4*4 2D 1-offset array would be stored in order of
The name of this ordering, I do not know. I will use it in a Jacobi SVD algorithm shortly and it can be used to help parallelize an SOR algorithm and anything else which demands that the reduction of the (i,j)'th elements demands that (i-1,j) and (i,j-1) are already reduced.
I can see trying out a wavefront parallelization where the loop computes the SOR values using anti-diagonal waves, but my intuition is that the overhead of putting the data into that organization wouldn’t be worth it. Do you have some preliminary results that seem to indicate changing the storage to anti-diagonals would improve the performance? Does/Would it better enable using vector loads?
What I saw with row-major storage was that a 36-core job ran slower than a 24 core job so I assumed I was getting lots of cache misses.
Hence I was reaching out for suggestions and your work suggests I may be wasting my time. I will read them in more detail over the break. Loop chain abstraction. Got to wrap my head around that one.
Thanks - Damian
P.S. A 52-core job takes even longer.
P.P.S. Thanks for those references. I read both. The second was seriously heavy. My brain hurts.
I will revisit them later and see if the more details manage to permeate the little grey cells in my head.
Hi Damian: I did not know that Chapel allowed column-major: could you post how to specify this for an array, or point me to where it is in the documentation ? (I could not find it)
use LayoutCS;
var D = {0..#n, 0..#m}; // a default-distributed domain
var CSC_Domain : sparse subdomain(D) dmapped CS(compressRows=false);
var CSC_Array: [CSC_Domain] real;
I find Chapel's current documentation pretty well frustrating impossible to search so I am not much use. Maybe Vass or Brad can point you at it. It is related to Domain Maps and Index Sets - I think.
@nelsonluisdias sorry we have not implemented column-major layout for dense matrices. I can think of some workarounds, however it depends on your application.
@damianmoz I agree, much of our documentation leaves much to be desired. Thank you for working to contribute improvements for documentation of C types.
Domain Maps already provide the syntax. You do not need to invent something new.. I apologize that U cannot find the example which was supposed to show how to do column-major ordering. But, I am really bad at being able to search the current documentation. Give me a PDF document any day.
I was also amazed but I could not even really find where it is stipulated that the Chapel default is row-major. Hmmm.
Somebody far more knowledgable than me needs to reply here.
@vass, it was curiousity, not necessity. I am perfectly happy to use row-major order, which in my opinion is actually more natural. I do not think that Chapel needs to have both ways.
That LAPACK documentation might have confused me that Chapel supported column major ordering. Or maybe I was reading too much into that Domain Map presentation which dates back to 2014.
Enjoy the Festive Season and the New Year celebrations. Drive safely