For the limited case where you have an array
v[1..m, 1..n]
which you want as a copy of the transpose of
u[1..n, 1..m]
where those ranges can be anything, just matching, you could have
proc cotranspose(u : [?uD] ?R, v : [?vD] R)
{
const (rows, columns) = uD.dims();
const r = rows.size, c = columns.size;
param B = 16;
if r <= B && c <= B then
{
for (i, j) in uD do
{
v[j, i] = u[i, j];
}
}
else if r < c then
{
const (cb, ce) = (columns.low, columns.high);
const s = cb + c / 2;
const lower = cb .. s;
const upper = s + 1 .. ce;
cotranspose(u[rows, lower], v[lower, rows]);
cotranspose(u[rows, upper], v[upper, rows]);
}
else
{
const (rb, re) = (rows.low, rows.high);
const s = rb + r / 2;
const lower = rb .. s;
const upper = s + 1 .. re;
cotranspose(u[lower, columns], v[columns, lower]);
cotranspose(u[upper, columns], v[columns, upper]);
}
}
It also suffers from some of the same problems as mentioned in #20803 which we will address soon.
Do we really need to handle strided arrays? In my use in linear algebra, I have never needed to do so. in decades. But maybe some oif the newer uses iin ML and stuff might need it. Any other comments before I spend some time on it.
In this era of vectorization, the forall over both indices might probably be better done as a for down each row with each then row processed by a foreach. Needs some testing, preferably across several different architectures such as X86-64 and ARM (like on the current Mac).