All red points in one range and all back in another. Here is my attempt
const D = {1..5,1..5};
var red : sparse subdomain(D);
var black : sparse subdomain(D);
for (i,j) in D
{
if (i+j) % 2 == 0
{
red += (i,j);
}
else
{
black += (i,j);
}
}
for (i,j) in red do
writeln(i," ",j);
for (i,j) in black do
writeln(i," ",j);
This seems to give me the correct partitioning of the range. Is this correct way to do it, or is there a better way to do this, a more compact way.
Following does not compile
const red : domain(2) = [ (i, j) in D && (i + j) % 2 == 0 ] (i,j);
const black: domain(2) = [ (i, j) in D && (i + j) % 2 == 1 ] (i,j);
I would generally avoid using sparse domains to represent a pattern as
regular as a red-black pattern on a dense array. Specifically, the
pattern in a red-black computation is very regular and algebraic whereas
sparse domains (at least, in their current implementations) are very
general-purpose, requiring O(nnz) storage and not likely to result in loops
that are as tight and efficient in the implementation.
Instead, I would consider writing explicit loops like:
forall i in 1..n {
forall j in 1..n by 2 align i { // or maybe foreach?
... do the computation in one color …
}
}
forall i in 1..n {
forall j in 1..n by 2 align i+1 { // or maybe foreach?
... do the other ...
}
}
(check my work) to express red-black. If you wanted to do this
with domains, I think the two approaches would be:
To use two rectangular, strided-by-two domains per red/black color (for
a total of four domains) to capture the points of interest. The
downside of this is that it would require two loops per color.
To create a custom sparse domain map that uses O(1) storage and
represents the red or black colors, whose implementation in terms of
parallel iterators and the like would likely be similar to the loops
above. In many senses, this would be a really clean, lovely
implementation, but writing a new domain map is a nontrivial (and not
particularly well-documented) effort. That said, I find myself
intrigued whether it could be done relatively easily.
A stepping stone to bullet two above would be to create some standalone
parallel iterators for red/black iteration, whose bodies looked something
like the loops above, and that's both a simpler effort than writing a complete
domain map and also a step along the way to creating the domain map.
In any event, I would start with an approach that used explicit loops as
above to get up and running; and then consider moving them into parallel
iterators (initially) and domain maps (ultimately) if the approach seemed
sound, performance was reasonable, and better abstraction was desired.
I tested Brad’s idea on a 500x500 grid which gives timing
Initial residual = 0.501004
Final residual = 4.61574e-07
No. of iterations= 1152
./test_poisson --n 500 --itmax 2000 0.73s user 0.02s system 278% cpu 0.267 total
If I use a condition test
forall (i,j) in inner do
if (i+j)%2 == 0
{
const tmp = 0.25 * (u[i-1,j] + u[i+1,j] + u[i,j-1] + u[i,j+1]
+ h * h * rhs[i,j]);
u[i,j] = (1.0 - r) * u[i,j] + r * tmp;
}
forall (i,j) in inner do
if (i+j)%2 == 1
{
const tmp = 0.25 * (u[i-1,j] + u[i+1,j] + u[i,j-1] + u[i,j+1]
+ h * h * rhs[i,j]);
u[i,j] = (1.0 - r) * u[i,j] + r * tmp;
}
it is bit slower
Initial residual = 0.501004
Final residual = 4.61574e-07
No. of iterations= 1152
./test_poisson --n 500 --itmax 2000 0.90s user 0.02s system 293% cpu 0.314 total
It doesn't necessarily surprise me that the version with the conditional would be a bit slower since control flow within loops can be bad for performance; though it'd be interesting to see whether the behavior flips as the array continues to get bigger since my approach would require two passes over the array, which could be bad if it doesn't fit into cache.
Be sure to compile with --fast when doing performance comparisons if you're not already. Also, if taking timings with time, you may want to time a no-op program to see how long that takes (all Chapel programs have some overhead to get up and running, even if they do nothing).