Red-black Gauss-Seidel

For red-black Gauss-Seidel, you need to partition the range into red and black sets, see e.g.,


All red points in one range and all back in another. Here is my attempt

   const D = {1..5,1..5};

   var red   : sparse subdomain(D);
   var black : sparse subdomain(D);

   for (i,j) in D
   {
      if (i+j) % 2 == 0
      {
         red += (i,j);
      }
      else
      {
         black += (i,j);
      }
   }

   for (i,j) in red do
      writeln(i," ",j);

   for (i,j) in black do
      writeln(i," ",j);

This seems to give me the correct partitioning of the range. Is this correct way to do it, or is there a better way to do this, a more compact way.

Following does not compile

   const red  : domain(2) = [ (i, j) in D && (i + j) % 2 == 0 ] (i,j);
   const black: domain(2) = [ (i, j) in D && (i + j) % 2 == 1 ] (i,j);

Hi Praveen —

I would generally avoid using sparse domains to represent a pattern as
regular as a red-black pattern on a dense array. Specifically, the
pattern in a red-black computation is very regular and algebraic whereas
sparse domains (at least, in their current implementations) are very
general-purpose, requiring O(nnz) storage and not likely to result in loops
that are as tight and efficient in the implementation.

Instead, I would consider writing explicit loops like:

    forall i in 1..n {
      forall j in 1..n by 2 align i {  // or maybe foreach?
        ... do the computation in one color …
      }
    }

    forall i in 1..n {
      forall j in 1..n by 2 align i+1 {  // or maybe foreach?
        ... do the other ...
      }
    }

(check my work) to express red-black. If you wanted to do this
with domains, I think the two approaches would be:

  • To use two rectangular, strided-by-two domains per red/black color (for
    a total of four domains) to capture the points of interest. The
    downside of this is that it would require two loops per color.

  • To create a custom sparse domain map that uses O(1) storage and
    represents the red or black colors, whose implementation in terms of
    parallel iterators and the like would likely be similar to the loops
    above. In many senses, this would be a really clean, lovely
    implementation, but writing a new domain map is a nontrivial (and not
    particularly well-documented) effort. That said, I find myself
    intrigued whether it could be done relatively easily.

A stepping stone to bullet two above would be to create some standalone
parallel iterators for red/black iteration, whose bodies looked something
like the loops above, and that's both a simpler effort than writing a complete
domain map and also a step along the way to creating the domain map.

In any event, I would start with an approach that used explicit loops as
above to get up and running; and then consider moving them into parallel
iterators (initially) and domain maps (ultimately) if the approach seemed
sound, performance was reasonable, and better abstraction was desired.

Those are my quick thoughts anyway,
-Brad

I tested Brad’s idea on a 500x500 grid which gives timing

Initial residual = 0.501004
Final   residual = 4.61574e-07
No. of iterations= 1152
./test_poisson --n 500 --itmax 2000  0.73s user 0.02s system 278% cpu 0.267 total

If I use a condition test

      forall (i,j) in inner do
      if (i+j)%2 == 0
      {
         const tmp = 0.25 * (u[i-1,j] + u[i+1,j] + u[i,j-1] + u[i,j+1] 
                             + h * h * rhs[i,j]);
         u[i,j] = (1.0 - r) * u[i,j] + r * tmp;
      }

      forall (i,j) in inner do
      if (i+j)%2 == 1
      {
         const tmp = 0.25 * (u[i-1,j] + u[i+1,j] + u[i,j-1] + u[i,j+1] 
                             + h * h * rhs[i,j]);
         u[i,j] = (1.0 - r) * u[i,j] + r * tmp;
      }

it is bit slower

Initial residual = 0.501004
Final   residual = 4.61574e-07
No. of iterations= 1152
./test_poisson --n 500 --itmax 2000  0.90s user 0.02s system 293% cpu 0.314 total

Hi @cpraveen

It doesn't necessarily surprise me that the version with the conditional would be a bit slower since control flow within loops can be bad for performance; though it'd be interesting to see whether the behavior flips as the array continues to get bigger since my approach would require two passes over the array, which could be bad if it doesn't fit into cache.

Be sure to compile with --fast when doing performance comparisons if you're not already. Also, if taking timings with time, you may want to time a no-op program to see how long that takes (all Chapel programs have some overhead to get up and running, even if they do nothing).

-Brad

Anna Felix did some work with this a few years ago? She might have some advice.

I think Engin or Vass provided some input at the time but I cannot find the details in my email.

There is discussion of red-black Gauss-Seidel for 3D Poisson here