Red-black Gauss-Seidel

cpraveen · October 22, 2025, 4:05pm

For red-black Gauss-Seidel, you need to partition the range into red and black sets, see e.g.,

All red points in one range and all back in another. Here is my attempt

   const D = {1..5,1..5};

   var red   : sparse subdomain(D);
   var black : sparse subdomain(D);

   for (i,j) in D
   {
      if (i+j) % 2 == 0
      {
         red += (i,j);
      }
      else
      {
         black += (i,j);
      }
   }

   for (i,j) in red do
      writeln(i," ",j);

   for (i,j) in black do
      writeln(i," ",j);

This seems to give me the correct partitioning of the range. Is this correct way to do it, or is there a better way to do this, a more compact way.

Following does not compile

   const red  : domain(2) = [ (i, j) in D && (i + j) % 2 == 0 ] (i,j);
   const black: domain(2) = [ (i, j) in D && (i + j) % 2 == 1 ] (i,j);

bradcray · October 22, 2025, 5:22pm

Hi Praveen —

I would generally avoid using sparse domains to represent a pattern as
regular as a red-black pattern on a dense array. Specifically, the
pattern in a red-black computation is very regular and algebraic whereas
sparse domains (at least, in their current implementations) are very
general-purpose, requiring O(nnz) storage and not likely to result in loops
that are as tight and efficient in the implementation.

Instead, I would consider writing explicit loops like:

    forall i in 1..n {
      forall j in 1..n by 2 align i {  // or maybe foreach?
        ... do the computation in one color …
      }
    }

    forall i in 1..n {
      forall j in 1..n by 2 align i+1 {  // or maybe foreach?
        ... do the other ...
      }
    }

(check my work) to express red-black. If you wanted to do this
with domains, I think the two approaches would be:

To use two rectangular, strided-by-two domains per red/black color (for
a total of four domains) to capture the points of interest. The
downside of this is that it would require two loops per color.
To create a custom sparse domain map that uses O(1) storage and
represents the red or black colors, whose implementation in terms of
parallel iterators and the like would likely be similar to the loops
above. In many senses, this would be a really clean, lovely
implementation, but writing a new domain map is a nontrivial (and not
particularly well-documented) effort. That said, I find myself
intrigued whether it could be done relatively easily.

A stepping stone to bullet two above would be to create some standalone
parallel iterators for red/black iteration, whose bodies looked something
like the loops above, and that's both a simpler effort than writing a complete
domain map and also a step along the way to creating the domain map.

In any event, I would start with an approach that used explicit loops as
above to get up and running; and then consider moving them into parallel
iterators (initially) and domain maps (ultimately) if the approach seemed
sound, performance was reasonable, and better abstraction was desired.

Those are my quick thoughts anyway,
-Brad

cpraveen · October 23, 2025, 2:48am

I tested Brad’s idea on a 500x500 grid which gives timing

Initial residual = 0.501004
Final   residual = 4.61574e-07
No. of iterations= 1152
./test_poisson --n 500 --itmax 2000  0.73s user 0.02s system 278% cpu 0.267 total

If I use a condition test

      forall (i,j) in inner do
      if (i+j)%2 == 0
      {
         const tmp = 0.25 * (u[i-1,j] + u[i+1,j] + u[i,j-1] + u[i,j+1] 
                             + h * h * rhs[i,j]);
         u[i,j] = (1.0 - r) * u[i,j] + r * tmp;
      }

      forall (i,j) in inner do
      if (i+j)%2 == 1
      {
         const tmp = 0.25 * (u[i-1,j] + u[i+1,j] + u[i,j-1] + u[i,j+1] 
                             + h * h * rhs[i,j]);
         u[i,j] = (1.0 - r) * u[i,j] + r * tmp;
      }

it is bit slower

Initial residual = 0.501004
Final   residual = 4.61574e-07
No. of iterations= 1152
./test_poisson --n 500 --itmax 2000  0.90s user 0.02s system 293% cpu 0.314 total

bradcray · October 23, 2025, 6:03am

Hi @cpraveen —

It doesn't necessarily surprise me that the version with the conditional would be a bit slower since control flow within loops can be bad for performance; though it'd be interesting to see whether the behavior flips as the array continues to get bigger since my approach would require two passes over the array, which could be bad if it doesn't fit into cache.

Be sure to compile with --fast when doing performance comparisons if you're not already. Also, if taking timings with time, you may want to time a no-op program to see how long that takes (all Chapel programs have some overhead to get up and running, even if they do nothing).

-Brad

damianmoz · October 27, 2025, 8:05am

Anna Felix did some work with this a few years ago? She might have some advice.

I think Engin or Vass provided some input at the time but I cannot find the details in my email.

cpraveen · October 27, 2025, 9:11am

There is discussion of red-black Gauss-Seidel for 3D Poisson here

annacfs · October 27, 2025, 10:31am

Topic		Replies	Views
Sparse Data Distribution Support over Multiple Locales Users	13	653	December 8, 2020
Code optimization Users	28	592	December 28, 2022
Sparse Matrices - CSR and RSB Users	3	176	December 22, 2022
Reindexing to same array Users	12	501	April 21, 2021
Partial reductions (operate on 2D-array and produce 1-D array) Users	19	496	December 31, 2022

Red-black Gauss-Seidel

Related topics