Allocating a nested array of arrays

I have an FFT (Fast Fourier Transform) data structure along the lines of

record
{
    np : int(w); // number of data samples
    nuf : int(w); // number of unique prime factors of 'np'
    pf : [1..nuf] uint(w); // prime factors of np
    rpf : [1..nuf] uint(w); // repetitiveness of pf[i]
    nf : int(w); // number of factors of 'np' where np = + reduce rpf
    tw : array [1..nf] of record // twiddle factors of the FFT
    {
        f : uint(w); // factor of concern
        m : uint(w); // m is a function of f, pf, rpf, and np
        v : [1..f, 1..m] real(u); // twiddle factor values
    };
}

I have factorized np in advanced. Also, courtesy of Michael, we know nuf <= 16 but that is not relevant here.

So, I know in advance, each of np, nf, nuf, and the contents of pf[..], rpf[..] and the dimensions of each 2D array v within the 1D array tw, i.e. f and m.

For n = 280 those values of f are (technically) 2, 2, 2, 5, 7.

What is the best way to allocate something like this within one routine and then return it as the result of that routine to the calling routine which itself will return the newly allocated routine to its caller. Do I know the class structure at this stage - No sorry.

Thanks

Hi @damianmoz,

It looks trivial to me to do what you are asking. Please clarify what I am missing and also your question "Do I know the class structure at this stage?"

First, if your data structure is a record, call it FFTstruct:

proc createFFTstruct(....) {
  /* prep work here */
  return new FFTstruct(....);
}

proc createWrapper(....) {
  var [or const] ffts = createFFTstruct(....);
  .....
  return ffts;
}

Due to copy elision the ownership of the record allocated by the new statement will transfer from createFFTstruct into createWrapper then to createWrapper's caller. Are you concerned about the record being copied? A shallow/bitwise copy may or may not occur. Even if it does, I expect the arrays to be passed around by (internal) reference.

If your FFTstruct is a class, the same code will still work, except the functions will be returning an owned class class pointer. Of course this guarantees that there will be no copying of the data.

Vass

Now that I have written the above, maybe these are the things you are actually asking:

Q: How to make tw contains instances of FFTstruct recursively?

A: the simplest is to make FFTstruct a class and declare:

tw : array [1..nf] of owned /*or shared*/ FFTstruct?;

I am sure we can get rid of the ? there, i.e. make the array elements non-nilable. This will require recursion in the FFTstruct constructor. Unsure if FFTstruct can be a struct in this case.

An alternative fun idea is to store the FFTstructs in a map(int, owned FFTstruct /* or even record FFTstruct */). That way tw is simply an array of ints. If so, v needs to be stored elsewhere, if I understand it.

Q: How to put (f, m, v) into a separate class?

A: Yes. In this case that class will be pointed at by a field in FFTstruct. If this does not "just work," please let us know. The decision to be made is whether the field should be owned, shared, or borrowed. This depends on the context.

I had lots of questions in my head and I did not know where to start.

I have two levels in that record. And my head was doing recursions. Let's address the simple bit.

I can create a record where given two dimensions f and m, I can create the record including the array space tw[1..f][1..m]; But doing several of them and storing them in an array is, it seems beyond me.

This program blows up.

record chunk
{
        var f : uint(64);
        var m : uint(64);
        var v : [1..f, 1..m] real(32);
};
var z : [1..16] chunk;
var sum1 = 0.0;
var sum2 = 0.0;
var kk = 0;

for f in 11..14 do
{
        for m in 5..8 do
        {
                kk += 1;
                z[kk] = new chunk(f, m);

                for i in 1..f do
                {
                        for j in 1..m do
                        {
                                z[kk].v[i, j] = (i + j):real(32);
                                sum1 += (i + j):real(32);
                        }
                }
        }
}

for k in 1..16 do
{
        const f = z[k].f;
        const m = z[k].m;

        for i in 1..f do
        {
                for j in 1..m do
                {
                for j in 1..m do
                {
                        sum2 += z[k].v[i, j];
                }
        }
}
writeln(sum1);
writeln(sum2);

m.chpl:17: error: halt reached - assigning between arrays of different shapes in dimension 0: 0 vs. 11

Apologies if my question was too vague and jumbled. Hopefully, I have at least pinpointed my first hurdle.

I am guessing I need to have a special init() routine but that is outside my skill set right now.

Thanks for your help.

Hello Damian,

Sorry I did not pick up on that aspect. Here is an adjustment to your code that works:

record chunk
{
        var f : uint(64);
        var m : uint(64);
        var v : [1..f, 1..m] real(32);
};
var sum1 = 0.0;
var sum2 = 0.0;

var z = for kk in 1..16 do createChunk(kk-1);

proc createChunk(k1) {
  const f = k1 / 4 + 11;
  const m = k1 % 4 + 5;
  var zkk = new chunk(f:uint, m:uint);

  // your code here, replacing 'z[kk]' with 'zkk'
  for i in 1..f do
  {
          for j in 1..m do
          {
                  zkk.v[i, j] = (i + j):real(32);
                  sum1 += (i + j):real(32);
          }
  }

  return zkk;
}

// your code with no changes
for k in 1..16 do
{
        const f = z[k].f;
        const m = z[k].m;

        for i in 1..f do
        {
                for j in 1..m do
                {
                        sum2 += z[k].v[i, j];
                }
        }
}
writeln(sum1);
writeln(sum2);

To explain, once you have created an instance of chunk, the dimensions of its v are baked in and cannot be changed, even when its f and m change. [We could make the dimensions changeable, however it is not necessary here.]

var z : [1..16] chunk; creates 16 instances of chunk with f==m==0, so their v are "stuck" as arrays with 0*0 elements. By contrast, new chunk(11,5) creates a v that is stuck at 11*5 elements. Then, the loop expression for kk in a..b do new chunk(...) converts to an array [a..b] chunk with each v sized according to what is passed to the new.

Stylistically, if you adopt the above approach, I recommend declaring chunk's fields f and m as const, to make it obvious that v is not resizable. I'd also make f and m ints, however it's more a matter of taste.

Please let us know your next hurdles.

Vass

Woo-hoo! That got me over the hurdle, or more like brain-blockage.

record chunk
{
        var f : uint(32);
        var m : uint(32);
        var v : [1..f, 1..m] real(32);
};
record plan
{
        const n : uint(32);
        const f : [1..n] uint(32);
        const m : [1..n] uint(32);
        var z : [1..n] chunk;
};
proc createChunk(f : uint(32), m : uint(32))
{
        return new chunk(f, m);
}
const R = 1..5:uint(32);
var f : [R] uint(32) = [ 4, 4, 5, 7, 11 ] : uint(32);
var m : [R] uint(32) = [ 10, 13, 20, 15, 12 ] : uint(32);
var p = new plan(5:uint(32), f, m, for i in R do createChunk(f[i], m[i]));

All working.

My question is that in that last line with new, how many times is the array space for that array of _chunk_s that form z within the record p allocated. Only once? Or once during the implicit call to init() and a second time when init() allocates the space for the array within p to which it then copies the space passed to in it the argument list of the init() routine?

Or am I totally confused?

Somebody far more knowledgeable than I should probably take my case here, or something like it, and insert it as an example to at least the primer if not the primary documentation. The context is the computation plan for a Discrete Fast Fourier Transform for N points, where N is an arbitrary number which hopefully N factorises nicely. Note that N will never exceed the 2^30 so I use a uint(32) to make sure nobody is doing something silly.

Thanks

When p, the plan, goes out of scope, will all those chunk s get deallocated too of do they have be deallocated/deleted?

Thanks.

When p, the plan, goes out of scope, will all those chunk s get deallocated too of do they have be deallocated/deleted?

They will be deallocated automatically. The only case in Chapel where you need to delete something manually is if it is an unmanaged class.

take my case here, or something like it, and insert it as an example to at least the primer if not the primary documentation

Are you thinking about the size of the array that is frozen upon allocation, as in your initial attempt? Or about how many times the array space is allocated / copying is performed?

I am looking for an answer to this last question meanwhile.

Vass

Thanks for that feedback.

Armed with the wisdom you have imparted to me, I can think about correct nomenclature. I would probably define these global scope data structures as

record fftplan // computation plan for a Fast Fourier Transform
{
    record TWFS // tw[iddle] f[actor]s - blame Cooley and Tukey for those words
    {
        const f : uint(32); // f[actor]
        const m : uint(32); // m[ultiplicity] (for want of a better word
        var v : [1..f, 1..m] real(32); // values of the twiddle factors
    };
    const n : uint(32);
    const uf : [1..n] uint(32);
    const ruf : [1..n] uint(32);
    var tw : [1..n] TWFS;
}

I cannot refer to the type of those Twiddle Factors as fftplan.TWFS (there is a dot in there) as in

new fftplan.TWFS(f, m); // was new chunk(f, m)

as I get an error message

invalid use of 'new'

Is TW not a genuine type or is there some different syntax or can I not refer to a type defined in a record or do I need a custom init() method for TW or ....

Here is the documentation on nested classes: https://chapel-lang.org/docs/main/language/spec/classes.html#nested-classes and the same for nested records: Records — Chapel Documentation 1.33

Here is how I interpret it:

record fftplan
{
    record TWFS { .... }
    .....

    proc      createTWFS_1(f:uint(32),m:uint(32)) do return new TWFS(f,m);
    proc type createTWFS_2(f:uint(32),m:uint(32)) do return new TWFS(f,m);
}

// createTWFS_1() is an instance method
var fp = new fftplan();
writeln(fp.createTWFS_1(5,5));

// createTWFS_2() is a type method
writeln(fftplan.createTWFS_2(6,6));

Vass

Cool. This works

record cfftPlan
{
    record TWFS
    {
        var f : uint(32);
        var m : uint(32);
        var v : [1..f, 1..m] real(32);
    };
    const n : uint(32);
    var tw : [1..n] TWFS;
    proc type newTWFS(_f : uint(32), _m : uint(32)) do return new TWFS(_f, _m);
};
const R = 1..5:uint(32);
var f : [R] uint(32) = [ 4, 4, 5, 7, 11 ] : uint(32);
var m : [R] uint(32) = [ 10, 13, 20, 15, 12 ] : uint(32);
var p = new cfftPlan(5, for i in R do cfftPlan.newTWFS(f[i], m[i]));

It looks like any type defined in the cfftPlan cannot be exported outside of it. Am I correct? I need to go and read that documentation more thoroughly.

Thanks heaps.

In this line

var p = new cfftPlan(5, for i in R do cfftPlan.newTWFS(f[i], m[i]));

I assume that the storage is allocated at invocation time, I assume an array of 5 pointer to the arrays created within newTWFS().

Does that array this get copied internally to the array tw[..] defined within the cfftPlan itself. I am curious about the duplication of space and for how long that duplication, if any, exists?

Thanks.

In that last line with new, how many times is the array space for that array of chunks that form z within the record p allocated?
var p = new plan(5:uint(32), f, m, for i in R do createChunk(f[i], m[i]));

Currently, just once. The for-expression is converted to an array, then the array is passed into the initializer for plan, which transfers the ownership to that array into the z field of the newly-created record instance.

Unfortunately this behavior is due to a bug [bug: array size mismatch with loop expression · Issue #23544 · chapel-lang/chapel · GitHub]. When we fix the bug, the initializer will copy its arg into the field instead of transferring the ownership. So there will be two arrays instead of one. To keep it to just one array, make z a generic field like var z;. Alternatively, put the for-expression into the initializer. For example:

record chunk
{
        var f : uint(32);
        var m : uint(32);
        var v : [1..f, 1..m] real(32);
};
record plan
{
        const n : uint(32);
        const f : [1..n] uint(32);
        const m : [1..n] uint(32);
        var z : [1..n] chunk;

        proc init(n: uint(32), in f, in m) {
          this.n = n;
          this.f = f;  // this is a copy
          this.m = m;
          this.z = for i in 1..n do createChunk(f[i], m[i]);
          // Also works: 'new chunk(...)' instead of 'createChunk(...)'
        }
};
proc createChunk(f : uint(32), m : uint(32))
{
        return new chunk(f, m);
}
const R = 1..5:uint(32);
var f : [R] uint(32) = [ 4, 4, 5, 7, 11 ] : uint(32);
var m : [R] uint(32) = [ 10, 13, 20, 15, 12 ] : uint(32);
var p = new plan(5:uint(32), f, m);

The same problem and solution apply to the arrays f and m.

Vass

In this particular case the documentation is scarce so I would rely on empirical evidence instead.

You could export a nested type, for example: type t = cfftPlan.newTWFS(f[i], m[i])).type; (untested). But then, it is easier to make it not nested to begin with.

Vass

Our wires just got crossed, see my reply a few minutes ago, which I wrote in the context of your earlier revision in this thread.

If that duplication exists, I expect the duplicate instances of TWFS and their respective arrays are deallocated once the statement var p = ...; is done executing.

Chapel supports tracing memory allocations and deallocations using the module MemDiagnostics, specifically startVerboseMem() and stopVerboseMem(), if you wanted to drill into it.

Vass

I deliberately do not want a non-nested type from the perspective of (data) hiding the type from those that do not need it. There really only needs to be a single type from a user perspective, what I now call cfftPlan, the plan for how the FFT of some given size will be run.

Thanks for the details about the generic field and the init() example. I had messed with the latter but had bugs. With your above guidance, I will see what I can do to make it a cleaner interface.

Thanks again for all your help.

I will say at this stage that this sort of data structure is far easier to describe, once I learn how to do it than it is in C/C++. And the code looks infinitely more robust and is heaps more readable.

I just did a version with a custom init() and it looks really clean. But then, my knowledge about all this has grown exponentially in the last few days.

I am constantly reminded about how much easier it is to read Chapel code compared to C++ or even straight C. And having a compiler that does array bounds checking is a massive bonus during development.

1 Like

By way of explanation using real world data, if one has 8800 data points to be FFT'd, it factors as

8800 = 2 * 4**2 * 5**2 * 11

i.e. mostly primes except for 4, for which the individual factors (and the correspond m vector)

const f = [ 2, 4, 4, 5, 5, 11 ]:uint(32);
const m = [ 4400, 1100, 275, 55, 1 ]:uint(32);

So, you get an idea of the size of those twiddle factor matrices.

Or if one has 8846 points, it factors (quite poorly) as

8846 =  2 * 4423

i.e. 4423 is prime, for which

const f = [2, 4423]:uint(32);
const m = [4423, 1]:uint(32);

When n=8800. it can exploit tiny computation kernels which exist for factors of 2, 3, 4, and 5.

The n=8846 case can only use the kernel for 2, and its FFT will take a lot longer than the first.

The final version of that cfftPlan will have a custom init() method which will accept a single argument, the number of points.

Hello Damian,

It is very nice to hear your impressions from using Chapel.

Also, your real-world example finally makes me understand your code. Here is a thought: if you are given a poorly-factorable number of points, how can you pad the data to the nearest well-factorable size so that the result of the fft remains the same within an acceptable error? Maybe by wrapping around the missing points? The goal is to make the fft go faster by exploiting tiny kernels.

We can improve readability of your code by adding a domain to TWFS. This way its v can be resized post-initialization. The cost is a constant memory and runtime overhead. You be the judge whether this is an improvement.

record cfftPlan
{
    record TWFS
    {
        var f : uint(32);
        var m : uint(32);
        var d : domain(2, uint(32));
        var v : [d] real(32);
        proc ref setup(f_, m_) {
          f = f_;
          m = m_;
          d = {1..f, 1..m};  // this resizes 'v'
          // fill up 'v'
        }
    };
    const n : uint(32);
    var tw : [1..n] TWFS;

    proc init(np) {
      const (n, fs, ms) = factorNumPoints(np);
      this.n = n;
      this.complete();
      for i in 1..n { tw[i].setup(fs[i], ms[i]); }
    }
};

proc factorNumPoints(np) {
  const n = 5: uint(32);
  var fs, ms: [1..n] uint(32);
  // fill in fs, ms
  return (n, fs, ms);
}

var p = new cfftPlan(8800);

Also as I have been playing with your code I noticed that I have to cast to uint(32) in a few places unless I am very careful. It would simplify coding if n, f, m are straight ints. The additional overhead from this change looks negligible to me. However, uint(32)s work fine, too.

Vass