Allocating a nested array of arrays

Very naughty. Certainly for geophysical processing of airborne surveys, that would not fly. There might be other areas of application where that works, but not for my those of the people with whom I work. An existing C++ FFT application has worked perfectly well for 25+ years without the need for padding on the data to be FFT'd. There is a new algorithm by Bluestein which is 20 years old which does something like what you are suggesting but that only works for real data. So we wont be going down the padding road.

I use uint(32) to make sure things are robust w.r.t uint(32) which is the absolute limit. For most practical purposes today, I should probably have made it uint(16) but I am guessing that at some stage over the next 25+ years, it will need to handle slightly more than that number of points, maybe as many as 2^24. That said, I have done all my testing of the factorization stuff ensuring that it works up to 2^32 (if I declare things as uint(64)). I like strong typing but yes, it gets a bit painful in Chapel but it does help show up potential issues.

I probably will change to ints because the major data storage overhead are the twiddle factors. However, my use of uints(32) has highlighted other Chapel issues/bugs like #22533 and #23302 as you might have seen on GitHub so I might keep them for the moment.

I like the idea of that domain. But I do not quite understand it. I need to read it a bit more.

Note that I would not normally populate the the twiddle factors, i.e. the v elements of TWFS, until after I have created the entire tw array of TWFS records. There are lots of computations there and the idea way to compute those is to use a FMA instruction - See #23251.

Interesting! Your explanations make sense. Re domains, in the following code:

var d: domain(2);
var a: [d] real;
d = ....;

assigning to d resizes a (padding it with zeros and/or dropping existing elements, as needed) so its index set always matches d.

Just for fun, how about dropping some points instead of padding? Empirically, for np \in 8800..9400, I'd need to drop at most 0.4% of points, often much less, to get to a number with the max prime factor of <=23 (for example, 8959 --> 8925). Instead of one very slow FFT on a full survey with np=8837, I'd run several super-fast FFTs on its subsets with np=8835 dropping different pairs of points, then average the results. I can even fantasize about an adaptive approach that starts with an aggressive drop size for smallest factors / fastest FFTs, then backs off to dropping fewer points if the answers that the individual FFTs return differ too much.

Of course if you are happy with the speed of even your slowest FFT, then all this is irrelevant.

Everybody likes speed.

But my job is to have a robust working FFT. At this stage it means lobbying for a working FMA and getting the right data structures right and debugging things. I also need to make sure that we work with arrays indexed from 1 so that the implementation agrees with the way the mathematics is written.

Trying to come up with an adaptive FFT is not on my radar.

Messing around with changing the number of sample points to get abetter factorization is somebody else's job. But with Chapel, we know that we can run those samples in parallel easily.

Thanks for all your help.

Fair enough.

Keep those questions coming!