16888, “ahysing”, “Efficient sorting of arrays over sparse domains”, “2020-12-20T16:34:52Z”
I am looking for an alternative for getting the bottom-most value in a large dataset. Usually a min-heap would be the perfect alternative. Chapel has one of those in module heap.
However Heap seems to be living in a single locale. When the data set is big enough the heap data structure will outgrow the size of memory in one locale, and eventually crash.
Sparse domains and arrays seems to overcome the data set size issue.
When arrays are large, copying them into a single locale is not feasable. That is why I want to sort
over large distributed arrays. When sorting sparse arrays sort enumrates the dense domain, and not the sparse domain. (See example below).
One suggests to sort only the sparse indecies when sorting a sparce domain. This will result in siginificantly better performance. Implicitly Represented Values for sparse arrays are equal, and can all be all be left in the same position when sorting. If implemented this way wort should run in ** N x log(N) ** time where N is the number of explicit array values with Implicitly Represented Values left out.
Concider this a feature request.
Summary of Problem
sort over an array with with 2**40 possible indices, and two actual values, should finish in less than a second.
When running this use case through chapel the resulting code spends hours , or days, to finish.
Steps to Reproduce
use Sort;
proc main() {
var start : int(64) = 0;
var end : int(64) = 1 << 40;
var ALL : domain(1) = {start..end};
var D : sparse subdomain(ALL);
D.add(0);
D.add(10);
var values : [D] real;
values.IRV = 5.0;
values[0] = 0.0;
values[10] = 10.0;
sort(values);
}