Comparing Standard Library Sorts: The Impact of Parallelism

This topic is a place to discuss the blog post on sorting available here: Comparing Standard Library Sorts: The Impact of Parallelism

1 Like

mo8it offered up an implementation of the sorting benchmark in Rust with Rayon. This is an interesting comparison point but doesn't fit into the blog post itself because it's not using the standard library sort.

I used the benchmark provided and ran it on the benchmark machine. I observed performance of 319-358 million elements sorted per second, depending on how many threads are used (where the best performance I saw was using 16 threads, but it seems to be necessary to include that in the source code?).

The Rust+Rayon benchmark is here:


Should we tweak the name of Sort.QuickSort.quickSort to reflect that the routine is not Hoare's original QuickSort. Should the documentation say

This function currently either uses a parallel radix sort
or a vastly improved serial quick sort. 

and quote Bentley and McIlroy as you mention in the Sort.chpl source code.

Should it also mention the term ascending or descending in its description?

I stopped using Hoare's QuickSort decades ago and am always suspicious of the name. I used Scowen's QuickerSort for years.

FWIW, the quickSort isn't serial either. It's just not as effective at using parallel hardware as the radix sort. In other words, it has limited parallelism. I believe it is an error in the documentation to say proc sort will use a serial quick sort.

I wouldn't change the name of it from quickSort since I wouldn't assume a "quick sort" is Hoare's original sorting code. But either way, I'm expecting that when we stabilize the Sort module, proc sort will be the main interface, and things like quickSort might be deprecated or remain unstable.

But, regarding the proc sort documentation, adjusting it to call it an "improved quick sort" or something like that seems reasonable.