Linear Time Order Statistics

Posted on Dec 5, 2023

(You can watch the video version of this post on YouTube.)

Now, sorting takes $\Omega(n\log n)$ time, right? I mean, it’s more subtle than that, but even if we stick to a comparison and swap model, then you can do what’s called “select” or “order statistics” in linear time. That’s the problem of finding the element that, if the array were sorted, would be at a given position – it’s the element of rank $i$. As it turns out, you can do that in worst case linear time on an unsorted array, without any preprocessing, in the compare and swap model.

The super quick version goes something like this. Consider quicksort: it starts by partitioning the array on a pivot: you put everything-that’s-smaller before everything-that’s-larger. Now quicksort recurses on both parts to sort them, but we’re only looking for the element that going to end up at a certain position, so wherever the border of the partition ends up, we know which side to recurse on.

If you take random pivots you expect this to take linear time in total – just like you expect $O(n\log n)$ for quicksort – but you may have noticed that I said worst case linear time. You can actually guarantee this with a bit of a delicate procedure to quickly get a good-enough pivot for the recursion to work out to linear. Look it up if you want to, but I should say that as far as I can tell, it’s not practical.¹ If you actually want a particular order statistic, but not the sorted list, in practice you should probably look up “intro-select.”

As for whose algorithm this is: I seemed to remember it was Blum, but wasn’t sure so I looked it up, and uh .. yeah, it turns out there’s an “et al” that’s doing a lot of heavy lifting – it’s a bit of an all-star band here: Blum, Floyd, Pratt, Rivest and Tarjan.

Also: don’t use this to find a pivot for quicksort. I mean, it would work in terms of big Oh, but it’d be an utterly grotesque way to go about it. ↩︎