Sorting in Practice 2: The Sorting Lower Bound
Last time I asked how to sort one million 32-bit integers. That sounds like a fairly concrete question that we should be able to answer, but even in our quick experiment it turned out the contents of the list matter quite a bit. And here we see that there’s a bit of difference between what you learn in an undergrad algorithms class and how things work out in practice. You may heard that insertion sort is efficient on nearly-sorted lists. (We did cover it in Würzburg.) But if there are only two things people remember about sorting, it’s probably this:
-
Quicksort and mergesort (I’ll count that as one “thing”), and
-
the $\Omega( n \log n )$ lower bound.
But wait! Actually there was also bucket sort – wasn’t there? – and radix sort. Weren’t they linear time? Well yes, so what’s up with that? Why do we always say that sorting takes $n \log n$ time when … it doesn’t?
We are going to say $\Theta( n \log n )$ a whole lot, so let’s start calling it linearithmic. Why do we say there is a linearithmic lower bound for sorting if even first-year algorithms classes typically cover the counterexample? This is where we run into machine models – the thing that we usually don’t really talk about when we use pseudocode. The linearithmic lower bound (which is technically correct) is for so-called comparison sort algorithms, where the only thing you can do with list items is copy them around and compare two of them to see which one is larger. And the bound is actually for the number of comparisons, so if you want to talk about time, you need the comparisons to be constant-time – which may not actually be the most reasonable thing if the length of your list is going to infinity. In that sense, the theorem is optimistic, but I digress.
But the theorem is pessimistic in the sense that it does not consider the possibility of doing clever things with the items themselves, such as in the case of bucket sort. That is not a comparison-based algorithm and therefore the lowerbound doesn’t apply.
We have a theorem, but it’s based on a model that’s not quite realistic. This can still be very useful in its own way – a way in which somewhat-unrealistic but “technically correct” theorems can often be useful: they give a list of conditions from which the theorem follows. So if you want to escape the conclusion (in this case: if you want to sort faster than linearithmically many comparisons) you now have a list of ways to cheat. You must cheat at least one of the conditions.
Conversely, if you prove a theorem, your proof should use all conditions of the theorem. When you are doing homework, and you do not use all the conditions, it probably means that your proof is wrong. If you’re doing research you might actually be able to generalise your theorem! If you don’t use a certain condition, but your proof is correct, then the condition shouldn’t be in the theorem.
So is the sorting lower bound relevant in practice? Maybe not technically as a lower bound, but most practical sorting algorithms are comparison-based and have linearithmic runtime, so I’ll call it good to know about.