Sorting in Practice 1: Trying the standard library

Posted on Nov 12, 2023

At Uni Würzburg I was sometimes involved with the first-year algorithms and data structures exam. We usually had a question where we make you pick the appropriate sorting algorithm for a particular situation. And it was always kind of hard to come up with the exact scenarios, because the theory just isn’t very granular. Here: let’s try it. This is from an interview that subsequent president of the United States of America, Barack Obama, gave at Google.

Eric Schmidt
What is the most efficient way to sort one million 32-bit integers?

Barack Obama
I think the bubble sort would be the wrong way to go.

Fair enough, that’s decent advice in general, I suppose – but how would you do it? How do you sort this data?

Let’s think about this. We’ll assume we have a modern desktop PC, the data is already in an array in main memory and we’re only going to do this once. A million 32-bit integers. First of all, a million is not that much – that’s what, 4 megabytes of data? Depending a little bit on the context, this might not be all that challenging at all on modern hardware. Let’s actually try this real quick and see how a standard library does, because in reality the answer is usually “just use whatever is built in.” We want to know how that works of course, but let’s get a baseline.

Here’s Python. Let’s get a list of a million random 32-bit integers and see how long it takes to sort this; we’ll run it just once.

from random import getrandbits
xs = [ getrandbits(32) for _ in range(1000000) ]

from timeit import timeit
timeit( "sorted(xs)", globals=globals(), number=1 )

Place your bets now. How long does this take? I’ll try twice.

0.4695933839999995

0.4615071949999958

That actually took some time: Half a second! But this way I’m probably getting a new list. Let’s sort in-place. Maybe that’s faster.

timeit( "xs.sort()", globals=globals(), number=1 )

0.4582117430000068

No, it isn’t really faster. But now the list is sorted. Do you think that will matter? If we try this again, do you think the timing will be different? Think about it. Should it be different?

0.0602326850000256

Yes! This is actually much faster! So the actual data matters quite a bit, as we see. A completely random list takes half a second to sort here. If that’s the only thing you’re doing, that’s not so bad – I can wait half a second - but if you need to do this all the time, as part of some other algorithm, or for every request at a server, that could be bad.

You know what, let’s get a second opinion. Here’s C++. I going to want a million numbers, I’ll put them in the vector xs, and I know how many numbers I’m going to put in there, so let’s actually say that in advance – there is a bunch more typing involved in C++ compared to Python.

#include <random>
#include <vector>
#include <algorithm>
#include <limits>
using namespace std;
using T = uint32_t;
c++
int main() { 
    random_device rd;
    mt19937 gen(rd());
    uniform_int_distribution<> rand(
        numeric_limits<T>::min(),
        numeric_limits<T>::max() );
        
    int n = 1000000;
    vector<T> xs;
    xs.reserve(n);
    for( int i=0; i<n; ++i ) {
        xs.push_back( rand(gen) );
    }
}

We didn’t actually measure how long it took to generate the list in Python, but it looked like it was pretty quick. Let’s confirm that. Compile with modern C++, optimisations on.

> clang++ -std=c++2a -O2 sort.cpp
time ./a.out

./a.out  0.01s user 0.00s system 4% cpu 0.399 total

Just making the list takes pretty much no time: that’s good. Now let’s sort it.

sort( xs.begin(), xs.end() );

> time ./a.out

0.10s user 0.00s system 27% cpu 0.374 total

OK, that’s faster than Python, but not completely different: a tenth of a second. That’s still a measureable amount of time; it’s something that could add up. Oh, one more thing, let’s see if this also gets faster if the list is already sorted. Let’s sort it … eh, a hundred times!

for( int i=0; i<100; ++i ) {
    sort( xs.begin(), xs.end() );
}

> time ./a.out

./a.out 0.17s user 0.00s system 49% cpu 0.355 total

Wow, that only takes 0.17 seconds. In less than twice the time, we can sort this list a hundred times!

I mean, that’s not really what’s going on, right? We call sort a hundred times. The list will already be sorted after the first call, so this is a bit of a weird experiment. In less than twice the time of sorting the list, we can sort the list once, and then – while the memory is probably fairly hot – “sort” that sorted list 99 more times.

You have to be a bit careful if you’re actually measuring what you’re trying to measure, and if that makes sense. And this entire thing wasn’t to say that C++ is a better programming language or something: the Python code was certainly much easier to write in this case. But your language choice can make a big difference!

You might worry a little bit that maybe the compiler knew that sorting a list makes it sorted, and that sorting a sorted list doesn’t do anything – but I checked the generated code, and that doesn’t seem to be the case. It looks like it really is calling sort a hundred times. This sort function is just really good at not wasting time on sorted lists.

So we did some ad hoc experiments to get a feel for the situation. And we already ran into some complications! It made a huge difference whether the data was already sorted. Indeed, bubble sort is efficient on nearly-sorted lists! And so is, for example, insertion sort. If you look at the code generated in our small example, you’ll see heap sort, something called introsort – more on that in a second – and hey! Insertion sort. Well, would you look at that. That could actually explain why sorting the sorted list was so fast. This disassembly is a bit much to go into here, but it looks like it uses various sorting algorithms in combination!