Algorithms in Practice 1: Theory & Practice

Posted on Nov 10, 2023

(You can watch the video version of this post on YouTube.)


I would like to take you on a bit of a journey. We will start out by looking at some very high level stuff – philosophy of science, if you want to call it that. We don’t usually talk about that in algorithms classes, because we take it for granted. But if we want to talk about algorithms in practice, we kind of need to talk about what we mean when we talk about algorithms in theory. We will spend most of the time down in the dirt, looking at things we usually ignore; things we sweep under the carpet. Well, now I’m mixing metaphors, but you see what I mean. And that’s what makes algorithms so interesting to me. There is beautiful maths and elegant abstractions, but if you care to look, it also has a real nitty-gritty hands-on engineering aspect to it. Let’s do this.

“In theory, there is no difference between theory and practice. But, in practice, there is.” 1

To kick us off, I’m going say some things that are quite critical of the way we usually analyse algorithms academically. I will question, a little bit, how we think about algorithms: what we’re even trying to do when we analyse them. That’s not because I think what we usually do is bad: it’s not. I’d just like us all to think a little bit about our basic assumptions – about our theoretical framework – because it’s probably been a while since you’ve really thought about this. Why we do the things we do, and why we do them the way that we do.


When you first learned about algorithms and data structures, you probably learned about pseudocode, and you start out counting “basic operations” – though it was probably never made entirely clear what those are, exactly, and why. Don’t worry about it: big-Oh notation shows up soon enough and we won’t have to think about it. These abstractions are good. They allow us to get things done – and useful things have most certainly been done! But we didn’t have to do it this way.

Nothing about algorithms, data structures, computation – nothing made us use asymptotic, big-Oh analysis. Well, your teachers made you do it, and they were right to, but I think you probably weren’t really in an optimal position at the time to appreciate exactly what was going on, scientifically speaking; in a position to see what an enormous decision it is, to do things this way: to care about these things specifically, and not others. It was all new, and you were kind of along for the ride. So I thought it might be good to revisit some of the basic assumptions. Not because they’re bad and you need different ones, but because it’s good to be aware of them and how they influence the relation between theory and practice.

The legendary Donald Knuth, for example, in his book series The Art of Computer Programming (note that he calls it the art of computer programming, not algorithms, even though it is definitely a book about algorithms) has an explicit machine model. He gives actual bounds on the number of instructions performed. Of merge sort he says, for example:

“Thus, about $12.5$ units are spent on each record in each pass, and the total running time will be asymptotically $12.5 N log N$, for both the average and the worst case. This is slower than quicksort’s average time, and it may not be enough better than heapsort to justify taking twice as much memory space, since the asymptotic running time of Program such-and-so is never more than $18\ N log N$.”

That’s an amount of detail that you don’t usually see – we would just call all of that $O(n \log n)$. We will have a quick look at sorting later and if you’re going to improve sorting, you’re probably going to have to look at more details than is usual.


Interpreting theorems about algorithms can actually be a little bit tricky. These are mathematical claims. They are technically correct – which is, as they say, the best kind of correct. But: they are therefore correct only in a very specific way and sometimes that way is not – technically – what we actually care about.

Sometimes we don’t even really think about the theorem quite in the right way. For example: asymptotic runtime bounds with big-Oh notation. I think about those as saying something about how fast an algorithm is. That’s a common interpretation and it generally holds up pretty well. But technically, these bounds hold existentially, stating that there exists some arbitrarily large $n_0$, and some arbitrarily large constant factor. That’s what we actually mean by “fast” when we use big-Oh analysis (which is all the time). There is no claim being made about any finite instance. None whatsoever: just make $n_0$ larger or make the hidden constant larger. Yet your computer is finite. Asymptotic big-Oh runtime bounds – as stated, technically – have no bearing on reality. Calling an algorithm “fast” on this basis … well, as it turns out that usually works out quite well. But it didn’t have to. That’s really quite interesting!

This can actually be a pretty useful trick when you want to prove bounds. For the asymptotic analysis, you can just ignore all instances up to a constant size, since you can do them in constant time: $f(k)$ a constant for constant $k$. Does your elegant algorithm have difficulty handling a couple of annoying small exceptions? Doesn’t matter, there exists an algorithm that’s just your elegant one except it has all the special cases hardcoded. For the same reason, you seldom have to care about the base case of a divide and conquer algorithm – as long as you’re not actually going to implement it. Just make sure you divide it down into things of size bounded by a constant, and you’re good.

OK, so big-Oh asymptotic runtime bounds are … meaningless? Well, no: they’re just not about your actual computer. Not technically.

I could claim that anything that I can ever run on my actual machine has runtime $O(1)$, because – and I’ll ignore streaming algorithms here – it can only handle instances of a constant size. That’s not really a useful claim though. Yeah, it’s all $O(1)$. Congratulations, you broke all of the good ideas. Because here’s the thing: we have lots of good algorithms! Saying “ha, everything is $O(1)$” is not useful. It doesn’t get you anywhere; I don’t recommend it.

You should definitely keep using asymptotic analysis and big-Oh notation. We have have lots of great theory! But now your theory is about these infinite things and that’s what I wanted to remind you of, because it’s probably not something that you think about a lot: how far away these theorems are from saying anything about what anybody does on a real machine. Technically. But it works. That’s amazing!


This may be apocryphal, but the story goes that Dijkstra had a hard time getting his shortest paths algorithm published. In this paper he gives the algorithm that you know about – the one that efficiently calculates the shortest path in a graph with nonnegative edge weights – it’s called: A Note on Two Problems In Connexion with Graphs. (The first one is minimum spanning tree, though he doesn’t use the word; and he adds: “a tree is a graph with one and only one path between every two nodes.”) This is a super important early paper, but as I said: apparently it was hard to get published. What argument could the reviewers possibly have had for rejecting it? Well, quite simply the argument was: “so what?!” The shortest path problem is clearly computable. Here, I’ll prove it for you.

Proof. The exists a shortest path that does not contain loops: those would just make the path longer. (Here, we must assume that the graph does not contain negative cycles, but so does Dijkstra.) So there exists a shortest path that is simple. The set of simple paths in a finite graph is of course finite, so you can just try them all and see which one is the shortest. Q.E.D.

Who needs a complicated way to do it? And remember, this is 1959 when it gets published. This is numerische Mathematik issue one. This is before P versus NP, by quite a bit. (Actually, this is before the concept of polynomial time as efficient). Do you know what runtime bound he gives in the paper? This is too much fun:

“The solution given above is to be prefered to the solution by L. R. Ford as described by C. Berge, for, irrespective of the number of branches” – he means edges – “we need not store the data for all branches simultaneously but only those for the branches in sets I and II, and this number is always less than $n$.”

What he’s saying here is that the priority queue, as we would implement it today, only ever contains $n$ elements. “Furthermore, the amount of work to be done seems to be considerably less.” We’ve come a long way since then.


  1. Variously attributed to Albert Einstein, Jan van de Snepscheut, Yogi Berra, and others. ↩︎