Earlier this week, a friend and I had lunch at a restaurant, mere minutes after they'd opened. As their first customers of the day, my order number was #1, and my friend's was #2. My friend, upset at coming in second, then made the entertaining argument that neither of us had come in first, because the restaurant probably used 0-indexing for their orders. Spoiler alert: they didn't.

The conversation brought up the age-old conflict: which is better – 0-indexing or 1-indexing?

Some languages, like Julia and Matlab, are popular in scientific communities, so they probably use 1-indexing to match matrix notation. Other languages, like C and Python, use 0-indexing for the convenience or elegance. I found those backstories particularly fascinating, so today, I'm sharing them.

C

C uses 0-indexing, and for a language that has pointers, this design choice makes sense. The index is meant to be an offset, i.e. array[i] accesses an element that is i elements away from the start of array. Framed this way, array[0] accesses the element that is 0 elements away from the start of array, which is logically sound.

But C wasn't the first language to introduce this concept. C took its inspiration from BCPL, a high-level* language created in 1967 that also allowed programmers access to low-level machine code. BCPL had only one data type – the "word" – and it had a fixed length, usually 16 bits for machines from the 1960s. Since BCPL code could reference memory addresses, programmers would find the i-th word in an array via the calculation start_addr + (i * word_size). And while the math for memory addresses is slightly different from the pointer math used in C, the point remains that 0-indexing was more convenient for both languages because they allowed access to low-level functions.

This was very different from older languages, like Fortran, which used 1-indexing or variable indexing. And after the 1960s, 0-indexing only grew in popularity.

Python

I'd assumed Python's choice to 0-index was inspired by C, but it turns out that creator Guido van Rossum found the pointer argument irrelevant. Instead, he wanted 0-indexing because it enabled elegant list splicing. According to him, a developer should be able to split their array cleanly into two by using the notation array[:k] to access the first k elements of the list and array[k:] to access the remaining elements. He's not wrong. The notation is definitively cleaner without the offset required by 1-indexing, which would use array[:k+1] and array[k+1:] to perform the same splice.

Dijkstra made a similar argument in the 1980s. To represent ranges of natural numbers, Dijkstra argued that the lower bound should be inclusive to avoid awkward references to non-natural numbers – i.e. 0 <= x < 10 is better than -1 < x < 10 – and that the upper bound should be exclusive to avoid awkward representations of empty ranges – i.e. 1 <= x < 1 is better than 1 <= x <= 0. Of course, his argument rests on the assumption that 0 is a natural-ish number, which not all mathematicians agree on, but who's nitpicking here?

Conclusion

The debate over which style of indexing is best will probably never be resolved, but it's interesting to learn about how it all started. Plus, I'm reminded of an old joke that lists the two hard things in computer science: 1) cache invalidation, 2) naming things, and 3) off-by-one errors. Would it still be considered an off-by-one error if the list had been 0-indexed?

*The definition of "high-level" has changed over the years, so both C and BCPL are usually considered low-level languages today.