How can you check if an integer is prime? Here's a naive function that does the job:

def is_prime(x):
    factor = 2
    while factor ** 2 <= x:
        if x % factor == 0:
            return False
        else:
            factor += 1
    return True

This algorithm is slow when testing the primality of very large numbers, but at least it's easy to comprehend. With minor tweaking, it can easily be optimized to have a faster runtime by reducing how many numbers between 2 and sqrt(x) are tested as potential factors of x.

But who needs easy comprehension? Turns out, a regex can do the job just as (un)efficiently:

import re

def is_prime(x):
    pattern = r"^1?$|^(11+?)\1+$"
    return re.match(pattern, "1" * x) is None

What's this black magic? Let's break it down. There's three main parts to making this regex work:

Part 1: Input format

First, the integer being tested is converted from an integer to a string of 1s of that length ("1" * x). The goal here is to enable regular expressions to be applied the way they were intended -- by counting how often certain expressions show up in a string -- rather than by using any true mathematical operations to test for primality.

As an example, an input of 7 gets converted to 1111111.

Part 2: Base case

The full regex pattern (^1?$|^(11+?)\1+$) contains a pipe character (|) -- a logical OR -- that separates the pattern into two possible expressions to match. I tend to think of the left half (^1?$) as the base case. In human lingo, this expression finds a match that starts at the beginning of the string (^), has zero or one 1s (1?), and ends at the end of the string ($). This means that it only finds a match when x is 0 or 1.

Part 3: Backtracking

The second half of the regex (^(11+?)\1+$) is where the magic happens. The ^ and $ tokens once again force any match to cover the entire string from beginning to end. The capture group ((11+?)) non-greedily finds the smallest sequence of 2 or more 1s and assigns it to \1. Then, the \1+ looks for this captured group to appear at least 1 more time.

For example, for the string with seven 1s (1111111), the first 11 is matched by the capture group and the next 1111 are matched by the \1+, but because the expression can't match the last 1 and the $ requires the regex engine to match to the end of the string, this string isn't a match for the pattern.

Said another way, the regex engine couldn't break the string 1111111 into even groups of 11 without any remainders. That's right, the capture group of 11 was the regex version of choosing a factor of 2 and testing if the input was divisible by that factor! Because the regex engine couldn't find a match, it effectively determined that the input was not divisible by 2.

So what about other factors?

This is where backtracking comes into play. When the currently chosen capture group doesn't result in a match, the regex engine then minimally increases the capture group to the next matching value and repeats the "divisibility" check with this new capture group. That means, for any inputs indivisible by 2 (capture group 11), the same process is repeated for 3 (capture group 111). And if that doesn't work, it tries again with 4 (capture group 1111) and 5 (capture group 11111) and so on until all possible factors have been exhausted. Only then, after an input fails to return any matches for any capture groups between 2 and itself, can we be confident that the input is prime.

And that, ladies, gentlemen, and variations thereof, is how a regex can be a primality oracle.