I recently needed to mask sensitive customer data to improve a company's security posture. We'd settled on revealing just the last 4 digits of important account numbers so that the customer could easily identify the different entities that they owned, and I quickly put together this code snippet to do the trick:

def mask(string)
  string.length > 4 ? "*"*(string.length-4) + string[-4..] : string
end

This snippet works for strings of any length, but I thought its behaviour was awkward for strings with 4 or fewer characters. In these cases, it returns the string as is, and that didn't seem to fit the definition of "masked" for sensitive data at all. I didn't fret much over it since the company didn't have any strings that were short and sensitive, but I was still curious. How did others handle this?

I didn't end up finding the answer to my question, but I did learn how to perform the same mask using Ruby's gsub method (global substitution using regex):

def mask(string)
  string.gsub(/.(?=.{4})/, "*")
end

I tend to think of regex solutions as generally slower and less readable, so I don't usually come up with them. But this time, I felt like exercising my new Ruby chops to confirm that this regex implementation was in fact slower than my initial one. I wrote a rudimentary benchmarking script to run each mask method 10,000 times on various string lengths and got these results:

String length Initial mask Regex mask
3 characters 0.0289s 0.0915s
10 characters 0.0850s 0.3249s
100 characters 0.1064s 2.1240s

No surprise here; the regex solution is slower. But besides that, I also noticed that the initial mask implementation grew slower as the string length increased. This is probably because the string.length method actually counts the characters in the string instead of retrieving a pre-computed value for length. I double-checked with my new benchmarking script and confirmed that this was the case.

Given that I call string.length twice in my one-liner implementation, I could thus halve the overall computation time by storing the length to a variable like this:

def mask(string)
  length = string.length
  length > 4 ? "*"*(length-4) + string[-4..] : string
end

In the end, I didn't use this because the initial implementation was fast enough for my use cases, and the branch had already been merged. Still, it was nice to explore. It's been a while since I've explored ideas for the sake of exploring rather than solving. I'll have to do this more often.