Stuart Thomson

Stuart Thomson

Don’t use this regex. Solve it in code instead.

One day a friend pinged me in Discord and pasted in a regular expression.

ts
/^(?:(?:(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]))|(?:(?=.*[a-z])(?=.*[A-Z])(?=.*[*.!@$%^&(){}[]:;<>,.?/~_+-=|\]))|(?:(?=.*[0-9])(?=.*[A-Z])(?=.*[*.!@$%^&(){}[]:;<>,.?/~_+-=|\]))|(?:(?=.*[0-9])(?=.*[a-z])(?=.*[*.!@$%^&(){}[]:;<>,.?/~_+-=|\]))).{8,32}$/

The regex is built for using in a “new password” field, in which the user’s new password must meet a set of criteria before being accepted by the system.

I later searched for it and found it on a tutorial site. I’m just glad my friend is sensible enough to not use this regex.

What you should be doing instead

Solve this problem in code. First, check the length of the string. If it’s not the right length, then you don’t need to run the rest of the validation. For each of the character requirements, test it against the string and if it passes increment a number. At the end if this number is greater than the threshold, in this case 3, then the password is acceptable.

The method described above can be modified to return the reasons why the password is not valid. That information could be used to display to the user which criteria haven’t been met, providing more immediate feedback and how they can fix the issue. Better user experience, and more maintainable code.

Regex Explanation

The set of criteria for a string to match the regex is:

Each of these on their own is a simple regex:

So if each requirement in the password can be expressed so easily on their own, why is the password regex so long? The reason comes from accepting strings that only have three of the four constraints. In order to support this, the regex needs to specify each of the possible combinations of 3 of the criteria. Since regexes use a regular grammar (that’s the “regular” in “regular expressions”) it can’t use variables or similar constructs, so each of the groups must be duplicated each time they’re used.

(?:
(?:(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]))| // number, lower case, upper case
(?:(?=.*[a-z])(?=.*[A-Z])(?=.*[*.!@$%^&(){}[]:;<>,.?/~_+-=|\]))| // lower case, upper case, symbol
(?:(?=.*[0-9])(?=.*[A-Z])(?=.*[*.!@$%^&(){}[]:;<>,.?/~_+-=|\]))| // number, upper case, symbol
(?:(?=.*[0-9])(?=.*[a-z])(?=.*[*.!@$%^&(){}[]:;<>,.?/~_+-=|\])) // number, lower case, symbol
)
.{8, 32} // length between 8 and 32 characters


Oh, and did you notice a bug in the regex? There are actually two, both of them in the symbol matching:

  1. The character classes being used for symbol matching are terminated too early.

  2. There is a range in the character class that is including numbers as symbols.

The first issue results in a situation where the symbol requirement must be either:

This issue can be solved by escaping the correct closing square bracket.

diff
- [*.!@$%^&(){}[]:;<>,.?/~_+-=|\]
+ [*.!@$%^&(){}[\]:;<>,.?/~_+-=|]

The second issue comes from the fact that a hypen (-) has special meaning in a character class: it specifies a range of characters, starting from the code point of the character before the hyphen to the code point of the character after it (inclusive). In the case of this regex, there’s a range from + (hex 2B) to = (hex 3D). This range includes the code points for the digits (hex 30 to 39). This means that a password with at least two digits in it counts as having a number and a symbol, even if none of the characters you’d usually think of as symbols are present.

This can be solved in two ways: shifting the hyphen to the end of the character class, or escaping the hyphen with a backslash.

diff
- [*.!@$%^&(){}[\]:;<>,.?/~_+-=|]
+ [*.!@$%^&(){}[\]:;<>,.?/~_+=|-]
or
+ [*.!@$%^&(){}[\]:;<>,.?/~_+\-=|]

Here’s the final fixed version of the password regex:

tsx
/^(?:(?:(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]))|(?:(?=.*[a-z])(?=.*[A-Z])(?=.*[*.!@$%^&(){}[\]:;<>,.?/~_+\-=|]))|(?:(?=.*[0-9])(?=.*[A-Z])(?=.*[*.!@$%^&(){}[\]:;<>,.?/~_+\-=|]))|(?:(?=.*[0-9])(?=.*[a-z])(?=.*[*.!@$%^&(){}[\]:;<>,.?/~_+\-=|]))).{8,32}$/

Don’t use it.