Comment by codetrotter
Comment by codetrotter 4 days ago
I use regexes a lot. The main thing that always trips me up is dealing with escaping, because different tools I use – vim, sed, rg, and so on – sometimes have different meanings for when to escape or not.
In one tool you’ll use + to match one or more times, and \+ to mean literal plus sign.
In another tool you’ll use \+ to match one or more time, and + to mean literal plus sign.
In one tool you’ll use ( and ) to create a match group, and \( and \) to mean literal open and close parentheses.
In another tool you’ll use \( and \) to create a match group, and ( and ) to mean literal open and close parentheses.
This is basically the only problem I have when writing regexes, for the kinds of regexes I write.
Also, one thing that’s not a problem per se but something that leads me to write my regexes with more characters than strictly necessary is that I rarely use shorthand for groups of characters. For example the tool might have a shorthand for digit but I always write [0-9] when I need to match a digit. Also probably because the shorthand might or might not be different for different tools.
Regexes are also known to be “write once read never”, in that writing a regex is relatively easy, but revisiting a semi-complicated regex you or someone else wrote in the past takes a little bit of extra effort to figure out what it’s matching and what edits one should make to it. In this case, tools like https://regex101.com/ or https://www.debuggex.com/ help a lot.
The problem with escaping (like with using quotes) is often that you need to know through how many parsers the string goes. The shell or editor, the language you are programming in and the regexp engine each time can strip off an escape character or a set of outer quotes. That and of course different dialects of regexp makes things complicated.