keropwhich.blogg.se - Regular expression not greedy

The original regex libraries all chose the first matching substring (in a left-to-right scan), and of those matches which all start at the same position, they chose the longest one. But it introduces an ambiguity which was not present in the original: if the regex can match more than one different substring, which one should it choose? Using regexes to find substrings was a (very useful) adaptation of a theoretical concept to practical programming problems. In that sense, the regular expression matches (or recognises) the entirety of a sentence. It's worth pointing out here that regular expressions define a "language", not a substring. Rather, it is non-deterministic: it will match as many times as necessary. In that context, the Kleene star operator is neither greedy nor abstemious. Ī regular expression, as the term is used here, is a mathematical construct it is the formalism used to define regular languages. *, not if you quantify something more specific than. And greediness is also often only an issue with.

So the appropriate default really depends on what (if anything) is after the quantifier. By making this non-greedy, the last capture group matched an empty string rather than everything after the final Answer: prefix.Īnd also consider the \* there - we want that to capture all the whitespace after the prefix, before grabbing the text in the capture group. My original answer to the above question had a bug, the regexp ended with Answer:\s*(.*?). If there's nothing after the quantifier, you usually want it to be greedy. Non-greedy quantifiers are not a magic bullet solution for all such problems, but it seems like there would be fewer problems if they were the default. And I think non-greedy quantifiers like this didn't even exist until PCRE - matching this was even more complicated then. But if there are multiple BEGIN.END pairs, it will span all of them. They intuitively expect the capture group to grab everything between adjacent BEGIN and END brackets. If someone writes a regexp like BEGIN(.*)END It seems like this is less intuitive than non-greedy regular expressions.

I wonder why regular expression quantifiers were defined to be greedy. Most recently, Python regex issue - * vs \d*. I can't count the number of times I've seen questions on StackOverflow related to regular expressions where the problem was due to the programmer not understanding the impact of greedy quantifiers.