Lazy Repetition Matching
The * and + specifiers can be used to define regex patterns with repeating subpatterns within them as discussed here. However, caution is required when using these specifiers in order to avoid unintentional overmatching. Consider the pattern 2 .* 2\:00 applied to the phrase "From 2 to 2:00 to 2 to 2:00" with the intention of matching 2 to 2:00. The match that will in fact be returned is 2 to 2:00 to 2 to 2:00. The .* is an instruction to match any character that does not force a line break 0 or more times with the constraint that this pattern should occur within the character sequence "2...2:00". The problem is that as things stand .* is allowed to match all characters between the first occurrence of 2 and the last occurrence of 2:00. Another way to look at this - we allow the regex engine to continue matching until it can match no more. But what if we want to match the first occurrence of 2:00? Or in other words, what if we want to stop matching as soon as a match has been found?
We do this by instructing the * operator to be lazy by appending a ? immediately after it. The modified pattern is 2 .*? 2\:00 which correctly returns the match 2 to 2:00
A final note:remember that * and + are the shorthand equivalents for {0,} and {1,} respectively. Consequently, the ? instruction can quite legitimately be used with the long versions as well.
Do not wrap the model expression in a /.../ pair. The characters
^$.?*!+:=()[]{}|\\ must be escaped - except when then occur inside a character class. Invalid characters will be grayed out.| Result | Left Text | Match | Right Text |
|---|---|---|---|
Download