Backreferences in Regular Expressions

As the name implies, a regex backreference refers to a substring previously encountered in the target text. Backreferences are denoted by the code \n where n takes values in the range 1..9. Needless to say, one has to instruct the regex engine to keep track of substrings that are to be backreferenced. This is done by placing the substring in parentheses. For instance the pattern Java(script) is a \1ing language. uses the backreference \1 to refer to the substring script. This pattern would validate the phrase "Javascript is ascripting language but not the phrase "Javascript is a programming language".

You will have noted that the instruction to index a backreference is exactly the same as the one used for a subexpression - characters wrapped in parentheses. The regex engine has to do extra work to perform backreference indexing which can measurably slow down its execution. We can avoid this by explicitly instructing the engine not to do backreference indexing by using the format (?:subexpr)?. For instance the pattern (?:(Spider)|(?:Super)?)?man spins a \1web would match "Spiderman spins a Spiderweb" and "Superman spins a web" but not Superman spins a Superweb. There are several points worthy of note here

  • We use a non-backreferenced subexpression: (?:(Spider)|(?:Super)?)?
  • This subexpression describes an alternation which in turn is composed of two subexpressions
    • (Spider), the first subexpression, is backreferenced
    • (?:Super)?, the second subexpression, is not backreferenced
  • "man spins a web"is matched. Why? Because in the absence of the word "Spider" the backreference \1 points to a null string.
  • Likewise, "Superman spins a web"is matched since the word "Super" was, under our instructions, never indexed for backreferencing.
  • "man spins a Spiderweb" is not matched. No surprises here.
  • If we supply the text "Spiderman spins a web"IE and Firefox, incorrectly, match the substring "man spins a web". Safari for Windows, correctly, fails to find a match.

It is possible to build extremely complex search patterns by using nested subexpressions, alternate patters and backreferences to define other backreferences. For instance (\1web) would have indexed the backreference "Spiderweb". However, the failure of IE and Firefox outlined above should bring home to the reader the importance of thoroughly testing such patterns in all target browsers.

Regular Expression Sandbox
Model
Data

Do not wrap the model expression in a /.../ pair. The characters ^$.?*!+:=()[]{}|\\ must be escaped - except when then occur inside a character class. Invalid characters will be grayed out.
Result Left Text Match Right Text
       

Download
Jump To...

Colophon