//   Extended regular expression matching and search.
//   Copyright (C) 1985 Richard M. Stallman

** Regular expressions.

GNU Emacs has regular expression facilities like those of most
Unix editors, but more powerful:

***             -- + --

+ specifies repetition of the preceding expression 1 or more
times.  It is in other respect like *, which specifies repetition
0 or more times.

***             -- ? --

?  is like * but matches at most one repetition of the preceding
expression.

***             -- \| --

\| specifies an alternative.  Two regular expressions A and B with \| in
between form an expression that matches anything that either A or B will
match.  Thus, "foo\|bar" matches either "foo" or "bar" but no other
string.

\| applies to the larges possible surrounding expressions.  Only a
surrounding \( ... \) grouping can limit the grouping power of \|.

Full backtracking capability exists when multiple \|'s are used.

***             -- \( ... \) --

\( ... \) are a grouping construct that serves three purposes:

1.  To enclose a set of \| alternatives for other operations.
    Thus, "\(foo\|bar\)x" matches either "foox" or "barx".
2.  To enclose a complicated expression for * to operate on.
    Thus, "ba\(na\)*" matches "bananana", etc., with any number
    of na's (zero or more).
3.  To mark a matched substring for future reference.

Application 3 is not a consequence of the idea of a parenthetical
grouping; it is a separate feature which happens to be assigned as a
second meaning to the same \( ... \) construct because there is no
conflict in practice between the two meanings.  Here is an explanation
of this feature.

		-- \digit --

After the end of a \( ... \) construct, the matcher remembers the
beginning and end of the text matched by that construct.  Then, later on
in the regular expression, you can use \ followed by a digit to mean,
``match the same text matched this time by the \( ... \) construct.''
The first nine \( ... \) constructs that appear in a regular expression
are assigned numbers 1 through 9 in order of their beginnings.  \1
through \9 can be used to refer to the text matched by the corresponding
\( ... \) construct.

For example, "\(.*\)\1" matches any string that is composed of two
identical halves.  The "\(.*\)" matches the first half, which can be
anything, but the \1 that follows must match the same exact text.

***             -- \` --

Matches the empty string, but only if it is at the beginning of the buffer.

***             -- \' --

Matches the empty string, but only if it is at the end of the buffer.

***             -- \b --

Matches the empty string, but only if it is at the beginning or end of
a word.  Thus, "\bfoo\b" matches any occurrence of "foo" as a separate word.
"\bball\(s\|\)\b" matches "ball" or "balls" as a separate word.

***             -- \B --

Matches the empty string, provided it is NOT at the beginning or end of
a word.

***             -- \< --

Matches the empty string, provided it is at the beginning of a word.

***             -- \> --

Matches the empty string, provided it is at the end of a word.

***             -- \w --

Matches any word-constituent character.  The editor syntax table determines
which characters these are.

***             -- \W --

Matches any character that is not a word-constituent.

***             -- \s<code> --

Matches any character whose syntax is <code>.  <code> is a letter that
represents a syntax code: thus, "w" for word constituent, "-" for
whitespace, "(" for open-parenthesis, etc.  Thus, "\s(" matches any
character with open-parenthesis syntax.

***             -- \S<code> --

Matches any character whose syntax is not <code>.