// Extended regular expression matching and search. // Copyright (C) 1985 Richard M. Stallman ** Regular expressions. GNU Emacs has regular expression facilities like those of most Unix editors, but more powerful: *** -- + -- + specifies repetition of the preceding expression 1 or more times. It is in other respect like *, which specifies repetition 0 or more times. *** -- ? -- ? is like * but matches at most one repetition of the preceding expression. *** -- \| -- \| specifies an alternative. Two regular expressions A and B with \| in between form an expression that matches anything that either A or B will match. Thus, "foo\|bar" matches either "foo" or "bar" but no other string. \| applies to the larges possible surrounding expressions. Only a surrounding \( ... \) grouping can limit the grouping power of \|. Full backtracking capability exists when multiple \|'s are used. *** -- \( ... \) -- \( ... \) are a grouping construct that serves three purposes: 1. To enclose a set of \| alternatives for other operations. Thus, "\(foo\|bar\)x" matches either "foox" or "barx". 2. To enclose a complicated expression for * to operate on. Thus, "ba\(na\)*" matches "bananana", etc., with any number of na's (zero or more). 3. To mark a matched substring for future reference. Application 3 is not a consequence of the idea of a parenthetical grouping; it is a separate feature which happens to be assigned as a second meaning to the same \( ... \) construct because there is no conflict in practice between the two meanings. Here is an explanation of this feature. -- \digit -- After the end of a \( ... \) construct, the matcher remembers the beginning and end of the text matched by that construct. Then, later on in the regular expression, you can use \ followed by a digit to mean, ``match the same text matched this time by the \( ... \) construct.'' The first nine \( ... \) constructs that appear in a regular expression are assigned numbers 1 through 9 in order of their beginnings. \1 through \9 can be used to refer to the text matched by the corresponding \( ... \) construct. For example, "\(.*\)\1" matches any string that is composed of two identical halves. The "\(.*\)" matches the first half, which can be anything, but the \1 that follows must match the same exact text. *** -- \` -- Matches the empty string, but only if it is at the beginning of the buffer. *** -- \' -- Matches the empty string, but only if it is at the end of the buffer. *** -- \b -- Matches the empty string, but only if it is at the beginning or end of a word. Thus, "\bfoo\b" matches any occurrence of "foo" as a separate word. "\bball\(s\|\)\b" matches "ball" or "balls" as a separate word. *** -- \B -- Matches the empty string, provided it is NOT at the beginning or end of a word. *** -- \< -- Matches the empty string, provided it is at the beginning of a word. *** -- \> -- Matches the empty string, provided it is at the end of a word. *** -- \w -- Matches any word-constituent character. The editor syntax table determines which characters these are. *** -- \W -- Matches any character that is not a word-constituent. *** -- \s -- Matches any character whose syntax is . is a letter that represents a syntax code: thus, "w" for word constituent, "-" for whitespace, "(" for open-parenthesis, etc. Thus, "\s(" matches any character with open-parenthesis syntax. *** -- \S -- Matches any character whose syntax is not .