Sun Release 4.1 Last change: 2 October 1989 ED(1) Regular Expressions ed supports a limited form of regular-expression notation, which can be used in a line address to specify lines by con- tent. A regular expression (RE) specifies a set of character strings to match against - such as "any string containing digits 5 through 9" or "only lines containing uppercase letters." A member of this set of strings is said to be matched by the regular expression. Regular expressions or patterns are used to address lines in the buffer (see Addresses , above), and also for selecting strings to be replaced using the s (substitute) command. Where multiple matches are present in a line, a regular expression matches the longest of the leftmost matching strings. Regular expressions can be built up from the following "single-character" RE's: c Any ordinary character not listed below. An ordinary character matches itself. \ Backslash. When followed by a special character, the RE matches the "quoted" character. A backslash fol- lowed by one of <, >, (, ), {, or }, represents an operator in a regular expression, as described below. . Dot. Matches any single character except NEWLINE. ^ As the leftmost character, a caret (or circumflex) con- strains the RE to match the leftmost portion of a line. A match of this type is called an "anchored match" because it is "anchored" to a specific place in the line. The ^ character loses its special meaning if it appears in any position other than the start of the RE. $ As the rightmost character, a dollar sign constrains the RE to match the rightmost portion of a line. The $ character loses its special meaning if it appears in any position other than at the end of the RE. ^RE$ The construction ^RE$ constrains the RE to match the entire line. \< The sequence \< in an RE constrains the one-character RE immediately following it only to match something at the beginning of a "word"; that is, either at the beginning of a line, or just before a letter, digit, or underline and after a character not one of these. \> The sequence \> in an RE constrains the one-character RE immediately following it only to match something at the end of a "word." [c...] A nonempty string of characters, enclosed in square brackets matches any single character in the string. For example, [abcxyz] matches any single character from the set `abcxyz'. When the first character of the string is a caret (^), then the RE matches any charac- ter except NEWLINE and those in the remainder of the string. For example, `[^45678]' matches any character except `45678'. A caret in any other position is interpreted as an ordinary character. []c...] The right square bracket does not terminate the enclosed string if it is the first character (after an initial `^', if any), in the bracketed string. In this position it is treated as an ordinary character. [l-r] The minus sign, between two characters, indicates a range of consecutive ASCII characters to match. For example, the range `[0-9]' is equivalent to the string `[0123456789]'. Such a bracketed string of characters is known as a character class. The `-' is treated as an ordinary character if it occurs first (or first after an initial ^) or last in the string. d Delimiter character. The character used to delimit an RE within a command is special for that command (for example, see how / is used in the g command, below). The following rules and special characters allow for con- structing RE's from single-character RE's: A concatenation of RE's matches a concatenation of text strings, each of which is a match for a successive RE in the search pattern. * A single-character RE, followed by an asterisk (*) matches zero or more occurrences of the single- character RE. Such a pattern is called a closure. For example, [a-z][a-z]* matches any string of one or more lower case letters. \{m\} \{m,\} \{m,n\} A one-character RE followed by \{m\}, \{m,\}, or \{m,n\} is an RE that matches a range of occurrences of the one-character RE. The values of m and n must be nonnegative integers less than 256; \{m\} matches exactly m occurrences; \{m,\} matches at least m occurrences; \{m,n\} matches any number of occurrences between m and n, inclusively. Whenever a choice exists, the RE matches as many occurrences as possible. \(...\) An RE enclosed between the character sequences \( and \) matches whatever the unadorned RE matches, but saves the string matched by the enclosed RE in a numbered substring register. There can be up to nine such sub- strings in an RE, and parenthesis operators can be nested. \n Match the contents of the nth substring register from the current RE. This provides a mechanism for extract- ing matched substrings. For example, the expression ^\(..*\)\1$ matches a line consisting entirely of two adjacent non-null appearances of the same string. When nested parenthesized substrings are present, n is determined by counting occurrences of \( starting from the left. // The null RE (//) is equivalent to the last RE encoun- tered.