diff options
Diffstat (limited to 'winsup/cygwin/regex/regex.7')
-rw-r--r-- | winsup/cygwin/regex/regex.7 | 480 |
1 files changed, 0 insertions, 480 deletions
diff --git a/winsup/cygwin/regex/regex.7 b/winsup/cygwin/regex/regex.7 deleted file mode 100644 index 79fecc197..000000000 --- a/winsup/cygwin/regex/regex.7 +++ /dev/null @@ -1,480 +0,0 @@ -.\" Copyright (c) 1992, 1993, 1994 Henry Spencer. -.\" Copyright (c) 1992, 1993, 1994 -.\" The Regents of the University of California. All rights reserved. -.\" -.\" This code is derived from software contributed to Berkeley by -.\" Henry Spencer. -.\" -.\" Redistribution and use in source and binary forms, with or without -.\" modification, are permitted provided that the following conditions -.\" are met: -.\" 1. Redistributions of source code must retain the above copyright -.\" notice, this list of conditions and the following disclaimer. -.\" 2. Redistributions in binary form must reproduce the above copyright -.\" notice, this list of conditions and the following disclaimer in the -.\" documentation and/or other materials provided with the distribution. -.\" 4. Neither the name of the University nor the names of its contributors -.\" may be used to endorse or promote products derived from this software -.\" without specific prior written permission. -.\" -.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND -.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE -.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL -.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS -.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) -.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT -.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY -.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF -.\" SUCH DAMAGE. -.\" -.\" @(#)re_format.7 8.3 (Berkeley) 3/20/94 -.\" $FreeBSD: src/lib/libc/regex/re_format.7,v 1.12 2008/09/05 17:41:20 keramida Exp $ -.\" -.Dd March 20, 1994 -.Dt RE_FORMAT 7 -.Os -.Sh NAME -.Nm re_format -.Nd POSIX 1003.2 regular expressions -.Sh DESCRIPTION -Regular expressions -.Pq Dq RE Ns s , -as defined in -.St -p1003.2 , -come in two forms: -modern REs (roughly those of -.Xr egrep 1 ; -1003.2 calls these -.Dq extended -REs) -and obsolete REs (roughly those of -.Xr ed 1 ; -1003.2 -.Dq basic -REs). -Obsolete REs mostly exist for backward compatibility in some old programs; -they will be discussed at the end. -.St -p1003.2 -leaves some aspects of RE syntax and semantics open; -`\(dd' marks decisions on these aspects that -may not be fully portable to other -.St -p1003.2 -implementations. -.Pp -A (modern) RE is one\(dd or more non-empty\(dd -.Em branches , -separated by -.Ql \&| . -It matches anything that matches one of the branches. -.Pp -A branch is one\(dd or more -.Em pieces , -concatenated. -It matches a match for the first, followed by a match for the second, etc. -.Pp -A piece is an -.Em atom -possibly followed -by a single\(dd -.Ql \&* , -.Ql \&+ , -.Ql \&? , -or -.Em bound . -An atom followed by -.Ql \&* -matches a sequence of 0 or more matches of the atom. -An atom followed by -.Ql \&+ -matches a sequence of 1 or more matches of the atom. -An atom followed by -.Ql ?\& -matches a sequence of 0 or 1 matches of the atom. -.Pp -A -.Em bound -is -.Ql \&{ -followed by an unsigned decimal integer, -possibly followed by -.Ql \&, -possibly followed by another unsigned decimal integer, -always followed by -.Ql \&} . -The integers must lie between 0 and -.Dv RE_DUP_MAX -(255\(dd) inclusive, -and if there are two of them, the first may not exceed the second. -An atom followed by a bound containing one integer -.Em i -and no comma matches -a sequence of exactly -.Em i -matches of the atom. -An atom followed by a bound -containing one integer -.Em i -and a comma matches -a sequence of -.Em i -or more matches of the atom. -An atom followed by a bound -containing two integers -.Em i -and -.Em j -matches -a sequence of -.Em i -through -.Em j -(inclusive) matches of the atom. -.Pp -An atom is a regular expression enclosed in -.Ql () -(matching a match for the -regular expression), -an empty set of -.Ql () -(matching the null string)\(dd, -a -.Em bracket expression -(see below), -.Ql .\& -(matching any single character), -.Ql \&^ -(matching the null string at the beginning of a line), -.Ql \&$ -(matching the null string at the end of a line), a -.Ql \e -followed by one of the characters -.Ql ^.[$()|*+?{\e -(matching that character taken as an ordinary character), -a -.Ql \e -followed by any other character\(dd -(matching that character taken as an ordinary character, -as if the -.Ql \e -had not been present\(dd), -or a single character with no other significance (matching that character). -A -.Ql \&{ -followed by a character other than a digit is an ordinary -character, not the beginning of a bound\(dd. -It is illegal to end an RE with -.Ql \e . -.Pp -A -.Em bracket expression -is a list of characters enclosed in -.Ql [] . -It normally matches any single character from the list (but see below). -If the list begins with -.Ql \&^ , -it matches any single character -(but see below) -.Em not -from the rest of the list. -If two characters in the list are separated by -.Ql \&- , -this is shorthand -for the full -.Em range -of characters between those two (inclusive) in the -collating sequence, -.No e.g. Ql [0-9] -in ASCII matches any decimal digit. -It is illegal\(dd for two ranges to share an -endpoint, -.No e.g. Ql a-c-e . -Ranges are very collating-sequence-dependent, -and portable programs should avoid relying on them. -.Pp -To include a literal -.Ql \&] -in the list, make it the first character -(following a possible -.Ql \&^ ) . -To include a literal -.Ql \&- , -make it the first or last character, -or the second endpoint of a range. -To use a literal -.Ql \&- -as the first endpoint of a range, -enclose it in -.Ql [.\& -and -.Ql .]\& -to make it a collating element (see below). -With the exception of these and some combinations using -.Ql \&[ -(see next paragraphs), all other special characters, including -.Ql \e , -lose their special significance within a bracket expression. -.Pp -Within a bracket expression, a collating element (a character, -a multi-character sequence that collates as if it were a single character, -or a collating-sequence name for either) -enclosed in -.Ql [.\& -and -.Ql .]\& -stands for the -sequence of characters of that collating element. -The sequence is a single element of the bracket expression's list. -A bracket expression containing a multi-character collating element -can thus match more than one character, -e.g.\& if the collating sequence includes a -.Ql ch -collating element, -then the RE -.Ql [[.ch.]]*c -matches the first five characters -of -.Ql chchcc . -.Pp -Within a bracket expression, a collating element enclosed in -.Ql [= -and -.Ql =] -is an equivalence class, standing for the sequences of characters -of all collating elements equivalent to that one, including itself. -(If there are no other equivalent collating elements, -the treatment is as if the enclosing delimiters were -.Ql [.\& -and -.Ql .] . ) -For example, if -.Ql x -and -.Ql y -are the members of an equivalence class, -then -.Ql [[=x=]] , -.Ql [[=y=]] , -and -.Ql [xy] -are all synonymous. -An equivalence class may not\(dd be an endpoint -of a range. -.Pp -Within a bracket expression, the name of a -.Em character class -enclosed in -.Ql [: -and -.Ql :] -stands for the list of all characters belonging to that -class. -Standard character class names are: -.Pp -.Bl -column "alnum" "digit" "xdigit" -offset indent -.It Em "alnum digit punct" -.It Em "alpha graph space" -.It Em "blank lower upper" -.It Em "cntrl print xdigit" -.El -.Pp -These stand for the character classes defined in -.Xr ctype 3 . -A locale may provide others. -A character class may not be used as an endpoint of a range. -.Pp -A bracketed expression like -.Ql [[:class:]] -can be used to match a single character that belongs to a character -class. -The reverse, matching any character that does not belong to a specific -class, the negation operator of bracket expressions may be used: -.Ql [^[:class:]] . -.Pp -There are two special cases\(dd of bracket expressions: -the bracket expressions -.Ql [[:<:]] -and -.Ql [[:>:]] -match the null string at the beginning and end of a word respectively. -A word is defined as a sequence of word characters -which is neither preceded nor followed by -word characters. -A word character is an -.Em alnum -character (as defined by -.Xr ctype 3 ) -or an underscore. -This is an extension, -compatible with but not specified by -.St -p1003.2 , -and should be used with -caution in software intended to be portable to other systems. -.Pp -In the event that an RE could match more than one substring of a given -string, -the RE matches the one starting earliest in the string. -If the RE could match more than one substring starting at that point, -it matches the longest. -Subexpressions also match the longest possible substrings, subject to -the constraint that the whole match be as long as possible, -with subexpressions starting earlier in the RE taking priority over -ones starting later. -Note that higher-level subexpressions thus take priority over -their lower-level component subexpressions. -.Pp -Match lengths are measured in characters, not collating elements. -A null string is considered longer than no match at all. -For example, -.Ql bb* -matches the three middle characters of -.Ql abbbc , -.Ql (wee|week)(knights|nights) -matches all ten characters of -.Ql weeknights , -when -.Ql (.*).*\& -is matched against -.Ql abc -the parenthesized subexpression -matches all three characters, and -when -.Ql (a*)* -is matched against -.Ql bc -both the whole RE and the parenthesized -subexpression match the null string. -.Pp -If case-independent matching is specified, -the effect is much as if all case distinctions had vanished from the -alphabet. -When an alphabetic that exists in multiple cases appears as an -ordinary character outside a bracket expression, it is effectively -transformed into a bracket expression containing both cases, -.No e.g. Ql x -becomes -.Ql [xX] . -When it appears inside a bracket expression, all case counterparts -of it are added to the bracket expression, so that (e.g.) -.Ql [x] -becomes -.Ql [xX] -and -.Ql [^x] -becomes -.Ql [^xX] . -.Pp -No particular limit is imposed on the length of REs\(dd. -Programs intended to be portable should not employ REs longer -than 256 bytes, -as an implementation can refuse to accept such REs and remain -POSIX-compliant. -.Pp -Obsolete -.Pq Dq basic -regular expressions differ in several respects. -.Ql \&| -is an ordinary character and there is no equivalent -for its functionality. -.Ql \&+ -and -.Ql ?\& -are ordinary characters, and their functionality -can be expressed using bounds -.No ( Ql {1,} -or -.Ql {0,1} -respectively). -Also note that -.Ql x+ -in modern REs is equivalent to -.Ql xx* . -The delimiters for bounds are -.Ql \e{ -and -.Ql \e} , -with -.Ql \&{ -and -.Ql \&} -by themselves ordinary characters. -The parentheses for nested subexpressions are -.Ql \e( -and -.Ql \e) , -with -.Ql \&( -and -.Ql \&) -by themselves ordinary characters. -.Ql \&^ -is an ordinary character except at the beginning of the -RE or\(dd the beginning of a parenthesized subexpression, -.Ql \&$ -is an ordinary character except at the end of the -RE or\(dd the end of a parenthesized subexpression, -and -.Ql \&* -is an ordinary character if it appears at the beginning of the -RE or the beginning of a parenthesized subexpression -(after a possible leading -.Ql \&^ ) . -Finally, there is one new type of atom, a -.Em back reference : -.Ql \e -followed by a non-zero decimal digit -.Em d -matches the same sequence of characters -matched by the -.Em d Ns th -parenthesized subexpression -(numbering subexpressions by the positions of their opening parentheses, -left to right), -so that (e.g.) -.Ql \e([bc]\e)\e1 -matches -.Ql bb -or -.Ql cc -but not -.Ql bc . -.Sh SEE ALSO -.Xr regex 3 -.Rs -.%T Regular Expression Notation -.%R IEEE Std -.%N 1003.2 -.%P section 2.8 -.Re -.Sh BUGS -Having two kinds of REs is a botch. -.Pp -The current -.St -p1003.2 -spec says that -.Ql \&) -is an ordinary character in -the absence of an unmatched -.Ql \&( ; -this was an unintentional result of a wording error, -and change is likely. -Avoid relying on it. -.Pp -Back references are a dreadful botch, -posing major problems for efficient implementations. -They are also somewhat vaguely defined -(does -.Ql a\e(\e(b\e)*\e2\e)*d -match -.Ql abbbd ? ) . -Avoid using them. -.Pp -.St -p1003.2 -specification of case-independent matching is vague. -The -.Dq one case implies all cases -definition given above -is current consensus among implementors as to the right interpretation. -.Pp -The syntax for word boundaries is incredibly ugly. |