Welcome to mirror list, hosted at ThFree Co, Russian Federation.

cygwin.com/git/newlib-cygwin.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'newlib/libc/sys/linux/stdlib/regex.3')
-rw-r--r--newlib/libc/sys/linux/stdlib/regex.3701
1 files changed, 701 insertions, 0 deletions
diff --git a/newlib/libc/sys/linux/stdlib/regex.3 b/newlib/libc/sys/linux/stdlib/regex.3
new file mode 100644
index 000000000..d87164177
--- /dev/null
+++ b/newlib/libc/sys/linux/stdlib/regex.3
@@ -0,0 +1,701 @@
+.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
+.\" Copyright (c) 1992, 1993, 1994
+.\" The Regents of the University of California. All rights reserved.
+.\"
+.\" This code is derived from software contributed to Berkeley by
+.\" Henry Spencer.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\" 3. All advertising materials mentioning features or use of this software
+.\" must display the following acknowledgement:
+.\" This product includes software developed by the University of
+.\" California, Berkeley and its contributors.
+.\" 4. Neither the name of the University nor the names of its contributors
+.\" may be used to endorse or promote products derived from this software
+.\" without specific prior written permission.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" @(#)regex.3 8.4 (Berkeley) 3/20/94
+.\" $FreeBSD: src/lib/libc/regex/regex.3,v 1.9 2001/10/01 16:08:58 ru Exp $
+.\"
+.Dd March 20, 1994
+.Dt REGEX 3
+.Os
+.Sh NAME
+.Nm regcomp ,
+.Nm regexec ,
+.Nm regerror ,
+.Nm regfree
+.Nd regular-expression library
+.Sh LIBRARY
+.Lb libc
+.Sh SYNOPSIS
+.In sys/types.h
+.In regex.h
+.Ft int
+.Fn regcomp "regex_t *preg" "const char *pattern" "int cflags"
+.Ft int
+.Fo regexec
+.Fa "const regex_t *preg" "const char *string"
+.Fa "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
+.Fc
+.Ft size_t
+.Fo regerror
+.Fa "int errcode" "const regex_t *preg"
+.Fa "char *errbuf" "size_t errbuf_size"
+.Fc
+.Ft void
+.Fn regfree "regex_t *preg"
+.Sh DESCRIPTION
+These routines implement
+.St -p1003.2
+regular expressions
+.Pq Do RE Dc Ns s ;
+see
+.Xr re_format 7 .
+.Fn Regcomp
+compiles an RE written as a string into an internal form,
+.Fn regexec
+matches that internal form against a string and reports results,
+.Fn regerror
+transforms error codes from either into human-readable messages,
+and
+.Fn regfree
+frees any dynamically-allocated storage used by the internal form
+of an RE.
+.Pp
+The header
+.Aq Pa regex.h
+declares two structure types,
+.Ft regex_t
+and
+.Ft regmatch_t ,
+the former for compiled internal forms and the latter for match reporting.
+It also declares the four functions,
+a type
+.Ft regoff_t ,
+and a number of constants with names starting with
+.Dq Dv REG_ .
+.Pp
+.Fn Regcomp
+compiles the regular expression contained in the
+.Fa pattern
+string,
+subject to the flags in
+.Fa cflags ,
+and places the results in the
+.Ft regex_t
+structure pointed to by
+.Fa preg .
+.Fa Cflags
+is the bitwise OR of zero or more of the following flags:
+.Bl -tag -width REG_EXTENDED
+.It Dv REG_EXTENDED
+Compile modern
+.Pq Dq extended
+REs,
+rather than the obsolete
+.Pq Dq basic
+REs that
+are the default.
+.It Dv REG_BASIC
+This is a synonym for 0,
+provided as a counterpart to
+.Dv REG_EXTENDED
+to improve readability.
+.It Dv REG_NOSPEC
+Compile with recognition of all special characters turned off.
+All characters are thus considered ordinary,
+so the
+.Dq RE
+is a literal string.
+This is an extension,
+compatible with but not specified by
+.St -p1003.2 ,
+and should be used with
+caution in software intended to be portable to other systems.
+.Dv REG_EXTENDED
+and
+.Dv REG_NOSPEC
+may not be used
+in the same call to
+.Fn regcomp .
+.It Dv REG_ICASE
+Compile for matching that ignores upper/lower case distinctions.
+See
+.Xr re_format 7 .
+.It Dv REG_NOSUB
+Compile for matching that need only report success or failure,
+not what was matched.
+.It Dv REG_NEWLINE
+Compile for newline-sensitive matching.
+By default, newline is a completely ordinary character with no special
+meaning in either REs or strings.
+With this flag,
+.Ql [^
+bracket expressions and
+.Ql .\&
+never match newline,
+a
+.Ql ^\&
+anchor matches the null string after any newline in the string
+in addition to its normal function,
+and the
+.Ql $\&
+anchor matches the null string before any newline in the
+string in addition to its normal function.
+.It Dv REG_PEND
+The regular expression ends,
+not at the first NUL,
+but just before the character pointed to by the
+.Va re_endp
+member of the structure pointed to by
+.Fa preg .
+The
+.Va re_endp
+member is of type
+.Ft "const char *" .
+This flag permits inclusion of NULs in the RE;
+they are considered ordinary characters.
+This is an extension,
+compatible with but not specified by
+.St -p1003.2 ,
+and should be used with
+caution in software intended to be portable to other systems.
+.El
+.Pp
+When successful,
+.Fn regcomp
+returns 0 and fills in the structure pointed to by
+.Fa preg .
+One member of that structure
+(other than
+.Va re_endp )
+is publicized:
+.Va re_nsub ,
+of type
+.Ft size_t ,
+contains the number of parenthesized subexpressions within the RE
+(except that the value of this member is undefined if the
+.Dv REG_NOSUB
+flag was used).
+If
+.Fn regcomp
+fails, it returns a non-zero error code;
+see
+.Sx DIAGNOSTICS .
+.Pp
+.Fn Regexec
+matches the compiled RE pointed to by
+.Fa preg
+against the
+.Fa string ,
+subject to the flags in
+.Fa eflags ,
+and reports results using
+.Fa nmatch ,
+.Fa pmatch ,
+and the returned value.
+The RE must have been compiled by a previous invocation of
+.Fn regcomp .
+The compiled form is not altered during execution of
+.Fn regexec ,
+so a single compiled RE can be used simultaneously by multiple threads.
+.Pp
+By default,
+the NUL-terminated string pointed to by
+.Fa string
+is considered to be the text of an entire line, minus any terminating
+newline.
+The
+.Fa eflags
+argument is the bitwise OR of zero or more of the following flags:
+.Bl -tag -width REG_STARTEND
+.It Dv REG_NOTBOL
+The first character of
+the string
+is not the beginning of a line, so the
+.Ql ^\&
+anchor should not match before it.
+This does not affect the behavior of newlines under
+.Dv REG_NEWLINE .
+.It Dv REG_NOTEOL
+The NUL terminating
+the string
+does not end a line, so the
+.Ql $\&
+anchor should not match before it.
+This does not affect the behavior of newlines under
+.Dv REG_NEWLINE .
+.It Dv REG_STARTEND
+The string is considered to start at
+.Fa string
++
+.Fa pmatch Ns [0]. Ns Va rm_so
+and to have a terminating NUL located at
+.Fa string
++
+.Fa pmatch Ns [0]. Ns Va rm_eo
+(there need not actually be a NUL at that location),
+regardless of the value of
+.Fa nmatch .
+See below for the definition of
+.Fa pmatch
+and
+.Fa nmatch .
+This is an extension,
+compatible with but not specified by
+.St -p1003.2 ,
+and should be used with
+caution in software intended to be portable to other systems.
+Note that a non-zero
+.Va rm_so
+does not imply
+.Dv REG_NOTBOL ;
+.Dv REG_STARTEND
+affects only the location of the string,
+not how it is matched.
+.El
+.Pp
+See
+.Xr re_format 7
+for a discussion of what is matched in situations where an RE or a
+portion thereof could match any of several substrings of
+.Fa string .
+.Pp
+Normally,
+.Fn regexec
+returns 0 for success and the non-zero code
+.Dv REG_NOMATCH
+for failure.
+Other non-zero error codes may be returned in exceptional situations;
+see
+.Sx DIAGNOSTICS .
+.Pp
+If
+.Dv REG_NOSUB
+was specified in the compilation of the RE,
+or if
+.Fa nmatch
+is 0,
+.Fn regexec
+ignores the
+.Fa pmatch
+argument (but see below for the case where
+.Dv REG_STARTEND
+is specified).
+Otherwise,
+.Fa pmatch
+points to an array of
+.Fa nmatch
+structures of type
+.Ft regmatch_t .
+Such a structure has at least the members
+.Va rm_so
+and
+.Va rm_eo ,
+both of type
+.Ft regoff_t
+(a signed arithmetic type at least as large as an
+.Ft off_t
+and a
+.Ft ssize_t ) ,
+containing respectively the offset of the first character of a substring
+and the offset of the first character after the end of the substring.
+Offsets are measured from the beginning of the
+.Fa string
+argument given to
+.Fn regexec .
+An empty substring is denoted by equal offsets,
+both indicating the character following the empty substring.
+.Pp
+The 0th member of the
+.Fa pmatch
+array is filled in to indicate what substring of
+.Fa string
+was matched by the entire RE.
+Remaining members report what substring was matched by parenthesized
+subexpressions within the RE;
+member
+.Va i
+reports subexpression
+.Va i ,
+with subexpressions counted (starting at 1) by the order of their opening
+parentheses in the RE, left to right.
+Unused entries in the array (corresponding either to subexpressions that
+did not participate in the match at all, or to subexpressions that do not
+exist in the RE (that is,
+.Va i
+>
+.Fa preg Ns -> Ns Va re_nsub ) )
+have both
+.Va rm_so
+and
+.Va rm_eo
+set to -1.
+If a subexpression participated in the match several times,
+the reported substring is the last one it matched.
+(Note, as an example in particular, that when the RE
+.Ql "(b*)+"
+matches
+.Ql bbb ,
+the parenthesized subexpression matches each of the three
+.So Li b Sc Ns s
+and then
+an infinite number of empty strings following the last
+.Ql b ,
+so the reported substring is one of the empties.)
+.Pp
+If
+.Dv REG_STARTEND
+is specified,
+.Fa pmatch
+must point to at least one
+.Ft regmatch_t
+(even if
+.Fa nmatch
+is 0 or
+.Dv REG_NOSUB
+was specified),
+to hold the input offsets for
+.Dv REG_STARTEND .
+Use for output is still entirely controlled by
+.Fa nmatch ;
+if
+.Fa nmatch
+is 0 or
+.Dv REG_NOSUB
+was specified,
+the value of
+.Fa pmatch Ns [0]
+will not be changed by a successful
+.Fn regexec .
+.Pp
+.Fn Regerror
+maps a non-zero
+.Fa errcode
+from either
+.Fn regcomp
+or
+.Fn regexec
+to a human-readable, printable message.
+If
+.Fa preg
+is
+.No non\- Ns Dv NULL ,
+the error code should have arisen from use of
+the
+.Ft regex_t
+pointed to by
+.Fa preg ,
+and if the error code came from
+.Fn regcomp ,
+it should have been the result from the most recent
+.Fn regcomp
+using that
+.Ft regex_t .
+.No ( Fn Regerror
+may be able to supply a more detailed message using information
+from the
+.Ft regex_t . )
+.Fn Regerror
+places the NUL-terminated message into the buffer pointed to by
+.Fa errbuf ,
+limiting the length (including the NUL) to at most
+.Fa errbuf_size
+bytes.
+If the whole message won't fit,
+as much of it as will fit before the terminating NUL is supplied.
+In any case,
+the returned value is the size of buffer needed to hold the whole
+message (including terminating NUL).
+If
+.Fa errbuf_size
+is 0,
+.Fa errbuf
+is ignored but the return value is still correct.
+.Pp
+If the
+.Fa errcode
+given to
+.Fn regerror
+is first ORed with
+.Dv REG_ITOA ,
+the
+.Dq message
+that results is the printable name of the error code,
+e.g.\&
+.Dq Dv REG_NOMATCH ,
+rather than an explanation thereof.
+If
+.Fa errcode
+is
+.Dv REG_ATOI ,
+then
+.Fa preg
+shall be
+.No non\- Ns Dv NULL
+and the
+.Va re_endp
+member of the structure it points to
+must point to the printable name of an error code;
+in this case, the result in
+.Fa errbuf
+is the decimal digits of
+the numeric value of the error code
+(0 if the name is not recognized).
+.Dv REG_ITOA
+and
+.Dv REG_ATOI
+are intended primarily as debugging facilities;
+they are extensions,
+compatible with but not specified by
+.St -p1003.2 ,
+and should be used with
+caution in software intended to be portable to other systems.
+Be warned also that they are considered experimental and changes are possible.
+.Pp
+.Fn Regfree
+frees any dynamically-allocated storage associated with the compiled RE
+pointed to by
+.Fa preg .
+The remaining
+.Ft regex_t
+is no longer a valid compiled RE
+and the effect of supplying it to
+.Fn regexec
+or
+.Fn regerror
+is undefined.
+.Pp
+None of these functions references global variables except for tables
+of constants;
+all are safe for use from multiple threads if the arguments are safe.
+.Sh IMPLEMENTATION CHOICES
+There are a number of decisions that
+.St -p1003.2
+leaves up to the implementor,
+either by explicitly saying
+.Dq undefined
+or by virtue of them being
+forbidden by the RE grammar.
+This implementation treats them as follows.
+.Pp
+See
+.Xr re_format 7
+for a discussion of the definition of case-independent matching.
+.Pp
+There is no particular limit on the length of REs,
+except insofar as memory is limited.
+Memory usage is approximately linear in RE size, and largely insensitive
+to RE complexity, except for bounded repetitions.
+See
+.Sx BUGS
+for one short RE using them
+that will run almost any system out of memory.
+.Pp
+A backslashed character other than one specifically given a magic meaning
+by
+.St -p1003.2
+(such magic meanings occur only in obsolete
+.Bq Dq basic
+REs)
+is taken as an ordinary character.
+.Pp
+Any unmatched
+.Ql [\&
+is a
+.Dv REG_EBRACK
+error.
+.Pp
+Equivalence classes cannot begin or end bracket-expression ranges.
+The endpoint of one range cannot begin another.
+.Pp
+.Dv RE_DUP_MAX ,
+the limit on repetition counts in bounded repetitions, is 255.
+.Pp
+A repetition operator
+.Ql ( ?\& ,
+.Ql *\& ,
+.Ql +\& ,
+or bounds)
+cannot follow another
+repetition operator.
+A repetition operator cannot begin an expression or subexpression
+or follow
+.Ql ^\&
+or
+.Ql |\& .
+.Pp
+.Ql |\&
+cannot appear first or last in a (sub)expression or after another
+.Ql |\& ,
+i.e. an operand of
+.Ql |\&
+cannot be an empty subexpression.
+An empty parenthesized subexpression,
+.Ql "()" ,
+is legal and matches an
+empty (sub)string.
+An empty string is not a legal RE.
+.Pp
+A
+.Ql {\&
+followed by a digit is considered the beginning of bounds for a
+bounded repetition, which must then follow the syntax for bounds.
+A
+.Ql {\&
+.Em not
+followed by a digit is considered an ordinary character.
+.Pp
+.Ql ^\&
+and
+.Ql $\&
+beginning and ending subexpressions in obsolete
+.Pq Dq basic
+REs are anchors, not ordinary characters.
+.Sh SEE ALSO
+.Xr grep 1 ,
+.Xr re_format 7
+.Pp
+.St -p1003.2 ,
+sections 2.8 (Regular Expression Notation)
+and
+B.5 (C Binding for Regular Expression Matching).
+.Sh DIAGNOSTICS
+Non-zero error codes from
+.Fn regcomp
+and
+.Fn regexec
+include the following:
+.Pp
+.Bl -tag -width REG_ECOLLATE -compact
+.It Dv REG_NOMATCH
+.Fn regexec
+failed to match
+.It Dv REG_BADPAT
+invalid regular expression
+.It Dv REG_ECOLLATE
+invalid collating element
+.It Dv REG_ECTYPE
+invalid character class
+.It Dv REG_EESCAPE
+.Ql \e
+applied to unescapable character
+.It Dv REG_ESUBREG
+invalid backreference number
+.It Dv REG_EBRACK
+brackets
+.Ql "[ ]"
+not balanced
+.It Dv REG_EPAREN
+parentheses
+.Ql "( )"
+not balanced
+.It Dv REG_EBRACE
+braces
+.Ql "{ }"
+not balanced
+.It Dv REG_BADBR
+invalid repetition count(s) in
+.Ql "{ }"
+.It Dv REG_ERANGE
+invalid character range in
+.Ql "[ ]"
+.It Dv REG_ESPACE
+ran out of memory
+.It Dv REG_BADRPT
+.Ql ?\& ,
+.Ql *\& ,
+or
+.Ql +\&
+operand invalid
+.It Dv REG_EMPTY
+empty (sub)expression
+.It Dv REG_ASSERT
+can't happen - you found a bug
+.It Dv REG_INVARG
+invalid argument, e.g. negative-length string
+.El
+.Sh HISTORY
+Originally written by
+.An Henry Spencer .
+Altered for inclusion in the
+.Bx 4.4
+distribution.
+.Sh BUGS
+This is an alpha release with known defects.
+Please report problems.
+.Pp
+The back-reference code is subtle and doubts linger about its correctness
+in complex cases.
+.Pp
+.Fn Regexec
+performance is poor.
+This will improve with later releases.
+.Fa Nmatch
+exceeding 0 is expensive;
+.Fa nmatch
+exceeding 1 is worse.
+.Fn Regexec
+is largely insensitive to RE complexity
+.Em except
+that back
+references are massively expensive.
+RE length does matter; in particular, there is a strong speed bonus
+for keeping RE length under about 30 characters,
+with most special characters counting roughly double.
+.Pp
+.Fn Regcomp
+implements bounded repetitions by macro expansion,
+which is costly in time and space if counts are large
+or bounded repetitions are nested.
+An RE like, say,
+.Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}"
+will (eventually) run almost any existing machine out of swap space.
+.Pp
+There are suspected problems with response to obscure error conditions.
+Notably,
+certain kinds of internal overflow,
+produced only by truly enormous REs or by multiply nested bounded repetitions,
+are probably not handled well.
+.Pp
+Due to a mistake in
+.St -p1003.2 ,
+things like
+.Ql "a)b"
+are legal REs because
+.Ql )\&
+is
+a special character only in the presence of a previous unmatched
+.Ql (\& .
+This can't be fixed until the spec is fixed.
+.Pp
+The standard's definition of back references is vague.
+For example, does
+.Ql "a\e(\e(b\e)*\e2\e)*d"
+match
+.Ql "abbbd" ?
+Until the standard is clarified,
+behavior in such cases should not be relied on.
+.Pp
+The implementation of word-boundary matching is a bit of a kludge,
+and bugs may lurk in combinations of word-boundary matching and anchoring.