Welcome to mirror list, hosted at ThFree Co, Russian Federation.

cygwin.com/git/newlib-cygwin.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'newlib/libc/sys/linux/stdlib/regex.3')
-rw-r--r--newlib/libc/sys/linux/stdlib/regex.3701
1 files changed, 0 insertions, 701 deletions
diff --git a/newlib/libc/sys/linux/stdlib/regex.3 b/newlib/libc/sys/linux/stdlib/regex.3
deleted file mode 100644
index d87164177..000000000
--- a/newlib/libc/sys/linux/stdlib/regex.3
+++ /dev/null
@@ -1,701 +0,0 @@
-.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
-.\" Copyright (c) 1992, 1993, 1994
-.\" The Regents of the University of California. All rights reserved.
-.\"
-.\" This code is derived from software contributed to Berkeley by
-.\" Henry Spencer.
-.\"
-.\" Redistribution and use in source and binary forms, with or without
-.\" modification, are permitted provided that the following conditions
-.\" are met:
-.\" 1. Redistributions of source code must retain the above copyright
-.\" notice, this list of conditions and the following disclaimer.
-.\" 2. Redistributions in binary form must reproduce the above copyright
-.\" notice, this list of conditions and the following disclaimer in the
-.\" documentation and/or other materials provided with the distribution.
-.\" 3. All advertising materials mentioning features or use of this software
-.\" must display the following acknowledgement:
-.\" This product includes software developed by the University of
-.\" California, Berkeley and its contributors.
-.\" 4. Neither the name of the University nor the names of its contributors
-.\" may be used to endorse or promote products derived from this software
-.\" without specific prior written permission.
-.\"
-.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
-.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
-.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
-.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
-.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
-.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
-.\" SUCH DAMAGE.
-.\"
-.\" @(#)regex.3 8.4 (Berkeley) 3/20/94
-.\" $FreeBSD: src/lib/libc/regex/regex.3,v 1.9 2001/10/01 16:08:58 ru Exp $
-.\"
-.Dd March 20, 1994
-.Dt REGEX 3
-.Os
-.Sh NAME
-.Nm regcomp ,
-.Nm regexec ,
-.Nm regerror ,
-.Nm regfree
-.Nd regular-expression library
-.Sh LIBRARY
-.Lb libc
-.Sh SYNOPSIS
-.In sys/types.h
-.In regex.h
-.Ft int
-.Fn regcomp "regex_t *preg" "const char *pattern" "int cflags"
-.Ft int
-.Fo regexec
-.Fa "const regex_t *preg" "const char *string"
-.Fa "size_t nmatch" "regmatch_t pmatch[]" "int eflags"
-.Fc
-.Ft size_t
-.Fo regerror
-.Fa "int errcode" "const regex_t *preg"
-.Fa "char *errbuf" "size_t errbuf_size"
-.Fc
-.Ft void
-.Fn regfree "regex_t *preg"
-.Sh DESCRIPTION
-These routines implement
-.St -p1003.2
-regular expressions
-.Pq Do RE Dc Ns s ;
-see
-.Xr re_format 7 .
-.Fn Regcomp
-compiles an RE written as a string into an internal form,
-.Fn regexec
-matches that internal form against a string and reports results,
-.Fn regerror
-transforms error codes from either into human-readable messages,
-and
-.Fn regfree
-frees any dynamically-allocated storage used by the internal form
-of an RE.
-.Pp
-The header
-.Aq Pa regex.h
-declares two structure types,
-.Ft regex_t
-and
-.Ft regmatch_t ,
-the former for compiled internal forms and the latter for match reporting.
-It also declares the four functions,
-a type
-.Ft regoff_t ,
-and a number of constants with names starting with
-.Dq Dv REG_ .
-.Pp
-.Fn Regcomp
-compiles the regular expression contained in the
-.Fa pattern
-string,
-subject to the flags in
-.Fa cflags ,
-and places the results in the
-.Ft regex_t
-structure pointed to by
-.Fa preg .
-.Fa Cflags
-is the bitwise OR of zero or more of the following flags:
-.Bl -tag -width REG_EXTENDED
-.It Dv REG_EXTENDED
-Compile modern
-.Pq Dq extended
-REs,
-rather than the obsolete
-.Pq Dq basic
-REs that
-are the default.
-.It Dv REG_BASIC
-This is a synonym for 0,
-provided as a counterpart to
-.Dv REG_EXTENDED
-to improve readability.
-.It Dv REG_NOSPEC
-Compile with recognition of all special characters turned off.
-All characters are thus considered ordinary,
-so the
-.Dq RE
-is a literal string.
-This is an extension,
-compatible with but not specified by
-.St -p1003.2 ,
-and should be used with
-caution in software intended to be portable to other systems.
-.Dv REG_EXTENDED
-and
-.Dv REG_NOSPEC
-may not be used
-in the same call to
-.Fn regcomp .
-.It Dv REG_ICASE
-Compile for matching that ignores upper/lower case distinctions.
-See
-.Xr re_format 7 .
-.It Dv REG_NOSUB
-Compile for matching that need only report success or failure,
-not what was matched.
-.It Dv REG_NEWLINE
-Compile for newline-sensitive matching.
-By default, newline is a completely ordinary character with no special
-meaning in either REs or strings.
-With this flag,
-.Ql [^
-bracket expressions and
-.Ql .\&
-never match newline,
-a
-.Ql ^\&
-anchor matches the null string after any newline in the string
-in addition to its normal function,
-and the
-.Ql $\&
-anchor matches the null string before any newline in the
-string in addition to its normal function.
-.It Dv REG_PEND
-The regular expression ends,
-not at the first NUL,
-but just before the character pointed to by the
-.Va re_endp
-member of the structure pointed to by
-.Fa preg .
-The
-.Va re_endp
-member is of type
-.Ft "const char *" .
-This flag permits inclusion of NULs in the RE;
-they are considered ordinary characters.
-This is an extension,
-compatible with but not specified by
-.St -p1003.2 ,
-and should be used with
-caution in software intended to be portable to other systems.
-.El
-.Pp
-When successful,
-.Fn regcomp
-returns 0 and fills in the structure pointed to by
-.Fa preg .
-One member of that structure
-(other than
-.Va re_endp )
-is publicized:
-.Va re_nsub ,
-of type
-.Ft size_t ,
-contains the number of parenthesized subexpressions within the RE
-(except that the value of this member is undefined if the
-.Dv REG_NOSUB
-flag was used).
-If
-.Fn regcomp
-fails, it returns a non-zero error code;
-see
-.Sx DIAGNOSTICS .
-.Pp
-.Fn Regexec
-matches the compiled RE pointed to by
-.Fa preg
-against the
-.Fa string ,
-subject to the flags in
-.Fa eflags ,
-and reports results using
-.Fa nmatch ,
-.Fa pmatch ,
-and the returned value.
-The RE must have been compiled by a previous invocation of
-.Fn regcomp .
-The compiled form is not altered during execution of
-.Fn regexec ,
-so a single compiled RE can be used simultaneously by multiple threads.
-.Pp
-By default,
-the NUL-terminated string pointed to by
-.Fa string
-is considered to be the text of an entire line, minus any terminating
-newline.
-The
-.Fa eflags
-argument is the bitwise OR of zero or more of the following flags:
-.Bl -tag -width REG_STARTEND
-.It Dv REG_NOTBOL
-The first character of
-the string
-is not the beginning of a line, so the
-.Ql ^\&
-anchor should not match before it.
-This does not affect the behavior of newlines under
-.Dv REG_NEWLINE .
-.It Dv REG_NOTEOL
-The NUL terminating
-the string
-does not end a line, so the
-.Ql $\&
-anchor should not match before it.
-This does not affect the behavior of newlines under
-.Dv REG_NEWLINE .
-.It Dv REG_STARTEND
-The string is considered to start at
-.Fa string
-+
-.Fa pmatch Ns [0]. Ns Va rm_so
-and to have a terminating NUL located at
-.Fa string
-+
-.Fa pmatch Ns [0]. Ns Va rm_eo
-(there need not actually be a NUL at that location),
-regardless of the value of
-.Fa nmatch .
-See below for the definition of
-.Fa pmatch
-and
-.Fa nmatch .
-This is an extension,
-compatible with but not specified by
-.St -p1003.2 ,
-and should be used with
-caution in software intended to be portable to other systems.
-Note that a non-zero
-.Va rm_so
-does not imply
-.Dv REG_NOTBOL ;
-.Dv REG_STARTEND
-affects only the location of the string,
-not how it is matched.
-.El
-.Pp
-See
-.Xr re_format 7
-for a discussion of what is matched in situations where an RE or a
-portion thereof could match any of several substrings of
-.Fa string .
-.Pp
-Normally,
-.Fn regexec
-returns 0 for success and the non-zero code
-.Dv REG_NOMATCH
-for failure.
-Other non-zero error codes may be returned in exceptional situations;
-see
-.Sx DIAGNOSTICS .
-.Pp
-If
-.Dv REG_NOSUB
-was specified in the compilation of the RE,
-or if
-.Fa nmatch
-is 0,
-.Fn regexec
-ignores the
-.Fa pmatch
-argument (but see below for the case where
-.Dv REG_STARTEND
-is specified).
-Otherwise,
-.Fa pmatch
-points to an array of
-.Fa nmatch
-structures of type
-.Ft regmatch_t .
-Such a structure has at least the members
-.Va rm_so
-and
-.Va rm_eo ,
-both of type
-.Ft regoff_t
-(a signed arithmetic type at least as large as an
-.Ft off_t
-and a
-.Ft ssize_t ) ,
-containing respectively the offset of the first character of a substring
-and the offset of the first character after the end of the substring.
-Offsets are measured from the beginning of the
-.Fa string
-argument given to
-.Fn regexec .
-An empty substring is denoted by equal offsets,
-both indicating the character following the empty substring.
-.Pp
-The 0th member of the
-.Fa pmatch
-array is filled in to indicate what substring of
-.Fa string
-was matched by the entire RE.
-Remaining members report what substring was matched by parenthesized
-subexpressions within the RE;
-member
-.Va i
-reports subexpression
-.Va i ,
-with subexpressions counted (starting at 1) by the order of their opening
-parentheses in the RE, left to right.
-Unused entries in the array (corresponding either to subexpressions that
-did not participate in the match at all, or to subexpressions that do not
-exist in the RE (that is,
-.Va i
->
-.Fa preg Ns -> Ns Va re_nsub ) )
-have both
-.Va rm_so
-and
-.Va rm_eo
-set to -1.
-If a subexpression participated in the match several times,
-the reported substring is the last one it matched.
-(Note, as an example in particular, that when the RE
-.Ql "(b*)+"
-matches
-.Ql bbb ,
-the parenthesized subexpression matches each of the three
-.So Li b Sc Ns s
-and then
-an infinite number of empty strings following the last
-.Ql b ,
-so the reported substring is one of the empties.)
-.Pp
-If
-.Dv REG_STARTEND
-is specified,
-.Fa pmatch
-must point to at least one
-.Ft regmatch_t
-(even if
-.Fa nmatch
-is 0 or
-.Dv REG_NOSUB
-was specified),
-to hold the input offsets for
-.Dv REG_STARTEND .
-Use for output is still entirely controlled by
-.Fa nmatch ;
-if
-.Fa nmatch
-is 0 or
-.Dv REG_NOSUB
-was specified,
-the value of
-.Fa pmatch Ns [0]
-will not be changed by a successful
-.Fn regexec .
-.Pp
-.Fn Regerror
-maps a non-zero
-.Fa errcode
-from either
-.Fn regcomp
-or
-.Fn regexec
-to a human-readable, printable message.
-If
-.Fa preg
-is
-.No non\- Ns Dv NULL ,
-the error code should have arisen from use of
-the
-.Ft regex_t
-pointed to by
-.Fa preg ,
-and if the error code came from
-.Fn regcomp ,
-it should have been the result from the most recent
-.Fn regcomp
-using that
-.Ft regex_t .
-.No ( Fn Regerror
-may be able to supply a more detailed message using information
-from the
-.Ft regex_t . )
-.Fn Regerror
-places the NUL-terminated message into the buffer pointed to by
-.Fa errbuf ,
-limiting the length (including the NUL) to at most
-.Fa errbuf_size
-bytes.
-If the whole message won't fit,
-as much of it as will fit before the terminating NUL is supplied.
-In any case,
-the returned value is the size of buffer needed to hold the whole
-message (including terminating NUL).
-If
-.Fa errbuf_size
-is 0,
-.Fa errbuf
-is ignored but the return value is still correct.
-.Pp
-If the
-.Fa errcode
-given to
-.Fn regerror
-is first ORed with
-.Dv REG_ITOA ,
-the
-.Dq message
-that results is the printable name of the error code,
-e.g.\&
-.Dq Dv REG_NOMATCH ,
-rather than an explanation thereof.
-If
-.Fa errcode
-is
-.Dv REG_ATOI ,
-then
-.Fa preg
-shall be
-.No non\- Ns Dv NULL
-and the
-.Va re_endp
-member of the structure it points to
-must point to the printable name of an error code;
-in this case, the result in
-.Fa errbuf
-is the decimal digits of
-the numeric value of the error code
-(0 if the name is not recognized).
-.Dv REG_ITOA
-and
-.Dv REG_ATOI
-are intended primarily as debugging facilities;
-they are extensions,
-compatible with but not specified by
-.St -p1003.2 ,
-and should be used with
-caution in software intended to be portable to other systems.
-Be warned also that they are considered experimental and changes are possible.
-.Pp
-.Fn Regfree
-frees any dynamically-allocated storage associated with the compiled RE
-pointed to by
-.Fa preg .
-The remaining
-.Ft regex_t
-is no longer a valid compiled RE
-and the effect of supplying it to
-.Fn regexec
-or
-.Fn regerror
-is undefined.
-.Pp
-None of these functions references global variables except for tables
-of constants;
-all are safe for use from multiple threads if the arguments are safe.
-.Sh IMPLEMENTATION CHOICES
-There are a number of decisions that
-.St -p1003.2
-leaves up to the implementor,
-either by explicitly saying
-.Dq undefined
-or by virtue of them being
-forbidden by the RE grammar.
-This implementation treats them as follows.
-.Pp
-See
-.Xr re_format 7
-for a discussion of the definition of case-independent matching.
-.Pp
-There is no particular limit on the length of REs,
-except insofar as memory is limited.
-Memory usage is approximately linear in RE size, and largely insensitive
-to RE complexity, except for bounded repetitions.
-See
-.Sx BUGS
-for one short RE using them
-that will run almost any system out of memory.
-.Pp
-A backslashed character other than one specifically given a magic meaning
-by
-.St -p1003.2
-(such magic meanings occur only in obsolete
-.Bq Dq basic
-REs)
-is taken as an ordinary character.
-.Pp
-Any unmatched
-.Ql [\&
-is a
-.Dv REG_EBRACK
-error.
-.Pp
-Equivalence classes cannot begin or end bracket-expression ranges.
-The endpoint of one range cannot begin another.
-.Pp
-.Dv RE_DUP_MAX ,
-the limit on repetition counts in bounded repetitions, is 255.
-.Pp
-A repetition operator
-.Ql ( ?\& ,
-.Ql *\& ,
-.Ql +\& ,
-or bounds)
-cannot follow another
-repetition operator.
-A repetition operator cannot begin an expression or subexpression
-or follow
-.Ql ^\&
-or
-.Ql |\& .
-.Pp
-.Ql |\&
-cannot appear first or last in a (sub)expression or after another
-.Ql |\& ,
-i.e. an operand of
-.Ql |\&
-cannot be an empty subexpression.
-An empty parenthesized subexpression,
-.Ql "()" ,
-is legal and matches an
-empty (sub)string.
-An empty string is not a legal RE.
-.Pp
-A
-.Ql {\&
-followed by a digit is considered the beginning of bounds for a
-bounded repetition, which must then follow the syntax for bounds.
-A
-.Ql {\&
-.Em not
-followed by a digit is considered an ordinary character.
-.Pp
-.Ql ^\&
-and
-.Ql $\&
-beginning and ending subexpressions in obsolete
-.Pq Dq basic
-REs are anchors, not ordinary characters.
-.Sh SEE ALSO
-.Xr grep 1 ,
-.Xr re_format 7
-.Pp
-.St -p1003.2 ,
-sections 2.8 (Regular Expression Notation)
-and
-B.5 (C Binding for Regular Expression Matching).
-.Sh DIAGNOSTICS
-Non-zero error codes from
-.Fn regcomp
-and
-.Fn regexec
-include the following:
-.Pp
-.Bl -tag -width REG_ECOLLATE -compact
-.It Dv REG_NOMATCH
-.Fn regexec
-failed to match
-.It Dv REG_BADPAT
-invalid regular expression
-.It Dv REG_ECOLLATE
-invalid collating element
-.It Dv REG_ECTYPE
-invalid character class
-.It Dv REG_EESCAPE
-.Ql \e
-applied to unescapable character
-.It Dv REG_ESUBREG
-invalid backreference number
-.It Dv REG_EBRACK
-brackets
-.Ql "[ ]"
-not balanced
-.It Dv REG_EPAREN
-parentheses
-.Ql "( )"
-not balanced
-.It Dv REG_EBRACE
-braces
-.Ql "{ }"
-not balanced
-.It Dv REG_BADBR
-invalid repetition count(s) in
-.Ql "{ }"
-.It Dv REG_ERANGE
-invalid character range in
-.Ql "[ ]"
-.It Dv REG_ESPACE
-ran out of memory
-.It Dv REG_BADRPT
-.Ql ?\& ,
-.Ql *\& ,
-or
-.Ql +\&
-operand invalid
-.It Dv REG_EMPTY
-empty (sub)expression
-.It Dv REG_ASSERT
-can't happen - you found a bug
-.It Dv REG_INVARG
-invalid argument, e.g. negative-length string
-.El
-.Sh HISTORY
-Originally written by
-.An Henry Spencer .
-Altered for inclusion in the
-.Bx 4.4
-distribution.
-.Sh BUGS
-This is an alpha release with known defects.
-Please report problems.
-.Pp
-The back-reference code is subtle and doubts linger about its correctness
-in complex cases.
-.Pp
-.Fn Regexec
-performance is poor.
-This will improve with later releases.
-.Fa Nmatch
-exceeding 0 is expensive;
-.Fa nmatch
-exceeding 1 is worse.
-.Fn Regexec
-is largely insensitive to RE complexity
-.Em except
-that back
-references are massively expensive.
-RE length does matter; in particular, there is a strong speed bonus
-for keeping RE length under about 30 characters,
-with most special characters counting roughly double.
-.Pp
-.Fn Regcomp
-implements bounded repetitions by macro expansion,
-which is costly in time and space if counts are large
-or bounded repetitions are nested.
-An RE like, say,
-.Ql "((((a{1,100}){1,100}){1,100}){1,100}){1,100}"
-will (eventually) run almost any existing machine out of swap space.
-.Pp
-There are suspected problems with response to obscure error conditions.
-Notably,
-certain kinds of internal overflow,
-produced only by truly enormous REs or by multiply nested bounded repetitions,
-are probably not handled well.
-.Pp
-Due to a mistake in
-.St -p1003.2 ,
-things like
-.Ql "a)b"
-are legal REs because
-.Ql )\&
-is
-a special character only in the presence of a previous unmatched
-.Ql (\& .
-This can't be fixed until the spec is fixed.
-.Pp
-The standard's definition of back references is vague.
-For example, does
-.Ql "a\e(\e(b\e)*\e2\e)*d"
-match
-.Ql "abbbd" ?
-Until the standard is clarified,
-behavior in such cases should not be relied on.
-.Pp
-The implementation of word-boundary matching is a bit of a kludge,
-and bugs may lurk in combinations of word-boundary matching and anchoring.