.\" Copyright (c) 2011 Gabor Kovesdan . .\" Copyright (c) 1992, 1993, 1994 Henry Spencer. .\" Copyright (c) 1992, 1993, 1994 .\" The Regents of the University of California. All rights reserved. .\" .\" This code is derived from software contributed to Berkeley by .\" Henry Spencer. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" 4. Neither the name of the University nor the names of its contributors .\" may be used to endorse or promote products derived from this software .\" without specific prior written permission. .\" .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" @(#)regex.3 8.4 (Berkeley) 3/20/94 .\" $FreeBSD$ .\" .Dd December 23, 2011 .Dt REGEX 3 .Os .Sh NAME .Nm regcomp , .Nm regncomp , .Nm regwcomp , .Nm regwncomp , .Nm regexec , .Nm regnexec , .Nm regwexec , .Nm regwnexec , .Nm regerror , .Nm regfree .Nd regular-expression library .Sh LIBRARY .Lb libc .Sh SYNOPSIS .In regex.h .Ft int .Fo regcomp .Fa "regex_t * preg" "const char * pattern" "int cflags" .Fc .Ft int .Fo regncomp .Fa "regex_t * preg" "const char * pattern" "size_t len" "int cflags" .Fc .Ft int .Fo regwcomp .Fa "regex_t * preg" "const wchar_t * pattern" "int cflags" .Fc .Ft int .Fo regwncomp .Fa "regex_t * preg" "const wchar_t * pattern" "size_t len" "int cflags" .Fc .Ft int .Fo regexec .Fa "const regex_t * preg" "const char * string" .Fa "size_t nmatch" "regmatch_t pmatch[]" "int eflags" .Fc .Ft int .Fo regnexec .Fa "const regex_t * preg" "const char * string" "size_t len" .Fa "size_t nmatch" "regmatch_t pmatch[]" "int eflags" .Fc .Ft int .Fo regwexec .Fa "const regex_t * preg" "const wchar_t * string" .Fa "size_t nmatch" "regmatch_t pmatch[]" "int eflags" .Fc .Ft int .Fo regwnexec .Fa "const regex_t * preg" "const wchar_t * string" "size_t len" .Fa "size_t nmatch" "regmatch_t pmatch[]" "int eflags" .Fc .Ft size_t .Fo regerror .Fa "int errcode" "const regex_t * restrict preg" .Fa "char * restrict errbuf" "size_t errbuf_size" .Fc .Ft void .Fn regfree "regex_t *preg" .Sh DESCRIPTION These routines implement pattern matching of .St -p1003.2 regular expressions. The .Xr re_format 7 manual can be consulted for their syntax and usage. .Pp The .Fn regcomp function compiles a regular expression from a string into an internal form. The .Fn regncomp function works like the former but takes another argument to specify the length of the pattern. This function can accept patterns containing NUL bytes. The .Fn regwcomp and .Fn regwncomp functions work like the two former ones but take the pattern in the wide string form. .Pp The .Fn regexec function matches the compiled regular expression against a string and reports results. The .Fn regnexec function works like the former but takes another argument to specify the length of the pattern, allowing NUL bytes in the input string. Additionally, for long inputs strings it is more efficient to call this function if the length is already known beause it will not require the matcher to calculate the length and read the input bytes one by one. The .Fn regwexec and .Fn regwnexec functions work like the two former ones but take the input as a wide string. .Pp The .Fn regerror function transforms error codes from the above functions into human-readable messages. .Pp The .Fn regfree function frees any dynamically-allocated storage used by the internal form of a regular expression. .Pp The header .In regex.h declares two structure types, .Ft regex_t and .Ft regmatch_t , the former for compiled internal forms and the latter for submatch reporting. It also declares the functions mentioned above, a type .Ft regoff_t , and a number of constants with names starting with .Dq Dv REG_ . .Pp The .Fn regcomp family of functions compile the regular expression contained in the .Fa pattern string, subject to the flags in .Fa cflags , and places the results in the .Ft regex_t structure pointed to by .Fa preg . Some variants of the function also take the length of the pattern in .Fa len. The .Fa cflags argument is the bitwise OR of zero or more of the following flags: .Bl -tag -width REG_EXTENDED .It Dv REG_EXTENDED Compile extended regular expressions .Pq Dq EREs , rather than the obsolete basic regular expressions .Pq Dq BREs that are the default. It may not be used together with .Dv REG_NOSPEC or .Dv REG_LITERAL in the same call to .Fn regcomp . .It Dv REG_BASIC This is a synonym for 0, provided as a counterpart to .Dv REG_EXTENDED to improve readability. .It Dv REG_NOSPEC Compile with recognition of all special characters turned off. All characters are thus considered ordinary, so the reqular expression is a literal string. .It Dv REG_LITERAL Synonym for .Dv REG_NOSPEC. .It Dv REG_ICASE Compile for case insensitive matching. .It Dv REG_NOSUB Compile for only reporting match or mismatch with no regard to the matching offset. .It Dv REG_NEWLINE Compile for newline-sensitive matching. By default, newline is a completely ordinary character with no special meaning in either regular expressions or strings. With this flag, .Ql [^ bracket expressions and .Ql .\& never match newline, a .Ql ^\& anchor matches the null string after any newline in the string in addition to its normal function, and the .Ql $\& anchor matches the null string before any newline in the string in addition to its normal function. .It Dv REG_PEND The regular expression ends, not at the first NUL, but just before the character pointed to by the .Va re_endp or .Va re_wendp member of the structure pointed to by .Fa preg . The former is used for the functions that take a single- or multi-byte string, while the second is used for those taking a wide string. The .Va re_endp member is of type .Ft "const char *" and the .Va re_wendp member is of type .Ft "const wchar_t *" . This flag permits inclusion of NULs in the regular expression; they are considered ordinary characters. .El .Pp When successful, the .Fn regcomp family of functions returns .Dv REG_OK and fills in the structure pointed to by .Fa preg . The .Va re_nsub , member of the structure of type .Ft size_t , contains the number of parenthesized subexpressions within the regular expression (except when the .Dv REG_NOSUB flag was used for the compilation of the pattern). If .Fn regcomp fails, it returns a non-zero error code; see .Sx RETURN VALUES . .Pp The .Fn regexec family of functions match the compiled regular expression pointed to by .Fa preg against the .Fa string (possibly having a length of .Fa len when using the variants that take the input length), subject to the flags in .Fa eflags and reports through its return value whether the input matches. The .Fa pmatch argument is also filled in to hold submatches unless the pattern was compiled using the .Dv REG_NOSUB flag or the .Fa nmatch argument was set to 0. The regular expression must have been compiled by a previous invocation of .Fn regcomp or any of its alternative forms. The compiled form is not altered during execution of .Fn regexec or its alternatives, so a single compiled regular expression can be used simultaneously by multiple threads, and it can be used with any variant of the .Fn regexec functions. (That is, a multi-byte pattern can be matched to wide string input and vice versa.) .Pp The .Fa eflags argument is the bitwise OR of zero or more of the following flags: .Bl -tag -width REG_STARTEND .It Dv REG_NOTBOL The first character of the string is not the beginning of a line, so the .Ql ^\& anchor should not match before it. This does not affect the behavior of newlines under .Dv REG_NEWLINE . .It Dv REG_NOTEOL The NUL character terminating the string does not end a line, so the .Ql $\& anchor should not match before it. This does not affect the behavior of newlines under .Dv REG_NEWLINE . .It Dv REG_STARTEND The string is considered to start at .Fa string + .Fa pmatch Ns [0]. Ns Va rm_so (inclusive) and to end at .Fa string + .Fa pmatch Ns [0]. Ns Va rm_eo (exclusive), regardless of the value of .Fa nmatch . See below for the definition of .Fa pmatch and .Fa nmatch . Note that a non-zero .Va rm_so does not imply .Dv REG_NOTBOL ; .Dv REG_STARTEND affects only the location of the string, not how it is matched. .El .Pp The function indicates a match by returning .Dv REG_OK , no match with .Dv REG_NOMATCH , or returns an error code different from the above two values if an error has occured during the execution. See .Sx RETURN VALUES for a detailed description of error codes. .Pp If .Dv REG_NOSUB was specified in the compilation of the regular expression, or if .Fa nmatch is 0, .Fn regexec ignores the .Fa pmatch argument (but see below for the case where .Dv REG_STARTEND is specified). Otherwise, .Fa pmatch points to an array of .Fa nmatch structures of type .Ft regmatch_t . Such a structure has at least the members .Va rm_so and .Va rm_eo , both of type .Ft regoff_t (a signed arithmetic type at least as large as an .Ft off_t and a .Ft ssize_t ) , containing respectively the offset of the first character of a substring and the offset of the first character after the end of the substring. Offsets are measured from the beginning of the .Fa string argument given to .Fn regexec . An empty substring is denoted by equal offsets, both indicating the character following the empty substring. .Pp The 0th member of the .Fa pmatch array is filled in to indicate what substring of .Fa string was matched by the entire regular expression. Remaining members report what substring was matched by parenthesized subexpressions within the regular expression; member .Va i reports subexpression .Va i , with subexpressions counted (starting at 1) by the order of their opening parentheses in the regular expression, left to right. Unused entries in the array (corresponding either to subexpressions that did not participate in the match at all, or to subexpressions that do not exist in the regular expression (that is, .Va i > .Fa preg Ns -> Ns Va re_nsub ) ) have both .Va rm_so and .Va rm_eo set to -1. If a subexpression participated in the match several times, the reported substring is the last one it matched. (Note, as an example in particular, that when the regular expression .Ql "(b*)+" matches .Ql bbb , the parenthesized subexpression matches each of the three .So Li b Sc Ns s and then an infinite number of empty strings following the last .Ql b , so the reported substring is one of the empties.) .Pp If .Dv REG_STARTEND is specified, .Fa pmatch must point to at least one .Ft regmatch_t (even if .Fa nmatch is 0 or .Dv REG_NOSUB was specified), to hold the input offsets for .Dv REG_STARTEND . Use for output is still entirely controlled by .Fa nmatch ; if .Fa nmatch is 0 or .Dv REG_NOSUB was specified, the value of .Fa pmatch Ns [0] will not be changed by a successful .Fn regexec . .Pp The .Fn regerror function maps a non-zero .Fa errcode from either .Fn regcomp or .Fn regexec to a human-readable, printable message. If .Fa preg is .No non\- Ns Dv NULL , the error code should have arisen from use of the .Ft regex_t pointed to by .Fa preg , and if the error code came from .Fn regcomp , it should have been the result from the most recent .Fn regcomp using that .Ft regex_t . (The .Fn regerror function may be able to supply a more detailed message using information from the .Ft regex_t . ) The .Fn regerror function places the NUL-terminated message into the buffer pointed to by .Fa errbuf , limiting the length (including the NUL) to at most .Fa errbuf_size bytes. If the whole message will not fit, as much of it as will fit before the terminating NUL is supplied. In any case, the returned value is the size of buffer needed to hold the whole message (including terminating NUL). If .Fa errbuf_size is 0, .Fa errbuf is ignored but the return value is still correct. .Pp The .Fn regfree function frees any dynamically-allocated storage associated with the compiled regular expression pointed to by .Fa preg . The remaining .Ft regex_t is no longer a valid compiled regular expression and the effect of supplying it to .Fn regexec or .Fn regerror is undefined. .Pp All of the functions described above are thread-safe. .Sh RETURN VALUES Non-zero error codes from the .Fn regcomp and .Fn regexec family of functions include the following: .Pp .Bl -tag -width REG_ECOLLATE -compact .It Dv REG_OK Operation successfully executed. Synonym for 0, to provide better code readability. .It Dv REG_NOMATCH The .Fn regexec function or its variants failed to match. .It Dv REG_BADPAT Invalid regular expression. This implementation only returns this code when the regular expression passed to .Fn regcomp contains an illegal multibyte sequence. .It Dv REG_ECOLLATE Invalid collating element. Returned whenever equivalence classes or multicharacter collating elements are used in a bracket expression. .Pq They are not supported yet. .It Dv REG_ECTYPE Invalid character class name. .It Dv REG_EESCAPE The last character was a backslash. .It Dv REG_ESUBREG Invalid backreference number. .It Dv REG_EBRACK Brackets .Ql "[ ]" are not balanced. .It Dv REG_EPAREN Parentheses .Ql "( )" are not balanced. .It Dv REG_EBRACE Braces .Ql "{ }" are not balanced. .It Dv REG_BADBR Invalid repetition count(s) in .Ql "{ }" : not a number, more than two numbers, first larger than second, or number too large. .It Dv REG_ERANGE Invalid character range in .Ql "[ ]" , i.e. ending point is earlier in the collating order than the starting point. .It Dv REG_ESPACE Out of memory. .It Dv REG_BADRPT Invalid use of repetition operators: two or more repetition operators have been chained in an undefined way. .Sh SEE ALSO .Xr grep 1 , .Xr re_format 7 .Pp .St -p1003.2 , sections 2.8 (Regular Expression Notation) and B.5 (C Binding for Regular Expression Matching). .Sh STANDARDS The .Fn regcomp , .Fn regexec , .Fn regerror and .Fn regfree functions, the header file .In regex.h and the two structure types .Ft regex_t and .Ft regmatch_t (except the .Va re_endp and .Va re_wendp fields), the type .Ft regoff_t , the macros .Dv REG_EXTENDED , .Dv REG_ICASE , .Dv REG_NOSUB , .Dv REG_NEWLINE , .Dv REG_NOTBOL , .Dv REG_NOTEOL and all the error codes except .Dv REG_OK conform to the standard .St -p1003.2 . .Pp The alternative forms of the functions taking the length of the input and/or taking wide strings, the flags that are not listed above, the .Va re_end and .Va re_wendp fields in .Ft regex_t and the .Dv REG_OK error code are extensions and thus are not expected to be portable. .Sh HISTORY This regex implementation comes from the TRE project and it was included first in .Fx 10-CURRENT. This manual was originally written by .An Henry Spencer for an older implementation and later extended and tailored for TRE by .An Gabor Kovesdan .