diff options
Diffstat (limited to 'src/test/modules/test_regex/README')
-rw-r--r-- | src/test/modules/test_regex/README | 78 |
1 files changed, 78 insertions, 0 deletions
diff --git a/src/test/modules/test_regex/README b/src/test/modules/test_regex/README new file mode 100644 index 0000000..3ef152d --- /dev/null +++ b/src/test/modules/test_regex/README @@ -0,0 +1,78 @@ +test_regex is a module for testing the regular expression package. +It is mostly meant to allow us to absorb Tcl's regex test suite. +Therefore, there are provisions to exercise regex features that +aren't currently exposed at the SQL level by PostgreSQL. + +Currently, one function is provided: + +test_regex(pattern text, string text, flags text) returns setof text[] + +Reports an error if the pattern is an invalid regex. Otherwise, +the first row of output contains the number of subexpressions, +followed by words reporting set bit(s) in the regex's re_info field. +If the pattern doesn't match the string, that's all. +If the pattern does match, the next row contains the whole match +as the first array element. If there are parenthesized subexpression(s), +following array elements contain the matches to those subexpressions. +If the "g" (glob) flag is set, then additional row(s) of output similarly +report any additional matches. + +The "flags" argument is a string of zero or more single-character +flags that modify the behavior of the regex package or the test +function. As described in Tcl's reg.test file: + +The flag characters are complex and a bit eclectic. Generally speaking, +lowercase letters are compile options, uppercase are expected re_info +bits, and nonalphabetics are match options, controls for how the test is +run, or testing options. The one small surprise is that AREs are the +default, and you must explicitly request lesser flavors of RE. The flags +are as follows. It is admitted that some are not very mnemonic. + + - no-op (placeholder) + 0 report indices not actual strings + (This substitutes for Tcl's -indices switch) + ! expect partial match, report start position anyway + % force small state-set cache in matcher (to test cache replace) + ^ beginning of string is not beginning of line + $ end of string is not end of line + * test is Unicode-specific, needs big character set + + provide fake xy equivalence class and ch collating element + (Note: the equivalence class is implemented, the + collating element is not; so references to [.ch.] fail) + , set REG_PROGRESS (only useful in REG_DEBUG builds) + . set REG_DUMP (only useful in REG_DEBUG builds) + : set REG_MTRACE (only useful in REG_DEBUG builds) + ; set REG_FTRACE (only useful in REG_DEBUG builds) + + & test as both ARE and BRE + (Not implemented in Postgres, we use separate tests) + b BRE + e ERE + a turn advanced-features bit on (error unless ERE already) + q literal string, no metacharacters at all + + g global match (find all matches) + i case-independent matching + o ("opaque") do not return match locations + p newlines are half-magic, excluded from . and [^ only + w newlines are half-magic, significant to ^ and $ only + n newlines are fully magic, both effects + x expanded RE syntax + t incomplete-match reporting + c canmatch (equivalent to "t0!", in Postgres implementation) + s match only at start (REG_BOSONLY) + + A backslash-_a_lphanumeric seen + B ERE/ARE literal-_b_race heuristic used + E backslash (_e_scape) seen within [] + H looka_h_ead constraint seen + I _i_mpossible to match + L _l_ocale-specific construct seen + M unportable (_m_achine-specific) construct seen + N RE can match empty (_n_ull) string + P non-_P_OSIX construct seen + Q {} _q_uantifier seen + R back _r_eference seen + S POSIX-un_s_pecified syntax seen + T prefers shortest (_t_iny) + U saw original-POSIX botch: unmatched right paren in ERE (_u_gh) |