summaryrefslogtreecommitdiffstats
path: root/src/test/modules/test_regex/README
diff options
context:
space:
mode:
Diffstat (limited to 'src/test/modules/test_regex/README')
-rw-r--r--src/test/modules/test_regex/README78
1 files changed, 78 insertions, 0 deletions
diff --git a/src/test/modules/test_regex/README b/src/test/modules/test_regex/README
new file mode 100644
index 0000000..3ef152d
--- /dev/null
+++ b/src/test/modules/test_regex/README
@@ -0,0 +1,78 @@
+test_regex is a module for testing the regular expression package.
+It is mostly meant to allow us to absorb Tcl's regex test suite.
+Therefore, there are provisions to exercise regex features that
+aren't currently exposed at the SQL level by PostgreSQL.
+
+Currently, one function is provided:
+
+test_regex(pattern text, string text, flags text) returns setof text[]
+
+Reports an error if the pattern is an invalid regex. Otherwise,
+the first row of output contains the number of subexpressions,
+followed by words reporting set bit(s) in the regex's re_info field.
+If the pattern doesn't match the string, that's all.
+If the pattern does match, the next row contains the whole match
+as the first array element. If there are parenthesized subexpression(s),
+following array elements contain the matches to those subexpressions.
+If the "g" (glob) flag is set, then additional row(s) of output similarly
+report any additional matches.
+
+The "flags" argument is a string of zero or more single-character
+flags that modify the behavior of the regex package or the test
+function. As described in Tcl's reg.test file:
+
+The flag characters are complex and a bit eclectic. Generally speaking,
+lowercase letters are compile options, uppercase are expected re_info
+bits, and nonalphabetics are match options, controls for how the test is
+run, or testing options. The one small surprise is that AREs are the
+default, and you must explicitly request lesser flavors of RE. The flags
+are as follows. It is admitted that some are not very mnemonic.
+
+ - no-op (placeholder)
+ 0 report indices not actual strings
+ (This substitutes for Tcl's -indices switch)
+ ! expect partial match, report start position anyway
+ % force small state-set cache in matcher (to test cache replace)
+ ^ beginning of string is not beginning of line
+ $ end of string is not end of line
+ * test is Unicode-specific, needs big character set
+ + provide fake xy equivalence class and ch collating element
+ (Note: the equivalence class is implemented, the
+ collating element is not; so references to [.ch.] fail)
+ , set REG_PROGRESS (only useful in REG_DEBUG builds)
+ . set REG_DUMP (only useful in REG_DEBUG builds)
+ : set REG_MTRACE (only useful in REG_DEBUG builds)
+ ; set REG_FTRACE (only useful in REG_DEBUG builds)
+
+ & test as both ARE and BRE
+ (Not implemented in Postgres, we use separate tests)
+ b BRE
+ e ERE
+ a turn advanced-features bit on (error unless ERE already)
+ q literal string, no metacharacters at all
+
+ g global match (find all matches)
+ i case-independent matching
+ o ("opaque") do not return match locations
+ p newlines are half-magic, excluded from . and [^ only
+ w newlines are half-magic, significant to ^ and $ only
+ n newlines are fully magic, both effects
+ x expanded RE syntax
+ t incomplete-match reporting
+ c canmatch (equivalent to "t0!", in Postgres implementation)
+ s match only at start (REG_BOSONLY)
+
+ A backslash-_a_lphanumeric seen
+ B ERE/ARE literal-_b_race heuristic used
+ E backslash (_e_scape) seen within []
+ H looka_h_ead constraint seen
+ I _i_mpossible to match
+ L _l_ocale-specific construct seen
+ M unportable (_m_achine-specific) construct seen
+ N RE can match empty (_n_ull) string
+ P non-_P_OSIX construct seen
+ Q {} _q_uantifier seen
+ R back _r_eference seen
+ S POSIX-un_s_pecified syntax seen
+ T prefers shortest (_t_iny)
+ U saw original-POSIX botch: unmatched right paren in ERE (_u_gh)