1 files changed, 78 insertions, 0 deletions
diff --git a/src/test/modules/test_regex/README b/src/test/modules/test_regex/README
new file mode 100644
index 0000000..3ef152d
--- /dev/null
+++ b/src/test/modules/test_regex/README
@@ -0,0 +1,78 @@
+test_regex is a module for testing the regular expression package.
+It is mostly meant to allow us to absorb Tcl's regex test suite.
+Therefore, there are provisions to exercise regex features that
+aren't currently exposed at the SQL level by PostgreSQL.
+
+Currently, one function is provided:
+
+test_regex(pattern text, string text, flags text) returns setof text[]
+
+Reports an error if the pattern is an invalid regex.  Otherwise,
+the first row of output contains the number of subexpressions,
+followed by words reporting set bit(s) in the regex's re_info field.
+If the pattern doesn't match the string, that's all.
+If the pattern does match, the next row contains the whole match
+as the first array element.  If there are parenthesized subexpression(s),
+following array elements contain the matches to those subexpressions.
+If the "g" (glob) flag is set, then additional row(s) of output similarly
+report any additional matches.
+
+The "flags" argument is a string of zero or more single-character
+flags that modify the behavior of the regex package or the test
+function.  As described in Tcl's reg.test file:
+
+The flag characters are complex and a bit eclectic.  Generally speaking,
+lowercase letters are compile options, uppercase are expected re_info
+bits, and nonalphabetics are match options, controls for how the test is
+run, or testing options.  The one small surprise is that AREs are the
+default, and you must explicitly request lesser flavors of RE.  The flags
+are as follows.  It is admitted that some are not very mnemonic.
+
+	-	no-op (placeholder)
+	0	report indices not actual strings
+		(This substitutes for Tcl's -indices switch)
+	!	expect partial match, report start position anyway
+	%	force small state-set cache in matcher (to test cache replace)
+	^	beginning of string is not beginning of line
+	$	end of string is not end of line
+	*	test is Unicode-specific, needs big character set
+	+	provide fake xy equivalence class and ch collating element
+		(Note: the equivalence class is implemented, the
+		collating element is not; so references to [.ch.] fail)
+	,	set REG_PROGRESS (only useful in REG_DEBUG builds)
+	.	set REG_DUMP (only useful in REG_DEBUG builds)
+	:	set REG_MTRACE (only useful in REG_DEBUG builds)
+	;	set REG_FTRACE (only useful in REG_DEBUG builds)
+
+	&	test as both ARE and BRE
+		(Not implemented in Postgres, we use separate tests)
+	b	BRE
+	e	ERE
+	a	turn advanced-features bit on (error unless ERE already)
+	q	literal string, no metacharacters at all
+
+	g	global match (find all matches)
+	i	case-independent matching
+	o	("opaque") do not return match locations
+	p	newlines are half-magic, excluded from . and [^ only
+	w	newlines are half-magic, significant to ^ and $ only
+	n	newlines are fully magic, both effects
+	x	expanded RE syntax
+	t	incomplete-match reporting
+	c	canmatch (equivalent to "t0!", in Postgres implementation)
+	s	match only at start (REG_BOSONLY)
+
+	A	backslash-_a_lphanumeric seen
+	B	ERE/ARE literal-_b_race heuristic used
+	E	backslash (_e_scape) seen within []
+	H	looka_h_ead constraint seen
+	I	_i_mpossible to match
+	L	_l_ocale-specific construct seen
+	M	unportable (_m_achine-specific) construct seen
+	N	RE can match empty (_n_ull) string
+	P	non-_P_OSIX construct seen
+	Q	{} _q_uantifier seen
+	R	back _r_eference seen
+	S	POSIX-un_s_pecified syntax seen
+	T	prefers shortest (_t_iny)
+	U	saw original-POSIX botch: unmatched right paren in ERE (_u_gh)