summaryrefslogtreecommitdiffstats
path: root/upstream/debian-unstable/man3/Unicode::Collate.3perl
diff options
context:
space:
mode:
Diffstat (limited to 'upstream/debian-unstable/man3/Unicode::Collate.3perl')
-rw-r--r--upstream/debian-unstable/man3/Unicode::Collate.3perl1192
1 files changed, 1192 insertions, 0 deletions
diff --git a/upstream/debian-unstable/man3/Unicode::Collate.3perl b/upstream/debian-unstable/man3/Unicode::Collate.3perl
new file mode 100644
index 00000000..c507ba3c
--- /dev/null
+++ b/upstream/debian-unstable/man3/Unicode::Collate.3perl
@@ -0,0 +1,1192 @@
+.\" -*- mode: troff; coding: utf-8 -*-
+.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
+.\"
+.\" Standard preamble:
+.\" ========================================================================
+.de Sp \" Vertical space (when we can't use .PP)
+.if t .sp .5v
+.if n .sp
+..
+.de Vb \" Begin verbatim text
+.ft CW
+.nf
+.ne \\$1
+..
+.de Ve \" End verbatim text
+.ft R
+.fi
+..
+.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
+.ie n \{\
+. ds C` ""
+. ds C' ""
+'br\}
+.el\{\
+. ds C`
+. ds C'
+'br\}
+.\"
+.\" Escape single quotes in literal strings from groff's Unicode transform.
+.ie \n(.g .ds Aq \(aq
+.el .ds Aq '
+.\"
+.\" If the F register is >0, we'll generate index entries on stderr for
+.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
+.\" entries marked with X<> in POD. Of course, you'll have to process the
+.\" output yourself in some meaningful fashion.
+.\"
+.\" Avoid warning from groff about undefined register 'F'.
+.de IX
+..
+.nr rF 0
+.if \n(.g .if rF .nr rF 1
+.if (\n(rF:(\n(.g==0)) \{\
+. if \nF \{\
+. de IX
+. tm Index:\\$1\t\\n%\t"\\$2"
+..
+. if !\nF==2 \{\
+. nr % 0
+. nr F 2
+. \}
+. \}
+.\}
+.rr rF
+.\" ========================================================================
+.\"
+.IX Title "Unicode::Collate 3perl"
+.TH Unicode::Collate 3perl 2024-01-12 "perl v5.38.2" "Perl Programmers Reference Guide"
+.\" For nroff, turn off justification. Always turn off hyphenation; it makes
+.\" way too many mistakes in technical documents.
+.if n .ad l
+.nh
+.SH NAME
+Unicode::Collate \- Unicode Collation Algorithm
+.SH SYNOPSIS
+.IX Header "SYNOPSIS"
+.Vb 1
+\& use Unicode::Collate;
+\&
+\& #construct
+\& $Collator = Unicode::Collate\->new(%tailoring);
+\&
+\& #sort
+\& @sorted = $Collator\->sort(@not_sorted);
+\&
+\& #compare
+\& $result = $Collator\->cmp($a, $b); # returns 1, 0, or \-1.
+.Ve
+.PP
+\&\fBNote:\fR Strings in \f(CW@not_sorted\fR, \f(CW$a\fR and \f(CW$b\fR are interpreted
+according to Perl's Unicode support. See perlunicode,
+perluniintro, perlunitut, perlunifaq, utf8.
+Otherwise you can use \f(CW\*(C`preprocess\*(C'\fR or should decode them before.
+.SH DESCRIPTION
+.IX Header "DESCRIPTION"
+This module is an implementation of Unicode Technical Standard #10
+(a.k.a. UTS #10) \- Unicode Collation Algorithm (a.k.a. UCA).
+.SS "Constructor and Tailoring"
+.IX Subsection "Constructor and Tailoring"
+The \f(CW\*(C`new\*(C'\fR method returns a collator object. If \fBnew()\fR is called
+with no parameters, the collator should do the default collation.
+.PP
+.Vb 10
+\& $Collator = Unicode::Collate\->new(
+\& UCA_Version => $UCA_Version,
+\& alternate => $alternate, # alias for \*(Aqvariable\*(Aq
+\& backwards => $levelNumber, # or \e@levelNumbers
+\& entry => $element,
+\& hangul_terminator => $term_primary_weight,
+\& highestFFFF => $bool,
+\& identical => $bool,
+\& ignoreName => qr/$ignoreName/,
+\& ignoreChar => qr/$ignoreChar/,
+\& ignore_level2 => $bool,
+\& katakana_before_hiragana => $bool,
+\& level => $collationLevel,
+\& long_contraction => $bool,
+\& minimalFFFE => $bool,
+\& normalization => $normalization_form,
+\& overrideCJK => \e&overrideCJK,
+\& overrideHangul => \e&overrideHangul,
+\& preprocess => \e&preprocess,
+\& rearrange => \e@charList,
+\& rewrite => \e&rewrite,
+\& suppress => \e@charList,
+\& table => $filename,
+\& undefName => qr/$undefName/,
+\& undefChar => qr/$undefChar/,
+\& upper_before_lower => $bool,
+\& variable => $variable,
+\& );
+.Ve
+.IP UCA_Version 4
+.IX Item "UCA_Version"
+If the revision (previously "tracking version") number of UCA is given,
+behavior of that revision is emulated on collating.
+If omitted, the return value of \f(CWUCA_Version()\fR is used.
+.Sp
+The following revisions are supported. The default is 43.
+.Sp
+.Vb 10
+\& UCA Unicode Standard DUCET (@version)
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& 8 3.1 3.0.1 (3.0.1d9)
+\& 9 3.1 with Corrigendum 3 3.1.1
+\& 11 4.0.0
+\& 14 4.1.0
+\& 16 5.0.0
+\& 18 5.1.0
+\& 20 5.2.0
+\& 22 6.0.0
+\& 24 6.1.0
+\& 26 6.2.0
+\& 28 6.3.0
+\& 30 7.0.0
+\& 32 8.0.0
+\& 34 9.0.0
+\& 36 10.0.0
+\& 38 11.0.0
+\& 40 12.0.0
+\& 41 12.1.0
+\& 43 13.0.0
+.Ve
+.Sp
+* See below for \f(CW\*(C`long_contraction\*(C'\fR with \f(CW\*(C`UCA_Version\*(C'\fR 22 and 24.
+.Sp
+* Noncharacters (e.g. U+FFFF) are not ignored, and can be overridden
+since \f(CW\*(C`UCA_Version\*(C'\fR 22.
+.Sp
+* Out-of-range codepoints (greater than U+10FFFF) are not ignored,
+and can be overridden since \f(CW\*(C`UCA_Version\*(C'\fR 22.
+.Sp
+* Fully ignorable characters were ignored, and would not interrupt
+contractions with \f(CW\*(C`UCA_Version\*(C'\fR 9 and 11.
+.Sp
+* Treatment of ignorables after variables and some behaviors
+were changed at \f(CW\*(C`UCA_Version\*(C'\fR 9.
+.Sp
+* Characters regarded as CJK unified ideographs (cf. \f(CW\*(C`overrideCJK\*(C'\fR)
+depend on \f(CW\*(C`UCA_Version\*(C'\fR.
+.Sp
+* Many hangul jamo are assigned at \f(CW\*(C`UCA_Version\*(C'\fR 20, that will affect
+\&\f(CW\*(C`hangul_terminator\*(C'\fR.
+.IP alternate 4
+.IX Item "alternate"
+\&\-\- see 3.2.2 Alternate Weighting, version 8 of UTS #10
+.Sp
+For backward compatibility, \f(CW\*(C`alternate\*(C'\fR (old name) can be used
+as an alias for \f(CW\*(C`variable\*(C'\fR.
+.IP backwards 4
+.IX Item "backwards"
+\&\-\- see 3.4 Backward Accents, UTS #10.
+.Sp
+.Vb 1
+\& backwards => $levelNumber or \e@levelNumbers
+.Ve
+.Sp
+Weights in reverse order; ex. level 2 (diacritic ordering) in French.
+If omitted (or \f(CW$levelNumber\fR is \f(CW\*(C`undef\*(C'\fR or \f(CW\*(C`\e@levelNumbers\*(C'\fR is \f(CW\*(C`[]\*(C'\fR),
+forwards at all the levels.
+.IP entry 4
+.IX Item "entry"
+\&\-\- see 5 Tailoring; 9.1 Allkeys File Format, UTS #10.
+.Sp
+If the same character (or a sequence of characters) exists
+in the collation element table through \f(CW\*(C`table\*(C'\fR,
+mapping to collation elements is overridden.
+If it does not exist, the mapping is defined additionally.
+.Sp
+.Vb 12
+\& entry => <<\*(AqENTRY\*(Aq, # for DUCET v4.0.0 (allkeys\-4.0.0.txt)
+\&0063 0068 ; [.0E6A.0020.0002.0063] # ch
+\&0043 0068 ; [.0E6A.0020.0007.0043] # Ch
+\&0043 0048 ; [.0E6A.0020.0008.0043] # CH
+\&006C 006C ; [.0F4C.0020.0002.006C] # ll
+\&004C 006C ; [.0F4C.0020.0007.004C] # Ll
+\&004C 004C ; [.0F4C.0020.0008.004C] # LL
+\&00F1 ; [.0F7B.0020.0002.00F1] # n\-tilde
+\&006E 0303 ; [.0F7B.0020.0002.00F1] # n\-tilde
+\&00D1 ; [.0F7B.0020.0008.00D1] # N\-tilde
+\&004E 0303 ; [.0F7B.0020.0008.00D1] # N\-tilde
+\&ENTRY
+\&
+\& entry => <<\*(AqENTRY\*(Aq, # for DUCET v4.0.0 (allkeys\-4.0.0.txt)
+\&00E6 ; [.0E33.0020.0002.00E6][.0E8B.0020.0002.00E6] # ae ligature as <a><e>
+\&00C6 ; [.0E33.0020.0008.00C6][.0E8B.0020.0008.00C6] # AE ligature as <A><E>
+\&ENTRY
+.Ve
+.Sp
+\&\fBNOTE:\fR The code point in the UCA file format (before \f(CW\*(Aq;\*(Aq\fR)
+\&\fBmust\fR be a Unicode code point (defined as hexadecimal),
+but not a native code point.
+So \f(CW0063\fR must always denote \f(CW\*(C`U+0063\*(C'\fR,
+but not a character of \f(CW"\ex63"\fR.
+.Sp
+Weighting may vary depending on collation element table.
+So ensure the weights defined in \f(CW\*(C`entry\*(C'\fR will be consistent with
+those in the collation element table loaded via \f(CW\*(C`table\*(C'\fR.
+.Sp
+In DUCET v4.0.0, primary weight of \f(CW\*(C`C\*(C'\fR is \f(CW0E60\fR
+and that of \f(CW\*(C`D\*(C'\fR is \f(CW\*(C`0E6D\*(C'\fR. So setting primary weight of \f(CW\*(C`CH\*(C'\fR to \f(CW\*(C`0E6A\*(C'\fR
+(as a value between \f(CW0E60\fR and \f(CW\*(C`0E6D\*(C'\fR)
+makes ordering as \f(CW\*(C`C < CH < D\*(C'\fR.
+Exactly speaking DUCET already has some characters between \f(CW\*(C`C\*(C'\fR and \f(CW\*(C`D\*(C'\fR:
+\&\f(CW\*(C`small capital C\*(C'\fR (\f(CW\*(C`U+1D04\*(C'\fR) with primary weight \f(CW0E64\fR,
+\&\f(CW\*(C`c\-hook/C\-hook\*(C'\fR (\f(CW\*(C`U+0188/U+0187\*(C'\fR) with \f(CW0E65\fR,
+and \f(CW\*(C`c\-curl\*(C'\fR (\f(CW\*(C`U+0255\*(C'\fR) with \f(CW0E69\fR.
+Then primary weight \f(CW\*(C`0E6A\*(C'\fR for \f(CW\*(C`CH\*(C'\fR makes \f(CW\*(C`CH\*(C'\fR
+ordered between \f(CW\*(C`c\-curl\*(C'\fR and \f(CW\*(C`D\*(C'\fR.
+.IP hangul_terminator 4
+.IX Item "hangul_terminator"
+\&\-\- see 7.1.4 Trailing Weights, UTS #10.
+.Sp
+If a true value is given (non-zero but should be positive),
+it will be added as a terminator primary weight to the end of
+every standard Hangul syllable. Secondary and any higher weights
+for terminator are set to zero.
+If the value is false or \f(CW\*(C`hangul_terminator\*(C'\fR key does not exist,
+insertion of terminator weights will not be performed.
+.Sp
+Boundaries of Hangul syllables are determined
+according to conjoining Jamo behavior in \fIthe Unicode Standard\fR
+and \fIHangulSyllableType.txt\fR.
+.Sp
+\&\fBImplementation Note:\fR
+(1) For expansion mapping (Unicode character mapped
+to a sequence of collation elements), a terminator will not be added
+between collation elements, even if Hangul syllable boundary exists there.
+Addition of terminator is restricted to the next position
+to the last collation element.
+.Sp
+(2) Non-conjoining Hangul letters
+(Compatibility Jamo, halfwidth Jamo, and enclosed letters) are not
+automatically terminated with a terminator primary weight.
+These characters may need terminator included in a collation element
+table beforehand.
+.IP highestFFFF 4
+.IX Item "highestFFFF"
+\&\-\- see 2.4 Tailored noncharacter weights, UTS #35 (LDML) Part 5: Collation.
+.Sp
+If the parameter is made true, \f(CW\*(C`U+FFFF\*(C'\fR has a highest primary weight.
+When a boolean of \f(CW\*(C`$coll\->ge($str, "abc")\*(C'\fR and
+\&\f(CW\*(C`$coll\->le($str, "abc\ex{FFFF}")\*(C'\fR is true, it is expected that \f(CW$str\fR
+begins with \f(CW"abc"\fR, or another primary equivalent.
+\&\f(CW$str\fR may be \f(CW"abcd"\fR, \f(CW"abc012"\fR, but should not include \f(CW\*(C`U+FFFF\*(C'\fR
+such as \f(CW"abc\ex{FFFF}xyz"\fR.
+.Sp
+\&\f(CW\*(C`$coll\->le($str, "abc\ex{FFFF}")\*(C'\fR works like \f(CW\*(C`$coll\->lt($str, "abd")\*(C'\fR
+almost, but the latter has a problem that you should know which letter is
+next to \f(CW\*(C`c\*(C'\fR. For a certain language where \f(CW\*(C`ch\*(C'\fR as the next letter,
+\&\f(CW"abch"\fR is greater than \f(CW"abc\ex{FFFF}"\fR, but less than \f(CW"abd"\fR.
+.Sp
+Note:
+This is equivalent to \f(CW\*(C`(entry => \*(AqFFFF ; [.FFFE.0020.0005.FFFF]\*(Aq)\*(C'\fR.
+Any other character than \f(CW\*(C`U+FFFF\*(C'\fR can be tailored by \f(CW\*(C`entry\*(C'\fR.
+.IP identical 4
+.IX Item "identical"
+\&\-\- see A.3 Deterministic Comparison, UTS #10.
+.Sp
+By default, strings whose weights are equal should be equal,
+even though their code points are not equal.
+Completely ignorable characters are ignored.
+.Sp
+If the parameter is made true, a final, tie-breaking level is used.
+If no difference of weights is found after the comparison through
+all the level specified by \f(CW\*(C`level\*(C'\fR, the comparison with code points
+will be performed.
+For the tie-breaking comparison, the sort key has code points
+of the original string appended.
+Completely ignorable characters are not ignored.
+.Sp
+If \f(CW\*(C`preprocess\*(C'\fR and/or \f(CW\*(C`normalization\*(C'\fR is applied, the code points
+of the string after them (in NFD by default) are used.
+.IP ignoreChar 4
+.IX Item "ignoreChar"
+.PD 0
+.IP ignoreName 4
+.IX Item "ignoreName"
+.PD
+\&\-\- see 3.6 Variable Weighting, UTS #10.
+.Sp
+Makes the entry in the table completely ignorable;
+i.e. as if the weights were zero at all level.
+.Sp
+Through \f(CW\*(C`ignoreChar\*(C'\fR, any character matching \f(CW\*(C`qr/$ignoreChar/\*(C'\fR
+will be ignored. Through \f(CW\*(C`ignoreName\*(C'\fR, any character whose name
+(given in the \f(CW\*(C`table\*(C'\fR file as a comment) matches \f(CW\*(C`qr/$ignoreName/\*(C'\fR
+will be ignored.
+.Sp
+E.g. when 'a' and 'e' are ignorable,
+\&'element' is equal to 'lament' (or 'lmnt').
+.IP ignore_level2 4
+.IX Item "ignore_level2"
+\&\-\- see 5.1 Parametric Tailoring, UTS #10.
+.Sp
+By default, case-sensitive comparison (that is level 3 difference)
+won't ignore accents (that is level 2 difference).
+.Sp
+If the parameter is made true, accents (and other primary ignorable
+characters) are ignored, even though cases are taken into account.
+.Sp
+\&\fBNOTE\fR: \f(CW\*(C`level\*(C'\fR should be 3 or greater.
+.IP katakana_before_hiragana 4
+.IX Item "katakana_before_hiragana"
+\&\-\- see 7.2 Tertiary Weight Table, UTS #10.
+.Sp
+By default, hiragana is before katakana.
+If the parameter is made true, this is reversed.
+.Sp
+\&\fBNOTE\fR: This parameter simplemindedly assumes that any hiragana/katakana
+distinctions must occur in level 3, and their weights at level 3 must be
+same as those mentioned in 7.3.1, UTS #10.
+If you define your collation elements which violate this requirement,
+this parameter does not work validly.
+.IP level 4
+.IX Item "level"
+\&\-\- see 4.3 Form Sort Key, UTS #10.
+.Sp
+Set the maximum level.
+Any higher levels than the specified one are ignored.
+.Sp
+.Vb 4
+\& Level 1: alphabetic ordering
+\& Level 2: diacritic ordering
+\& Level 3: case ordering
+\& Level 4: tie\-breaking (e.g. in the case when variable is \*(Aqshifted\*(Aq)
+\&
+\& ex.level => 2,
+.Ve
+.Sp
+If omitted, the maximum is the 4th.
+.Sp
+\&\fBNOTE:\fR The DUCET includes weights over 0xFFFF at the 4th level.
+But this module only uses weights within 0xFFFF.
+When \f(CW\*(C`variable\*(C'\fR is 'blanked' or 'non\-ignorable' (other than 'shifted'
+and 'shift\-trimmed'), the level 4 may be unreliable.
+.Sp
+See also \f(CW\*(C`identical\*(C'\fR.
+.IP long_contraction 4
+.IX Item "long_contraction"
+\&\-\- see 3.8.2 Well-Formedness of the DUCET, 4.2 Produce Array, UTS #10.
+.Sp
+If the parameter is made true, for a contraction with three or more
+characters (here nicknamed "long contraction"), initial substrings
+will be handled.
+For example, a contraction ABC, where A is a starter, and B and C
+are non-starters (character with non-zero combining character class),
+will be detected even if there is not AB as a contraction.
+.Sp
+\&\fBDefault:\fR Usually false.
+If \f(CW\*(C`UCA_Version\*(C'\fR is 22 or 24, and the value of \f(CW\*(C`long_contraction\*(C'\fR
+is not specified in \f(CWnew()\fR, a true value is set implicitly.
+This is a workaround to pass Conformance Tests for Unicode 6.0.0 and 6.1.0.
+.Sp
+\&\f(CWchange()\fR handles \f(CW\*(C`long_contraction\*(C'\fR explicitly only.
+If \f(CW\*(C`long_contraction\*(C'\fR is not specified in \f(CWchange()\fR, even though
+\&\f(CW\*(C`UCA_Version\*(C'\fR is changed, \f(CW\*(C`long_contraction\*(C'\fR will not be changed.
+.Sp
+\&\fBLimitation:\fR Scanning non-starters is one-way (no back tracking).
+If AB is found but not ABC is not found, other long contraction where
+the first character is A and the second is not B may not be found.
+.Sp
+Under \f(CW\*(C`(normalization => undef)\*(C'\fR, detection step of discontiguous
+contractions will be skipped.
+.Sp
+\&\fBNote:\fR The following contractions in DUCET are not considered
+in steps S2.1.1 to S2.1.3, where they are discontiguous.
+.Sp
+.Vb 2
+\& 0FB2 0F71 0F80 (TIBETAN VOWEL SIGN VOCALIC RR)
+\& 0FB3 0F71 0F80 (TIBETAN VOWEL SIGN VOCALIC LL)
+.Ve
+.Sp
+For example \f(CW\*(C`TIBETAN VOWEL SIGN VOCALIC RR\*(C'\fR with \f(CW\*(C`COMBINING TILDE OVERLAY\*(C'\fR
+(\f(CW\*(C`U+0344\*(C'\fR) is \f(CW\*(C`0FB2 0344 0F71 0F80\*(C'\fR in NFD.
+In this case \f(CW\*(C`0FB2 0F80\*(C'\fR (\f(CW\*(C`TIBETAN VOWEL SIGN VOCALIC R\*(C'\fR) is detected,
+instead of \f(CW\*(C`0FB2 0F71 0F80\*(C'\fR.
+Inserted \f(CW0344\fR makes \f(CW\*(C`0FB2 0F71 0F80\*(C'\fR discontiguous and lack of
+contraction \f(CW\*(C`0FB2 0F71\*(C'\fR prohibits \f(CW\*(C`0FB2 0F71 0F80\*(C'\fR from being detected.
+.IP minimalFFFE 4
+.IX Item "minimalFFFE"
+\&\-\- see 1.1.1 U+FFFE, UTS #35 (LDML) Part 5: Collation.
+.Sp
+If the parameter is made true, \f(CW\*(C`U+FFFE\*(C'\fR has a minimal primary weight.
+The comparison between \f(CW"$a1\ex{FFFE}$a2"\fR and \f(CW"$b1\ex{FFFE}$b2"\fR
+first compares \f(CW$a1\fR and \f(CW$b1\fR at level 1, and
+then \f(CW$a2\fR and \f(CW$b2\fR at level 1, as followed.
+.Sp
+.Vb 12
+\& "ab\ex{FFFE}a"
+\& "Ab\ex{FFFE}a"
+\& "ab\ex{FFFE}c"
+\& "Ab\ex{FFFE}c"
+\& "ab\ex{FFFE}xyz"
+\& "abc\ex{FFFE}def"
+\& "abc\ex{FFFE}xYz"
+\& "aBc\ex{FFFE}xyz"
+\& "abcX\ex{FFFE}def"
+\& "abcx\ex{FFFE}xyz"
+\& "b\ex{FFFE}aaa"
+\& "bbb\ex{FFFE}a"
+.Ve
+.Sp
+Note:
+This is equivalent to \f(CW\*(C`(entry => \*(AqFFFE ; [.0001.0020.0005.FFFE]\*(Aq)\*(C'\fR.
+Any other character than \f(CW\*(C`U+FFFE\*(C'\fR can be tailored by \f(CW\*(C`entry\*(C'\fR.
+.IP normalization 4
+.IX Item "normalization"
+\&\-\- see 4.1 Normalize, UTS #10.
+.Sp
+If specified, strings are normalized before preparation of sort keys
+(the normalization is executed after preprocess).
+.Sp
+A form name \f(CWUnicode::Normalize::normalize()\fR accepts will be applied
+as \f(CW$normalization_form\fR.
+Acceptable names include \f(CW\*(AqNFD\*(Aq\fR, \f(CW\*(AqNFC\*(Aq\fR, \f(CW\*(AqNFKD\*(Aq\fR, and \f(CW\*(AqNFKC\*(Aq\fR.
+See \f(CWUnicode::Normalize::normalize()\fR for detail.
+If omitted, \f(CW\*(AqNFD\*(Aq\fR is used.
+.Sp
+\&\f(CW\*(C`normalization\*(C'\fR is performed after \f(CW\*(C`preprocess\*(C'\fR (if defined).
+.Sp
+Furthermore, special values, \f(CW\*(C`undef\*(C'\fR and \f(CW"prenormalized"\fR, can be used,
+though they are not concerned with \f(CWUnicode::Normalize::normalize()\fR.
+.Sp
+If \f(CW\*(C`undef\*(C'\fR (not a string \f(CW"undef"\fR) is passed explicitly
+as the value for this key,
+any normalization is not carried out (this may make tailoring easier
+if any normalization is not desired). Under \f(CW\*(C`(normalization => undef)\*(C'\fR,
+only contiguous contractions are resolved;
+e.g. even if \f(CW\*(C`A\-ring\*(C'\fR (and \f(CW\*(C`A\-ring\-cedilla\*(C'\fR) is ordered after \f(CW\*(C`Z\*(C'\fR,
+\&\f(CW\*(C`A\-cedilla\-ring\*(C'\fR would be primary equal to \f(CW\*(C`A\*(C'\fR.
+In this point,
+\&\f(CW\*(C`(normalization => undef, preprocess => sub { NFD(shift) })\*(C'\fR
+\&\fBis not\fR equivalent to \f(CW\*(C`(normalization => \*(AqNFD\*(Aq)\*(C'\fR.
+.Sp
+In the case of \f(CW\*(C`(normalization => "prenormalized")\*(C'\fR,
+any normalization is not performed, but
+discontiguous contractions with combining characters are performed.
+Therefore
+\&\f(CW\*(C`(normalization => \*(Aqprenormalized\*(Aq, preprocess => sub { NFD(shift) })\*(C'\fR
+\&\fBis\fR equivalent to \f(CW\*(C`(normalization => \*(AqNFD\*(Aq)\*(C'\fR.
+If source strings are finely prenormalized,
+\&\f(CW\*(C`(normalization => \*(Aqprenormalized\*(Aq)\*(C'\fR may save time for normalization.
+.Sp
+Except \f(CW\*(C`(normalization => undef)\*(C'\fR,
+\&\fBUnicode::Normalize\fR is required (see also \fBCAVEAT\fR).
+.IP overrideCJK 4
+.IX Item "overrideCJK"
+\&\-\- see 7.1 Derived Collation Elements, UTS #10.
+.Sp
+By default, CJK unified ideographs are ordered in Unicode codepoint
+order, but those in the CJK Unified Ideographs block are less than
+those in the CJK Unified Ideographs Extension A etc.
+.Sp
+.Vb 10
+\& In the CJK Unified Ideographs block:
+\& U+4E00..U+9FA5 if UCA_Version is 8, 9 or 11.
+\& U+4E00..U+9FBB if UCA_Version is 14 or 16.
+\& U+4E00..U+9FC3 if UCA_Version is 18.
+\& U+4E00..U+9FCB if UCA_Version is 20 or 22.
+\& U+4E00..U+9FCC if UCA_Version is 24 to 30.
+\& U+4E00..U+9FD5 if UCA_Version is 32 or 34.
+\& U+4E00..U+9FEA if UCA_Version is 36.
+\& U+4E00..U+9FEF if UCA_Version is 38, 40 or 41.
+\& U+4E00..U+9FFC if UCA_Version is 43.
+\&
+\& In the CJK Unified Ideographs Extension blocks:
+\& Ext.A (U+3400..U+4DB5) if UCA_Version is 8 to 41.
+\& Ext.A (U+3400..U+4DBF) if UCA_Version is 43.
+\& Ext.B (U+20000..U+2A6D6) if UCA_Version is 8 to 41.
+\& Ext.B (U+20000..U+2A6DD) if UCA_Version is 43.
+\& Ext.C (U+2A700..U+2B734) if UCA_Version is 20 or later.
+\& Ext.D (U+2B740..U+2B81D) if UCA_Version is 22 or later.
+\& Ext.E (U+2B820..U+2CEA1) if UCA_Version is 32 or later.
+\& Ext.F (U+2CEB0..U+2EBE0) if UCA_Version is 36 or later.
+\& Ext.G (U+30000..U+3134A) if UCA_Version is 43.
+.Ve
+.Sp
+Through \f(CW\*(C`overrideCJK\*(C'\fR, ordering of CJK unified ideographs (including
+extensions) can be overridden.
+.Sp
+ex. CJK unified ideographs in the JIS code point order.
+.Sp
+.Vb 7
+\& overrideCJK => sub {
+\& my $u = shift; # get a Unicode codepoint
+\& my $b = pack(\*(Aqn\*(Aq, $u); # to UTF\-16BE
+\& my $s = your_unicode_to_sjis_converter($b); # convert
+\& my $n = unpack(\*(Aqn\*(Aq, $s); # convert sjis to short
+\& [ $n, 0x20, 0x2, $u ]; # return the collation element
+\& },
+.Ve
+.Sp
+The return value may be an arrayref of 1st to 4th weights as shown
+above. The return value may be an integer as the primary weight
+as shown below. If \f(CW\*(C`undef\*(C'\fR is returned, the default derived
+collation element will be used.
+.Sp
+.Vb 7
+\& overrideCJK => sub {
+\& my $u = shift; # get a Unicode codepoint
+\& my $b = pack(\*(Aqn\*(Aq, $u); # to UTF\-16BE
+\& my $s = your_unicode_to_sjis_converter($b); # convert
+\& my $n = unpack(\*(Aqn\*(Aq, $s); # convert sjis to short
+\& return $n; # return the primary weight
+\& },
+.Ve
+.Sp
+The return value may be a list containing zero or more of
+an arrayref, an integer, or \f(CW\*(C`undef\*(C'\fR.
+.Sp
+ex. ignores all CJK unified ideographs.
+.Sp
+.Vb 1
+\& overrideCJK => sub {()}, # CODEREF returning empty list
+\&
+\& # where \->eq("Pe\ex{4E00}rl", "Perl") is true
+\& # as U+4E00 is a CJK unified ideograph and to be ignorable.
+.Ve
+.Sp
+If a false value (including \f(CW\*(C`undef\*(C'\fR) is passed, \f(CW\*(C`overrideCJK\*(C'\fR
+has no effect.
+\&\f(CW\*(C`$Collator\->change(overrideCJK => 0)\*(C'\fR resets the old one.
+.Sp
+But assignment of weight for CJK unified ideographs
+in \f(CW\*(C`table\*(C'\fR or \f(CW\*(C`entry\*(C'\fR is still valid.
+If \f(CW\*(C`undef\*(C'\fR is passed explicitly as the value for this key,
+weights for CJK unified ideographs are treated as undefined.
+However when \f(CW\*(C`UCA_Version\*(C'\fR > 8, \f(CW\*(C`(overrideCJK => undef)\*(C'\fR
+has no special meaning.
+.Sp
+\&\fBNote:\fR In addition to them, 12 CJK compatibility ideographs (\f(CW\*(C`U+FA0E\*(C'\fR,
+\&\f(CW\*(C`U+FA0F\*(C'\fR, \f(CW\*(C`U+FA11\*(C'\fR, \f(CW\*(C`U+FA13\*(C'\fR, \f(CW\*(C`U+FA14\*(C'\fR, \f(CW\*(C`U+FA1F\*(C'\fR, \f(CW\*(C`U+FA21\*(C'\fR, \f(CW\*(C`U+FA23\*(C'\fR,
+\&\f(CW\*(C`U+FA24\*(C'\fR, \f(CW\*(C`U+FA27\*(C'\fR, \f(CW\*(C`U+FA28\*(C'\fR, \f(CW\*(C`U+FA29\*(C'\fR) are also treated as CJK unified
+ideographs. But they can't be overridden via \f(CW\*(C`overrideCJK\*(C'\fR when you use
+DUCET, as the table includes weights for them. \f(CW\*(C`table\*(C'\fR or \f(CW\*(C`entry\*(C'\fR has
+priority over \f(CW\*(C`overrideCJK\*(C'\fR.
+.IP overrideHangul 4
+.IX Item "overrideHangul"
+\&\-\- see 7.1 Derived Collation Elements, UTS #10.
+.Sp
+By default, Hangul syllables are decomposed into Hangul Jamo,
+even if \f(CW\*(C`(normalization => undef)\*(C'\fR.
+But the mapping of Hangul syllables may be overridden.
+.Sp
+This parameter works like \f(CW\*(C`overrideCJK\*(C'\fR, so see there for examples.
+.Sp
+If you want to override the mapping of Hangul syllables,
+NFD and NFKD are not appropriate, since NFD and NFKD will decompose
+Hangul syllables before overriding. FCD may decompose Hangul syllables
+as the case may be.
+.Sp
+If a false value (but not \f(CW\*(C`undef\*(C'\fR) is passed, \f(CW\*(C`overrideHangul\*(C'\fR
+has no effect.
+\&\f(CW\*(C`$Collator\->change(overrideHangul => 0)\*(C'\fR resets the old one.
+.Sp
+If \f(CW\*(C`undef\*(C'\fR is passed explicitly as the value for this key,
+weight for Hangul syllables is treated as undefined
+without decomposition into Hangul Jamo.
+But definition of weight for Hangul syllables
+in \f(CW\*(C`table\*(C'\fR or \f(CW\*(C`entry\*(C'\fR is still valid.
+.IP overrideOut 4
+.IX Item "overrideOut"
+\&\-\- see 7.1.1 Handling Ill-Formed Code Unit Sequences, UTS #10.
+.Sp
+Perl seems to allow out-of-range values (greater than 0x10FFFF).
+By default, out-of-range values are replaced with \f(CW\*(C`U+FFFD\*(C'\fR
+(REPLACEMENT CHARACTER) when \f(CW\*(C`UCA_Version\*(C'\fR >= 22,
+or ignored when \f(CW\*(C`UCA_Version\*(C'\fR <= 20.
+.Sp
+When \f(CW\*(C`UCA_Version\*(C'\fR >= 22, the weights of out-of-range values
+can be overridden. Though \f(CW\*(C`table\*(C'\fR or \f(CW\*(C`entry\*(C'\fR are available for them,
+out-of-range values are too many.
+.Sp
+\&\f(CW\*(C`overrideOut\*(C'\fR can perform it algorithmically.
+This parameter works like \f(CW\*(C`overrideCJK\*(C'\fR, so see there for examples.
+.Sp
+ex. ignores all out-of-range values.
+.Sp
+.Vb 1
+\& overrideOut => sub {()}, # CODEREF returning empty list
+.Ve
+.Sp
+If a false value (including \f(CW\*(C`undef\*(C'\fR) is passed, \f(CW\*(C`overrideOut\*(C'\fR
+has no effect.
+\&\f(CW\*(C`$Collator\->change(overrideOut => 0)\*(C'\fR resets the old one.
+.Sp
+\&\fBNOTE ABOUT U+FFFD:\fR
+.Sp
+UCA recommends that out-of-range values should not be ignored for security
+reasons. Say, \f(CW"pe\ex{110000}rl"\fR should not be equal to \f(CW"perl"\fR.
+However, \f(CW\*(C`U+FFFD\*(C'\fR is wrongly mapped to a variable collation element
+in DUCET for Unicode 6.0.0 to 6.2.0, that means out-of-range values will be
+ignored when \f(CW\*(C`variable\*(C'\fR isn't \f(CW\*(C`Non\-ignorable\*(C'\fR.
+.Sp
+The mapping of \f(CW\*(C`U+FFFD\*(C'\fR is corrected in Unicode 6.3.0.
+see <http://www.unicode.org/reports/tr10/tr10\-28.html#Trailing_Weights>
+(7.1.4 Trailing Weights). Such a correction is reproduced by this.
+.Sp
+.Vb 1
+\& overrideOut => sub { 0xFFFD }, # CODEREF returning a very large integer
+.Ve
+.Sp
+This workaround is unnecessary since Unicode 6.3.0.
+.IP preprocess 4
+.IX Item "preprocess"
+\&\-\- see 5.4 Preprocessing, UTS #10.
+.Sp
+If specified, the coderef is used to preprocess each string
+before the formation of sort keys.
+.Sp
+ex. dropping English articles, such as "a" or "the".
+Then, "the pen" is before "a pencil".
+.Sp
+.Vb 5
+\& preprocess => sub {
+\& my $str = shift;
+\& $str =~ s/\eb(?:an?|the)\es+//gi;
+\& return $str;
+\& },
+.Ve
+.Sp
+\&\f(CW\*(C`preprocess\*(C'\fR is performed before \f(CW\*(C`normalization\*(C'\fR (if defined).
+.Sp
+ex. decoding strings in a legacy encoding such as shift-jis:
+.Sp
+.Vb 4
+\& $sjis_collator = Unicode::Collate\->new(
+\& preprocess => \e&your_shiftjis_to_unicode_decoder,
+\& );
+\& @result = $sjis_collator\->sort(@shiftjis_strings);
+.Ve
+.Sp
+\&\fBNote:\fR Strings returned from the coderef will be interpreted
+according to Perl's Unicode support. See perlunicode,
+perluniintro, perlunitut, perlunifaq, utf8.
+.IP rearrange 4
+.IX Item "rearrange"
+\&\-\- see 3.5 Rearrangement, UTS #10.
+.Sp
+Characters that are not coded in logical order and to be rearranged.
+If \f(CW\*(C`UCA_Version\*(C'\fR is equal to or less than 11, default is:
+.Sp
+.Vb 1
+\& rearrange => [ 0x0E40..0x0E44, 0x0EC0..0x0EC4 ],
+.Ve
+.Sp
+If you want to disallow any rearrangement, pass \f(CW\*(C`undef\*(C'\fR or \f(CW\*(C`[]\*(C'\fR
+(a reference to empty list) as the value for this key.
+.Sp
+If \f(CW\*(C`UCA_Version\*(C'\fR is equal to or greater than 14, default is \f(CW\*(C`[]\*(C'\fR
+(i.e. no rearrangement).
+.Sp
+\&\fBAccording to the version 9 of UCA, this parameter shall not be used;
+but it is not warned at present.\fR
+.IP rewrite 4
+.IX Item "rewrite"
+If specified, the coderef is used to rewrite lines in \f(CW\*(C`table\*(C'\fR or \f(CW\*(C`entry\*(C'\fR.
+The coderef will get each line, and then should return a rewritten line
+according to the UCA file format.
+If the coderef returns an empty line, the line will be skipped.
+.Sp
+e.g. any primary ignorable characters into tertiary ignorable:
+.Sp
+.Vb 5
+\& rewrite => sub {
+\& my $line = shift;
+\& $line =~ s/\e[\e.0000\e..{4}\e..{4}\e./[.0000.0000.0000./g;
+\& return $line;
+\& },
+.Ve
+.Sp
+This example shows rewriting weights. \f(CW\*(C`rewrite\*(C'\fR is allowed to
+affect code points, weights, and the name.
+.Sp
+\&\fBNOTE\fR: \f(CW\*(C`table\*(C'\fR is available to use another table file;
+preparing a modified table once would be more efficient than
+rewriting lines on reading an unmodified table every time.
+.IP suppress 4
+.IX Item "suppress"
+\&\-\- see 3.12 Special-Purpose Commands, UTS #35 (LDML) Part 5: Collation.
+.Sp
+Contractions beginning with the specified characters are suppressed,
+even if those contractions are defined in \f(CW\*(C`table\*(C'\fR.
+.Sp
+An example for Russian and some languages using the Cyrillic script:
+.Sp
+.Vb 1
+\& suppress => [0x0400..0x0417, 0x041A..0x0437, 0x043A..0x045F],
+.Ve
+.Sp
+where 0x0400 stands for \f(CW\*(C`U+0400\*(C'\fR, CYRILLIC CAPITAL LETTER IE WITH GRAVE.
+.Sp
+\&\fBNOTE\fR: Contractions via \f(CW\*(C`entry\*(C'\fR will not be suppressed.
+.IP table 4
+.IX Item "table"
+\&\-\- see 3.8 Default Unicode Collation Element Table, UTS #10.
+.Sp
+You can use another collation element table if desired.
+.Sp
+The table file should locate in the \fIUnicode/Collate\fR directory
+on \f(CW@INC\fR. Say, if the filename is \fIFoo.txt\fR,
+the table file is searched as \fIUnicode/Collate/Foo.txt\fR in \f(CW@INC\fR.
+.Sp
+By default, \fIallkeys.txt\fR (as the filename of DUCET) is used.
+If you will prepare your own table file, any name other than \fIallkeys.txt\fR
+may be better to avoid namespace conflict.
+.Sp
+\&\fBNOTE\fR: When XSUB is used, the DUCET is compiled on building this
+module, and it may save time at the run time.
+Explicit saying \f(CW\*(C`(table => \*(Aqallkeys.txt\*(Aq)\*(C'\fR, or using another table,
+or using \f(CW\*(C`ignoreChar\*(C'\fR, \f(CW\*(C`ignoreName\*(C'\fR, \f(CW\*(C`undefChar\*(C'\fR, \f(CW\*(C`undefName\*(C'\fR or
+\&\f(CW\*(C`rewrite\*(C'\fR will prevent this module from using the compiled DUCET.
+.Sp
+If \f(CW\*(C`undef\*(C'\fR is passed explicitly as the value for this key,
+no file is read (but you can define collation elements via \f(CW\*(C`entry\*(C'\fR).
+.Sp
+A typical way to define a collation element table
+without any file of table:
+.Sp
+.Vb 11
+\& $onlyABC = Unicode::Collate\->new(
+\& table => undef,
+\& entry => << \*(AqENTRIES\*(Aq,
+\&0061 ; [.0101.0020.0002.0061] # LATIN SMALL LETTER A
+\&0041 ; [.0101.0020.0008.0041] # LATIN CAPITAL LETTER A
+\&0062 ; [.0102.0020.0002.0062] # LATIN SMALL LETTER B
+\&0042 ; [.0102.0020.0008.0042] # LATIN CAPITAL LETTER B
+\&0063 ; [.0103.0020.0002.0063] # LATIN SMALL LETTER C
+\&0043 ; [.0103.0020.0008.0043] # LATIN CAPITAL LETTER C
+\&ENTRIES
+\& );
+.Ve
+.Sp
+If \f(CW\*(C`ignoreName\*(C'\fR or \f(CW\*(C`undefName\*(C'\fR is used, character names should be
+specified as a comment (following \f(CW\*(C`#\*(C'\fR) on each line.
+.IP undefChar 4
+.IX Item "undefChar"
+.PD 0
+.IP undefName 4
+.IX Item "undefName"
+.PD
+\&\-\- see 6.3.3 Reducing the Repertoire, UTS #10.
+.Sp
+Undefines the collation element as if it were unassigned in the \f(CW\*(C`table\*(C'\fR.
+This reduces the size of the table.
+If an unassigned character appears in the string to be collated,
+the sort key is made from its codepoint
+as a single-character collation element,
+as it is greater than any other assigned collation elements
+(in the codepoint order among the unassigned characters).
+But, it'd be better to ignore characters
+unfamiliar to you and maybe never used.
+.Sp
+Through \f(CW\*(C`undefChar\*(C'\fR, any character matching \f(CW\*(C`qr/$undefChar/\*(C'\fR
+will be undefined. Through \f(CW\*(C`undefName\*(C'\fR, any character whose name
+(given in the \f(CW\*(C`table\*(C'\fR file as a comment) matches \f(CW\*(C`qr/$undefName/\*(C'\fR
+will be undefined.
+.Sp
+ex. Collation weights for beyond-BMP characters are not stored in object:
+.Sp
+.Vb 1
+\& undefChar => qr/[^\e0\-\ex{fffd}]/,
+.Ve
+.IP upper_before_lower 4
+.IX Item "upper_before_lower"
+\&\-\- see 6.6 Case Comparisons, UTS #10.
+.Sp
+By default, lowercase is before uppercase.
+If the parameter is made true, this is reversed.
+.Sp
+\&\fBNOTE\fR: This parameter simplemindedly assumes that any lowercase/uppercase
+distinctions must occur in level 3, and their weights at level 3 must be
+same as those mentioned in 7.3.1, UTS #10.
+If you define your collation elements which differs from this requirement,
+this parameter doesn't work validly.
+.IP variable 4
+.IX Item "variable"
+\&\-\- see 3.6 Variable Weighting, UTS #10.
+.Sp
+This key allows for variable weighting of variable collation elements,
+which are marked with an ASTERISK in the table
+(NOTE: Many punctuation marks and symbols are variable in \fIallkeys.txt\fR).
+.Sp
+.Vb 1
+\& variable => \*(Aqblanked\*(Aq, \*(Aqnon\-ignorable\*(Aq, \*(Aqshifted\*(Aq, or \*(Aqshift\-trimmed\*(Aq.
+.Ve
+.Sp
+These names are case-insensitive.
+By default (if specification is omitted), 'shifted' is adopted.
+.Sp
+.Vb 2
+\& \*(AqBlanked\*(Aq Variable elements are made ignorable at levels 1 through 3;
+\& considered at the 4th level.
+\&
+\& \*(AqNon\-Ignorable\*(Aq Variable elements are not reset to ignorable.
+\&
+\& \*(AqShifted\*(Aq Variable elements are made ignorable at levels 1 through 3
+\& their level 4 weight is replaced by the old level 1 weight.
+\& Level 4 weight for Non\-Variable elements is 0xFFFF.
+\&
+\& \*(AqShift\-Trimmed\*(Aq Same as \*(Aqshifted\*(Aq, but all FFFF\*(Aqs at the 4th level
+\& are trimmed.
+.Ve
+.SS "Methods for Collation"
+.IX Subsection "Methods for Collation"
+.ie n .IP """@sorted = $Collator\->sort(@not_sorted)""" 4
+.el .IP "\f(CW@sorted = $Collator\->sort(@not_sorted)\fR" 4
+.IX Item "@sorted = $Collator->sort(@not_sorted)"
+Sorts a list of strings.
+.ie n .IP """$result = $Collator\->cmp($a, $b)""" 4
+.el .IP "\f(CW$result = $Collator\->cmp($a, $b)\fR" 4
+.IX Item "$result = $Collator->cmp($a, $b)"
+Returns 1 (when \f(CW$a\fR is greater than \f(CW$b\fR)
+or 0 (when \f(CW$a\fR is equal to \f(CW$b\fR)
+or \-1 (when \f(CW$a\fR is less than \f(CW$b\fR).
+.ie n .IP """$result = $Collator\->eq($a, $b)""" 4
+.el .IP "\f(CW$result = $Collator\->eq($a, $b)\fR" 4
+.IX Item "$result = $Collator->eq($a, $b)"
+.PD 0
+.ie n .IP """$result = $Collator\->ne($a, $b)""" 4
+.el .IP "\f(CW$result = $Collator\->ne($a, $b)\fR" 4
+.IX Item "$result = $Collator->ne($a, $b)"
+.ie n .IP """$result = $Collator\->lt($a, $b)""" 4
+.el .IP "\f(CW$result = $Collator\->lt($a, $b)\fR" 4
+.IX Item "$result = $Collator->lt($a, $b)"
+.ie n .IP """$result = $Collator\->le($a, $b)""" 4
+.el .IP "\f(CW$result = $Collator\->le($a, $b)\fR" 4
+.IX Item "$result = $Collator->le($a, $b)"
+.ie n .IP """$result = $Collator\->gt($a, $b)""" 4
+.el .IP "\f(CW$result = $Collator\->gt($a, $b)\fR" 4
+.IX Item "$result = $Collator->gt($a, $b)"
+.ie n .IP """$result = $Collator\->ge($a, $b)""" 4
+.el .IP "\f(CW$result = $Collator\->ge($a, $b)\fR" 4
+.IX Item "$result = $Collator->ge($a, $b)"
+.PD
+They works like the same name operators as theirs.
+.Sp
+.Vb 6
+\& eq : whether $a is equal to $b.
+\& ne : whether $a is not equal to $b.
+\& lt : whether $a is less than $b.
+\& le : whether $a is less than $b or equal to $b.
+\& gt : whether $a is greater than $b.
+\& ge : whether $a is greater than $b or equal to $b.
+.Ve
+.ie n .IP """$sortKey = $Collator\->getSortKey($string)""" 4
+.el .IP "\f(CW$sortKey = $Collator\->getSortKey($string)\fR" 4
+.IX Item "$sortKey = $Collator->getSortKey($string)"
+\&\-\- see 4.3 Form Sort Key, UTS #10.
+.Sp
+Returns a sort key.
+.Sp
+You compare the sort keys using a binary comparison
+and get the result of the comparison of the strings using UCA.
+.Sp
+.Vb 1
+\& $Collator\->getSortKey($a) cmp $Collator\->getSortKey($b)
+\&
+\& is equivalent to
+\&
+\& $Collator\->cmp($a, $b)
+.Ve
+.ie n .IP """$sortKeyForm = $Collator\->viewSortKey($string)""" 4
+.el .IP "\f(CW$sortKeyForm = $Collator\->viewSortKey($string)\fR" 4
+.IX Item "$sortKeyForm = $Collator->viewSortKey($string)"
+Converts a sorting key into its representation form.
+If \f(CW\*(C`UCA_Version\*(C'\fR is 8, the output is slightly different.
+.Sp
+.Vb 3
+\& use Unicode::Collate;
+\& my $c = Unicode::Collate\->new();
+\& print $c\->viewSortKey("Perl"),"\en";
+\&
+\& # output:
+\& # [0B67 0A65 0B7F 0B03 | 0020 0020 0020 0020 | 0008 0002 0002 0002 | FFFF FFFF FFFF FFFF]
+\& # Level 1 Level 2 Level 3 Level 4
+.Ve
+.SS "Methods for Searching"
+.IX Subsection "Methods for Searching"
+The \f(CW\*(C`match\*(C'\fR, \f(CW\*(C`gmatch\*(C'\fR, \f(CW\*(C`subst\*(C'\fR, \f(CW\*(C`gsubst\*(C'\fR methods work
+like \f(CW\*(C`m//\*(C'\fR, \f(CW\*(C`m//g\*(C'\fR, \f(CW\*(C`s///\*(C'\fR, \f(CW\*(C`s///g\*(C'\fR, respectively,
+but they are not aware of any pattern, but only a literal substring.
+.PP
+\&\fBDISCLAIMER:\fR If \f(CW\*(C`preprocess\*(C'\fR or \f(CW\*(C`normalization\*(C'\fR parameter is true
+for \f(CW$Collator\fR, calling these methods (\f(CW\*(C`index\*(C'\fR, \f(CW\*(C`match\*(C'\fR, \f(CW\*(C`gmatch\*(C'\fR,
+\&\f(CW\*(C`subst\*(C'\fR, \f(CW\*(C`gsubst\*(C'\fR) is croaked, as the position and the length might
+differ from those on the specified string.
+.PP
+\&\f(CW\*(C`rearrange\*(C'\fR and \f(CW\*(C`hangul_terminator\*(C'\fR parameters are neglected.
+\&\f(CW\*(C`katakana_before_hiragana\*(C'\fR and \f(CW\*(C`upper_before_lower\*(C'\fR don't affect
+matching and searching, as it doesn't matter whether greater or less.
+.ie n .IP """$position = $Collator\->index($string, $substring[, $position])""" 4
+.el .IP "\f(CW$position = $Collator\->index($string, $substring[, $position])\fR" 4
+.IX Item "$position = $Collator->index($string, $substring[, $position])"
+.PD 0
+.ie n .IP """($position, $length) = $Collator\->index($string, $substring[, $position])""" 4
+.el .IP "\f(CW($position, $length) = $Collator\->index($string, $substring[, $position])\fR" 4
+.IX Item "($position, $length) = $Collator->index($string, $substring[, $position])"
+.PD
+If \f(CW$substring\fR matches a part of \f(CW$string\fR, returns
+the position of the first occurrence of the matching part in scalar context;
+in list context, returns a two-element list of
+the position and the length of the matching part.
+.Sp
+If \f(CW$substring\fR does not match any part of \f(CW$string\fR,
+returns \f(CW\-1\fR in scalar context and
+an empty list in list context.
+.Sp
+e.g. when the content of \f(CW$str\fR is \f(CW\*(C`"Ich mu\*(C'\fRß\f(CW\*(C` studieren Perl."\*(C'\fR,
+you say the following where \f(CW$sub\fR is \f(CW\*(C`"M\*(C'\fRü\f(CW\*(C`SS"\*(C'\fR,
+.Sp
+.Vb 6
+\& my $Collator = Unicode::Collate\->new( normalization => undef, level => 1 );
+\& # (normalization => undef) is REQUIRED.
+\& my $match;
+\& if (my($pos,$len) = $Collator\->index($str, $sub)) {
+\& $match = substr($str, $pos, $len);
+\& }
+.Ve
+.Sp
+and get \f(CW\*(C`"mu\*(C'\fRß\f(CW\*(C`"\*(C'\fR in \f(CW$match\fR, since \f(CW\*(C`"mu\*(C'\fRß\f(CW\*(C`"\*(C'\fR
+is primary equal to \f(CW\*(C`"M\*(C'\fRü\f(CW\*(C`SS"\*(C'\fR.
+.ie n .IP """$match_ref = $Collator\->match($string, $substring)""" 4
+.el .IP "\f(CW$match_ref = $Collator\->match($string, $substring)\fR" 4
+.IX Item "$match_ref = $Collator->match($string, $substring)"
+.PD 0
+.ie n .IP """($match) = $Collator\->match($string, $substring)""" 4
+.el .IP "\f(CW($match) = $Collator\->match($string, $substring)\fR" 4
+.IX Item "($match) = $Collator->match($string, $substring)"
+.PD
+If \f(CW$substring\fR matches a part of \f(CW$string\fR, in scalar context, returns
+\&\fBa reference to\fR the first occurrence of the matching part
+(\f(CW$match_ref\fR is always true if matches,
+since every reference is \fBtrue\fR);
+in list context, returns the first occurrence of the matching part.
+.Sp
+If \f(CW$substring\fR does not match any part of \f(CW$string\fR,
+returns \f(CW\*(C`undef\*(C'\fR in scalar context and
+an empty list in list context.
+.Sp
+e.g.
+.Sp
+.Vb 5
+\& if ($match_ref = $Collator\->match($str, $sub)) { # scalar context
+\& print "matches [$$match_ref].\en";
+\& } else {
+\& print "doesn\*(Aqt match.\en";
+\& }
+\&
+\& or
+\&
+\& if (($match) = $Collator\->match($str, $sub)) { # list context
+\& print "matches [$match].\en";
+\& } else {
+\& print "doesn\*(Aqt match.\en";
+\& }
+.Ve
+.ie n .IP """@match = $Collator\->gmatch($string, $substring)""" 4
+.el .IP "\f(CW@match = $Collator\->gmatch($string, $substring)\fR" 4
+.IX Item "@match = $Collator->gmatch($string, $substring)"
+If \f(CW$substring\fR matches a part of \f(CW$string\fR, returns
+all the matching parts (or matching count in scalar context).
+.Sp
+If \f(CW$substring\fR does not match any part of \f(CW$string\fR,
+returns an empty list.
+.ie n .IP """$count = $Collator\->subst($string, $substring, $replacement)""" 4
+.el .IP "\f(CW$count = $Collator\->subst($string, $substring, $replacement)\fR" 4
+.IX Item "$count = $Collator->subst($string, $substring, $replacement)"
+If \f(CW$substring\fR matches a part of \f(CW$string\fR,
+the first occurrence of the matching part is replaced by \f(CW$replacement\fR
+(\f(CW$string\fR is modified) and \f(CW$count\fR (always equals to \f(CW1\fR) is returned.
+.Sp
+\&\f(CW$replacement\fR can be a \f(CW\*(C`CODEREF\*(C'\fR,
+taking the matching part as an argument,
+and returning a string to replace the matching part
+(a bit similar to \f(CW\*(C`s/(..)/$coderef\->($1)/e\*(C'\fR).
+.ie n .IP """$count = $Collator\->gsubst($string, $substring, $replacement)""" 4
+.el .IP "\f(CW$count = $Collator\->gsubst($string, $substring, $replacement)\fR" 4
+.IX Item "$count = $Collator->gsubst($string, $substring, $replacement)"
+If \f(CW$substring\fR matches a part of \f(CW$string\fR,
+all the occurrences of the matching part are replaced by \f(CW$replacement\fR
+(\f(CW$string\fR is modified) and \f(CW$count\fR is returned.
+.Sp
+\&\f(CW$replacement\fR can be a \f(CW\*(C`CODEREF\*(C'\fR,
+taking the matching part as an argument,
+and returning a string to replace the matching part
+(a bit similar to \f(CW\*(C`s/(..)/$coderef\->($1)/eg\*(C'\fR).
+.Sp
+e.g.
+.Sp
+.Vb 4
+\& my $Collator = Unicode::Collate\->new( normalization => undef, level => 1 );
+\& # (normalization => undef) is REQUIRED.
+\& my $str = "Camel donkey zebra came\ex{301}l CAMEL horse cam\e0e\e0l...";
+\& $Collator\->gsubst($str, "camel", sub { "<b>$_[0]</b>" });
+\&
+\& # now $str is "<b>Camel</b> donkey zebra <b>came\ex{301}l</b> <b>CAMEL</b> horse <b>cam\e0e\e0l</b>...";
+\& # i.e., all the camels are made bold\-faced.
+\&
+\& Examples: levels and ignore_level2 \- what does camel match?
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& level ignore_level2 | camel Camel came\ex{301}l c\-a\-m\-e\-l cam\e0e\e0l
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-|\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& 1 false | yes yes yes yes yes
+\& 2 false | yes yes no yes yes
+\& 3 false | yes no no yes yes
+\& 4 false | yes no no no yes
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-|\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& 1 true | yes yes yes yes yes
+\& 2 true | yes yes yes yes yes
+\& 3 true | yes no yes yes yes
+\& 4 true | yes no yes no yes
+\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
+\& note: if variable => non\-ignorable, camel doesn\*(Aqt match c\-a\-m\-e\-l
+\& at any level.
+.Ve
+.SS "Other Methods"
+.IX Subsection "Other Methods"
+.ie n .IP """%old_tailoring = $Collator\->change(%new_tailoring)""" 4
+.el .IP "\f(CW%old_tailoring = $Collator\->change(%new_tailoring)\fR" 4
+.IX Item "%old_tailoring = $Collator->change(%new_tailoring)"
+.PD 0
+.ie n .IP """$modified_collator = $Collator\->change(%new_tailoring)""" 4
+.el .IP "\f(CW$modified_collator = $Collator\->change(%new_tailoring)\fR" 4
+.IX Item "$modified_collator = $Collator->change(%new_tailoring)"
+.PD
+Changes the value of specified keys and returns the changed part.
+.Sp
+.Vb 1
+\& $Collator = Unicode::Collate\->new(level => 4);
+\&
+\& $Collator\->eq("perl", "PERL"); # false
+\&
+\& %old = $Collator\->change(level => 2); # returns (level => 4).
+\&
+\& $Collator\->eq("perl", "PERL"); # true
+\&
+\& $Collator\->change(%old); # returns (level => 2).
+\&
+\& $Collator\->eq("perl", "PERL"); # false
+.Ve
+.Sp
+Not all \f(CW\*(C`(key,value)\*(C'\fRs are allowed to be changed.
+See also \f(CW@Unicode::Collate::ChangeOK\fR and \f(CW@Unicode::Collate::ChangeNG\fR.
+.Sp
+In the scalar context, returns the modified collator
+(but it is \fBnot\fR a clone from the original).
+.Sp
+.Vb 1
+\& $Collator\->change(level => 2)\->eq("perl", "PERL"); # true
+\&
+\& $Collator\->eq("perl", "PERL"); # true; now max level is 2nd.
+\&
+\& $Collator\->change(level => 4)\->eq("perl", "PERL"); # false
+.Ve
+.ie n .IP """$version = $Collator\->version()""" 4
+.el .IP "\f(CW$version = $Collator\->version()\fR" 4
+.IX Item "$version = $Collator->version()"
+Returns the version number (a string) of the Unicode Standard
+which the \f(CW\*(C`table\*(C'\fR file used by the collator object is based on.
+If the table does not include a version line (starting with \f(CW@version\fR),
+returns \f(CW"unknown"\fR.
+.ie n .IP UCA_Version() 4
+.el .IP \f(CWUCA_Version()\fR 4
+.IX Item "UCA_Version()"
+Returns the revision number of UTS #10 this module consults,
+that should correspond with the DUCET incorporated.
+.ie n .IP Base_Unicode_Version() 4
+.el .IP \f(CWBase_Unicode_Version()\fR 4
+.IX Item "Base_Unicode_Version()"
+Returns the version number of UTS #10 this module consults,
+that should correspond with the DUCET incorporated.
+.SH EXPORT
+.IX Header "EXPORT"
+No method will be exported.
+.SH INSTALL
+.IX Header "INSTALL"
+Though this module can be used without any \f(CW\*(C`table\*(C'\fR file,
+to use this module easily, it is recommended to install a table file
+in the UCA format, by copying it under the directory
+<a place in \f(CW@INC\fR>/Unicode/Collate.
+.PP
+The most preferable one is "The Default Unicode Collation Element Table"
+(aka DUCET), available from the Unicode Consortium's website:
+.PP
+.Vb 1
+\& http://www.unicode.org/Public/UCA/
+\&
+\& http://www.unicode.org/Public/UCA/latest/allkeys.txt
+\& (latest version)
+.Ve
+.PP
+If DUCET is not installed, it is recommended to copy the file
+from http://www.unicode.org/Public/UCA/latest/allkeys.txt
+to <a place in \f(CW@INC\fR>/Unicode/Collate/allkeys.txt
+manually.
+.SH CAVEATS
+.IX Header "CAVEATS"
+.IP Normalization 4
+.IX Item "Normalization"
+Use of the \f(CW\*(C`normalization\*(C'\fR parameter requires the \fBUnicode::Normalize\fR
+module (see Unicode::Normalize).
+.Sp
+If you need not it (say, in the case when you need not
+handle any combining characters),
+assign \f(CW\*(C`(normalization => undef)\*(C'\fR explicitly.
+.Sp
+\&\-\- see 6.5 Avoiding Normalization, UTS #10.
+.IP "Conformance Test" 4
+.IX Item "Conformance Test"
+The Conformance Test for the UCA is available
+under <http://www.unicode.org/Public/UCA/>.
+.Sp
+For \fICollationTest_SHIFTED.txt\fR,
+a collator via \f(CW\*(C`Unicode::Collate\->new( )\*(C'\fR should be used;
+for \fICollationTest_NON_IGNORABLE.txt\fR, a collator via
+\&\f(CW\*(C`Unicode::Collate\->new(variable => "non\-ignorable", level => 3)\*(C'\fR.
+.Sp
+If \f(CW\*(C`UCA_Version\*(C'\fR is 26 or later, the \f(CW\*(C`identical\*(C'\fR level is preferred;
+\&\f(CW\*(C`Unicode::Collate\->new(identical => 1)\*(C'\fR and
+\&\f(CW\*(C`Unicode::Collate\->new(identical => 1,\*(C'\fR
+\&\f(CW\*(C`variable => "non\-ignorable", level => 3)\*(C'\fR should be used.
+.Sp
+\&\fBUnicode::Normalize is required to try The Conformance Test.\fR
+.Sp
+\&\fBEBCDIC-SUPPORT IS EXPERIMENTAL.\fR
+.SH "AUTHOR, COPYRIGHT AND LICENSE"
+.IX Header "AUTHOR, COPYRIGHT AND LICENSE"
+The Unicode::Collate module for perl was written by SADAHIRO Tomoyuki,
+<SADAHIRO@cpan.org>. This module is Copyright(C) 2001\-2021,
+SADAHIRO Tomoyuki. Japan. All rights reserved.
+.PP
+This module is free software; you can redistribute it and/or
+modify it under the same terms as Perl itself.
+.PP
+The file Unicode/Collate/allkeys.txt was copied verbatim
+from <http://www.unicode.org/Public/UCA/13.0.0/allkeys.txt>.
+For this file, Copyright (c) 2020 Unicode, Inc.; distributed
+under the Terms of Use in <http://www.unicode.org/terms_of_use.html>
+.SH "SEE ALSO"
+.IX Header "SEE ALSO"
+.IP "Unicode Collation Algorithm \- UTS #10" 4
+.IX Item "Unicode Collation Algorithm - UTS #10"
+<http://www.unicode.org/reports/tr10/>
+.IP "The Default Unicode Collation Element Table (DUCET)" 4
+.IX Item "The Default Unicode Collation Element Table (DUCET)"
+<http://www.unicode.org/Public/UCA/latest/allkeys.txt>
+.IP "The conformance test for the UCA" 4
+.IX Item "The conformance test for the UCA"
+<http://www.unicode.org/Public/UCA/latest/CollationTest.html>
+.Sp
+<http://www.unicode.org/Public/UCA/latest/CollationTest.zip>
+.IP "Hangul Syllable Type" 4
+.IX Item "Hangul Syllable Type"
+<http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt>
+.IP "Unicode Normalization Forms \- UAX #15" 4
+.IX Item "Unicode Normalization Forms - UAX #15"
+<http://www.unicode.org/reports/tr15/>
+.IP "Unicode Locale Data Markup Language (LDML) \- UTS #35" 4
+.IX Item "Unicode Locale Data Markup Language (LDML) - UTS #35"
+<http://www.unicode.org/reports/tr35/>