diff options
Diffstat (limited to 'upstream/mageia-cauldron/man3pm/Unicode::Normalize.3pm')
-rw-r--r-- | upstream/mageia-cauldron/man3pm/Unicode::Normalize.3pm | 549 |
1 files changed, 549 insertions, 0 deletions
diff --git a/upstream/mageia-cauldron/man3pm/Unicode::Normalize.3pm b/upstream/mageia-cauldron/man3pm/Unicode::Normalize.3pm new file mode 100644 index 00000000..40f37a73 --- /dev/null +++ b/upstream/mageia-cauldron/man3pm/Unicode::Normalize.3pm @@ -0,0 +1,549 @@ +.\" -*- mode: troff; coding: utf-8 -*- +.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43) +.\" +.\" Standard preamble: +.\" ======================================================================== +.de Sp \" Vertical space (when we can't use .PP) +.if t .sp .5v +.if n .sp +.. +.de Vb \" Begin verbatim text +.ft CW +.nf +.ne \\$1 +.. +.de Ve \" End verbatim text +.ft R +.fi +.. +.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>. +.ie n \{\ +. ds C` "" +. ds C' "" +'br\} +.el\{\ +. ds C` +. ds C' +'br\} +.\" +.\" Escape single quotes in literal strings from groff's Unicode transform. +.ie \n(.g .ds Aq \(aq +.el .ds Aq ' +.\" +.\" If the F register is >0, we'll generate index entries on stderr for +.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index +.\" entries marked with X<> in POD. Of course, you'll have to process the +.\" output yourself in some meaningful fashion. +.\" +.\" Avoid warning from groff about undefined register 'F'. +.de IX +.. +.nr rF 0 +.if \n(.g .if rF .nr rF 1 +.if (\n(rF:(\n(.g==0)) \{\ +. if \nF \{\ +. de IX +. tm Index:\\$1\t\\n%\t"\\$2" +.. +. if !\nF==2 \{\ +. nr % 0 +. nr F 2 +. \} +. \} +.\} +.rr rF +.\" ======================================================================== +.\" +.IX Title "Unicode::Normalize 3pm" +.TH Unicode::Normalize 3pm 2023-11-28 "perl v5.38.2" "Perl Programmers Reference Guide" +.\" For nroff, turn off justification. Always turn off hyphenation; it makes +.\" way too many mistakes in technical documents. +.if n .ad l +.nh +.SH NAME +Unicode::Normalize \- Unicode Normalization Forms +.SH SYNOPSIS +.IX Header "SYNOPSIS" +(1) using function names exported by default: +.PP +.Vb 1 +\& use Unicode::Normalize; +\& +\& $NFD_string = NFD($string); # Normalization Form D +\& $NFC_string = NFC($string); # Normalization Form C +\& $NFKD_string = NFKD($string); # Normalization Form KD +\& $NFKC_string = NFKC($string); # Normalization Form KC +.Ve +.PP +(2) using function names exported on request: +.PP +.Vb 1 +\& use Unicode::Normalize \*(Aqnormalize\*(Aq; +\& +\& $NFD_string = normalize(\*(AqD\*(Aq, $string); # Normalization Form D +\& $NFC_string = normalize(\*(AqC\*(Aq, $string); # Normalization Form C +\& $NFKD_string = normalize(\*(AqKD\*(Aq, $string); # Normalization Form KD +\& $NFKC_string = normalize(\*(AqKC\*(Aq, $string); # Normalization Form KC +.Ve +.SH DESCRIPTION +.IX Header "DESCRIPTION" +Parameters: +.PP +\&\f(CW$string\fR is used as a string under character semantics (see perlunicode). +.PP +\&\f(CW$code_point\fR should be an unsigned integer representing a Unicode code point. +.PP +Note: Between XSUB and pure Perl, there is an incompatibility +about the interpretation of \f(CW$code_point\fR as a decimal number. +XSUB converts \f(CW$code_point\fR to an unsigned integer, but pure Perl does not. +Do not use a floating point nor a negative sign in \f(CW$code_point\fR. +.SS "Normalization Forms" +.IX Subsection "Normalization Forms" +.ie n .IP """$NFD_string = NFD($string)""" 4 +.el .IP "\f(CW$NFD_string = NFD($string)\fR" 4 +.IX Item "$NFD_string = NFD($string)" +It returns the Normalization Form D (formed by canonical decomposition). +.ie n .IP """$NFC_string = NFC($string)""" 4 +.el .IP "\f(CW$NFC_string = NFC($string)\fR" 4 +.IX Item "$NFC_string = NFC($string)" +It returns the Normalization Form C (formed by canonical decomposition +followed by canonical composition). +.ie n .IP """$NFKD_string = NFKD($string)""" 4 +.el .IP "\f(CW$NFKD_string = NFKD($string)\fR" 4 +.IX Item "$NFKD_string = NFKD($string)" +It returns the Normalization Form KD (formed by compatibility decomposition). +.ie n .IP """$NFKC_string = NFKC($string)""" 4 +.el .IP "\f(CW$NFKC_string = NFKC($string)\fR" 4 +.IX Item "$NFKC_string = NFKC($string)" +It returns the Normalization Form KC (formed by compatibility decomposition +followed by \fBcanonical\fR composition). +.ie n .IP """$FCD_string = FCD($string)""" 4 +.el .IP "\f(CW$FCD_string = FCD($string)\fR" 4 +.IX Item "$FCD_string = FCD($string)" +If the given string is in FCD ("Fast C or D" form; cf. UTN #5), +it returns the string without modification; otherwise it returns an FCD string. +.Sp +Note: FCD is not always unique, then plural forms may be equivalent +each other. \f(CWFCD()\fR will return one of these equivalent forms. +.ie n .IP """$FCC_string = FCC($string)""" 4 +.el .IP "\f(CW$FCC_string = FCC($string)\fR" 4 +.IX Item "$FCC_string = FCC($string)" +It returns the FCC form ("Fast C Contiguous"; cf. UTN #5). +.Sp +Note: FCC is unique, as well as four normalization forms (NF*). +.ie n .IP """$normalized_string = normalize($form_name, $string)""" 4 +.el .IP "\f(CW$normalized_string = normalize($form_name, $string)\fR" 4 +.IX Item "$normalized_string = normalize($form_name, $string)" +It returns the normalization form of \f(CW$form_name\fR. +.Sp +As \f(CW$form_name\fR, one of the following names must be given. +.Sp +.Vb 4 +\& \*(AqC\*(Aq or \*(AqNFC\*(Aq for Normalization Form C (UAX #15) +\& \*(AqD\*(Aq or \*(AqNFD\*(Aq for Normalization Form D (UAX #15) +\& \*(AqKC\*(Aq or \*(AqNFKC\*(Aq for Normalization Form KC (UAX #15) +\& \*(AqKD\*(Aq or \*(AqNFKD\*(Aq for Normalization Form KD (UAX #15) +\& +\& \*(AqFCD\*(Aq for "Fast C or D" Form (UTN #5) +\& \*(AqFCC\*(Aq for "Fast C Contiguous" (UTN #5) +.Ve +.SS "Decomposition and Composition" +.IX Subsection "Decomposition and Composition" +.ie n .IP """$decomposed_string = decompose($string [, $useCompatMapping])""" 4 +.el .IP "\f(CW$decomposed_string = decompose($string [, $useCompatMapping])\fR" 4 +.IX Item "$decomposed_string = decompose($string [, $useCompatMapping])" +It returns the concatenation of the decomposition of each character +in the string. +.Sp +If the second parameter (a boolean) is omitted or false, +the decomposition is canonical decomposition; +if the second parameter (a boolean) is true, +the decomposition is compatibility decomposition. +.Sp +The string returned is not always in NFD/NFKD. Reordering may be required. +.Sp +.Vb 2 +\& $NFD_string = reorder(decompose($string)); # eq. to NFD() +\& $NFKD_string = reorder(decompose($string, TRUE)); # eq. to NFKD() +.Ve +.ie n .IP """$reordered_string = reorder($string)""" 4 +.el .IP "\f(CW$reordered_string = reorder($string)\fR" 4 +.IX Item "$reordered_string = reorder($string)" +It returns the result of reordering the combining characters +according to Canonical Ordering Behavior. +.Sp +For example, when you have a list of NFD/NFKD strings, +you can get the concatenated NFD/NFKD string from them, by saying +.Sp +.Vb 2 +\& $concat_NFD = reorder(join \*(Aq\*(Aq, @NFD_strings); +\& $concat_NFKD = reorder(join \*(Aq\*(Aq, @NFKD_strings); +.Ve +.ie n .IP """$composed_string = compose($string)""" 4 +.el .IP "\f(CW$composed_string = compose($string)\fR" 4 +.IX Item "$composed_string = compose($string)" +It returns the result of canonical composition +without applying any decomposition. +.Sp +For example, when you have a NFD/NFKD string, +you can get its NFC/NFKC string, by saying +.Sp +.Vb 2 +\& $NFC_string = compose($NFD_string); +\& $NFKC_string = compose($NFKD_string); +.Ve +.ie n .IP """($processed, $unprocessed) = splitOnLastStarter($normalized)""" 4 +.el .IP "\f(CW($processed, $unprocessed) = splitOnLastStarter($normalized)\fR" 4 +.IX Item "($processed, $unprocessed) = splitOnLastStarter($normalized)" +It returns two strings: the first one, \f(CW$processed\fR, is a part +before the last starter, and the second one, \f(CW$unprocessed\fR is +another part after the first part. A starter is a character having +a combining class of zero (see UAX #15). +.Sp +Note that \f(CW$processed\fR may be empty (when \f(CW$normalized\fR contains no +starter or starts with the last starter), and then \f(CW$unprocessed\fR +should be equal to the entire \f(CW$normalized\fR. +.Sp +When you have a \f(CW$normalized\fR string and an \f(CW$unnormalized\fR string +following it, a simple concatenation is wrong: +.Sp +.Vb 1 +\& $concat = $normalized . normalize($form, $unnormalized); # wrong! +.Ve +.Sp +Instead of it, do like this: +.Sp +.Vb 2 +\& ($processed, $unprocessed) = splitOnLastStarter($normalized); +\& $concat = $processed . normalize($form,$unprocessed.$unnormalized); +.Ve +.Sp +\&\f(CWsplitOnLastStarter()\fR should be called with a pre-normalized parameter +\&\f(CW$normalized\fR, that is in the same form as \f(CW$form\fR you want. +.Sp +If you have an array of \f(CW@string\fR that should be concatenated and then +normalized, you can do like this: +.Sp +.Vb 11 +\& my $result = ""; +\& my $unproc = ""; +\& foreach my $str (@string) { +\& $unproc .= $str; +\& my $n = normalize($form, $unproc); +\& my($p, $u) = splitOnLastStarter($n); +\& $result .= $p; +\& $unproc = $u; +\& } +\& $result .= $unproc; +\& # instead of normalize($form, join(\*(Aq\*(Aq, @string)) +.Ve +.ie n .IP """$processed = normalize_partial($form, $unprocessed)""" 4 +.el .IP "\f(CW$processed = normalize_partial($form, $unprocessed)\fR" 4 +.IX Item "$processed = normalize_partial($form, $unprocessed)" +A wrapper for the combination of \f(CWnormalize()\fR and \f(CWsplitOnLastStarter()\fR. +Note that \f(CW$unprocessed\fR will be modified as a side-effect. +.Sp +If you have an array of \f(CW@string\fR that should be concatenated and then +normalized, you can do like this: +.Sp +.Vb 8 +\& my $result = ""; +\& my $unproc = ""; +\& foreach my $str (@string) { +\& $unproc .= $str; +\& $result .= normalize_partial($form, $unproc); +\& } +\& $result .= $unproc; +\& # instead of normalize($form, join(\*(Aq\*(Aq, @string)) +.Ve +.ie n .IP """$processed = NFD_partial($unprocessed)""" 4 +.el .IP "\f(CW$processed = NFD_partial($unprocessed)\fR" 4 +.IX Item "$processed = NFD_partial($unprocessed)" +It does like \f(CW\*(C`normalize_partial(\*(AqNFD\*(Aq, $unprocessed)\*(C'\fR. +Note that \f(CW$unprocessed\fR will be modified as a side-effect. +.ie n .IP """$processed = NFC_partial($unprocessed)""" 4 +.el .IP "\f(CW$processed = NFC_partial($unprocessed)\fR" 4 +.IX Item "$processed = NFC_partial($unprocessed)" +It does like \f(CW\*(C`normalize_partial(\*(AqNFC\*(Aq, $unprocessed)\*(C'\fR. +Note that \f(CW$unprocessed\fR will be modified as a side-effect. +.ie n .IP """$processed = NFKD_partial($unprocessed)""" 4 +.el .IP "\f(CW$processed = NFKD_partial($unprocessed)\fR" 4 +.IX Item "$processed = NFKD_partial($unprocessed)" +It does like \f(CW\*(C`normalize_partial(\*(AqNFKD\*(Aq, $unprocessed)\*(C'\fR. +Note that \f(CW$unprocessed\fR will be modified as a side-effect. +.ie n .IP """$processed = NFKC_partial($unprocessed)""" 4 +.el .IP "\f(CW$processed = NFKC_partial($unprocessed)\fR" 4 +.IX Item "$processed = NFKC_partial($unprocessed)" +It does like \f(CW\*(C`normalize_partial(\*(AqNFKC\*(Aq, $unprocessed)\*(C'\fR. +Note that \f(CW$unprocessed\fR will be modified as a side-effect. +.SS "Quick Check" +.IX Subsection "Quick Check" +(see Annex 8, UAX #15; and \fIlib/unicore/DerivedNormalizationProps.txt\fR) +.PP +The following functions check whether the string is in that normalization form. +.PP +The result returned will be one of the following: +.PP +.Vb 3 +\& YES The string is in that normalization form. +\& NO The string is not in that normalization form. +\& MAYBE Dubious. Maybe yes, maybe no. +.Ve +.ie n .IP """$result = checkNFD($string)""" 4 +.el .IP "\f(CW$result = checkNFD($string)\fR" 4 +.IX Item "$result = checkNFD($string)" +It returns true (\f(CW1\fR) if \f(CW\*(C`YES\*(C'\fR; false (\f(CW\*(C`empty string\*(C'\fR) if \f(CW\*(C`NO\*(C'\fR. +.ie n .IP """$result = checkNFC($string)""" 4 +.el .IP "\f(CW$result = checkNFC($string)\fR" 4 +.IX Item "$result = checkNFC($string)" +It returns true (\f(CW1\fR) if \f(CW\*(C`YES\*(C'\fR; false (\f(CW\*(C`empty string\*(C'\fR) if \f(CW\*(C`NO\*(C'\fR; +\&\f(CW\*(C`undef\*(C'\fR if \f(CW\*(C`MAYBE\*(C'\fR. +.ie n .IP """$result = checkNFKD($string)""" 4 +.el .IP "\f(CW$result = checkNFKD($string)\fR" 4 +.IX Item "$result = checkNFKD($string)" +It returns true (\f(CW1\fR) if \f(CW\*(C`YES\*(C'\fR; false (\f(CW\*(C`empty string\*(C'\fR) if \f(CW\*(C`NO\*(C'\fR. +.ie n .IP """$result = checkNFKC($string)""" 4 +.el .IP "\f(CW$result = checkNFKC($string)\fR" 4 +.IX Item "$result = checkNFKC($string)" +It returns true (\f(CW1\fR) if \f(CW\*(C`YES\*(C'\fR; false (\f(CW\*(C`empty string\*(C'\fR) if \f(CW\*(C`NO\*(C'\fR; +\&\f(CW\*(C`undef\*(C'\fR if \f(CW\*(C`MAYBE\*(C'\fR. +.ie n .IP """$result = checkFCD($string)""" 4 +.el .IP "\f(CW$result = checkFCD($string)\fR" 4 +.IX Item "$result = checkFCD($string)" +It returns true (\f(CW1\fR) if \f(CW\*(C`YES\*(C'\fR; false (\f(CW\*(C`empty string\*(C'\fR) if \f(CW\*(C`NO\*(C'\fR. +.ie n .IP """$result = checkFCC($string)""" 4 +.el .IP "\f(CW$result = checkFCC($string)\fR" 4 +.IX Item "$result = checkFCC($string)" +It returns true (\f(CW1\fR) if \f(CW\*(C`YES\*(C'\fR; false (\f(CW\*(C`empty string\*(C'\fR) if \f(CW\*(C`NO\*(C'\fR; +\&\f(CW\*(C`undef\*(C'\fR if \f(CW\*(C`MAYBE\*(C'\fR. +.Sp +Note: If a string is not in FCD, it must not be in FCC. +So \f(CWcheckFCC($not_FCD_string)\fR should return \f(CW\*(C`NO\*(C'\fR. +.ie n .IP """$result = check($form_name, $string)""" 4 +.el .IP "\f(CW$result = check($form_name, $string)\fR" 4 +.IX Item "$result = check($form_name, $string)" +It returns true (\f(CW1\fR) if \f(CW\*(C`YES\*(C'\fR; false (\f(CW\*(C`empty string\*(C'\fR) if \f(CW\*(C`NO\*(C'\fR; +\&\f(CW\*(C`undef\*(C'\fR if \f(CW\*(C`MAYBE\*(C'\fR. +.Sp +As \f(CW$form_name\fR, one of the following names must be given. +.Sp +.Vb 4 +\& \*(AqC\*(Aq or \*(AqNFC\*(Aq for Normalization Form C (UAX #15) +\& \*(AqD\*(Aq or \*(AqNFD\*(Aq for Normalization Form D (UAX #15) +\& \*(AqKC\*(Aq or \*(AqNFKC\*(Aq for Normalization Form KC (UAX #15) +\& \*(AqKD\*(Aq or \*(AqNFKD\*(Aq for Normalization Form KD (UAX #15) +\& +\& \*(AqFCD\*(Aq for "Fast C or D" Form (UTN #5) +\& \*(AqFCC\*(Aq for "Fast C Contiguous" (UTN #5) +.Ve +.PP +\&\fBNote\fR +.PP +In the cases of NFD, NFKD, and FCD, the answer must be +either \f(CW\*(C`YES\*(C'\fR or \f(CW\*(C`NO\*(C'\fR. The answer \f(CW\*(C`MAYBE\*(C'\fR may be returned +in the cases of NFC, NFKC, and FCC. +.PP +A \f(CW\*(C`MAYBE\*(C'\fR string should contain at least one combining character +or the like. For example, \f(CW\*(C`COMBINING ACUTE ACCENT\*(C'\fR has +the MAYBE_NFC/MAYBE_NFKC property. +.PP +Both \f(CW\*(C`checkNFC("A\eN{COMBINING ACUTE ACCENT}")\*(C'\fR +and \f(CW\*(C`checkNFC("B\eN{COMBINING ACUTE ACCENT}")\*(C'\fR will return \f(CW\*(C`MAYBE\*(C'\fR. +\&\f(CW"A\eN{COMBINING ACUTE ACCENT}"\fR is not in NFC +(its NFC is \f(CW"\eN{LATIN CAPITAL LETTER A WITH ACUTE}"\fR), +while \f(CW"B\eN{COMBINING ACUTE ACCENT}"\fR is in NFC. +.PP +If you want to check exactly, compare the string with its NFC/NFKC/FCC. +.PP +.Vb 5 +\& if ($string eq NFC($string)) { +\& # $string is exactly normalized in NFC; +\& } else { +\& # $string is not normalized in NFC; +\& } +\& +\& if ($string eq NFKC($string)) { +\& # $string is exactly normalized in NFKC; +\& } else { +\& # $string is not normalized in NFKC; +\& } +.Ve +.SS "Character Data" +.IX Subsection "Character Data" +These functions are interface of character data used internally. +If you want only to get Unicode normalization forms, you don't need +call them yourself. +.ie n .IP """$canonical_decomposition = getCanon($code_point)""" 4 +.el .IP "\f(CW$canonical_decomposition = getCanon($code_point)\fR" 4 +.IX Item "$canonical_decomposition = getCanon($code_point)" +If the character is canonically decomposable (including Hangul Syllables), +it returns the (full) canonical decomposition as a string. +Otherwise it returns \f(CW\*(C`undef\*(C'\fR. +.Sp +\&\fBNote:\fR According to the Unicode standard, the canonical decomposition +of the character that is not canonically decomposable is same as +the character itself. +.ie n .IP """$compatibility_decomposition = getCompat($code_point)""" 4 +.el .IP "\f(CW$compatibility_decomposition = getCompat($code_point)\fR" 4 +.IX Item "$compatibility_decomposition = getCompat($code_point)" +If the character is compatibility decomposable (including Hangul Syllables), +it returns the (full) compatibility decomposition as a string. +Otherwise it returns \f(CW\*(C`undef\*(C'\fR. +.Sp +\&\fBNote:\fR According to the Unicode standard, the compatibility decomposition +of the character that is not compatibility decomposable is same as +the character itself. +.ie n .IP """$code_point_composite = getComposite($code_point_here, $code_point_next)""" 4 +.el .IP "\f(CW$code_point_composite = getComposite($code_point_here, $code_point_next)\fR" 4 +.IX Item "$code_point_composite = getComposite($code_point_here, $code_point_next)" +If two characters here and next (as code points) are composable +(including Hangul Jamo/Syllables and Composition Exclusions), +it returns the code point of the composite. +.Sp +If they are not composable, it returns \f(CW\*(C`undef\*(C'\fR. +.ie n .IP """$combining_class = getCombinClass($code_point)""" 4 +.el .IP "\f(CW$combining_class = getCombinClass($code_point)\fR" 4 +.IX Item "$combining_class = getCombinClass($code_point)" +It returns the combining class (as an integer) of the character. +.ie n .IP """$may_be_composed_with_prev_char = isComp2nd($code_point)""" 4 +.el .IP "\f(CW$may_be_composed_with_prev_char = isComp2nd($code_point)\fR" 4 +.IX Item "$may_be_composed_with_prev_char = isComp2nd($code_point)" +It returns a boolean whether the character of the specified codepoint +may be composed with the previous one in a certain composition +(including Hangul Compositions, but excluding +Composition Exclusions and Non-Starter Decompositions). +.ie n .IP """$is_exclusion = isExclusion($code_point)""" 4 +.el .IP "\f(CW$is_exclusion = isExclusion($code_point)\fR" 4 +.IX Item "$is_exclusion = isExclusion($code_point)" +It returns a boolean whether the code point is a composition exclusion. +.ie n .IP """$is_singleton = isSingleton($code_point)""" 4 +.el .IP "\f(CW$is_singleton = isSingleton($code_point)\fR" 4 +.IX Item "$is_singleton = isSingleton($code_point)" +It returns a boolean whether the code point is a singleton +.ie n .IP """$is_non_starter_decomposition = isNonStDecomp($code_point)""" 4 +.el .IP "\f(CW$is_non_starter_decomposition = isNonStDecomp($code_point)\fR" 4 +.IX Item "$is_non_starter_decomposition = isNonStDecomp($code_point)" +It returns a boolean whether the code point has Non-Starter Decomposition. +.ie n .IP """$is_Full_Composition_Exclusion = isComp_Ex($code_point)""" 4 +.el .IP "\f(CW$is_Full_Composition_Exclusion = isComp_Ex($code_point)\fR" 4 +.IX Item "$is_Full_Composition_Exclusion = isComp_Ex($code_point)" +It returns a boolean of the derived property Comp_Ex +(Full_Composition_Exclusion). This property is generated from +Composition Exclusions + Singletons + Non-Starter Decompositions. +.ie n .IP """$NFD_is_NO = isNFD_NO($code_point)""" 4 +.el .IP "\f(CW$NFD_is_NO = isNFD_NO($code_point)\fR" 4 +.IX Item "$NFD_is_NO = isNFD_NO($code_point)" +It returns a boolean of the derived property NFD_NO +(NFD_Quick_Check=No). +.ie n .IP """$NFC_is_NO = isNFC_NO($code_point)""" 4 +.el .IP "\f(CW$NFC_is_NO = isNFC_NO($code_point)\fR" 4 +.IX Item "$NFC_is_NO = isNFC_NO($code_point)" +It returns a boolean of the derived property NFC_NO +(NFC_Quick_Check=No). +.ie n .IP """$NFC_is_MAYBE = isNFC_MAYBE($code_point)""" 4 +.el .IP "\f(CW$NFC_is_MAYBE = isNFC_MAYBE($code_point)\fR" 4 +.IX Item "$NFC_is_MAYBE = isNFC_MAYBE($code_point)" +It returns a boolean of the derived property NFC_MAYBE +(NFC_Quick_Check=Maybe). +.ie n .IP """$NFKD_is_NO = isNFKD_NO($code_point)""" 4 +.el .IP "\f(CW$NFKD_is_NO = isNFKD_NO($code_point)\fR" 4 +.IX Item "$NFKD_is_NO = isNFKD_NO($code_point)" +It returns a boolean of the derived property NFKD_NO +(NFKD_Quick_Check=No). +.ie n .IP """$NFKC_is_NO = isNFKC_NO($code_point)""" 4 +.el .IP "\f(CW$NFKC_is_NO = isNFKC_NO($code_point)\fR" 4 +.IX Item "$NFKC_is_NO = isNFKC_NO($code_point)" +It returns a boolean of the derived property NFKC_NO +(NFKC_Quick_Check=No). +.ie n .IP """$NFKC_is_MAYBE = isNFKC_MAYBE($code_point)""" 4 +.el .IP "\f(CW$NFKC_is_MAYBE = isNFKC_MAYBE($code_point)\fR" 4 +.IX Item "$NFKC_is_MAYBE = isNFKC_MAYBE($code_point)" +It returns a boolean of the derived property NFKC_MAYBE +(NFKC_Quick_Check=Maybe). +.SH EXPORT +.IX Header "EXPORT" +\&\f(CW\*(C`NFC\*(C'\fR, \f(CW\*(C`NFD\*(C'\fR, \f(CW\*(C`NFKC\*(C'\fR, \f(CW\*(C`NFKD\*(C'\fR: by default. +.PP +\&\f(CW\*(C`normalize\*(C'\fR and other some functions: on request. +.SH CAVEATS +.IX Header "CAVEATS" +.IP "Perl's version vs. Unicode version" 4 +.IX Item "Perl's version vs. Unicode version" +Since this module refers to perl core's Unicode database in the directory +\&\fI/lib/unicore\fR (or formerly \fI/lib/unicode\fR), the Unicode version of +normalization implemented by this module depends on what has been +compiled into your perl. The following table lists the default Unicode +version that comes with various perl versions. (It is possible to change +the Unicode version in any perl version to be any earlier Unicode version, +so one could cause Unicode 3.2 to be used in any perl version starting with +5.8.0. Read \fR\f(CI$Config{privlib}\fR\fI/unicore/README.perl\fR for details. +.Sp +.Vb 10 +\& perl\*(Aqs version implemented Unicode version +\& 5.6.1 3.0.1 +\& 5.7.2 3.1.0 +\& 5.7.3 3.1.1 (normalization is same as 3.1.0) +\& 5.8.0 3.2.0 +\& 5.8.1\-5.8.3 4.0.0 +\& 5.8.4\-5.8.6 4.0.1 (normalization is same as 4.0.0) +\& 5.8.7\-5.8.8 4.1.0 +\& 5.10.0 5.0.0 +\& 5.8.9, 5.10.1 5.1.0 +\& 5.12.x 5.2.0 +\& 5.14.x 6.0.0 +\& 5.16.x 6.1.0 +\& 5.18.x 6.2.0 +\& 5.20.x 6.3.0 +\& 5.22.x 7.0.0 +.Ve +.IP "Correction of decomposition mapping" 4 +.IX Item "Correction of decomposition mapping" +In older Unicode versions, a small number of characters (all of which are +CJK compatibility ideographs as far as they have been found) may have +an erroneous decomposition mapping (see +\&\fIlib/unicore/NormalizationCorrections.txt\fR). +Anyhow, this module will neither refer to +\&\fIlib/unicore/NormalizationCorrections.txt\fR +nor provide any specific version of normalization. Therefore this module +running on an older perl with an older Unicode database may use +the erroneous decomposition mapping blindly conforming to the Unicode database. +.IP "Revised definition of canonical composition" 4 +.IX Item "Revised definition of canonical composition" +In Unicode 4.1.0, the definition D2 of canonical composition (which +affects NFC and NFKC) has been changed (see Public Review Issue #29 +and recent UAX #15). This module has used the newer definition +since the version 0.07 (Oct 31, 2001). +This module will not support the normalization according to the older +definition, even if the Unicode version implemented by perl is +lower than 4.1.0. +.SH AUTHOR +.IX Header "AUTHOR" +SADAHIRO Tomoyuki <SADAHIRO@cpan.org> +.PP +Currently maintained by <perl5\-porters@perl.org> +.PP +Copyright(C) 2001\-2012, SADAHIRO Tomoyuki. Japan. All rights reserved. +.SH LICENSE +.IX Header "LICENSE" +This module is free software; you can redistribute it +and/or modify it under the same terms as Perl itself. +.SH "SEE ALSO" +.IX Header "SEE ALSO" +.IP <http://www.unicode.org/reports/tr15/> 4 +.IX Item "<http://www.unicode.org/reports/tr15/>" +Unicode Normalization Forms \- UAX #15 +.IP <http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt> 4 +.IX Item "<http://www.unicode.org/Public/UNIDATA/CompositionExclusions.txt>" +Composition Exclusion Table +.IP <http://www.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt> 4 +.IX Item "<http://www.unicode.org/Public/UNIDATA/DerivedNormalizationProps.txt>" +Derived Normalization Properties +.IP <http://www.unicode.org/Public/UNIDATA/NormalizationCorrections.txt> 4 +.IX Item "<http://www.unicode.org/Public/UNIDATA/NormalizationCorrections.txt>" +Normalization Corrections +.IP <http://www.unicode.org/review/pr\-29.html> 4 +.IX Item "<http://www.unicode.org/review/pr-29.html>" +Public Review Issue #29: Normalization Issue +.IP <http://www.unicode.org/notes/tn5/> 4 +.IX Item "<http://www.unicode.org/notes/tn5/>" +Canonical Equivalence in Applications \- UTN #5 |