1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
|
.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
. if \nF \{\
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{\
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "Encode::JP 3perl"
.TH Encode::JP 3perl 2024-02-11 "perl v5.38.2" "Perl Programmers Reference Guide"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
Encode::JP \- Japanese Encodings
.SH SYNOPSIS
.IX Header "SYNOPSIS"
.Vb 3
\& use Encode qw/encode decode/;
\& $euc_jp = encode("euc\-jp", $utf8); # loads Encode::JP implicitly
\& $utf8 = decode("euc\-jp", $euc_jp); # ditto
.Ve
.SH ABSTRACT
.IX Header "ABSTRACT"
This module implements Japanese charset encodings. Encodings
supported are as follows.
.PP
.Vb 10
\& Canonical Alias Description
\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
\& euc\-jp /\ebeuc.*jp$/i EUC (Extended Unix Character)
\& /\ebjp.*euc/i
\& /\ebujis$/i
\& shiftjis /\ebshift.*jis$/i Shift JIS (aka MS Kanji)
\& /\ebsjis$/i
\& 7bit\-jis /\ebjis$/i 7bit JIS
\& iso\-2022\-jp ISO\-2022\-JP [RFC1468]
\& = 7bit JIS with all Halfwidth Kana
\& converted to Fullwidth
\& iso\-2022\-jp\-1 ISO\-2022\-JP\-1 [RFC2237]
\& = ISO\-2022\-JP with JIS X 0212\-1990
\& support. See below
\& MacJapanese Shift JIS + Apple vendor mappings
\& cp932 /\ebwindows\-31j$/i Code Page 932
\& = Shift JIS + MS/IBM vendor mappings
\& jis0201\-raw JIS0201, raw format
\& jis0208\-raw JIS0208, raw format
\& jis0212\-raw JIS0212, raw format
\& \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-
.Ve
.SH DESCRIPTION
.IX Header "DESCRIPTION"
To find out how to use this module in detail, see Encode.
.SH "Note on ISO\-2022\-JP(\-1)?"
.IX Header "Note on ISO-2022-JP(-1)?"
ISO\-2022\-JP\-1 (RFC2237) is a superset of ISO\-2022\-JP (RFC1468) which
adds support for JIS X 0212\-1990. That means you can use the same
code to decode to utf8 but not vice versa.
.PP
.Vb 1
\& $utf8 = decode(\*(Aqiso\-2022\-jp\-1\*(Aq, $stream);
.Ve
.PP
and
.PP
.Vb 1
\& $utf8 = decode(\*(Aqiso\-2022\-jp\*(Aq, $stream);
.Ve
.PP
yield the same result but
.PP
.Vb 1
\& $with_0212 = encode(\*(Aqiso\-2022\-jp\-1\*(Aq, $utf8);
.Ve
.PP
is now different from
.PP
.Vb 1
\& $without_0212 = encode(\*(Aqiso\-2022\-jp\*(Aq, $utf8 );
.Ve
.PP
In the latter case, characters that map to 0212 are first converted
to U+3013 (0xA2AE in EUC-JP; a white square also known as 'Tofu' or
\&'geta mark') then fed to the decoding engine. U+FFFD is not used,
in order to preserve text layout as much as possible.
.SH BUGS
.IX Header "BUGS"
The ASCII region (0x00\-0x7f) is preserved for all encodings, even
though this conflicts with mappings by the Unicode Consortium.
.SH "SEE ALSO"
.IX Header "SEE ALSO"
Encode
|