diff options
Diffstat (limited to 'upstream/opensuse-leap-15-6/man1/preconv.1')
-rw-r--r-- | upstream/opensuse-leap-15-6/man1/preconv.1 | 374 |
1 files changed, 374 insertions, 0 deletions
diff --git a/upstream/opensuse-leap-15-6/man1/preconv.1 b/upstream/opensuse-leap-15-6/man1/preconv.1 new file mode 100644 index 00000000..94c9dbb8 --- /dev/null +++ b/upstream/opensuse-leap-15-6/man1/preconv.1 @@ -0,0 +1,374 @@ +.TH PRECONV 1 "7 February 2022" "groff 1.22.4" +.SH NAME +preconv \- convert encoding of input files to something GNU troff \ +understands +. +. +.\" Save and disable compatibility mode (for, e.g., Solaris 10/11). +.do nr preconv_C \n[.C] +.cp 0 +. +. +.\" ==================================================================== +.\" Legal Terms +.\" ==================================================================== +.\" +.\" Copyright (C) 2006-2018 Free Software Foundation, Inc. +.\" +.\" Permission is granted to make and distribute verbatim copies of this +.\" manual provided the copyright notice and this permission notice are +.\" preserved on all copies. +.\" +.\" Permission is granted to copy and distribute modified versions of +.\" this manual under the conditions for verbatim copying, provided that +.\" the entire resulting derived work is distributed under the terms of +.\" a permission notice identical to this one. +.\" +.\" Permission is granted to copy and distribute translations of this +.\" manual into another language, under the above conditions for +.\" modified versions, except that this permission notice may be +.\" included in translations approved by the Free Software Foundation +.\" instead of in the original English. +. +. +.\" ==================================================================== +.SH SYNOPSIS +.\" ==================================================================== +. +.SY preconv +.OP \-dr +.OP \-D default_encoding +.OP \-e encoding +.RI [ file +\&.\|.\|.\&] +. +.SY preconv +.B \-h +.SY preconv +.B \-\-help +.YS +. +.SY preconv +.B \-v +.SY preconv +.B \-\-version +.YS +. +. +.\" ==================================================================== +.SH DESCRIPTION +.\" ==================================================================== +. +.B preconv +reads +.I files +and converts its encoding(s) to a form GNU +.BR troff (1) +can process, sending the data to standard output. +. +Currently, this means ASCII characters and \[oq]\e[uXXXX]\[cq] +entities, where \[oq]XXXX\[cq] is a hexadecimal number with four to +six digits, representing a Unicode input code. +. +Normally, +.B preconv +should be invoked with the +.B \-k +and +.B \-K +options of +.BR groff . +. +. +.\" ==================================================================== +.SH OPTIONS +.\" ==================================================================== +. +Whitespace is permitted between a command-line option and its argument. +. +. +.TP +.B \-d +Emit debugging messages to standard error (mainly the used encoding). +. +.TP +.BI \-D encoding +Specify default encoding if everything fails (see below). +. +.TP +.BI \-e encoding +Specify input encoding explicitly, overriding all other methods. +. +This corresponds to +.BR groff 's +.BI \-K encoding +option. +. +Without this switch, +.B preconv +uses the algorithm described below to select the input encoding. +. +.TP +.B \-\-help +.TQ +.B \-h +Print a help message and exit. +. +.TP +.B \-r +Do not add \&.lf requests. +. +.TP +.B \-\-version +.TQ +.B \-v +Print the version number and exit. +. +. +.\" ==================================================================== +.SH USAGE +.\" ==================================================================== +. +.B preconv +tries to find the input encoding with the following algorithm. +. +.IP 1. +If the input encoding has been explicitly specified with option +.BR \-e , +use it. +. +.IP 2. +Otherwise, check whether the input starts with a +.I Byte Order Mark +(BOM, see below). +. +If found, use it. +. +.IP 3. +Otherwise, check whether there is a known +.I coding tag +(see below) in either the first or second input line. +. +If found, use it. +. +.IP 4 +Finally, if the +.B uchardet +library +(an encoding detector library available on most major distributions) +is available on the system, use it to try to detect the encoding of the file. +. +.IP 5. +If everything fails, use a default encoding as given with option +.BR \-D , +by the current locale, or \[oq]latin1\[cq] if the locale is set to +\[oq]C\[cq], \[oq]POSIX\[cq], or empty (in that order). +. +. +.PP +Note that the +.B groff +program supports a +.I \%GROFF_ENCODING +environment variable which is eventually expanded to option +.BR \-k . +. +. +.\" ==================================================================== +.SS "Byte Order Mark" +.\" ==================================================================== +. +The Unicode Standard defines character U+FEFF as the Byte Order Mark +(BOM). +. +On the other hand, value U+FFFE is guaranteed not be a Unicode character at +all. +. +This allows detection of the byte order within the data stream (either +big-endian or little-endian), and the MIME encodings \%\[oq]UTF-16\[cq] +and \%\[oq]UTF-32\[cq] mandate that the data stream starts with U+FEFF. +. +Similarly, the data stream encoded as \%\[oq]UTF-8\[cq] might start +with a BOM (to ease the conversion from and to \%UTF-16 and \%UTF-32). +. +In all cases, the byte order mark is +.I not +part of the data but part of the encoding protocol; in other words, +.BR preconv 's +output doesn't contain it. +. +. +.PP +Note that U+FEFF not at the start of the input data actually is +emitted; it has then the meaning of a \[oq]zero width no-break +space\[cq] character \[en] something not needed normally in +.BR groff . +. +. +.\" ==================================================================== +.SS "Coding Tags" +.\" ==================================================================== +. +Editors which support more than a single character encoding need tags +within the input files to mark the file's encoding. +. +While it is possible to guess the right input encoding with the help of +heuristic algorithms for data which represents a greater amount of a natural +language, it is still just a guess. +. +Additionally, all algorithms fail easily for input which is either too short +or doesn't represent a natural language. +. +. +.PP +For these reasons, +.B preconv +supports the coding tag convention (with some restrictions) as used by +.B "GNU Emacs" +and +.B XEmacs +(and probably other programs too). +. +. +.PP +Coding tags in +.B "GNU Emacs" +and +.B XEmacs +are stored in so-called +.IR "File Variables" . +. +.B preconv +recognizes the following syntax form which must be put into a troff comment +in the first or second line. +. +.RS +.PP +\-*\- +.IR tag1 : +.IR value1 ; +.IR tag2 : +.IR value2 ; +\&.\|.\|.\& \-*\- +.RE +. +. +.PP +The only relevant tag for +.B preconv +is \[oq]coding\[cq] which can take the values listed below. +. +Here an example line which tells +.B Emacs +to edit a file in troff mode, and to use \%latin2 as its encoding. +. +.RS +.PP +.EX +\&.\[rs]" \-*\- mode: troff; coding: latin-2 \-*\- +.EE +.RE +. +. +.PP +The following list gives all MIME coding tags (either lowercase or +uppercase) supported by +.BR preconv ; +this list is hard-coded in the source. +. +.RS +.PP +.ad l +\%big5, \%cp1047, \%euc-jp, \%euc-kr, \%gb2312, \%iso-8859-1, +\%iso-8859-2, \%iso-8859-5, \%iso-8859-7, \%iso-8859-9, \%iso-8859-13, +\%iso-8859-15, \%koi8-r, \%us-ascii, \%utf-8, \%utf-16, \%utf-16be, +\%utf-16le +.ad +.RE +. +. +.PP +In addition, the following hard-coded list of other tags is recognized +which eventually map to values from the list above. +. +.RS +.PP +.ad l +\%ascii, \%chinese-big5, \%chinese-euc, \%chinese-iso-8bit, \%cn-big5, +\%\%cn-gb, \%cn-gb-2312, \%cp878, \%csascii, \%csisolatin1, +\%cyrillic-iso-8bit, \%cyrillic-koi8, \%euc-china, \%euc-cn, +\%euc-japan, \%euc-japan-1990, \%euc-korea, \%greek-iso-8bit, +\%iso-10646/utf8, \%iso-10646/utf-8, \%iso-latin-1, \%iso-latin-2, +\%iso-latin-5, \%iso-latin-7, \%iso-latin-9, \%japanese-euc, +\%japanese-iso-8bit, \%jis8, \%koi8, \%korean-euc, \%korean-iso-8bit, +\%latin-0, \%latin1, \%latin-1, \%latin-2, \%latin-5, \%latin-7, +\%latin-9, \%mule-utf-8, \%mule-utf-16, \%mule-utf-16be, +\%mule-utf-16-be, \%mule-utf-16be-with-signature, \%mule-utf-16le, +\%mule-utf-16-le, \%mule-utf-16le-with-signature, \%utf8, \%utf-16-be, +\%utf-16-be-with-signature, \%utf-16be-with-signature, \%utf-16-le, +\%utf-16-le-with-signature, \%utf-16le-with-signature +.ad +.RE +. +. +.PP +Those tags are taken from +.B "GNU Emacs" +and +.BR XEmacs , +together with some aliases. +. +Trailing \%\[oq]-dos\[cq], \%\[oq]-unix\[cq], and \%\[oq]-mac\[cq] +suffixes of coding tags (which give the end-of-line convention used in +the file) are stripped off before the comparison with the above tags +happens. +. +.SS "Iconv Issues" +.B preconv +by itself only supports three encodings: \%latin-1, cp1047, and \%UTF-8; +all other encodings are passed to the +.B iconv +library functions. +. +At compile time it is searched and checked for a valid +.B iconv +implementation; a call to \[oq]preconv \-\-version\[cq] shows whether +.B iconv +is used. +. +. +.\" ==================================================================== +.SH BUGS +.\" ==================================================================== +. +.B preconv +doesn't support +.I "local variable lists" +yet. +. +This is a different syntax form to specify local variables at the end of a +file. +. +. +.\" ==================================================================== +.SH "SEE ALSO" +.\" ==================================================================== +. +.BR groff (1) +.br +the +.B "GNU Emacs" +and +.B XEmacs +info pages +. +. +.\" Restore compatibility mode (for, e.g., Solaris 10/11). +.cp \n[preconv_C] +. +. +.\" Emacs setting +.\" Local Variables: +.\" mode: nroff +.\" End: +.\" vim: set filetype=groff: |