summaryrefslogtreecommitdiffstats
path: root/src/preproc/preconv/preconv.1.man
diff options
context:
space:
mode:
Diffstat (limited to 'src/preproc/preconv/preconv.1.man')
-rw-r--r--src/preproc/preconv/preconv.1.man559
1 files changed, 559 insertions, 0 deletions
diff --git a/src/preproc/preconv/preconv.1.man b/src/preproc/preconv/preconv.1.man
new file mode 100644
index 0000000..1535bae
--- /dev/null
+++ b/src/preproc/preconv/preconv.1.man
@@ -0,0 +1,559 @@
+.TH preconv @MAN1EXT@ "@MDATE@" "groff @VERSION@"
+.SH Name
+preconv \- prepare files for typesetting with
+.I groff
+.
+.
+.\" ====================================================================
+.\" Legal Terms
+.\" ====================================================================
+.\"
+.\" Copyright (C) 2006-2020 Free Software Foundation, Inc.
+.\"
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of
+.\" this manual under the conditions for verbatim copying, provided that
+.\" the entire resulting derived work is distributed under the terms of
+.\" a permission notice identical to this one.
+.\"
+.\" Permission is granted to copy and distribute translations of this
+.\" manual into another language, under the above conditions for
+.\" modified versions, except that this permission notice may be
+.\" included in translations approved by the Free Software Foundation
+.\" instead of in the original English.
+.
+.
+.\" Save and disable compatibility mode (for, e.g., Solaris 10/11).
+.do nr *groff_preconv_1_man_C \n[.cp]
+.cp 0
+.
+.\" Define fallback for groff 1.23's MR macro if the system lacks it.
+.nr do-fallback 0
+.if !\n(.f .nr do-fallback 1 \" mandoc
+.if \n(.g .if !d MR .nr do-fallback 1 \" older groff
+.if !\n(.g .nr do-fallback 1 \" non-groff *roff
+.if \n[do-fallback] \{\
+. de MR
+. ie \\n(.$=1 \
+. I \%\\$1
+. el \
+. IR \%\\$1 (\\$2)\\$3
+. .
+.\}
+.rr do-fallback
+.
+.
+.\" ====================================================================
+.SH Synopsis
+.\" ====================================================================
+.
+.SY preconv
+.RB [ \-dr ]
+.RB [ \-D\~\c
+.IR fallback-encoding ]
+.RB [ \-e\~\c
+.IR encoding ]
+.RI [ file\~ .\|.\|.]
+.YS
+.
+.
+.SY preconv
+.B \-h
+.
+.SY preconv
+.B \-\-help
+.YS
+.
+.
+.SY preconv
+.B \-v
+.
+.SY preconv
+.B \-\-version
+.YS
+.
+.
+.\" ====================================================================
+.SH Description
+.\" ====================================================================
+.
+.I preconv
+reads each
+.IR file ,
+converts its encoded characters to a form
+.MR @g@troff @MAN1EXT@
+can interpret,
+and sends the result to the standard output stream.
+.
+Currently,
+this means that code points in the range 0\[en]127
+(in US-ASCII,
+ISO\~8859,
+or Unicode)
+remain as-is and the remainder are converted to the
+.I groff
+special character form
+.RB \[lq] \[rs][\c
+.BI u XXXX ]\c
+\[rq],
+where
+.I XXXX
+is a hexadecimal number of four to six digits corresponding to a Unicode
+code point.
+.
+By default,
+.I preconv
+also inserts a
+.I roff
+.B .lf
+request at the beginning of each
+.IR file ,
+identifying it for the benefit of later processing
+(including diagnostic messages);
+the
+.B \-r
+option suppresses this behavior.
+.
+.
+.PP
+In typical usage scenarios,
+.I preconv
+need not be run directly;
+instead it should be invoked with the
+.B \-k
+or
+.B \-K
+options of
+.IR groff .
+.
+If no
+.I file
+operands are given on the command line,
+or if
+.I file
+is
+.RB \[lq] \- \[rq],
+the standard input stream is read.
+.
+.
+.PP
+.I preconv
+tries to find the input encoding with the following algorithm,
+stopping at the first success.
+.
+.
+.IP 1. 4n
+If the input encoding has been explicitly specified with option
+.BR \-e ,
+use it.
+.
+.
+.IP 2.
+If the input starts with a Unicode Byte Order Mark,
+determine the encoding as UTF-8,
+UTF-16,
+or UTF-32 accordingly.
+.
+.
+.IP 3.
+If the input stream is seekable,
+check the first and second input lines for a recognized GNU\~Emacs
+file-local variable identifying the character encoding,
+here referred to as the \[lq]coding tag\[rq] for brevity.
+.
+If found,
+use it.
+.
+.
+.IP 4.
+If the input stream is seekable,
+and if the
+.I uchardet
+library is available on the system,
+use it to try to infer the encoding of the file.
+.
+.
+.IP 5.
+If the
+.B \-D
+option specifies an encoding,
+use it.
+.
+.
+.IP 6.
+Use the encoding specified by the current locale
+.RI ( LC_CTYPE ),
+unless the locale is
+\[lq]C\[rq],
+\[lq]POSIX\[rq],
+or empty,
+in which case assume Latin-1
+(ISO\~8859-1).
+.
+.
+.PP
+The coding tag and
+.I uchardet
+methods in the above procedure rely upon a seekable input stream;
+when
+.I preconv
+reads from a pipe,
+the stream is not seekable,
+and these detection methods are skipped.
+.
+If character encoding detection of your input files is unreliable,
+arrange for one of the other methods to succeed by using
+.IR preconv 's
+.B \-D
+or
+.B \-e
+options,
+or by configuring your locale appropriately.
+.
+.I groff
+also supports a
+.I \%GROFF_ENCODING
+environment variable,
+which can be overridden by its
+.B \-K
+option.
+.
+Valid values for
+(or parameters to)
+all of these are enumerated in the lists of recognized coding tags in
+the next subsection,
+and are further influenced by
+.I iconv
+library support.
+.
+.
+.\" ====================================================================
+.SS "Coding tags"
+.\" ====================================================================
+.
+Text editors that support more than a single character encoding need
+tags within the input files to mark the file's encoding.
+.
+While it is possible to guess the right input encoding with the help of
+heuristics that are reliable for a preponderance of natural language
+texts,
+they are not absolutely reliable.
+.
+Heuristics can fail on inputs that are too short or don't represent a
+natural language.
+.
+.
+.PP
+Consequently,
+.I preconv
+supports the coding tag convention used by GNU\~Emacs
+(with some restrictions).
+.
+This notation appears in specially marked regions of an input file
+designated for \[lq]file-local variables\[rq].
+.
+.
+.PP
+.I preconv
+interprets the following syntax if it occurs in a
+.I roff
+comment
+in the first or second line of the input file.
+.
+Both \[lq]\[rs]"\[rq] and \[lq]\[rs]#\[rq] comment forms are recognized,
+but the control
+(or no-break control)
+character must be the default and must begin the line.
+.
+Similarly,
+the escape character must be the default.
+.
+.
+.RS
+.EX
+.B \-*\- \c
+.RB [.\|.\|. ; ]\~\c
+.B coding: \c
+.I encoding\c
+.RB [ ;\~ .\|.\|.\&]\~\c
+.B \-*\-
+.EE
+.RE
+.
+.
+.PP
+The only variable
+.I preconv
+interprets is \[lq]coding\[rq],
+which can take the values listed below.
+.
+.
+.PP
+The following list comprises all MIME \[lq]charset\[rq] parameter values
+recognized,
+case-insensitively,
+by
+.IR preconv .
+.
+.RS
+\%big5,
+\%cp1047,
+\%euc\-jp,
+\%euc\-kr,
+\%gb2312,
+\%iso\-8859\-1,
+\%iso\-8859\-2,
+\%iso\-8859\-5,
+\%iso\-8859\-7,
+\%iso\-8859\-9,
+\%iso\-8859\-13,
+\%iso\-8859\-15,
+\%koi8\-r,
+\%us\-ascii,
+\%utf\-8,
+\%utf\-16,
+\%utf\-16be,
+\%utf\-16le
+.RE
+.
+.
+.PP
+In addition,
+the following list of other coding tags is recognized,
+each of which is mapped to an appropriate value from the list above.
+.
+.RS
+\%ascii,
+\%chinese\-big5,
+\%chinese\-euc,
+\%chinese\-iso\-8bit,
+\%cn\-big5,
+\%cn\-gb,
+\%cn\-gb\-2312,
+\%cp878,
+\%csascii,
+\%csisolatin1,
+\%cyrillic\-iso\-8bit,
+\%cyrillic\-koi8,
+\%euc\-china,
+\%euc\-cn,
+\%euc\-japan,
+\%euc\-japan\-1990,
+\%euc\-korea,
+\%greek\-iso\-8bit,
+\%iso\-10646/utf8,
+\%iso\-10646/utf\-8,
+\%iso\-latin\-1,
+\%iso\-latin\-2,
+\%iso\-latin\-5,
+\%iso\-latin\-7,
+\%iso\-latin\-9,
+\%japanese\-euc,
+\%japanese\-iso\-8bit,
+\%jis8,
+\%koi8,
+\%korean\-euc,
+\%korean\-iso\-8bit,
+\%latin\-0,
+\%latin1,
+\%latin\-1,
+\%latin\-2,
+\%latin\-5,
+\%latin\-7,
+\%latin\-9,
+\%mule\-utf\-8,
+\%mule\-utf\-16,
+\%mule\-utf\-16be,
+\%mule\-utf\-16\-be,
+\%mule\-utf\-16be\-with\-signature,
+\%mule\-utf\-16le,
+\%mule\-utf\-16\-le,
+\%mule\-utf\-16le\-with\-signature,
+\%utf8,
+\%utf\-16\-be,
+\%utf\-16\-be\-with\-signature,
+\%utf\-16be\-with\-signature,
+\%utf\-16\-le,
+\%utf\-16\-le\-with\-signature,
+\%utf\-16le\-with\-signature
+.RE
+.
+.
+.PP
+Trailing
+\[lq]\-dos\[rq],
+\[lq]\-unix\[rq],
+and
+\[lq]\-mac\[rq]
+suffixes on coding tags
+(which indicate the end-of-line convention used in the file)
+are disregarded for the purpose of comparison with the above tags.
+.
+.
+.\" ====================================================================
+.SS "\f[I]iconv\f[] support"
+.\" ====================================================================
+.
+While
+.I preconv
+recognizes all of the coding tags listed above,
+it is capable on its own of interpreting only three encodings:
+Latin-1,
+code page 1047,
+and UTF-8.
+.
+If
+.I iconv
+support is configured at compile time and available at run time,
+all others are passed to
+.I iconv
+library functions,
+which may recognize many additional encoding strings.
+.
+The command
+.RB \[lq] preconv\~\-v \[rq]
+discloses whether
+.I iconv
+support is configured.
+.
+.
+.PP
+The use of
+.I iconv
+means that characters in the input that encode invalid code points for
+that encoding may be dropped from the output stream or mapped to the
+Unicode replacement character
+(U+FFFD).
+.
+Compare the following examples using the input \[lq]caf\['e]\[rq]
+(note the \[lq]e\[rq] with an acute accent),
+which due to its short length challenges inference of the encoding used.
+.
+.RS
+.EX
+printf \[aq]caf\[rs]351\[rs]n\[aq] | LC_ALL=en_US.UTF\-8 preconv
+printf \[aq]caf\[rs]351\[rs]n\[aq] | preconv \-e us\-ascii
+printf \[aq]caf\[rs]351\[rs]n\[aq] | preconv \-e latin\-1
+.EE
+.RE
+.
+The fate of the accented \[lq]e\[rq] differs in each case.
+.
+In the first,
+.I uchardet
+fails to detect an encoding
+(though the library on your system may behave differently)
+and
+.I preconv
+falls back to the locale settings,
+where octal 351 starts an incomplete UTF-8 sequence and results in the
+Unicode replacement character.
+.
+In the second,
+it is not a representable character in the declared input encoding of
+US-ASCII and is discarded by
+.IR iconv .
+.
+In the last,
+it is correctly detected and mapped.
+.
+.
+.\" ====================================================================
+.SS Limitations
+.\" ====================================================================
+.
+.I preconv
+cannot perform any transformation on input that it cannot see.
+.
+Examples include files that are interpolated by preprocessors that run
+subsequently,
+including
+.MR @g@soelim @MAN1EXT@ ;
+files included by
+.I @g@troff
+itself through
+.RB \[lq] so \[rq]
+and similar requests;
+and string definitions passed to
+.I @g@troff
+through its
+.B \-d
+command-line option.
+.
+.
+.P
+.I preconv
+assumes that its input uses the default escape character,
+a backslash
+.BR \[rs] ,
+and writes special character escape sequences accordingly.
+.
+.
+.\" ====================================================================
+.SH Options
+.\" ====================================================================
+.
+.B \-h
+and
+.B \-\-help
+display a usage message,
+while
+.B \-v
+and
+.B \-\-version
+show version information;
+all exit afterward.
+.
+.
+.TP
+.B \-d
+Emit debugging messages to the standard error stream.
+.
+.
+.TP
+.BI \-D\~ fallback-encoding
+Report
+.I fallback-encoding
+if all detection methods fail.
+.
+.
+.TP
+.BI \-e\~ encoding
+Skip detection and assume
+.IR encoding ;
+see
+.IR groff 's
+.B \-K
+option.
+.
+.
+.TP
+.B \-r
+Write files \[lq]raw\[rq];
+do not add
+.B .lf
+requests.
+.
+.
+.\" ====================================================================
+.SH "See also"
+.\" ====================================================================
+.
+.MR groff @MAN1EXT@ ,
+.MR iconv 3 ,
+.MR locale 7
+.
+.
+.\" Restore compatibility mode (for, e.g., Solaris 10/11).
+.cp \n[*groff_preconv_1_man_C]
+.do rr *groff_preconv_1_man_C
+.
+.
+.\" Local Variables:
+.\" fill-column: 72
+.\" mode: nroff
+.\" End:
+.\" vim: set filetype=groff textwidth=72: