summaryrefslogtreecommitdiffstats
path: root/docs/README.chartrans
blob: a13ef361fa4582e85aba443855b26e20fd8d06be (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
Lynx CHARTRANS

 Features (in addition to those which Lynx 2.7.1 already has):

 - Can (attempt to) translate from any document charset to any display
   character set, *IF* the document charset is known by a translation
   table (compiled in at installation).

 - New method to define character sets: used for input charset as well
   as display character set, translation tables compiled in from
   separate files (one per charset).  One table is designated as default
   and can be used for fallback translation to 7-bit replacements for
   display.

 - New method for specifying translations of SGML entities.

 - Unicode (UTF-8) support: can (attempt to) decode and translate UTF-8 to
   display character set, or pass through UTF to display (if terminal
   or console understands UTF-8).  [raw display of UTF only tested with Slang
   so far, does not always position everything correctly on screen]

 - Support for CHARSET attribute on A tag (and sometimes LINK), as in HTML
   i18n RFC 2070 and W3C HTML 4.0 drafts.  A link can suggest the target's
   charset in this way.

 - Support for ACCEPT-CHARSET attribute of FORM tags.

 - EXPERIMENTAL, currently enabled only for Linux console:
   can (attempt to) automatically switch terminal mode and load new
   code pages on change of display character set.

 - some minor changes: sometimes invalid characters were displayed in a hex
   notation Uxxxx (helps debugging, but I also regard it as at least not
   worse than showing the wrong char without warning), now they are not
   displayed to reduce garbage.

Additions/changes to user interface:

 - many new Display Character Sets are available on O)ptions screen.
   (One can use arrow keys, HOME, END etc. for cycling through the list
   or use selection from popup box, as for other options.)

 - new command line flags:
   -assume_charset=...  assume this as charset for documents that don't
                        specify a charset parameter in HTTP headers
   -assume_local_charset=...  assume this as charset of local file
   -assume_unrec_charset=...  in case a charset parameter is not recognized;
   docs also available as ASSUME_CHARSET etc. in lynx.cfg
   In "Advanced User" mode, ASSUME_CHARSET can be changed during a session
   from the Options Screen.

 - The "Raw" toggle (from -raw flag, '@' key, or Options screen)
   o  toggles the assumption "Default remote charset is same as Display
      Character Set" on or off.
      Toggling of the assumed charset is between Display Character Set and
      the specified ASSUME_CHARSET or, if they are the same, between the
      specified ASSUME_CHARSET and ISO-8859-1.
   o  The default for raw mode now depends on the Display Character Set as
      well as on the specified ASSUME_CHARSET value.
   o  should work as before for CJK charsets (turning CJK-mode on or off).
   o  If the effective ASSUME_CHARSET and the Display Character Set are
      unchanged from the ISO-8859-1 default, toggling "Raw" may have some
      additional effect for characters that can't be translated.
   (Try the "Transparent" Display Character Set for more "rawness".)


Requirements:  same as for Lynx in general :)

The chartrans code is now merged with Wayne Buttle's changes for
32-bit MS Windows and DOS/DJGPP, with Thomas Dickey's and Jim Spath's
emerging auto-configure mechanism, and with BUGFIXES from Foteos
Macrides.  See the accompanying file CHANGES for the current
status.


A warning:
In some cases undisplayable bytes may still get sent to the terminal
which are then interpreted as control chars, there is no protection
against if strange things are defined in the table files.


HOW TO INSTALL:

(4) before compiling:

    Check top level makefile or Makefile and userdefs.h as usual.

    NOTE that there is a new "#define" in userdefs.h for MAX_CHARSETS
    near the end (in "Section 3.").

(5) Building Lynx:

    Compiling the chartrans code is now integrated into the normal
    installation procedures for UNIX (configure script) and other
    platforms.

    What's supposed to happen (in addition to the usual things when
    building Lynx): in the new subdirectory src/chrtrans, make should
    first compile the auxiliary program `makeuctb', then invoke that
    program to create xxxxx_yyy.h files from the provided xxxxx_yyy.tab
    translation table files.  (See README.* files in src/chrtrans for
    more info.)

    If all goes well, just invoking make from the top-level Lynx dir
    as usual should do everything automatically.  If not, the makefiles
    may need some tweaking... or:

(6) Some things to look at if compilation fails:

    In src/chrtrans/UCkd.h there is a typedef for an unsigned 16bit
    numeric type which may need to be changed for your system.
    See comment near top there.

    For recompiling Lynx, `make clean' should not be necessary if only
    files in src/chrtrans have been changed.  On the other hand
    may not propagate to the src/chrtrans directory (depending how things
    are going with auto-config), you may have to cd to that directory
    and `make clean' there to really clean up there.

(7) To customize (add/change translation tables etc.):

     See README.* files in src/chrtrans.
     Make the necessary changes there, then recompile.
     (A general `make clean' should not be necessary, but make sure
     the ...uni.h file in src/chrtrans gets regenerated.)

     Note that definition of new character entities (if e.g., you want
     Lynx to recognize Ž) are not covered by these table files,
     they have to be listed in entities.h.

     _If you are on a Linux system_ and using Lynx on the console (i.e.
     not xterm, not a dialup *into* the Linux box), you can compile
     with -DEXP_CHARTRANS_AUTOSWITCH.  This is very useful for testing
     the various Display Character Sets, Lynx will try to automatically
     change the console state.  You need to have the Linux kbd package
     installed, with a working `setfont' command executable by the user,
     and the right font files - check the source in src/UCAuto.c for
     the files used and/or to change them!
     NOTE that with this enabled,
     - Lynx currently will not clean up the console state at exit,
       it will probably left like the last Display Character Set you used.
     - Loading a font is global across _all_ virtual text consoles, so
       using Lynx (compiled with this flag) may change the appearance of
       text on other consoles (if that text contains characters
       beyond US-ASCII).

(8) Some suggested Web pages for testing:

    <URL:  http://www.tezcat.com/~kweide/lynx-chartrans/test/>

    <URL:  http://www.isoc.org:8080/>,
      especially
    <URL:  http://www.isoc.org:8080/liste_ml.htm>.

    <URL:  http://www.accentsoft.com/un/un-all.htm>

(9) Please report bugs, unexpected behavior, etc.
    to <lynx-dev@nongnu.org>.

    Suggestions for improvement would be welcome, as well as
    contributed translation tables (for stuff that is not available
    at ftp://dkuug.dk or ftp://ftp.unicode.org).

KW  1997-11-06