src/chrtrans/README.tables


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76

The translation table files in this directory were collected from
several sources (among them ftp://ftp.unicode.org, Linux kbd package,
ftp://dkuug.dk/) and are believed to be correct in their mappings,
but not checked in detail.  The Unicode/UCS2 values
for some of the RFC 1345 Mnemonic codes are out of date,
a cleanup and update would be needed for serious use.
[See also http://czyborra.com/charsets/iso8859.html for codepages survey.]

These changes were made to all of the files used from ftp.unicode.org:

	a) add the MIME name of the charset.
	b) add a name for the display charset (used on Options screen)
	c) add the codepage number
	d) remove lines for control characters 0x00 to 0x1f, 0x7f to 0x9f.
	e) comment-out ASCII lines 0x20 to 0x7f
	f) use idem to represent the commented-out lines
	g) change C-style 0xNNNN constants to Unicode-style U+NNNN.

Other changes include

	h) add code-points to several lines to provide Unicode equivalents
	i) add extra mappings at the end of the files
	j) comment-out other one-one mappings in the 0xa0-0xff range.

More translation files can be easily provided (and new character entities
added to entities.h), this set is just to test whether the system works
in principle (and also how it behaves with incomplete data...)

See the file README.format for a brief explanation of what's in the
table files.

The examples have names *_uni or *_suni with a .tbl suffix, but it
doesn't really matter.  The auxiliary program makeuctb (MAKE UniCode
TaBle) is used to "compile" them into C header files, which can be
included by UCdomap.c.

Ideally, this should be taken care of by the Makefiles.  On VMS, use
build-chrtrans.com to compile and link makeuctb.exe and create the
set of .h files from the current set of .tbl files.  Thereafter, use
build-header.com to update particular .h files.

To make a new chartrans table available to Lynx (and thereby make a new
charset known to Lynx) you currently have to manually edit UCdomap.c, in
two places:

a) Near the top, you will find a bunch of lines (some may be commented out)

  #include "<fn>.h"

Add or comment out as you wish.  But it is probably safest to leave the
commonly used ones, referring to "def7_uni.h" and "iso01_uni.h", in place.

b) At the bottom, you will find a bunch of lines (again, some may be
   commented out by default) of the form

    UC_CHARSET_SETUP_<something>;

which should correspond to the #include lines from a).  Again,
add or subtract as you wish (but preferably consistent with what you
did under a)...) [The <something> is derived from the charset's MIME name.
if in doubt, check the last lines of the corresponding ...uni.h file.]

c) To let make automatically notice when you have changed one of the
   table files, and automatically regenerate the *uni.h file(s),
you also have to add any new tables to both src/Makefile *and*
src/chrtrans/Makefile.  Or, for auto-config, the equivalent files
named makefile.in before running ./configure, or makefile after running
./configure.  (That may be inconvenient, but I didn't want to depend
on features than not all makes may have.)  Note that for recompiling
Lynx, a `make clean' should not be necessary if you have *only* made
changes to the files in src/chrtrans.  On VMS, add entries for new
tables to build-chrtrans.com, but you can update the particular file
with build-header.com, then use the top directory's build.com and
answer 'n' to its prompts about whether to update the WWW library
and chrtrans modules.