1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
|
#
# $Id: api.txt,v 1.3 2001/01/02 18:46:20 mleisher Exp $
#
The MUTT UCData API
-------------------
####
NOTE: This library has been customized for use with OpenLDAP. The character
data tables are hardcoded into the library and the load/unload/reload
functions are no-ops. Also, the MUTT API claimed to be compatible with
John Cowan's library but its ucnumber behavior was broken. This has been
fixed in the OpenLDAP release.
By default, the implementation specific properties in MUTTUCData.txt are
not incorporated into the OpenLDAP build. You can supply them to ucgendat
and recreate uctable.h if you need them.
-- hyc@openldap.org
####
-----------------------------------------------------------------------------
Macros that combine to select data tables for ucdata_load(), ucdata_unload(),
and ucdata_reload().
#define UCDATA_CASE 0x01
#define UCDATA_CTYPE 0x02
#define UCDATA_DECOMP 0x04
#define UCDATA_CMBCL 0x08
#define UCDATA_NUM 0x10
#define UCDATA_COMP 0x20
#define UCATA_ALL (UCDATA_CASE|UCDATA_CTYPE|UCDATA_DECOMP|\
UCDATA_CMBCL|UCDATA_NUM|UCDATA_COMP)
-----------------------------------------------------------------------------
void ucdata_load(char *paths, int masks)
This function initializes the UCData library by locating the data files in
one of the colon-separated directories in the `paths' parameter. The data
files to be loaded are specified in the `masks' parameter as a bitwise
combination of the macros listed above.
This should be called before using any of the other functions.
NOTE: the ucdata_setup(char *paths) function is now a macro that expands
into this function at compile time.
-----------------------------------------------------------------------------
void ucdata_unload(int masks)
This function unloads the data tables specified in the `masks' parameter.
This function should be called when the application is done using the UCData
package.
NOTE: the ucdata_cleanup() function is now a macro that expands into this
function at compile time.
-----------------------------------------------------------------------------
void ucdata_reload(char *paths, int masks)
This function reloads the data files from one of the colon-separated
directories in the `paths' parameter. The data files to be reloaded are
specified in the `masks' parameter as a bitwise combination of the macros
listed above.
If the data files have already been loaded, they are unloaded before the
data files are loaded again.
-----------------------------------------------------------------------------
int ucdecomp(unsigned long code, unsigned long *num, unsigned long **decomp)
This function determines if a character has a decomposition and returns the
decomposition information if it exists.
If a zero is returned, there is no decomposition. If a non-zero is
returned, then the `num' and `decomp' variables are filled in with the
appropriate values.
Example call:
unsigned long i, num, *decomp;
if (ucdecomp(0x1d5, &num, &decomp) != 0) {
for (i = 0; i < num; i++)
printf("0x%08lX,", decomp[i]);
putchar('\n');
}
int uccanondecomp(const unsigned long *in, int inlen, unsigned long **out,
int *outlen)
This function decomposes an input string and does canonical reordering of
the characters at the same time.
If a -1 is returned, memory allocation was not successful. If a zero is
returned, no decomposition occured. Any other value means the output string
contains the fully decomposed string in canonical order.
If the "outlen" parameter comes back with a value > 0, then the string
returned in the "out" parameter needs to be deallocated by the caller.
-----------------------------------------------------------------------------
int ucdecomp_hangul(unsigned long code, unsigned long *num,
unsigned long decomp[])
This function determines if a Hangul syllable has a decomposition and
returns the decomposition information.
An array of at least size 3 should be passed to the function for the
decomposition of the syllable.
If a zero is returned, the character is not a Hangul syllable. If a
non-zero is returned, the `num' field will be 2 or 3 and the syllable will
be decomposed into the `decomp' array arithmetically.
Example call:
unsigned long i, num, decomp[3];
if (ucdecomp_hangul(0xb1ba, &num, &decomp) != 0) {
for (i = 0; i < num; i++)
printf("0x%08lX,", decomp[i]);
putchar('\n');
}
-----------------------------------------------------------------------------
int uccomp(unsigned long ch1, unsigned long ch2, unsigned long *comp)
This function takes a pair of characters and determines if they combine to
form another character.
If a zero is returned, no composition is formed by the character pair. Any
other value indicates the "comp" parameter has a value.
int uccomp_hangul(unsigned long *str, int len)
This function composes the Hangul Jamo in the string. The composition is
done in-place.
The return value provides the new length of the string. This will be
smaller than "len" if compositions occured.
int uccanoncomp(unsigned long *str, int len)
This function does a canonical composition of characters in the string.
The return value is the new length of the string.
-----------------------------------------------------------------------------
struct ucnumber {
int numerator;
int denominator;
};
int ucnumber_lookup(unsigned long code, struct ucnumber *num)
This function determines if the code is a number and fills in the `num'
field with the numerator and denominator. If the code happens to be a
single digit, the denominator field will be 1.
####
The original code would set numerator = denominator for regular digits.
However, the Readme also claimed to be compatible with John Cowan's uctype
library, but this behavior is both nonsensical and incompatible with the
Cowan library. As such, it has been fixed here as described above.
-- hyc@openldap.org
####
If the function returns 0, the code is not a number. Any other return
value means the code is a number.
int ucdigit_lookup(unsigned long code, int *digit)
This function determines if the code is a digit and fills in the `digit'
field with the digit value.
If the function returns 0, the code is not a number. Any other return
value means the code is a number.
struct ucnumber ucgetnumber(unsigned long code)
This is a compatibility function with John Cowan's "uctype" package. It
uses ucnumber_lookup().
int ucgetdigit(unsigned long code)
This is a compatibility function with John Cowan's "uctype" package. It
uses ucdigit_lookup().
-----------------------------------------------------------------------------
unsigned long uctoupper(unsigned long code)
This function returns the code unchanged if it is already upper case or has
no upper case equivalent. Otherwise the upper case equivalent is returned.
-----------------------------------------------------------------------------
unsigned long uctolower(unsigned long code)
This function returns the code unchanged if it is already lower case or has
no lower case equivalent. Otherwise the lower case equivalent is returned.
-----------------------------------------------------------------------------
unsigned long uctotitle(unsigned long code)
This function returns the code unchanged if it is already title case or has
no title case equivalent. Otherwise the title case equivalent is returned.
-----------------------------------------------------------------------------
int ucisalpha(unsigned long code)
int ucisalnum(unsigned long code)
int ucisdigit(unsigned long code)
int uciscntrl(unsigned long code)
int ucisspace(unsigned long code)
int ucisblank(unsigned long code)
int ucispunct(unsigned long code)
int ucisgraph(unsigned long code)
int ucisprint(unsigned long code)
int ucisxdigit(unsigned long code)
int ucisupper(unsigned long code)
int ucislower(unsigned long code)
int ucistitle(unsigned long code)
These functions (actually macros) determine if a character has these
properties. These behave in a fashion very similar to the venerable ctype
package.
-----------------------------------------------------------------------------
int ucisisocntrl(unsigned long code)
Is the character a C0 control character (< 32) ?
int ucisfmtcntrl(unsigned long code)
Is the character a format control character?
int ucissymbol(unsigned long code)
Is the character a symbol?
int ucisnumber(unsigned long code)
Is the character a number or digit?
int ucisnonspacing(unsigned long code)
Is the character non-spacing?
int ucisopenpunct(unsigned long code)
Is the character an open/left punctuation (i.e. '[')
int ucisclosepunct(unsigned long code)
Is the character an close/right punctuation (i.e. ']')
int ucisinitialpunct(unsigned long code)
Is the character an initial punctuation (i.e. U+2018 LEFT SINGLE QUOTATION
MARK)
int ucisfinalpunct(unsigned long code)
Is the character a final punctuation (i.e. U+2019 RIGHT SINGLE QUOTATION
MARK)
int uciscomposite(unsigned long code)
Can the character be decomposed into a set of other characters?
int ucisquote(unsigned long code)
Is the character one of the many quotation marks?
int ucissymmetric(unsigned long code)
Is the character one that has an opposite form (i.e. <>)
int ucismirroring(unsigned long code)
Is the character mirroring (superset of symmetric)?
int ucisnonbreaking(unsigned long code)
Is the character non-breaking (i.e. non-breaking space)?
int ucisrtl(unsigned long code)
Does the character have strong right-to-left directionality (i.e. Arabic
letters)?
int ucisltr(unsigned long code)
Does the character have strong left-to-right directionality (i.e. Latin
letters)?
int ucisstrong(unsigned long code)
Does the character have strong directionality?
int ucisweak(unsigned long code)
Does the character have weak directionality (i.e. numbers)?
int ucisneutral(unsigned long code)
Does the character have neutral directionality (i.e. whitespace)?
int ucisseparator(unsigned long code)
Is the character a block or segment separator?
int ucislsep(unsigned long code)
Is the character a line separator?
int ucispsep(unsigned long code)
Is the character a paragraph separator?
int ucismark(unsigned long code)
Is the character a mark of some kind?
int ucisnsmark(unsigned long code)
Is the character a non-spacing mark?
int ucisspmark(unsigned long code)
Is the character a spacing mark?
int ucismodif(unsigned long code)
Is the character a modifier letter?
int ucismodifsymbol(unsigned long code)
Is the character a modifier symbol?
int ucisletnum(unsigned long code)
Is the character a number represented by a letter?
int ucisconnect(unsigned long code)
Is the character connecting punctuation?
int ucisdash(unsigned long code)
Is the character dash punctuation?
int ucismath(unsigned long code)
Is the character a math character?
int uciscurrency(unsigned long code)
Is the character a currency character?
int ucisenclosing(unsigned long code)
Is the character enclosing (i.e. enclosing box)?
int ucisprivate(unsigned long code)
Is the character from the Private Use Area?
int ucissurrogate(unsigned long code)
Is the character one of the surrogate codes?
int ucisdefined(unsigned long code)
Is the character defined (appeared in one of the data files)?
int ucisundefined(unsigned long code)
Is the character not defined (non-Unicode)?
int ucishan(unsigned long code)
Is the character a Han ideograph?
int ucishangul(unsigned long code)
Is the character a pre-composed Hangul syllable?
|