summaryrefslogtreecommitdiffstats
path: root/fluent-bit/lib/onigmo/README.md
blob: ca29c0b7ead4a530cf57433290891cb571581da8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
[![Build Status](https://travis-ci.org/k-takata/Onigmo.svg?branch=master)](https://travis-ci.org/k-takata/Onigmo)
[![Build status](https://ci.appveyor.com/api/projects/status/kndb924qaw1hq72i/branch/master?svg=true)](https://ci.appveyor.com/project/k-takata/onigmo/branch/master)
[![Coverage Status](https://coveralls.io/repos/k-takata/Onigmo/badge.svg?branch=master&service=github)](https://coveralls.io/github/k-takata/Onigmo?branch=master)
[![Coverity Scan Build Status](https://scan.coverity.com/projects/2778/badge.svg)](https://scan.coverity.com/projects/k-takata-onigmo)
[![Code Quality: Cpp](https://img.shields.io/lgtm/grade/cpp/g/k-takata/Onigmo.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/k-takata/Onigmo/context:cpp)
[![Total Alerts](https://img.shields.io/lgtm/alerts/g/k-takata/Onigmo.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/k-takata/Onigmo/alerts)

Onigmo (Oniguruma-mod)
======================

https://github.com/k-takata/Onigmo

Onigmo is a regular expressions library forked from [Oniguruma](https://github.com/kkos/oniguruma).
It focuses to support new expressions like `\K`, `\R`, `(?(cond)yes|no)`
and etc. which are supported in Perl 5.10+.

Since Onigmo is used as the default regexp library of Ruby 2.0 or later,
many patches are backported from Ruby 2.x.

See also the Wiki page:
https://github.com/k-takata/Onigmo/wiki


License
-------

  BSD license.


Install
-------

### Case 1: Unix and Cygwin platform

   1. `./autogen.sh`  (If `configure` doesn't exist.)
   2. `./configure`
   3. `make`
   4. `make install`

   * test

        make test

   * uninstall

        make uninstall

   * configuration check

        onigmo-config --cflags
        onigmo-config --libs
        onigmo-config --prefix
        onigmo-config --exec-prefix


### Case 2: Windows 64/32bit platform (Visual C++)

   Execute `build_nmake.cmd`.
   `build_x64` or `build_x86` will be used as a working/output directory.

      onigmo_s.lib:  static link library
      onigmo.lib:    import library for dynamic link
      onigmo.dll:    dynamic link library

   * test (ASCII/Shift_JIS/EUC-JP/Unicode)

      Execute `build_nmake.cmd test`.
      Python (with the same bitness of Onigmo) is needed to run the tests.


### Case 3: Windows 64/32bit platform (MinGW)

   Execute `mingw32-make -f win32/Makefile.mingw`.
   `build_x86-64`, `build_i686` and etc. will be used as a working/output
   directory.

      libonigmo.a:     static link library
      libonigmo.dll.a: import library for dynamic link
      onigmo.dll:      dynamic link library

   * test (ASCII/Shift_JIS/EUC-JP/Unicode)

      Execute `mingw32-make -f win32/Makefile.mingw test`.
      Python (with the same bitness of Onigmo) is needed to run the tests.

   * If you use MinGW on MSYS2, you can also use `./configure` and `make`
     like Unix. In this case, DLL name will have API version number. E.g.:

        libonigmo-6.dll


Regular Expressions
-------------------

  See [doc/RE](doc/RE) or [doc/RE.ja](doc/RE.ja) for Japanese.


Usage
-----

  Include onigmo.h in your program. (Onigmo API)
  See [doc/API](doc/API) for Onigmo API.

  If you want to disable `UChar` type (== `unsigned char`) definition
  in onigmo.h, define `ONIG_ESCAPE_UCHAR_COLLISION` and then
  include onigmo.h.

  If you want to disable `regex_t` type definition in onigmo.h,
  define `ONIG_ESCAPE_REGEX_T_COLLISION` and then include onigmo.h.

  Example of the compiling/linking command line in Unix or Cygwin,
  (prefix == /usr/local case)

    cc sample.c -L/usr/local/lib -lonigmo


  If you want to use static link library (onigmo_s.lib) in Win32,
  add option `-DONIG_EXTERN=extern` to C compiler.



Sample Programs
---------------

|File                  |Description                               |
|:---------------------|:-----------------------------------------|
|sample/simple.c       |example of the minimum (Onigmo API)       |
|sample/names.c        |example of the named group callback.      |
|sample/encode.c       |example of some encodings.                |
|sample/listcap.c      |example of the capture history.           |
|sample/posix.c        |POSIX API sample.                         |
|sample/sql.c          |example of the variable meta characters.  |


Test Programs

|File               |Description                            |
|:------------------|:--------------------------------------|
|sample/syntax.c    |Perl, Java and ASIS syntax test.       |
|sample/crnl.c      |CRNL test                              |



Source Files
------------

|File                |Description                                            |
|:-------------------|:------------------------------------------------------|
|onigmo.h            |Onigmo API header file (public)                        |
|onigmo-config.in    |configuration check program template                   |
|onigmo.py           |Onigmo module for Python                               |
|regenc.h            |character encodings framework header file              |
|regint.h            |internal definitions                                   |
|regparse.h          |internal definitions for regparse.c and regcomp.c      |
|regcomp.c           |compiling and optimization functions                   |
|regenc.c            |character encodings framework                          |
|regerror.c          |error message function                                 |
|regext.c            |extended API functions (deluxe version API)            |
|regexec.c           |search and match functions                             |
|regparse.c          |parsing functions.                                     |
|regsyntax.c         |pattern syntax functions and built-in syntax definition|
|regtrav.c           |capture history tree data traverse functions           |
|regversion.c        |version info function                                  |
|st.h                |hash table functions header file                       |
|st.c                |hash table functions                                   |
|onigmognu.h         |GNU regex API header file (public)                     |
|reggnu.c            |GNU regex API functions                                |
|onigmoposix.h       |POSIX API header file (public)                         |
|regposerr.c         |POSIX error message function                           |
|regposix.c          |POSIX API functions                                    |
|enc/mktable.c       |character type table generator                         |
|enc/ascii.c         |ASCII-8BIT encoding                                    |
|enc/jis/            |JIS properties data                                    |
|enc/euc_jp.c        |EUC-JP encoding                                        |
|enc/euc_tw.c        |EUC-TW encoding                                        |
|enc/euc_kr.c        |EUC-KR, EUC-CN encoding                                |
|enc/shift_jis.c     |Shift_JIS encoding                                     |
|enc/shift_jis.h     |Common part of Shift_JIS and Windows-31J encoding      |
|enc/windows_31j.c   |Windows-31J (CP932) encoding                           |
|enc/big5.c          |Big5      encoding                                     |
|enc/gb18030.c       |GB18030   encoding                                     |
|enc/gbk.c           |GBK       encoding                                     |
|enc/koi8_r.c        |KOI8-R    encoding                                     |
|enc/koi8_u.c        |KOI8-U    encoding                                     |
|enc/iso_8859.h      |common definition of ISO-8859 encoding                 |
|enc/iso_8859_1.c    |ISO-8859-1 (Latin-1)                                   |
|enc/iso_8859_2.c    |ISO-8859-2 (Latin-2)                                   |
|enc/iso_8859_3.c    |ISO-8859-3 (Latin-3)                                   |
|enc/iso_8859_4.c    |ISO-8859-4 (Latin-4)                                   |
|enc/iso_8859_5.c    |ISO-8859-5 (Cyrillic)                                  |
|enc/iso_8859_6.c    |ISO-8859-6 (Arabic)                                    |
|enc/iso_8859_7.c    |ISO-8859-7 (Greek)                                     |
|enc/iso_8859_8.c    |ISO-8859-8 (Hebrew)                                    |
|enc/iso_8859_9.c    |ISO-8859-9 (Latin-5 or Turkish)                        |
|enc/iso_8859_10.c   |ISO-8859-10 (Latin-6 or Nordic)                        |
|enc/iso_8859_11.c   |ISO-8859-11 (Thai)                                     |
|enc/iso_8859_13.c   |ISO-8859-13 (Latin-7 or Baltic Rim)                    |
|enc/iso_8859_14.c   |ISO-8859-14 (Latin-8 or Celtic)                        |
|enc/iso_8859_15.c   |ISO-8859-15 (Latin-9 or West European with Euro)       |
|enc/iso_8859_16.c   |ISO-8859-16 (Latin-10)                                 |
|enc/utf_8.c         |UTF-8    encoding                                      |
|enc/utf_16be.c      |UTF-16BE encoding                                      |
|enc/utf_16le.c      |UTF-16LE encoding                                      |
|enc/utf_32be.c      |UTF-32BE encoding                                      |
|enc/utf_32le.c      |UTF-32LE encoding                                      |
|enc/unicode.c       |common codes of Unicode encoding                       |
|enc/unicode/        |Unicode case folding data and properties data          |
|enc/windows_1250.c  |Windows-1250 (CP1250) encoding (Central/Eastern Europe)|
|enc/windows_1251.c  |Windows-1251 (CP1251) encoding (Cyrillic)              |
|enc/windows_1252.c  |Windows-1252 (CP1252) encoding (Latin)                 |
|enc/windows_1253.c  |Windows-1253 (CP1253) encoding (Greek)                 |
|enc/windows_1254.c  |Windows-1254 (CP1254) encoding (Turkish)               |
|enc/windows_1257.c  |Windows-1257 (CP1257) encoding (Baltic Rim)            |
|enc/cp949.c         |CP949 encoding          (only used in Ruby)            |
|enc/emacs_mule.c    |Emacs internal encoding (only used in Ruby)            |
|enc/gb2312.c        |GB2312 encoding         (only used in Ruby)            |
|enc/us_ascii.c      |US-ASCII encoding       (only used in Ruby)            |
|win32/Makefile      |Makefile for Win32 (VC++)                              |
|win32/Makefile.mingw|Makefile for Win32 (MinGW)                             |
|win32/config.h      |config.h for Win32                                     |
|win32/onigmo.rc     |resource file for Win32                                |