1 files changed, 569 insertions, 0 deletions
diff --git a/src/fluent-bit/lib/onigmo/doc/RE b/src/fluent-bit/lib/onigmo/doc/RE
new file mode 100644
index 000000000..4d2cdaba3
--- /dev/null
+++ b/src/fluent-bit/lib/onigmo/doc/RE
@@ -0,0 +1,569 @@
+Onigmo (Oniguruma-mod) Regular Expressions Version 6.1.0    2016/12/25
+
+syntax: ONIG_SYNTAX_RUBY (default)
+
+
+1. Syntax elements
+
+  \       escape (enable or disable meta character)
+  |       alternation
+  (...)   group
+  [...]   character class
+
+
+2. Characters
+
+  \t           horizontal tab         (0x09)
+  \v           vertical tab           (0x0B)
+  \n           newline (line feed)    (0x0A)
+  \r           carriage return        (0x0D)
+  \b           backspace              (0x08)
+  \f           form feed              (0x0C)
+  \a           bell                   (0x07)
+  \e           escape                 (0x1B)
+  \nnn         octal char             (encoded byte value)
+  \xHH         hexadecimal char       (encoded byte value)
+  \x{7HHHHHHH} wide hexadecimal char  (character code point value)
+  \uHHHH       wide hexadecimal char  (character code point value)
+  \cx          control char           (character code point value)
+  \C-x         control char           (character code point value)
+  \M-x         meta  (x|0x80)         (character code point value)
+  \M-\C-x      meta control char      (character code point value)
+
+ (* \b as backspace is effective in character class only)
+
+  * ONIG_SYNTAX_PERL: \o{nnn} (octal char) can be also used.
+
+
+3. Character types
+
+  .        any character (except newline)
+
+  \w       word character
+
+           Not Unicode:
+             alphanumeric and "_".
+
+           Unicode:
+             General_Category -- (Letter|Mark|Number|Connector_Punctuation)
+
+           It depends on ONIG_OPTION_ASCII_RANGE option that non-ASCII char
+           includes or not.
+
+  \W       non-word char
+
+  \s       whitespace char
+
+           Not Unicode:
+             \t, \n, \v, \f, \r, \x20
+
+           Unicode:
+             0009, 000A, 000B, 000C, 000D, 0085(NEL),
+             General_Category -- Line_Separator
+                              -- Paragraph_Separator
+                              -- Space_Separator
+
+           It depends on ONIG_OPTION_ASCII_RANGE option that non-ASCII char
+           includes or not.
+
+  \S       non-whitespace char
+
+  \d       decimal digit char
+
+           Unicode: General_Category -- Decimal_Number
+
+           It depends on ONIG_OPTION_ASCII_RANGE option that non-ASCII char
+           includes or not.
+
+  \D       non-decimal-digit char
+
+  \h       hexadecimal-digit char   [0-9a-fA-F]
+
+  \H       non-hexadecimal-digit char
+
+
+  Character Property
+
+    * \p{property-name}
+    * \p{^property-name}    (negative)
+    * \P{property-name}     (negative)
+
+    property-name:
+
+     + works on all encodings
+       Alnum, Alpha, Blank, Cntrl, Digit, Graph, Lower,
+       Print, Punct, Space, Upper, XDigit, Word, ASCII
+
+     + works on EUC_JP, Shift_JIS, CP932
+       Hiragana, Katakana, Han, Latin, Greek, Cyrillic
+
+     + works on UTF-8, UTF-16, UTF-32
+       see UnicodeProps.txt
+
+    \p{Punct} works slightly different on Unicode encodings and the other
+    encodings. It matches the nine characters "$+<=>^`|~" on non-Unicode
+    encodings (which is the same as [[:punct:]]), but not on Unicode encodings.
+    \p{XPosixPunct} matches the nine characters on Unicode encodings.
+
+
+  \R       Linebreak
+
+           Unicode:
+             (?>\x0D\x0A|[\x0A-\x0D\x{85}\x{2028}\x{2029}])
+
+           Not Unicode:
+             (?>\x0D\x0A|[\x0A-\x0D])
+
+  \X       Extended Grapheme cluster
+
+           Unicode:
+             See: Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION
+                  http://unicode.org/reports/tr29/
+
+           Not Unicode:
+             (?>\x0D\x0A|(?m:.))
+
+
+
+4. Quantifier
+
+  greedy
+
+    ?       1 or 0 times
+    *       0 or more times
+    +       1 or more times
+    {n,m}   at least n but no more than m times
+    {n,}    at least n times
+    {,n}    at least 0 but no more than n times ({0,n})
+    {n}     n times
+
+  reluctant
+
+    ??      1 or 0 times
+    *?      0 or more times
+    +?      1 or more times
+    {n,m}?  at least n but not more than m times
+    {n,}?   at least n times
+    {,n}?   at least 0 but not more than n times (== {0,n}?)
+
+  possessive (greedy and does not backtrack once match)
+
+    ?+      1 or 0 times
+    *+      0 or more times
+    ++      1 or more times
+
+    ({n,m}+, {n,}+, {n}+ are possessive op. in ONIG_SYNTAX_JAVA and
+    ONIG_SYNTAX_PERL only)
+
+    ex. /a*+/ === /(?>a*)/
+
+
+5. Anchors
+
+  ^       beginning of the line
+  $       end of the line
+  \b      word boundary
+  \B      non-word boundary
+  \A      beginning of string
+  \Z      end of string, or before newline at the end
+  \z      end of string
+  \G      where the current search attempt begins
+
+
+6. Character class
+
+  ^...    negative class (lowest precedence)
+  x-y     range from x to y
+  [...]   set (character class in character class)
+  ..&&..  intersection (low precedence, only higher than ^)
+
+    ex. [a-w&&[^c-g]z] ==> ([a-w] AND ([^c-g] OR z)) ==> [abh-w]
+
+  * If you want to use '[', '-', or ']' as a normal character
+    in character class, you should escape them with '\'.
+
+
+  POSIX bracket ([:xxxxx:], negate [:^xxxxx:])
+
+    Not Unicode Case:
+
+      alnum    alphabet or digit char
+      alpha    alphabet
+      ascii    code value: [0 - 127]
+      blank    \t, \x20
+      cntrl
+      digit    0-9
+      graph    \x21-\x7E and all of multibyte encoded characters
+      lower
+      print    \x20-\x7E and all of multibyte encoded characters
+      punct
+      space    \t, \n, \v, \f, \r, \x20
+      upper
+      xdigit   0-9, a-f, A-F
+      word     alphanumeric, "_" and multibyte characters
+
+
+    Unicode Case:
+
+      alnum    Letter | Mark | Decimal_Number
+      alpha    Letter | Mark
+      ascii    0000 - 007F
+      blank    Space_Separator | 0009
+      cntrl    Control | Format | Unassigned | Private_Use | Surrogate
+      digit    Decimal_Number
+      graph    [[:^space:]] && ^Control && ^Unassigned && ^Surrogate
+      lower    Lowercase_Letter
+      print    [[:graph:]] | Space_Separator
+      punct    Connector_Punctuation | Dash_Punctuation | Close_Punctuation |
+               Final_Punctuation | Initial_Punctuation | Other_Punctuation |
+               Open_Punctuation | 0024 | 002B | 003C | 003D | 003E | 005E |
+               0060 | 007C | 007E
+      space    Space_Separator | Line_Separator | Paragraph_Separator |
+               0009 | 000A | 000B | 000C | 000D | 0085
+      upper    Uppercase_Letter
+      xdigit   0030 - 0039 | 0041 - 0046 | 0061 - 0066
+               (0-9, a-f, A-F)
+      word     Letter | Mark | Decimal_Number | Connector_Punctuation
+
+
+    It depends on ONIG_OPTION_ASCII_RANGE option and
+    ONIG_OPTION_POSIX_BRACKET_ALL_RANGE option that POSIX brackets
+    match non-ASCII char or not.
+
+
+
+7. Extended groups
+
+  (?#...)            comment
+
+  (?imxdau-imx)      option on/off
+                         i: ignore case
+                         m: multi-line (dot (.) also matches newline)
+                         x: extended form
+
+                       character set option (character range option)
+                         d: Default (compatible with Ruby 1.9.3)
+                            \w, \d and \s doesn't match non-ASCII characters.
+                            \b, \B and POSIX brackets use the each encoding's
+                            rules.
+                         a: ASCII
+                            ONIG_OPTION_ASCII_RANGE option is turned on.
+                            \w, \d, \s and POSIX brackets doesn't match
+                            non-ASCII characters.
+                            \b and \B use the ASCII rules.
+                         u: Unicode
+                            ONIG_OPTION_ASCII_RANGE option is turned off.
+                            \w (\W), \d (\D), \s (\S), \b (\B) and POSIX
+                            brackets use the each encoding's rules.
+
+  (?imxdau-imx:subexp)
+                     option on/off for subexp
+
+  (?:subexp)         non-capturing group
+  (subexp)           capturing group
+
+  (?=subexp)         look-ahead
+  (?!subexp)         negative look-ahead
+  (?<=subexp)        look-behind
+  (?<!subexp)        negative look-behind
+
+                     Subexp of look-behind must be fixed-width.
+                     But top-level alternatives can be of various lengths.
+                     ex. (?<=a|bc) is OK. (?<=aaa(?:b|cd)) is not allowed.
+
+                     In negative look-behind, capturing group isn't allowed,
+                     but non-capturing group (?:) is allowed.
+
+  \K                 keep
+                     Another expression of look-behind. Keep the stuff left
+                     of the \K, don't include it in the result.
+
+  (?>subexp)         atomic group
+                     no backtracks in subexp.
+
+  (?<name>subexp), (?'name'subexp)
+                     define named group
+                     (Each character of the name must be a word character.)
+
+                     Not only a name but a number is assigned like a capturing
+                     group.
+
+                     Assigning the same name to two or more subexps is allowed.
+
+  (?(cond)yes-subexp), (?(cond)yes-subexp|no-subexp)
+                    conditional expression
+                    Matches yes-subexp if (cond) yields a true value, matches
+                    no-subexp otherwise.
+                    Following (cond) can be used:
+
+                    (n)  (n >= 1)
+                        Checks if the numbered capturing group has matched
+                        something.
+
+                    (<name>), ('name')
+                        Checks if a group with the given name has matched
+                        something.
+
+                        BUG: If the name is defined more than once, the
+                        left-most group is checked, but it should be the
+                        same as \k<name>.
+
+  (?~subexp)        absence operator (experimental)
+                    Matches any string which doesn't contain any string which
+                    matches subexp.
+                    More precisely, (?~subexp) matches the complement set of
+                    a set which .*subexp.* matches. This is regular in the
+                    meaning of formal language theory.
+                    Similar to (?:(?!subexp).)*, but easy to write.
+
+                    E.g.:
+                      (?~abc) matches: "", "ab", "aab", "ccdd", etc.
+                      It doesn't match: "abc", "aabc", "ccabcdd", etc.
+
+                      \/\*(?~\*\/)\*\/ matches C style comments:
+                        "/**/", "/* foobar */", etc.
+
+                      \A\/\*(?~\*\/)\*\/\z doesn't match "/**/ */".
+                      This is different from \A\/\*.*?\*\/\z which uses a
+                      reluctant quantifier (.*?).
+
+                      Unlike (?:(?!abc).)*c, (?~abc)c matches "abc", because
+                      (?~abc) matches "ab".
+
+                      (?~) never matches.
+
+                    Theoretical backgrounds are discussed in Tanaka Akira's
+                    paper and slide (both Japanese):
+
+                      * Absent Operator for Regular Expression
+                        https://staff.aist.go.jp/tanaka-akira/pub/prosym49-akr-paper.pdf
+                      * 正規表現における非包含オペレータの提案
+                        https://staff.aist.go.jp/tanaka-akira/pub/prosym49-akr-presen.pdf
+
+
+8. Backreferences
+
+  When we say "backreference a group," it actually means, "re-match the same
+  text matched by the subexp in that group."
+
+  \n  \k<n>     \k'n'     (n >= 1) backreference the nth group in the regexp
+      \k<-n>    \k'-n'    (n >= 1) backreference the nth group counting
+                          backwards from the referring position
+      \k<name>  \k'name'  backreference a group with the specified name
+
+  When backreferencing with a name that is assigned to more than one groups,
+  the last group with the name is checked first, if not matched then the
+  previous one with the name, and so on, until there is a match.
+
+  * Backreference by number is forbidden if any named group is defined and
+    ONIG_OPTION_CAPTURE_GROUP is not set.
+
+  * ONIG_SYNTAX_PERL: \g{n}, \g{-n} and \g{name} can also be used.
+    If a name is defined more than once in Perl syntax, only the left-most
+    group is checked.
+
+
+  backreference with recursion level
+
+    (n >= 1, level >= 0)
+
+    \k<n+level>  \k'n+level'
+    \k<n-level>  \k'n-level'
+    \k<-n+level> \k'-n+level'
+    \k<-n-level> \k'-n-level'
+
+    \k<name+level> \k'name+level'
+    \k<name-level> \k'name-level'
+
+    Refer a group on the recursion level relative to the referring position.
+
+    ex 1.
+
+      /\A(?<a>|.|(?:(?<b>.)\g<a>\k<b>))\z/.match("reee")
+      /\A(?<a>|.|(?:(?<b>.)\g<a>\k<b+0>))\z/.match("reer")
+
+      \k<b+0> refers to the (?<b>.) on the same recursion level with it.
+
+    ex 2.
+
+      r = Regexp.compile(<<'__REGEXP__'.strip, Regexp::EXTENDED)
+      (?<element> \g<stag> \g<content>* \g<etag> ){0}
+      (?<stag> < \g<name> \s* > ){0}
+      (?<name> [a-zA-Z_:]+ ){0}
+      (?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}
+      (?<etag> </ \k<name+1> >){0}
+      \g<element>
+      __REGEXP__
+
+      p r.match("<foo>f<bar>bbb</bar>f</foo>").captures
+
+
+9. Subexp calls ("Tanaka Akira special")
+
+  When we say "call a group," it actually means, "re-execute the subexp in
+  that group."
+
+  \g<0>     \g'0'     call the whole pattern recursively
+  \g<n>     \g'n'     (n >= 1) call the nth group
+  \g<-n>    \g'-n'    (n >= 1) call the nth group counting backwards from
+                      the calling position
+  \g<+n>    \g'+n'    (n >= 1) call the nth group counting forwards from
+                      the calling position
+  \g<name>  \g'name'  call the group with the specified name
+
+  * Left-most recursive calls are not allowed.
+
+    ex. (?<name>a|\g<name>b)    => error
+        (?<name>a|b\g<name>c)   => OK
+
+  * Calls with a name that is assigned to more than one groups are not
+    allowed in ONIG_SYNTAX_RUBY.
+
+  * Call by number is forbidden if any named group is defined and
+    ONIG_OPTION_CAPTURE_GROUP is not set.
+
+  * The option status of the called group is always effective.
+
+    ex. /(?-i:\g<name>)(?i:(?<name>a)){0}/.match("A")
+
+  * ONIG_SYNTAX_PERL:
+    Use (?&name), (?n), (?-n), (?+n), (?R) or (?0) instead of \g<>.
+    Calls with a name that is assigned to more than one groups are allowed,
+    and the left-most subexp is used.
+
+
+10. Captured group
+
+  Behavior of an unnamed group (...) changes with the following conditions.
+  (But named group is not changed.)
+
+  case 1. /.../     (named group is not used, no option)
+
+     (...) is treated as a capturing group.
+
+  case 2. /.../g    (named group is not used, 'g' option)
+
+     (...) is treated as a non-capturing group (?:...).
+
+  case 3. /..(?<name>..)../   (named group is used, no option)
+
+     (...) is treated as a non-capturing group.
+     numbered-backref/call is not allowed.
+
+  case 4. /..(?<name>..)../G  (named group is used, 'G' option)
+
+     (...) is treated as a capturing group.
+     numbered-backref/call is allowed.
+
+  where
+    g: ONIG_OPTION_DONT_CAPTURE_GROUP
+    G: ONIG_OPTION_CAPTURE_GROUP
+
+  ('g' and 'G' options are argued in ruby-dev ML)
+
+
+
+-----------------------------
+A-1. Syntax-dependent options
+
+   + ONIG_SYNTAX_RUBY
+     (?m): dot (.) also matches newline
+
+   + ONIG_SYNTAX_PERL, ONIG_SYNTAX_JAVA and ONIG_SYNTAX_PYTHON
+     (?s): dot (.) also matches newline
+     (?m): ^ matches after newline, $ matches before newline
+
+   + ONIG_SYNTAX_PERL
+     (?d), (?l): same as (?u)
+
+
+A-2. Original extensions
+
+   + hexadecimal digit char type  \h, \H
+   + named group                  (?<name>...), (?'name'...)
+   + named backref                \k<name>
+   + subexp call                  \g<name>, \g<group-num>
+
+
+A-3. Missing features compared with perl 5.18.0
+
+   + \N{name}, \N{U+xxxx}, \N
+   + \l,\u,\L,\U, \C
+   + \v, \V, \h, \H
+   + (?{code})
+   + (??{code})
+   + (?|...)
+   + (?[])
+   + (*VERB:ARG)
+
+   * \Q...\E
+     This is effective on ONIG_SYNTAX_PERL and ONIG_SYNTAX_JAVA.
+
+
+A-4. Differences with Japanized GNU regex(version 0.12) of Ruby 1.8
+
+   + add character property (\p{property}, \P{property})
+   + add hexadecimal digit char type (\h, \H)
+   + add look-behind
+     (?<=fixed-width-pattern), (?<!fixed-width-pattern)
+   + add possessive quantifier. ?+, *+, ++
+   + add operations in character class. [], &&
+     ('[' must be escaped as an usual char in character class.)
+   + add named group and subexp call.
+   + octal or hexadecimal number sequence can be treated as
+     a multibyte code char in character class if multibyte encoding
+     is specified.
+     (ex. [\xa1\xa2], [\xa1\xa7-\xa4\xa1])
+   + allow the range of single byte char and multibyte char in character
+     class.
+     ex. /[a-<<any EUC-JP character>>]/ in EUC-JP encoding.
+   + effect range of isolated option is to next ')'.
+     ex. (?:(?i)a|b) is interpreted as (?:(?i:a|b)), not (?:(?i:a)|b).
+   + isolated option is not transparent to previous pattern.
+     ex. a(?i)* is a syntax error pattern.
+   + allowed unpaired left brace as a normal character.
+     ex. /{/, /({)/, /a{2,3/ etc...
+   + negative POSIX bracket [:^xxxx:] is supported.
+   + POSIX bracket [:ascii:] is added.
+   + repeat of look-ahead is not allowed.
+     ex. /(?=a)*/, /(?!b){5}/
+   + Ignore case option is effective to escape sequence.
+     ex. /\x61/i =~ "A"
+   + In the range quantifier, the number of the minimum is optional.
+     /a{,n}/ == /a{0,n}/
+     The omission of both minimum and maximum values is not allowed.
+     /a{,}/
+   + /{n}?/ is not a reluctant quantifier.
+     /a{n}?/ == /(?:a{n})?/
+   + invalid back reference is checked and raises error.
+     /\1/, /(a)\2/
+   + Zero-width match in an infinite loop stops the repeat,
+     then changes of the capture group status are checked as stop condition.
+     /(?:()|())*\1\2/ =~ ""
+     /(?:\1a|())*/ =~ "a"
+
+
+A-5. Features disabled in default syntax
+
+   + capture history
+
+     (?@...) and (?@<name>...)
+
+     ex. /(?@a)*/.match("aaa") ==> [<0-1>, <1-2>, <2-3>]
+
+     see sample/listcap.c file.
+
+
+A-6. Problems
+
+   + Invalid encoding byte sequence is not checked.
+
+     ex. UTF-8
+
+     * Invalid first byte is treated as a character.
+       /./u =~ "\xa3"
+
+     * Incomplete byte sequence is not checked.
+       /\w+/ =~ "a\xf3\x8ec"
+
+// END