summaryrefslogtreecommitdiffstats
path: root/vendor/regex/UNICODE.md
diff options
context:
space:
mode:
Diffstat (limited to 'vendor/regex/UNICODE.md')
-rw-r--r--vendor/regex/UNICODE.md13
1 files changed, 6 insertions, 7 deletions
diff --git a/vendor/regex/UNICODE.md b/vendor/regex/UNICODE.md
index df7d21ed9..60db0aad1 100644
--- a/vendor/regex/UNICODE.md
+++ b/vendor/regex/UNICODE.md
@@ -8,7 +8,8 @@ Full support for Level 1 ("Basic Unicode Support") is provided with two
exceptions:
1. Line boundaries are not Unicode aware. Namely, only the `\n`
- (`END OF LINE`) character is recognized as a line boundary.
+ (`END OF LINE`) character is recognized as a line boundary by default.
+ One can opt into `\r\n|\r|\n` being a line boundary via CRLF mode.
2. The compatibility properties specified by
[RL1.2a](https://unicode.org/reports/tr18/#RL1.2a)
are ASCII-only definitions.
@@ -229,12 +230,10 @@ then all characters classes are case folded as well.
[UTS#18 RL1.6](https://unicode.org/reports/tr18/#Line_Boundaries)
The regex crate only provides support for recognizing the `\n` (`END OF LINE`)
-character as a line boundary. This choice was made mostly for implementation
-convenience, and to avoid performance cliffs that Unicode word boundaries are
-subject to.
-
-Ideally, it would be nice to at least support `\r\n` as a line boundary as
-well, and in theory, this could be done efficiently.
+character as a line boundary by default. One can also opt into treating
+`\r\n|\r|\n` as a line boundary via CRLF mode. This choice was made mostly for
+implementation convenience, and to avoid performance cliffs that Unicode word
+boundaries are subject to.
## RL1.7 Code Points