summaryrefslogtreecommitdiffstats
path: root/vendor/regex/UNICODE.md
diff options
context:
space:
mode:
authorDaniel Baumann <daniel.baumann@progress-linux.org>2024-06-19 09:26:03 +0000
committerDaniel Baumann <daniel.baumann@progress-linux.org>2024-06-19 09:26:03 +0000
commit9918693037dce8aa4bb6f08741b6812923486c18 (patch)
tree21d2b40bec7e6a7ea664acee056eb3d08e15a1cf /vendor/regex/UNICODE.md
parentReleasing progress-linux version 1.75.0+dfsg1-5~progress7.99u1. (diff)
downloadrustc-9918693037dce8aa4bb6f08741b6812923486c18.tar.xz
rustc-9918693037dce8aa4bb6f08741b6812923486c18.zip
Merging upstream version 1.76.0+dfsg1.
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'vendor/regex/UNICODE.md')
-rw-r--r--vendor/regex/UNICODE.md13
1 files changed, 6 insertions, 7 deletions
diff --git a/vendor/regex/UNICODE.md b/vendor/regex/UNICODE.md
index df7d21ed9..60db0aad1 100644
--- a/vendor/regex/UNICODE.md
+++ b/vendor/regex/UNICODE.md
@@ -8,7 +8,8 @@ Full support for Level 1 ("Basic Unicode Support") is provided with two
exceptions:
1. Line boundaries are not Unicode aware. Namely, only the `\n`
- (`END OF LINE`) character is recognized as a line boundary.
+ (`END OF LINE`) character is recognized as a line boundary by default.
+ One can opt into `\r\n|\r|\n` being a line boundary via CRLF mode.
2. The compatibility properties specified by
[RL1.2a](https://unicode.org/reports/tr18/#RL1.2a)
are ASCII-only definitions.
@@ -229,12 +230,10 @@ then all characters classes are case folded as well.
[UTS#18 RL1.6](https://unicode.org/reports/tr18/#Line_Boundaries)
The regex crate only provides support for recognizing the `\n` (`END OF LINE`)
-character as a line boundary. This choice was made mostly for implementation
-convenience, and to avoid performance cliffs that Unicode word boundaries are
-subject to.
-
-Ideally, it would be nice to at least support `\r\n` as a line boundary as
-well, and in theory, this could be done efficiently.
+character as a line boundary by default. One can also opt into treating
+`\r\n|\r|\n` as a line boundary via CRLF mode. This choice was made mostly for
+implementation convenience, and to avoid performance cliffs that Unicode word
+boundaries are subject to.
## RL1.7 Code Points