diff options
author | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 17:44:55 +0000 |
---|---|---|
committer | Daniel Baumann <daniel.baumann@progress-linux.org> | 2024-05-04 17:44:55 +0000 |
commit | 5068d34c08f951a7ea6257d305a1627b09a95817 (patch) | |
tree | 08213e2be853396a3b07ce15dbe222644dcd9a89 /test/expected/test_text_file.sh_ac872aadda29b9a824361a2c711d62ec1c75d40f.out | |
parent | Initial commit. (diff) | |
download | lnav-upstream.tar.xz lnav-upstream.zip |
Adding upstream version 0.11.1.upstream/0.11.1upstream
Signed-off-by: Daniel Baumann <daniel.baumann@progress-linux.org>
Diffstat (limited to 'test/expected/test_text_file.sh_ac872aadda29b9a824361a2c711d62ec1c75d40f.out')
-rw-r--r-- | test/expected/test_text_file.sh_ac872aadda29b9a824361a2c711d62ec1c75d40f.out | 74 |
1 files changed, 74 insertions, 0 deletions
diff --git a/test/expected/test_text_file.sh_ac872aadda29b9a824361a2c711d62ec1c75d40f.out b/test/expected/test_text_file.sh_ac872aadda29b9a824361a2c711d62ec1c75d40f.out new file mode 100644 index 0000000..6c9b5ae --- /dev/null +++ b/test/expected/test_text_file.sh_ac872aadda29b9a824361a2c711d62ec1c75d40f.out @@ -0,0 +1,74 @@ +[1m[31m✘ error[0m: unable to parse markdown file + [1m[31mreason[0m: file has invalid UTF-8 at offset 4461: Expecting bytes in the following ranges: 00..7F C2..F4. + +UTF-8 decoder capability and stress test +---------------------------------------- + +Markus Kuhn <http://www.cl.cam.ac.uk/~mgk25/> - 2015-08-28 - CC BY 4.0 + +This test file can help you examine, how your UTF-8 decoder handles +various types of correct, malformed, or otherwise interesting UTF-8 +sequences. This file is not meant to be a conformance test. It does +not prescribe any particular outcome. Therefore, there is no way to +[35m"pass"[0m or [35m"fail"[0m this test file, even though the text does suggest a +preferable decoder behaviour at some places. Its aim is, instead, to +help you think about, and test, the behaviour of your UTF-8 decoder on a +systematic collection of unusual inputs. Experience so far suggests +that most first-time authors of UTF-8 decoders find at least one +serious problem in their decoder using this file. + +The test lines below cover boundary conditions, malformed UTF-8 +sequences, as well as correctly encoded UTF-8 sequences of Unicode code +points that should never occur in a correct UTF-8 file. + +According to ISO 10646-1:2000, sections D.7 and 2.3c, a device +receiving UTF-8 shall interpret a "malformed sequence in the same way +that it interprets a character that is outside the adopted subset" and +"characters that are not within the adopted subset shall be indicated +to the user" by a receiving device. One commonly used approach in +UTF-8 decoders is to replace any malformed UTF-8 sequence by a +replacement character (U+FFFD), which looks a bit like an inverted +question mark, or a similar symbol. It might be a good idea to +visually distinguish a malformed UTF-8 sequence from a correctly +encoded Unicode character that is just not available in the current +font but otherwise fully legal, even though ISO 10646-1 doesn't +mandate this. In any case, just ignoring malformed sequences or +unavailable characters does not conform to ISO 10646, will make +debugging more difficult, and can lead to user confusion. + +Please check, whether a malformed UTF-8 sequence is (1) represented at +all, (2) represented by exactly one single replacement character (or +equivalent signal), and (3) the following quotation mark after an +illegal UTF-8 sequence is correctly displayed, i.e. proper +resynchronization takes place immediately after any malformed +sequence. This file says [35m"THE END"[0m in the last line, so if you don't +see that, your decoder crashed somehow before, which should always be +cause for concern. + +All lines in this file are exactly 79 characters long (plus the line +feed). In addition, all lines end with [35m"|"[0m, except for the two test +lines 2.1.1 and 2.2.1, which contain non-printable ASCII controls +U+0000 and U+007F. If you display this file with a fixed-width font, +these [35m"|"[0m characters should all line up in column 79 (right margin). +This allows you to test quickly, whether your UTF-8 decoder finds the +correct number of characters in every line, that is whether each +malformed sequences is replaced by a single replacement character. + +Note that, as an alternative to the notion of malformed sequence used +here, it is also a perfectly acceptable (and in some situations even +preferable) solution to represent each individual byte of a malformed +sequence with a replacement character. If you follow this strategy in +your decoder, then please ignore the [35m"|"[0m column. + + +Here come the tests: | + | +1 Some correct UTF-8 text | + | +You should see the Greek word [35m'kosme'[0m: [35m"κόσμε"[0m | + | +2 Boundary condition test cases | + | +2.1 First possible sequence of a certain length | + | +2.1.1 1 byte (U-00000000): " |