blob: 1697827d4bd8ef54a1eba8cb2db4b4adb6ab8d21 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
|
<text>
The Canterbury Corpus
file size packed size bpb corruption
text: 152089 40695 2.14059 no
play: 125179 37421 2.39152 no
html: 24603 6859 2.2303 no
Csrc: 11150 2792 2.00323 no
list: 3721 1084 2.33056 no
Excl: 1029744 156897 1.21892 no
tech: 426754 102805 1.9272 no
poem: 481861 136664 2.26894 no
fax: 513216 51109 0.796686 no
SPRC: 38240 12590 2.63389 no
man: 4227 1530 2.89567 no
average: 2.07614
time: 3.062sec
The Calgary Corpus
file size packed size bpb corruption
bib: 111261 26039 1.87228 no
book1: 768771 218772 2.27659 no
book2: 610856 151985 1.99045 no
geo: 102400 59371 4.63836 no
news: 377109 115334 2.4467 no
obj1: 21504 9832 3.65774 no
obj2: 246814 75065 2.43309 no
paper1: 53161 15263 2.29687 no
paper2: 82199 23368 2.27429 no
pic: 513216 51109 0.796686 no
progc: 39611 11549 2.33248 no
progl: 71646 15297 1.70806 no
progp: 49379 10447 1.69254 no
trans: 93695 17677 1.50932 no
average: 2.28039
time: 4.172sec
The Artificial Corpus
file size packed size bpb corruption
a: 1 7 56 no
aaa: 100000 17 0.00136 no
alphabet: 100000 65 0.0052 no
random: 100000 82599 6.60792 no
average: 15.6536
time: 672ms
The Large Corpus
file size packed size bpb corruption
E.coli: 4638690 1130095 1.94899 no
bible: 4047392 848956 1.67803 no
word: 2473400 542983 1.75623 no
average: 1.79442
time: 6.516sec
</text>
|