src/isa-l/Release_notes.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287

v2.29 Intel Intelligent Storage Acceleration Library Release Notes
==================================================================

RELEASE NOTE CONTENTS
1. KNOWN ISSUES
2. FIXED ISSUES
3. CHANGE LOG & FEATURES ADDED

1. KNOWN ISSUES
----------------

* Perf tests do not run in Windows environment.

* 32-bit lib is not supported in Windows.

2. FIXED ISSUES
---------------
v2.28

* Fix documentation on gf_vect_mad(). Min length listed as 32 instead of
  required min 64 bytes.

v2.27

* Fix lack of install for pkg-config files

v2.26

* Fixes for sanitizer warnings.

v2.25

* Fix for nasm on Mac OS X/darwin.

v2.24

* Fix for crc32_iscsi().  Potential read-over for small buffer.  For an input
  buffer length of less than 8 bytes and aligned to an 8 byte boundary, function
  could read past length.  Previously had the possibility to cause a seg fault
  only for length 0 and invalid buffer passed.  Calculated CRC is unchanged.

* Fix for compression/decompression of > 4GB files.  For streaming compression
  of extremely large files, the total_out parameter would wrap and could
  potentially flag an otherwise valid lookback distance as being invalid.
  Total_out is still 32bit for zlib compatibility.  No inconsistent compressed
  buffers were generated by the issue.

v2.23

* Fix for histogram generation base function.
* Fix library build warnings on macOS.
* Fix igzip to use bsf instruction when tzcnt is not available.

v2.22

* Fix ISA-L builds for other architectures.  Base function and examples
  sanitized for non-IA builds.

* Fix fuzz test script to work with llvm 6.0 builtin libFuzz.

v2.20

* Inflate total_out behavior corrected for in-progress decompression.
  Previously total_out represented the total bytes decompressed into the output
  buffer or temp internal buffer.  This is changed to be only the bytes put into
  the output buffer.

* Fixed issue with isal_create_hufftables_subset.  Affects semi-dynamic
  compression use case when explicitly creating hufftables from histogram.  The
  _hufftables_subset function could fail to generate length symbols for any
  length that were never seen.

v2.19

* Fix erasure code test that violates rs matrix bounds.

* Fix 0 length file and looping errors in igzip_inflate_test.

v2.18

* Mac OS X/darwin systems no longer require the --target=darwin config option.
  The autoconf canonical build should detect.

v2.17

* Fix igzip using 32K window and a shared object

* Fix igzip undefined instruction error on Nehalem.

* Fixed issue in crc performance tests where OS optimizations turned cold cache
  tests into warm tests.

v2.15

* Fix for windows register save in gf_6vect_mad_avx2.asm.  Only affects windows
  versions of ec_encode_data_update() running with AVX2.  A GP register was not
  properly restored resulting in corruption on return.

v2.14

* Building in unit directories is no longer supported removing the issue of
  leftover object files causing the top-level make build to fail.

v2.10

* Fix for windows register save overlap in gf_{3-6}vect_dot_prod_sse.asm. Only
  affects windows versions of erasure code.  GP register saves/restore were
  pushed to same stack area as XMM.

3. CHANGE LOG & FEATURES ADDED
------------------------------
v2.29

* CRC Improvements
  - New AVX512 vclmul versions of crc16_t10dif(), crc32_ieee(), crc32_gzip_refl.

* Erasure code improvements
  - Added AVX512 ec functions with 5 and 6 outputs. Can improve performance for
    codes with 5 or more parity by running in batches of up to 6 at a time.

v2.28

* New next-arch versions of 64-bit CRC. All norm and reflected 64-bit
  polynomials are expanded to utilize vpclmulqdq.

v2.27

* New multi-threaded compression option for igzip cli tool

v2.26

* Adler32 added to external API.
* Multi-arch improvements.
* Performance test improvements.

v2.25

* Igzip performance improvements and features.
  - Performance improvements for uncompressable files. Random or uncompressable
    files can be up to 3x faster in level 1 or 2 compression.
  - Additional small file performance improvments.
  - New options in igzip cli: use name from header or not, test compressed file.

* Multi-arch autoconf script.
  - Autoconf should detect architecture and run base functions at minimum.

v2.24

* Igzip small file performance improvements and new features.
  - Better performance on small files.
  - New gzip/zlib header and trailer handling.
  - New gzip/zlib header parsing helper functions.
  - New user-space compression/decompression tool igzip.

* New mem unit added with first function isal_zero_detect().

v2.23

* Igzip inflate (decompression) performance improvements.
  - Implemented multi-byte decode for inflate.  Decode can pack up to three
    symbols into the decode table making some compressed streams decompress much
    faster depending on the prevalence of short codes.

v2.22

* Igzip: AVX2 version of level 3 compression added.

* Erasure code examples
  - New examples for standard EC encode and decode.
  - Example of piggyback EC encode and decode.

v2.21

* Igzip improvements
  - New compression levels added.  ISA-L fast deflate now has more levels to
    balance speed vs. target compression level.  Level 0, 1 are as in previous
    generations.  New levels 2 & 3 target higher compression roughly comparable
    to zlib levels 2-3.  Level 3 is currently only optimized for processors with
    AVX512 instructions.

* New T10dif & copy function - crc16_t10dif_copy()
  - CRC and copy was added to emulate T10dif operations such as DIF insert and
    strip.  This function stitches together CRC and memcpy operations
    eliminating an extra data read.

* CRC32 iscsi performance improvements
  - Fixes issue under some distributions where warm cache performance was
    reduced.

v2.20

* Igzip improvements
  - Optimized deflate_hash in compression functions.
    Improves performance of using preset dictionary.
  - Removed alignment restrictions on input structure.

v2.19

* Igzip improvements

  - Add optimized Adler-32 checksum.

  - Implement zlib compression format.

  - Add stateful dictionary support.

  - Add struct reset functions for both deflate and inflate.

* Reflected IEEE format CRC32 is released out. Function interface is named
  crc32_gzip_refl.

* Exact work condition of Erasure Code Reed-Solomon Matrix is determined by new
  added program gen_rs_matrix_limits.

v2.18

* New 2-pass fully-dynamic deflate compression (level -1).  ISA-L fast deflate
  now has two levels.  Level 0 (default) is the same as previous generations.
  Setting to level 1 will switch to the fully-dynamic compression that will
  typically reach higher compression ratios.

* RAID AVX512 functions.

v2.17

* New fast decompression (inflate)

* Compression improvements (deflate)
  - Speed and compression ratio improvements.
  - Fast custom Huffman code generation.
  - New features:
    * Run-time option of gzip crc calculation and headers/trailer.
    * Choice of static header (BTYPE 01) blocks.
    * LARGE_WINDOW, 32K history, now default.
    * Stateless full flush mode.

* CRC64
  - Six new 64-bit polynomials supported. Normal and reflected versions of ECMA,
    ISO and Jones polynomials.

v2.16

* Units added: crc, raid, igzip (deflate compression).

v2.15

* Erasure code updates. New AVX512 versions.

* Nasm support.  ISA-L ported to build with nasm or yasm assembler.

* Windows DLL support.  Windows builds DLL by default.

v2.14

* Autoconf and autotools build allows easier porting to additional systems.
  Previous make system still available to embedded users with Makefile.unx.

* Includes update for building on Mac OS X/darwin systems. Add --target=darwin
  to ./configure step.

v2.13

* Erasure code improvments
  - 32-bit port of optimized gf_vect_dot_prod() functions.  This makes
    ec_encode_data() functions much faster on 32-bit processors.
  - Avoton performance improvements.  Performance on Avoton for
    gf_vect_dot_prod() and ec_encode_data() can improve by as much as 20%.

v2.11

* Incremental erasure code.  New functions added to erasure code to handle
  single source update of code blocks.  The function ec_encode_data_update()
  works with parameters similar to ec_encode_data() but are called incrementally
  with each source block.  These versions are useful when source blocks are not
  all available at once.

v2.10

* Erasure code updates
  - New AVX and AVX2 support functions.
  - Changes min len requirement on gf_vect_dot_prod() to 32 from 16.
  - Tests include both source and parity recovery with ec_encode_data().
  - New encoding examples with Vandermonde or Cauchy matrix.

v2.8

* First open release of erasure code unit that is part of ISA-L.