summaryrefslogtreecommitdiffstats
path: root/src/spdk/intel-ipsec-mb/ReleaseNotes.txt
diff options
context:
space:
mode:
Diffstat (limited to 'src/spdk/intel-ipsec-mb/ReleaseNotes.txt')
-rw-r--r--src/spdk/intel-ipsec-mb/ReleaseNotes.txt575
1 files changed, 575 insertions, 0 deletions
diff --git a/src/spdk/intel-ipsec-mb/ReleaseNotes.txt b/src/spdk/intel-ipsec-mb/ReleaseNotes.txt
new file mode 100644
index 000000000..950dfac21
--- /dev/null
+++ b/src/spdk/intel-ipsec-mb/ReleaseNotes.txt
@@ -0,0 +1,575 @@
+========================================================================
+Release Notes for Intel(R) Multi-Buffer Crypto for IPsec Library
+
+v0.53 October 2019
+========================================================================
+
+Library
+- AES-CCM performance optimizations done
+ - full assembly implementation
+ - authentication decoupled from cipher
+ - CCM chain order expected to be HASH_CIPHER for encryption and
+ CIPHER_HASH for decryption
+- AES-CTR implementation for VAES added
+- AES-CBC implementation for VAES added
+- Single buffer AES-GCM performance improvements added for VPCLMULQDQ + VAES
+- Multi-buffer AES-GCM implementation added for VPCLMULQDQ + VAES
+- Data transposition optimizations and unification across the library
+ implemented
+- Generation of make dependency files for Linux added
+- AES-ECB implementation added
+- PON specific stitched algorithm implementation added
+ - stitched AES-CTR-128 (optional) with CRC32 and BIP (running 32-bit XOR)
+- AES-CMAC-128 implementation for bit length messages added
+- ZUC-EEA3 and ZUC-EIA3 implementation added
+- FreeBSD experimental support added
+- KASUMI-F8 and KASUMI-F9 implementation added
+- SNOW3G-UEA2 and SNOW3G-UIA2 implementation added
+- AES-CTR implementation for bit length (128-NEA2/192-NEA2/256-NEA2) messages added
+- SAFE_PARAM, SAFE_DATA and SAFE_LOOKUP compile time options added.
+ Find more about these options in the README file or on-line at
+ https://github.com/intel/intel-ipsec-mb/blob/master/README.
+
+LibTestApp
+- New API tests added
+- CMAC test vectors extended
+- New chained operation tests added
+- Out-of-place chained operation tests added
+- AES-ECB tests added
+- PON algorithm tests added
+- Extra AES-CTR test vectors added
+- Extra AES-CBC test vectors added
+- AES-CMAC-128 bit length message tests added
+- CPU capability detection used to disable tests if instruction not present
+- ZUC-EEA3 and ZUC-EIA3 tests added
+- New cross architecture test application (ipsec_xvalid) added,
+ which mixes different implementations (based on different architectures),
+ to double check their correctness
+- SNOW3G-UEA2 and SNOW3G-UIA2 tests added
+- AES-CTR-128 bit length message tests added
+- Negative tests extended to cover all API's
+
+LibPerfApp
+- Job size and number of iterations options added
+- Single architecture test option added
+- AAD size option added
+- Allow zero length source buffer option added
+- Custom performance test combination added:
+ cipher-algo, hash-algo and aead-algo arguments.
+- Cipher direction option added
+- The maximum buffer size extended from 2K to 16K
+- Support for user defined range of job sizes added
+
+Fixes
+- Uninitialized memory reported by Valgrind fixed
+- Flush decryption job fixed (issue #33)
+- NULL_CIPHER order check removed (issue #30)
+- Save XMM registers when emulating AES fixed (issue #28)
+- SSE & AVX AES-CMAC fixed (issue #27)
+- Missing GCM pointers fixed for AES-NI emulation (issue #29)
+
+v0.52 December 2018
+========================================================================
+
+03 Dec, 2018
+
+General
+- Added AESNI emulation implementation
+- Added AES-GCM multi-buffer implementation for AVX512
+- Added flexible job chain order support
+- GCM submit and flush functions moved into architecture MB manager modules
+- AVX512/AVX2/AVX/SSE AAD GHASH computation performance improvement
+- GCM API's added to MB_MGR structure
+- Added plain SHA support in JOB API
+- Added architectural compiler optimizations for GCC/CC
+
+LibTestApp
+- Added option not to run GCM tests
+- Added AESNI emulation tests
+- Added plain SHA tests
+- Updated to take advantage of new GCM macros
+
+LibPerfApp
+- Buffer alignment update
+- Updated to take advantage of new GCM macros
+
+v0.51 September 2018
+========================================================================
+
+13 Sep, 2018
+
+General
+- AES-CMAC performance optimizations
+- Implemented store to load optimizations in
+ - AES-CMAC submit and flush jobs for SSE and AVX
+ - HMAC-MD5, HMAC-SHA submit jobs for AVX
+ - HMAC-MD5 submit job for AVX2
+- Added zero-sized message support in GCM
+- Stack execution flag disabled in new asm modules
+
+LibTestApp
+- Added AES vectors
+- Added DOCSIS AES vectors
+- Added CFB validation
+
+LibPerfApp
+- Smoke test option added
+
+v0.50 June 2018
+========================================================================
+
+13 Jun, 2018
+
+General
+- Added support for compile time and runtime library version checking
+- Added support for full MD5 digest size
+- Replaced defines for API with symbols for binary compatibility
+- Added HMAC-SHA & HMAC-MD5 vectors to LibTestApp
+- Added support for zero cipher length in AES-CCM
+- Added new API's to compute SHA1, SHA224, SHA256, SHA384 and SHA512 hashes
+ to support key reduction cases where key is longer than a block size
+- Extended support for HMAC full digest sizes for HMAC-SHA1, HMAC-SHA224,
+ HMAC-SHA256, HMAC-SHA384 and HMAC-SHA512. Previously only truncated sizes
+ were supported.
+- Added AES-CMAC support for output digest size between 4 and 16 bytes
+- Added GHASH support for output digest size up to 16 bytes
+- Optimized submit job API's with store to load optimization in SSE, AVX,
+ AVX2 (excluding MD5)
+- Improved performance application accuracy by increase number of
+ test iterations
+- Extended multi-thread features of LibPerfApp Windows version to match
+ Linux version of the application
+
+v0.49 March 2018
+========================================================================
+
+21 Mar, 2018
+
+General
+- AES-CMAC support added (AES-CMAC-128 and AES-CMAC-96)
+- 3DES support added
+- Library compiles to SO/DLL by default
+- Install/uninstall targets added to makefiles
+- Multiple API header files consolidated into one (intel-ipsec-mb.h)
+- Unhalted cycles support added to LibPerfApp (Linux at the moment)
+- ELF stack execute protection added for assembly files
+- VZEROUPPER instruction issued after AVX2/AVX512 code to avoid
+ expensive SSE<->AVX transitions
+- MAN page added
+- README documentation extensions and updates
+- AVX512 DES performance smoothed out
+- Multi-buffer manager instance allocate and free API's added
+- Core affinity support added in LibPerfApp
+
+v0.48 December 2017
+========================================================================
+
+12 Dec, 2017
+
+General
+- Linux SO compilation option added
+- Windows DLL compilation option added
+- AES CCM 128 support added
+- Multithread command line option added to LibPerfApp
+- Coding style fixes
+- Coding style target added to Makefile
+
+v0.47 October 2017
+========================================================================
+
+Oct 5, 2017
+
+Intel(R) AVX-512 Instructions
+- DES CBC AVX512 implementation
+- DOCSIS DES AVX512 implementation
+General
+- DES CBC cipher added (generic x86 implementation)
+- DOCSIS DES cipher added (generic x86 implementation)
+- DES and DOCSIS DES tests added
+- RPM SPEC file created
+
+v0.46 June 2017
+========================================================================
+
+Jun 27, 2017
+
+General
+- AES GCM optimizations for AVX2
+- Change of AES GCM API: renamed and expanded keys separated from the context
+- New AES GCM API via job structure and API's
+ - use of the interface may simplify application design at the expense of
+ slightly lower performance vs direct AES GCM API's
+- AES GCM IV automatically padded with block counter (no need for application to do it)
+- IV in AES CTR mode can be 12 bytes (no block counter); 16 byte format still allowed
+- Macros added to ease access to job API for specific architecture
+ - use of these macros can simplify application design but it may produce worse
+ performance than calling architecture job API's directly
+- Submit_job_nocheck() API added to gain some cycles by not validating job structure
+- Result stability improvements in LibPerfApp
+
+v0.45 March 2017
+========================================================================
+
+Mar 29, 2017
+
+Intel(R) AVX-512 Instructions
+- Added optimized HMAC-SHA224 and HMAC-SHA256
+- Added optimized HMAC-SHA384 and HMAC-SHA512
+General
+- Windows x64 compilation target
+- New DOCSIS SEC BPI V3.1 cipher
+- GCM128 and GCM256 updates (with new API that is scatter gather list friendly)
+- GCM192 added
+- Added library API benchmark tool 'ipsec_perf' and
+ script to compare results 'ipsec_diff_tool.py'
+Bug Fixes (vs v0.44)
+- AES CTR mode fix to allow message size not to be multiple of AES block size
+- RSI and RDI registers clobbered when running HMAC-SHA224 or HMAC-SHA256
+ on Windows using SHA extensions
+
+v0.44 November 2016
+========================================================================
+
+Nov 21, 2016
+
+Intel(R) AVX-512 Instructions
+- AVX512 multi buffer manager added (uses AVX2 implementations by default)
+- Optimized SHA1 implementation added
+Intel(R) SHA Extensions
+- SHA1, SHA224 and SHA256 implementations added for Intel(R) SSE
+General
+- NULL cipher added
+- NULL hash added
+- NASM tool chain compilation added (default)
+
+=======================================
+Feb 11, 2015
+
+Fixed, so that the job auth_tag_output_len_in_bytes takes a different
+value for different MAC types. In particular, the valid values are(in bytes):
+SHA1 - 12
+sha224 - 14
+SHA256 - 16
+sha384 - 24
+SHA512 - 32
+XCBC - 12
+MD5 - 12
+
+=======================================
+Oct 24, 2011
+
+SHA_256 added to multibuffer
+------------------------
+12 Aug 2011
+
+API
+
+ The GCM API is distinct from the Multi-buffer API. This is because
+ the GCM code is an optimized single-buffer implementation. By
+ packaging them separately, the application has the option of where,
+ when, and how to call the GCM code, independent of how it is calling
+ the multi-buffer code.
+
+ For example, the application might be enqueing multi-buffer requests
+ for a separate thread to process. In this scenario, if a particular
+ packet used GCM, then the application could choose whether to call
+ the GCM routines directly, or whether to enqueue those requests and
+ have the compute thread call the GCM routines.
+
+GCM API
+
+ The GCM functions are defined as described the the header
+ files. They are simple computational routines, with no state
+ associated with them.
+
+Multi-Buffer API: Two Sets of Functions
+
+ There are two parallel interfaces, one suffixed with "_sse" and one
+ suffixed with "_avx". These are functionally equivalent. The "_sse"
+ functions work on WSM and later processors. The "_avx" functions
+ offer better performance, but they only run on processors after WSM.
+
+ The same interface object structures are used for both sets of
+ interfaces, although one cannot mix the two interfaces on the same
+ initialized object (e.g. it would be wrong to initialize with
+ init_mb_mgr_sse() and then to pass that to submit_job_avx() ). After
+ the MB_MGR structure has been initialized with one of the two
+ initialization functions (init_mb_mgr_sse() or init_mb_mgr_avx()),
+ only the corresponding functions should be used on it.
+
+ There are several ways in which an application could use these
+ interfaces.
+
+ 1) Direct
+ If an application is only going to be run on a post-WSM machine,
+ it can just call the "_avx" functions directly. Conversely, if it
+ is just going to be run on WSM machines, it can call the "_sse"
+ functions directly.
+
+ 2) Via Branches
+ If an application can run on both WSM and SNB and wants the
+ improved performance on SNB, then it can use some method to
+ determine if it is on SNB, and then use a conditional branch to
+ determine which function to call. E.g. this could be wrapped in a
+ macro along the lines of:
+ #define submit_job(mb_mgr) \
+ if (_use_avx) submit_job_avx(mb_mgr); \
+ else submit_job_sse(mb_mgr)
+
+ 3) Via a Function Table
+ One can embed the function addresses into a structure, call them
+ through this structure, and change the structure based on which
+ set of functions one wishes to use, e.g.
+
+ struct funcs_t {
+ init_mb_mgr_t init_mb_mgr;
+ get_next_job_t get_next_job;
+ submit_job_t submit_job;
+ get_completed_job_t get_completed_job;
+ flush_job_t flush_job;
+ };
+
+ funcs_t funcs_sse = {
+ init_mb_mgr_sse,
+ get_next_job_sse,
+ submit_job_sse,
+ get_completed_job_sse,
+ flush_job_sse
+ };
+ funcs_t funcs_avx = {
+ init_mb_mgr_avx,
+ get_next_job_avx,
+ submit_job_avx,
+ get_completed_job_avx,
+ flush_job_avx
+ };
+ funcs_t *funcs = &funcs_sse;
+ ...
+ if (do_avx)
+ funcs = &funcs_avx;
+ ...
+ funcs->init_mb_mgr(&mb_mgr);
+
+ For simplicity in the rest of this document, the functions will be
+ refered to no suffix.
+
+API: Overview
+
+ The basic unit of work is a "job". It is represented by a
+ JOB_AES_HMAC structure. It contains all of the information needed to
+ perform encryption/decryption and SHA1/HMAC authentication on one
+ buffer for IPSec processing.
+
+ The basic paradigm is that the application needs to be able to
+ provide new jobs before old jobs have completed processing. One
+ might call this an "asynchronous" interface.
+
+ The basic interface is that the application "submits" a job to the
+ multi-buffer manager (MB_MGR), and it may receive a completed job
+ back, or it may receive NULL. The returned job, if there is one,
+ will not be the same as the submitted job, but the jobs will be
+ returned in the same order in which they are submitted.
+
+ Since there can be a semi-arbitrary number of outstanding jobs,
+ management of the job object is handled by the MB_MGR. The
+ application gets a pointer to a new job object by calling
+ get_next_job(). It then fills in the data fields and submits it by
+ calling submit_job(). If a job is returned, then that job has been
+ completed, and the application should do whatever it needs to do in
+ order to further process that buffer.
+
+ The job object is not explicitly returned to the MB_MGR. Rather it
+ is implicitly returned by the next call to get_next_job(). Another
+ way to put this is that the data within the job object is
+ guaranteed to be valid until the next call to get_next_job().
+
+ In order to reduce latency, there is an optional function that may
+ be called, get_completed_job(). This returns the next job if that
+ job has previously been completed. But if that job has not been
+ completed, no processing is done, and the function returns
+ NULL. This may be used to reduce the number of outstanding jobs
+ within the MB_MGR.
+
+ At times, it may be necessary to process the jobs currently within
+ the MB_MGR without providing new jobs as input. This process is
+ called "flushing", and it is invoked by calling flush_job(). If
+ there are any jobs within the MB_MGR, this will complete processing
+ on the earliest job and return it. It will only return NULL if there
+ are no jobs within the MB_MGR.
+
+ Flushing will be described in more detail below.
+
+ The presumption is that the same AES key will apply to a number of
+ buffers. For increased efficiency, it requires that the AES key
+ expansion happens as a distinct step apart from buffer
+ encryption/decryption. The expanded keys are stored in a data
+ structure (array), and this expanded key structure is used by the
+ job object.
+
+ There are two variants provided, MB_MGR and MB_MGR2. They are
+ functionally equivalent. The reason that two are provided is that
+ they differ slightly in their implementation, and so they may have
+ slightly different characteristics in terms of latency and overhead.
+
+API: Usage Skeleton
+ The basic usage is illustrated in the following pseudo_code:
+
+ init_mb_mgr(&mb_mgr);
+ ...
+ aes_keyexp_128(key, enc_exp_keys, dec_exp_keys);
+ ...
+ while (work_to_be_done) {
+ job = get_next_job(&mb_mgr);
+ // TODO: Fill in job fields
+ job = submit_job(&mb_mgr);
+ while (job) {
+ // TODO: Complete processing on job
+ job = get_completed_job(&mb_mgr);
+ }
+ }
+
+API: Job Fields
+ The mode is determined by the fields "cipher_direction" and
+ "chain_order". The first specifies encrypt or decrypt, and the
+ second specifies whether whether the hash should be done before or
+ after the cipher operation.
+ In the current implementation, only two combinations of these are
+ supported. For encryption, these should be set to "ENCRYPT" and
+ "CIPHER_HASH", and for decryption, these should be set to "DECRYPT"
+ and "HASH_CIPHER".
+
+ The expanded keys are pointed to by "aes_enc_key_expanded" and
+ "aes_dec_key_expanded". These arrays must be aligned on a 16-byte
+ boundary. Only one of these is necessary (as determined by
+ "cipher_direction").
+
+ One selects AES128 vs AES256 by using the "aes_key_len_in_bytes"
+ field. The only valid values are 16 (AES128) and 32 (AES256).
+
+ One selects the AES mode (CBC versus counter-mode) using
+ "cipher_mode".
+
+ One selects the hash algorith (SHA1-HMAC, AES-XCBC, or MD5-HMAC)
+ using "hash_alg".
+
+ The data to be encrypted/decrypted is defined by
+ "src + cipher_start_src_offset_in_bytes". The length of data is
+ given by "msg_len_to_cipher_in_bytes". It must be a multiple of
+ 16 bytes.
+
+ The destination for the cipher operation is given by "dst" (NOT by
+ "dst + cipher_start_src_offset_in_bytes". In many/most applications,
+ the destination pointer may overlap the source pointer. That is,
+ "dst" may be equal to "src + cipher_start_src_offset_in_bytes".
+
+ The IV for the cipher operation is given by "iv". The
+ "iv_len_in_bytes" should be 16. This pointer does not need to be
+ aligned.
+
+ The data to be hashed is defined by
+ "src + hash_start_src_offset_in_bytes". The length of data is
+ given by "msg_len_to_hash_in_bytes".
+
+ The output of the hash operation is defined by
+ "auth_tag_output". The number of bytes written is given by
+ "auth_tag_output_len_in_bytes". Currently the only valid value for
+ this parameter is 12.
+
+ The ipad and opad are given as the result of hashing the HMAC key
+ xor'ed with the appropriate value. That is, rather than passing in
+ the HMAC key and rehashing the initial block for every buffer, the
+ hashing of the initial block is done separately, and the results of
+ this hash are used as input in the job structure.
+
+ Similar to the expanded AES keys, the premise here is that one HMAC
+ key will apply to many buffers, so we want to do that hashing once
+ and not for each buffer.
+
+ The "status" reflects the status of the returned job. It should be
+ "STS_COMPLETED".
+
+ The "user_data" field is ignored. It can be used to attach
+ application data to the job object.
+
+Flushing Concerns
+ As long as jobs are coming in at a reasonable rate, jobs should be
+ returned at a reasonable rate. However, if there is a lull in the
+ arrival of new jobs, the last few jobs that were submitted tend to
+ stay in the MB_MGR until new jobs arrive. This might result in there
+ being an unreasonable latency for these jobs.
+
+ In this case, flush_job() should be used to complete processing on
+ these outstanding jobs and prevent them from having excessive
+ latency.
+
+ Exactly when and how to use flush_job() is up to the application,
+ and is a balancing act. The processing of flush_job() is less
+ efficient than that of submit_job(), so calling flush_job() too
+ often will lower the system efficiency. Conversely, calling
+ flush_job() too rarely may result in some jobs seeing excessive
+ latency.
+
+ There are several strategies that the application may employ for
+ flushing. One usage model is that there is a (thread-safe) queue
+ containing work items. One or more threads puts work onto this
+ queue, and one or more processing threads removes items from this
+ queue and processes them through the MB_MGR. In this usage, a simple
+ flushing strategy is that when the processing thread wants to do
+ more work, but the queue is empty, it then proceeds to flush jobs
+ until either the queue contains more work, or the MB_MGR no longer
+ contains jobs (i.e. that flush_job() returns NULL). A variation on
+ this is that when the work queue is empty, the processing thread
+ might pause for a short time to see if any new work appears, before
+ it starts flushing.
+
+ In other usage models, there may be no such queue. An alternate
+ flushing strategy is that have a separate "flush thread" hanging
+ around. It wakes up periodically and checks to see if any work has
+ been requested since the last time it woke up. If some period of
+ time has gone by with no new work appearing, it would proceed to
+ flush the MB_MGR.
+
+AES Key Usage
+ If the AES mode is CBC, then the fields aes_enc_key_expanded or
+ aes_dec_key_expanded are using depending on whether the data is
+ being encrypted or decrypted. However, if the AES mode is CNTR
+ (counter mode), then only aes_enc_key_expanded is used, even for a
+ decrypt operation.
+
+ The application can handle this dichotomy, or it might choose to
+ simply set both fields in all cases.
+
+Thread Safety
+ The MB_MGR and the associated functions ARE NOT thread safe. If
+ there are multiple threads that may be calling these functions
+ (e.g. a processing thread and a flushing thread), it is the
+ responsibility of the application to put in place sufficient locking
+ so that no two threads will make calls to the same MB_MGR object at
+ the same time.
+
+XMM Register Usage
+ The current implementation is designed for integration in the Linux
+ Kernel. All of the functions satisfy the Linux ABI with respect to
+ general purpose registers. However, the submit_job() and flush_job()
+ functions use XMM registers without saving/restoring any of them. It
+ is up to the application to manage the saving/restoring of XMM
+ registers itself.
+
+Auxiliary Functions
+ There are several auxiliary functions packed with MB_MGR. These may
+ be used, or the application may choose to use their own version. Two
+ of these, aes_keyexp_128() and aes_keyexp_256() expand AES keys into
+ a form that is acceptable for reference in the job structure.
+
+ In the case of AES128, the expanded key structure should be an array
+ of 11 128-bit words, aligned on a 16-byte boundary. In the case of
+ AES256, it should be an array of 15 128-bit words, aligned on a
+ 16-byte boundary.
+
+ There is also a function, sha1(), which will compute the SHA1 digest
+ of a single 64-byte block. It can be used to compute the ipad and
+ opad digests. There is a similar function, md5(), which can be used
+ when using MD5-HMAC.
+
+ For further details on the usage of these functions, see the sample
+ test application.