1 files changed, 575 insertions, 0 deletions
diff --git a/src/spdk/intel-ipsec-mb/ReleaseNotes.txt b/src/spdk/intel-ipsec-mb/ReleaseNotes.txt
new file mode 100644
index 000000000..950dfac21
--- /dev/null
+++ b/src/spdk/intel-ipsec-mb/ReleaseNotes.txt
@@ -0,0 +1,575 @@
+========================================================================
+Release Notes for Intel(R) Multi-Buffer Crypto for IPsec Library
+
+v0.53 October 2019
+========================================================================
+
+Library
+- AES-CCM performance optimizations done
+  - full assembly implementation
+  - authentication decoupled from cipher
+  - CCM chain order expected to be HASH_CIPHER for encryption and
+    CIPHER_HASH for decryption
+- AES-CTR implementation for VAES added
+- AES-CBC implementation for VAES added
+- Single buffer AES-GCM performance improvements added for VPCLMULQDQ + VAES
+- Multi-buffer AES-GCM implementation added for VPCLMULQDQ + VAES
+- Data transposition optimizations and unification across the library
+  implemented
+- Generation of make dependency files for Linux added
+- AES-ECB implementation added
+- PON specific stitched algorithm implementation added
+  - stitched AES-CTR-128 (optional) with CRC32 and BIP (running 32-bit XOR)
+- AES-CMAC-128 implementation for bit length messages added
+- ZUC-EEA3 and ZUC-EIA3 implementation added
+- FreeBSD experimental support added
+- KASUMI-F8 and KASUMI-F9 implementation added
+- SNOW3G-UEA2 and SNOW3G-UIA2 implementation added
+- AES-CTR implementation for bit length (128-NEA2/192-NEA2/256-NEA2) messages added
+- SAFE_PARAM, SAFE_DATA and SAFE_LOOKUP compile time options added.
+  Find more about these options in the README file or on-line at
+  https://github.com/intel/intel-ipsec-mb/blob/master/README.
+
+LibTestApp
+- New API tests added
+- CMAC test vectors extended
+- New chained operation tests added
+- Out-of-place chained operation tests added
+- AES-ECB tests added
+- PON algorithm tests added
+- Extra AES-CTR test vectors added
+- Extra AES-CBC test vectors added
+- AES-CMAC-128 bit length message tests added
+- CPU capability detection used to disable tests if instruction not present
+- ZUC-EEA3 and ZUC-EIA3 tests added
+- New cross architecture test application (ipsec_xvalid) added,
+  which mixes different implementations (based on different architectures),
+  to double check their correctness
+- SNOW3G-UEA2 and SNOW3G-UIA2 tests added
+- AES-CTR-128 bit length message tests added
+- Negative tests extended to cover all API's
+
+LibPerfApp
+- Job size and number of iterations options added
+- Single architecture test option added
+- AAD size option added
+- Allow zero length source buffer option added
+- Custom performance test combination added:
+  cipher-algo, hash-algo and aead-algo arguments.
+- Cipher direction option added
+- The maximum buffer size extended from 2K to 16K
+- Support for user defined range of job sizes added
+
+Fixes
+- Uninitialized memory reported by Valgrind fixed
+- Flush decryption job fixed (issue #33)
+- NULL_CIPHER order check removed (issue #30)
+- Save XMM registers when emulating AES fixed (issue #28)
+- SSE & AVX AES-CMAC fixed (issue #27)
+- Missing GCM pointers fixed for AES-NI emulation (issue #29)
+
+v0.52 December 2018
+========================================================================
+
+03 Dec, 2018
+
+General
+- Added AESNI emulation implementation
+- Added AES-GCM multi-buffer implementation for AVX512
+- Added flexible job chain order support
+- GCM submit and flush functions moved into architecture MB manager modules
+- AVX512/AVX2/AVX/SSE AAD GHASH computation performance improvement
+- GCM API's added to MB_MGR structure
+- Added plain SHA support in JOB API
+- Added architectural compiler optimizations for GCC/CC
+
+LibTestApp
+- Added option not to run GCM tests
+- Added AESNI emulation tests
+- Added plain SHA tests
+- Updated to take advantage of new GCM macros
+
+LibPerfApp
+- Buffer alignment update
+- Updated to take advantage of new GCM macros
+
+v0.51 September 2018
+========================================================================
+
+13 Sep, 2018
+
+General
+- AES-CMAC performance optimizations
+- Implemented store to load optimizations in
+  - AES-CMAC submit and flush jobs for SSE and AVX
+  - HMAC-MD5, HMAC-SHA submit jobs for AVX
+  - HMAC-MD5 submit job for AVX2
+- Added zero-sized message support in GCM
+- Stack execution flag disabled in new asm modules
+
+LibTestApp
+- Added AES vectors
+- Added DOCSIS AES vectors
+- Added CFB validation
+
+LibPerfApp
+- Smoke test option added
+
+v0.50 June 2018
+========================================================================
+
+13 Jun, 2018
+
+General
+- Added support for compile time and runtime library version checking
+- Added support for full MD5 digest size
+- Replaced defines for API with symbols for binary compatibility
+- Added HMAC-SHA & HMAC-MD5 vectors to LibTestApp
+- Added support for zero cipher length in AES-CCM
+- Added new API's to compute SHA1, SHA224, SHA256, SHA384 and SHA512 hashes
+  to support key reduction cases where key is longer than a block size
+- Extended support for HMAC full digest sizes for HMAC-SHA1, HMAC-SHA224,
+  HMAC-SHA256, HMAC-SHA384 and HMAC-SHA512. Previously only truncated sizes
+  were supported.
+- Added AES-CMAC support for output digest size between 4 and 16 bytes
+- Added GHASH support for output digest size up to 16 bytes
+- Optimized submit job API's with store to load optimization in SSE, AVX,
+  AVX2 (excluding MD5)
+- Improved performance application accuracy by increase number of
+  test iterations
+- Extended multi-thread features of LibPerfApp Windows version to match
+  Linux version of the application
+
+v0.49 March 2018
+========================================================================
+
+21 Mar, 2018
+
+General
+- AES-CMAC support added (AES-CMAC-128 and AES-CMAC-96)
+- 3DES support added
+- Library compiles to SO/DLL by default
+- Install/uninstall targets added to makefiles
+- Multiple API header files consolidated into one (intel-ipsec-mb.h)
+- Unhalted cycles support added to LibPerfApp (Linux at the moment)
+- ELF stack execute protection added for assembly files
+- VZEROUPPER instruction issued after AVX2/AVX512 code to avoid
+  expensive SSE<->AVX transitions
+- MAN page added
+- README documentation extensions and updates
+- AVX512 DES performance smoothed out
+- Multi-buffer manager instance allocate and free API's added
+- Core affinity support added in LibPerfApp
+
+v0.48 December 2017
+========================================================================
+
+12 Dec, 2017
+
+General
+- Linux SO compilation option added
+- Windows DLL compilation option added
+- AES CCM 128 support added
+- Multithread command line option added to LibPerfApp
+- Coding style fixes
+- Coding style target added to Makefile
+
+v0.47 October 2017
+========================================================================
+
+Oct 5, 2017
+
+Intel(R) AVX-512 Instructions
+- DES CBC AVX512 implementation
+- DOCSIS DES AVX512 implementation
+General
+- DES CBC cipher added (generic x86 implementation)
+- DOCSIS DES cipher added (generic x86 implementation)
+- DES and DOCSIS DES tests added
+- RPM SPEC file created
+
+v0.46 June 2017
+========================================================================
+
+Jun 27, 2017
+
+General
+- AES GCM optimizations for AVX2
+- Change of AES GCM API: renamed and expanded keys separated from the context
+- New AES GCM API via job structure and API's
+  -  use of the interface may simplify application design at the expense of
+     slightly lower performance vs direct AES GCM API's
+- AES GCM IV automatically padded with block counter (no need for application to do it)
+- IV in AES CTR mode can be 12 bytes (no block counter); 16 byte format still allowed
+- Macros added to ease access to job API for specific architecture
+  - use of these macros can simplify application design but it may produce worse
+    performance than calling architecture job API's directly
+- Submit_job_nocheck() API added to gain some cycles by not validating job structure
+- Result stability improvements in LibPerfApp
+
+v0.45 March 2017
+========================================================================
+
+Mar 29, 2017
+
+Intel(R) AVX-512 Instructions
+- Added optimized HMAC-SHA224 and HMAC-SHA256
+- Added optimized HMAC-SHA384 and HMAC-SHA512
+General
+- Windows x64 compilation target
+- New DOCSIS SEC BPI V3.1 cipher
+- GCM128 and GCM256 updates (with new API that is scatter gather list friendly)
+- GCM192 added
+- Added library API benchmark tool 'ipsec_perf' and
+  script to compare results 'ipsec_diff_tool.py'
+Bug Fixes (vs v0.44)
+- AES CTR mode fix to allow message size not to be multiple of AES block size
+- RSI and RDI registers clobbered when running HMAC-SHA224 or HMAC-SHA256
+  on Windows using SHA extensions
+
+v0.44 November 2016
+========================================================================
+
+Nov 21, 2016
+
+Intel(R) AVX-512 Instructions
+- AVX512 multi buffer manager added (uses AVX2 implementations by default)
+- Optimized SHA1 implementation added
+Intel(R) SHA Extensions
+- SHA1, SHA224 and SHA256 implementations added for Intel(R) SSE
+General
+- NULL cipher added
+- NULL hash added
+- NASM tool chain compilation added (default)
+
+=======================================
+Feb 11, 2015
+
+Fixed, so that the job auth_tag_output_len_in_bytes takes a different 
+value for different MAC types. In particular, the valid values are(in bytes):
+SHA1 - 12
+sha224 - 14 
+SHA256 - 16 
+sha384 - 24
+SHA512 - 32
+XCBC - 12
+MD5 - 12
+
+=======================================
+Oct 24, 2011
+
+SHA_256 added to multibuffer
+------------------------
+12 Aug 2011
+
+API
+
+  The GCM API is distinct from the Multi-buffer API. This is because
+  the GCM code is an optimized single-buffer implementation. By
+  packaging them separately, the application has the option of where,
+  when, and how to call the GCM code, independent of how it is calling
+  the multi-buffer code.
+
+  For example, the application might be enqueing multi-buffer requests
+  for a separate thread to process. In this scenario, if a particular
+  packet used GCM, then the application could choose whether to call
+  the GCM routines directly, or whether to enqueue those requests and
+  have the compute thread call the GCM routines.
+
+GCM API
+
+  The GCM functions are defined as described the the header
+  files. They are simple computational routines, with no state
+  associated with them.
+
+Multi-Buffer API: Two Sets of Functions
+
+  There are two parallel interfaces, one suffixed with "_sse" and one
+  suffixed with "_avx". These are functionally equivalent. The "_sse"
+  functions work on WSM and later processors. The "_avx" functions
+  offer better performance, but they only run on processors after WSM.
+
+  The same interface object structures are used for both sets of
+  interfaces, although one cannot mix the two interfaces on the same
+  initialized object (e.g. it would be wrong to initialize with
+  init_mb_mgr_sse() and then to pass that to submit_job_avx() ). After
+  the MB_MGR structure has been initialized with one of the two
+  initialization functions (init_mb_mgr_sse() or init_mb_mgr_avx()),
+  only the corresponding functions should be used on it.
+
+  There are several ways in which an application could use these
+  interfaces.
+
+  1) Direct
+     If an application is only going to be run on a post-WSM machine,
+     it can just call the "_avx" functions directly. Conversely, if it
+     is just going to be run on WSM machines, it can call the "_sse"
+     functions directly.
+
+  2) Via Branches
+     If an application can run on both WSM and SNB and wants the
+     improved performance on SNB, then it can use some method to
+     determine if it is on SNB, and then use a conditional branch to
+     determine which function to call. E.g. this could be wrapped in a
+     macro along the lines of:
+     #define submit_job(mb_mgr) \
+        if (_use_avx) submit_job_avx(mb_mgr); \
+        else          submit_job_sse(mb_mgr)
+
+  3) Via a Function Table
+     One can embed the function addresses into a structure, call them
+     through this structure, and change the structure based on which
+     set of functions one wishes to use, e.g.
+
+        struct funcs_t {
+            init_mb_mgr_t       init_mb_mgr;
+            get_next_job_t      get_next_job;
+            submit_job_t        submit_job;
+            get_completed_job_t get_completed_job;
+            flush_job_t         flush_job;
+        };
+        
+        funcs_t funcs_sse = {
+            init_mb_mgr_sse,
+            get_next_job_sse,
+            submit_job_sse,
+            get_completed_job_sse,
+            flush_job_sse
+        };
+        funcs_t funcs_avx = {
+            init_mb_mgr_avx,
+            get_next_job_avx,
+            submit_job_avx,
+            get_completed_job_avx,
+            flush_job_avx
+        };
+        funcs_t *funcs = &funcs_sse;
+        ...
+        if (do_avx)
+            funcs = &funcs_avx;
+        ...
+        funcs->init_mb_mgr(&mb_mgr);
+
+  For simplicity in the rest of this document, the functions will be
+  refered to no suffix.
+
+API: Overview
+
+  The basic unit of work is a "job". It is represented by a
+  JOB_AES_HMAC structure. It contains all of the information needed to
+  perform encryption/decryption and SHA1/HMAC authentication on one
+  buffer for IPSec processing.
+
+  The basic paradigm is that the application needs to be able to
+  provide new jobs before old jobs have completed processing. One
+  might call this an "asynchronous" interface. 
+
+  The basic interface is that the application "submits" a job to the
+  multi-buffer manager (MB_MGR), and it may receive a completed job
+  back, or it may receive NULL. The returned job, if there is one,
+  will not be the same as the submitted job, but the jobs will be
+  returned in the same order in which they are submitted.
+
+  Since there can be a semi-arbitrary number of outstanding jobs,
+  management of the job object is handled by the MB_MGR. The
+  application gets a pointer to a new job object by calling
+  get_next_job(). It then fills in the data fields and submits it by
+  calling submit_job(). If a job is returned, then that job has been
+  completed, and the application should do whatever it needs to do in
+  order to further process that buffer. 
+
+  The job object is not explicitly returned to the MB_MGR. Rather it
+  is implicitly returned by the next call to get_next_job(). Another
+  way to put this is that the data within the job object is
+  guaranteed to be valid until the next call to get_next_job().
+
+  In order to reduce latency, there is an optional function that may
+  be called, get_completed_job(). This returns the next job if that
+  job has previously been completed. But if that job has not been
+  completed, no processing is done, and the function returns
+  NULL. This may be used to reduce the number of outstanding jobs
+  within the MB_MGR.
+
+  At times, it may be necessary to process the jobs currently within
+  the MB_MGR without providing new jobs as input. This process is
+  called "flushing", and it is invoked by calling flush_job(). If
+  there are any jobs within the MB_MGR, this will complete processing
+  on the earliest job and return it. It will only return NULL if there
+  are no jobs within the MB_MGR.
+
+  Flushing will be described in more detail below.
+
+  The presumption is that the same AES key will apply to a number of
+  buffers. For increased efficiency, it requires that the AES key
+  expansion happens as a distinct step apart from buffer
+  encryption/decryption. The expanded keys are stored in a data
+  structure (array), and this expanded key structure is used by the
+  job object.
+
+  There are two variants provided, MB_MGR and MB_MGR2. They are
+  functionally equivalent. The reason that two are provided is that
+  they differ slightly in their implementation, and so they may have
+  slightly different characteristics in terms of latency and overhead.
+
+API: Usage Skeleton
+  The basic usage is illustrated in the following pseudo_code:
+
+    init_mb_mgr(&mb_mgr);
+    ...
+    aes_keyexp_128(key, enc_exp_keys, dec_exp_keys);
+    ...
+    while (work_to_be_done) {
+        job = get_next_job(&mb_mgr);
+        // TODO: Fill in job fields
+        job = submit_job(&mb_mgr);
+        while (job) {
+            // TODO: Complete processing on job
+    	job = get_completed_job(&mb_mgr);
+        }
+    }
+
+API: Job Fields
+  The mode is determined by the fields "cipher_direction" and
+  "chain_order". The first specifies encrypt or decrypt, and the
+  second specifies whether whether the hash should be done before or
+  after the cipher operation.
+  In the current implementation, only two combinations of these are
+  supported. For encryption, these should be set to "ENCRYPT" and
+  "CIPHER_HASH", and for decryption, these should be set to "DECRYPT"
+  and "HASH_CIPHER".
+
+  The expanded keys are pointed to by "aes_enc_key_expanded" and
+  "aes_dec_key_expanded". These arrays must be aligned on a 16-byte
+  boundary. Only one of these is necessary (as determined by
+  "cipher_direction"). 
+
+  One selects AES128 vs AES256 by using the "aes_key_len_in_bytes"
+  field. The only valid values are 16 (AES128) and 32 (AES256).
+
+  One selects the AES mode (CBC versus counter-mode) using
+  "cipher_mode".
+
+  One selects the hash algorith (SHA1-HMAC, AES-XCBC, or MD5-HMAC)
+  using "hash_alg".
+
+  The data to be encrypted/decrypted is defined by
+  "src + cipher_start_src_offset_in_bytes". The length of data is
+  given by "msg_len_to_cipher_in_bytes". It must be a multiple of
+  16 bytes.
+
+  The destination for the cipher operation is given by "dst" (NOT by
+  "dst + cipher_start_src_offset_in_bytes". In many/most applications,
+  the destination pointer may overlap the source pointer. That is,
+  "dst" may be equal to "src + cipher_start_src_offset_in_bytes".
+
+  The IV for the cipher operation is given by "iv". The
+  "iv_len_in_bytes" should be 16. This pointer does not need to be
+  aligned. 
+
+  The data to be hashed is defined by
+  "src + hash_start_src_offset_in_bytes". The length of data is
+  given by "msg_len_to_hash_in_bytes".
+
+  The output of the hash operation is defined by
+  "auth_tag_output". The number of bytes written is given by
+  "auth_tag_output_len_in_bytes". Currently the only valid value for
+  this parameter is 12.
+
+  The ipad and opad are given as the result of hashing the HMAC key
+  xor'ed with the appropriate value. That is, rather than passing in
+  the HMAC key and rehashing the initial block for every buffer, the
+  hashing of the initial block is done separately, and the results of
+  this hash are used as input in the job structure.
+
+  Similar to the expanded AES keys, the premise here is that one HMAC
+  key will apply to many buffers, so we want to do that hashing once
+  and not for each buffer.
+
+  The "status" reflects the status of the returned job. It should be
+  "STS_COMPLETED". 
+
+  The "user_data" field is ignored. It can be used to attach
+  application data to the job object.
+
+Flushing Concerns
+  As long as jobs are coming in at a reasonable rate, jobs should be
+  returned at a reasonable rate. However, if there is a lull in the
+  arrival of new jobs, the last few jobs that were submitted tend to
+  stay in the MB_MGR until new jobs arrive. This might result in there
+  being an unreasonable latency for these jobs.
+
+  In this case, flush_job() should be used to complete processing on
+  these outstanding jobs and prevent them from having excessive
+  latency.
+
+  Exactly when and how to use flush_job() is up to the application,
+  and is a balancing act. The processing of flush_job() is less
+  efficient than that of submit_job(), so calling flush_job() too
+  often will lower the system efficiency. Conversely, calling
+  flush_job() too rarely may result in some jobs seeing excessive
+  latency. 
+
+  There are several strategies that the application may employ for
+  flushing. One usage model is that there is a (thread-safe) queue
+  containing work items. One or more threads puts work onto this
+  queue, and one or more processing threads removes items from this
+  queue and processes them through the MB_MGR. In this usage, a simple
+  flushing strategy is that when the processing thread wants to do
+  more work, but the queue is empty, it then proceeds to flush jobs
+  until either the queue contains more work, or the MB_MGR no longer
+  contains jobs (i.e. that flush_job() returns NULL). A variation on
+  this is that when the work queue is empty, the processing thread
+  might pause for a short time to see if any new work appears, before
+  it starts flushing.
+
+  In other usage models, there may be no such queue. An alternate
+  flushing strategy is that have a separate "flush thread" hanging
+  around. It wakes up periodically and checks to see if any work has
+  been requested since the last time it woke up. If some period of
+  time has gone by with no new work appearing, it would proceed to
+  flush the MB_MGR.
+
+AES Key Usage
+  If the AES mode is CBC, then the fields aes_enc_key_expanded or
+  aes_dec_key_expanded are using depending on whether the data is
+  being encrypted or decrypted. However, if the AES mode is CNTR
+  (counter mode), then only aes_enc_key_expanded is used, even for a
+  decrypt operation. 
+
+  The application can handle this dichotomy, or it might choose to
+  simply set both fields in all cases.
+
+Thread Safety
+  The MB_MGR and the associated functions ARE NOT thread safe. If
+  there are multiple threads that may be calling these functions
+  (e.g. a processing thread and a flushing thread), it is the
+  responsibility of the application to put in place sufficient locking
+  so that no two threads will make calls to the same MB_MGR object at
+  the same time.
+
+XMM Register Usage
+  The current implementation is designed for integration in the Linux
+  Kernel. All of the functions satisfy the Linux ABI with respect to
+  general purpose registers. However, the submit_job() and flush_job()
+  functions use XMM registers without saving/restoring any of them. It
+  is up to the application to manage the saving/restoring of XMM
+  registers itself.
+
+Auxiliary Functions
+  There are several auxiliary functions packed with MB_MGR. These may
+  be used, or the application may choose to use their own version. Two
+  of these, aes_keyexp_128() and aes_keyexp_256() expand AES keys into
+  a form that is acceptable for reference in the job structure. 
+
+  In the case of AES128, the expanded key structure should be an array
+  of 11 128-bit words, aligned on a 16-byte boundary. In the case of
+  AES256, it should be an array of 15 128-bit words, aligned on a
+  16-byte boundary. 
+
+  There is also a function, sha1(), which will compute the SHA1 digest
+  of a single 64-byte block. It can be used to compute the ipad and
+  opad digests. There is a similar function, md5(), which can be used
+  when using MD5-HMAC.
+
+  For further details on the usage of these functions, see the sample
+  test application.