Integer Flag ADX Arithmetic Add unsigned 32-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry or overflow flag), and store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag). tmp[32:0] := a[31:0] + b[31:0] + (c_in > 0 ? 1 : 0) MEM[out+31:out] := tmp[31:0] dst[0] := tmp[32] dst[7:1] := 0
Integer Flag ADX Arithmetic Add unsigned 64-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry or overflow flag), and store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag). tmp[64:0] := a[63:0] + b[63:0] + (c_in > 0 ? 1 : 0) MEM[out+63:out] := tmp[63:0] dst[0] := tmp[64] dst[7:1] := 0
Integer AES Cryptography Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"." a[127:0] := ShiftRows(a[127:0]) a[127:0] := SubBytes(a[127:0]) a[127:0] := MixColumns(a[127:0]) dst[127:0] := a[127:0] XOR RoundKey[127:0]
Integer AES Cryptography Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"." a[127:0] := ShiftRows(a[127:0]) a[127:0] := SubBytes(a[127:0]) dst[127:0] := a[127:0] XOR RoundKey[127:0]
Integer AES Cryptography Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst". a[127:0] := InvShiftRows(a[127:0]) a[127:0] := InvSubBytes(a[127:0]) a[127:0] := InvMixColumns(a[127:0]) dst[127:0] := a[127:0] XOR RoundKey[127:0]
Integer AES Cryptography Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst". a[127:0] := InvShiftRows(a[127:0]) a[127:0] := InvSubBytes(a[127:0]) dst[127:0] := a[127:0] XOR RoundKey[127:0]
Integer AES Cryptography Perform the InvMixColumns transformation on "a" and store the result in "dst". dst[127:0] := InvMixColumns(a[127:0])
Integer AES Cryptography Assist in expanding the AES cipher key by computing steps towards generating a round key for encryption cipher using data from "a" and an 8-bit round constant specified in "imm8", and store the result in "dst"." X3[31:0] := a[127:96] X2[31:0] := a[95:64] X1[31:0] := a[63:32] X0[31:0] := a[31:0] RCON[31:0] := ZeroExtend32(imm8[7:0]) dst[31:0] := SubWord(X1) dst[63:32] := RotWord(SubWord(X1)) XOR RCON dst[95:64] := SubWord(X3) dst[127:96] := RotWord(SubWord(X3)) XOR RCON
Tile Floating Point AMXBF16 Application-Targeted Compute dot-product of BF16 (16-bit) floating-point pairs in tiles "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "dst", and store the 32-bit result back to tile "dst". FOR m := 0 TO dst.rows - 1 tmp := dst.row[m] FOR k := 0 TO (a.colsb / 4) - 1 FOR n := 0 TO (dst.colsb / 4) - 1 tmp.fp32[n] += FP32(a.row[m].bf16[2*k+0]) * FP32(b.row[k].bf16[2*n+0]) tmp.fp32[n] += FP32(a.row[m].bf16[2*k+1]) * FP32(b.row[k].bf16[2*n+1]) ENDFOR ENDFOR write_row_and_zero(dst, m, tmp, dst.colsb) ENDFOR zero_upper_rows(dst, dst.rows) zero_tileconfig_start()
Tile AMXINT8 Application-Targeted Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "a" with corresponding unsigned 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". DEFINE DPBD(c, x, y) { tmp1 := SignExtend32(x.byte[0]) * ZeroExtend32(y.byte[0]) tmp2 := SignExtend32(x.byte[1]) * ZeroExtend32(y.byte[1]) tmp3 := SignExtend32(x.byte[2]) * ZeroExtend32(y.byte[2]) tmp4 := SignExtend32(x.byte[3]) * ZeroExtend32(y.byte[3]) RETURN c + tmp1 + tmp2 + tmp3 + tmp4 } FOR m := 0 TO dst.rows - 1 tmp := dst.row[m] FOR k := 0 TO (a.colsb / 4) - 1 FOR n := 0 TO (dst.colsb / 4) - 1 tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n]) ENDFOR ENDFOR write_row_and_zero(dst, m, tmp, dst.colsb) ENDFOR zero_upper_rows(dst, dst.rows) zero_tileconfig_start()
Tile AMXINT8 Application-Targeted Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". DEFINE DPBD(c, x, y) { tmp1 := ZeroExtend32(x.byte[0]) * SignExtend32(y.byte[0]) tmp2 := ZeroExtend32(x.byte[1]) * SignExtend32(y.byte[1]) tmp3 := ZeroExtend32(x.byte[2]) * SignExtend32(y.byte[2]) tmp4 := ZeroExtend32(x.byte[3]) * SignExtend32(y.byte[3]) RETURN c + tmp1 + tmp2 + tmp3 + tmp4 } FOR m := 0 TO dst.rows - 1 tmp := dst.row[m] FOR k := 0 TO (a.colsb / 4) - 1 FOR n := 0 TO (dst.colsb / 4) - 1 tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n]) ENDFOR ENDFOR write_row_and_zero(dst, m, tmp, dst.colsb) ENDFOR zero_upper_rows(dst, dst.rows) zero_tileconfig_start()
Tile AMXINT8 Application-Targeted Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding unsigned 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". DEFINE DPBD(c, x, y) { tmp1 := ZeroExtend32(x.byte[0]) * ZeroExtend32(y.byte[0]) tmp2 := ZeroExtend32(x.byte[1]) * ZeroExtend32(y.byte[1]) tmp3 := ZeroExtend32(x.byte[2]) * ZeroExtend32(y.byte[2]) tmp4 := ZeroExtend32(x.byte[3]) * ZeroExtend32(y.byte[3]) RETURN c + tmp1 + tmp2 + tmp3 + tmp4 } FOR m := 0 TO dst.rows - 1 tmp := dst.row[m] FOR k := 0 TO (a.colsb / 4) - 1 FOR n := 0 TO (dst.colsb / 4) - 1 tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n]) ENDFOR ENDFOR write_row_and_zero(dst, m, tmp, dst.colsb) ENDFOR zero_upper_rows(dst, dst.rows) zero_tileconfig_start()
Tile AMXINT8 Application-Targeted Compute dot-product of bytes in tiles with a source/destination accumulator. Multiply groups of 4 adjacent pairs of signed 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate 32-bit results. Sum these 4 results with the corresponding 32-bit integer in "dst", and store the 32-bit result back to tile "dst". DEFINE DPBD(c, x, y) { tmp1 := SignExtend32(x.byte[0]) * SignExtend32(y.byte[0]) tmp2 := SignExtend32(x.byte[1]) * SignExtend32(y.byte[1]) tmp3 := SignExtend32(x.byte[2]) * SignExtend32(y.byte[2]) tmp4 := SignExtend32(x.byte[3]) * SignExtend32(y.byte[3]) RETURN c + tmp1 + tmp2 + tmp3 + tmp4 } FOR m := 0 TO dst.rows - 1 tmp := dst.row[m] FOR k := 0 TO (a.colsb / 4) - 1 FOR n := 0 TO (dst.colsb / 4) - 1 tmp.dword[n] := DPBD(tmp.dword[n], a.row[m].dword[k], b.row[k].dword[n]) ENDFOR ENDFOR write_row_and_zero(dst, m, tmp, dst.colsb) ENDFOR zero_upper_rows(dst, dst.rows) zero_tileconfig_start()
Tile AMXTILE Application-Targeted Load tile configuration from a 64-byte memory location specified by "mem_addr". The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If the specified pallette_id is zero, that signifies the init state for both the tile config and the tile data, and the tiles are zeroed. Any invalid configurations will result in #GP fault. // format of memory payload. each field is a byte. // 0: palette_id // 1: startRow (8b) // 2-15: reserved (must be zero) // 16-17: tile0.colsb -- bytes_per_row // 18-19: tile1.colsb // 20-21: tile2.colsb // ... // 46-47: tile15.colsb // 48: tile0.rows // 49: tile1.rows // 50: tile2.rows // ... // 63: tile15.rows
Tile AMXTILE Application-Targeted Stores the current tile configuration to a 64-byte memory location specified by "mem_addr". The tile configuration format is specified below, and includes the tile type pallette, the number of bytes per row, and the number of rows. If tiles are not configured, all zeroes will be stored to memory. // format of memory payload. each field is a byte. // 0: palette_id // 1: startRow (8b) // 2-15: reserved (must be zero) // 16-17: tile0.colsb -- bytes_per_row // 18-19: tile1.colsb // 20-21: tile2.colsb // ... // 46-47: tile15.colsb // 48: tile0.rows // 49: tile1.rows // 50: tile2.rows // ... // 63: tile15.rows
Tile AMXTILE Application-Targeted Load tile rows from memory specified by "base" address and "stride" into destination tile "dst" using the tile configuration previously configured via "_tile_loadconfig". start := tileconfig.startRow IF start == 0 // not restarting, zero incoming state tilezero(dst) FI nbytes := dst.colsb DO WHILE start < dst.rows memptr := base + start * stride write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes) start := start + 1 OD zero_upper_rows(dst, dst.rows) zero_tileconfig_start()
Tile AMXTILE Application-Targeted Load tile rows from memory specified by "base" address and "stride" into destination tile "dst" using the tile configuration previously configured via "_tile_loadconfig". This intrinsic provides a hint to the implementation that the data will likely not be reused in the near future and the data caching can be optimized accordingly. start := tileconfig.startRow IF start == 0 // not restarting, zero incoming state tilezero(dst) FI nbytes := dst.colsb DO WHILE start < dst.rows memptr := base + start * stride write_row_and_zero(dst, start, read_memory(memptr, nbytes), nbytes) start := start + 1 OD zero_upper_rows(dst, dst.rows) zero_tileconfig_start()
Tile AMXTILE Application-Targeted Release the tile configuration to return to the init state, which releases all storage it currently holds.
Tile AMXTILE Application-Targeted Store the tile specified by "src" to memory specified by "base" address and "stride" using the tile configuration previously configured via "_tile_loadconfig". start := tileconfig.startRow DO WHILE start < src.rows memptr := base + start * stride write_memory(memptr, src.colsb, src.row[start]) start := start + 1 OD zero_tileconfig_start()
Tile AMXTILE Application-Targeted Zero the tile specified by "tdest". nbytes := palette_table[tileconfig.palette_id].bytes_per_row FOR i := 0 TO palette_table[tileconfig.palette_id].max_rows-1 FOR j := 0 TO nbytes-1 tdest.row[i].byte[j] := 0 ENDFOR ENDFOR
Floating Point AVX Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] + b[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] + b[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] + b[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Swizzle Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst". FOR j := 0 to 3 i := j*64 IF imm8[j] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX Swizzle Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst". FOR j := 0 to 7 i := j*32 IF imm8[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX Swizzle Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst". FOR j := 0 to 3 i := j*64 IF mask[i+63] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX Swizzle Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst". FOR j := 0 to 7 i := j*32 IF mask[i+31] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". FOR j := 0 to 3 i := 64*j dst[i+63:i] := a[i+63:i] / b[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := a[i+31:i] / b[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Conditionally multiply the packed single-precision (32-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8". DEFINE DP(a[127:0], b[127:0], imm8[7:0]) { FOR j := 0 to 3 i := j*32 IF imm8[(4+j)%8] temp[i+31:i] := a[i+31:i] * b[i+31:i] ELSE temp[i+31:i] := FP32(0.0) FI ENDFOR sum[31:0] := (temp[127:96] + temp[95:64]) + (temp[63:32] + temp[31:0]) FOR j := 0 to 3 i := j*32 IF imm8[j%8] tmpdst[i+31:i] := sum[31:0] ELSE tmpdst[i+31:i] := FP32(0.0) FI ENDFOR RETURN tmpdst[127:0] } dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0]) dst[255:128] := DP(a[255:128], b[255:128], imm8[7:0]) dst[MAX:256] := 0
Floating Point AVX Arithmetic Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst". dst[63:0] := a[127:64] + a[63:0] dst[127:64] := b[127:64] + b[63:0] dst[191:128] := a[255:192] + a[191:128] dst[255:192] := b[255:192] + b[191:128] dst[MAX:256] := 0
Floating Point AVX Arithmetic Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst". dst[31:0] := a[63:32] + a[31:0] dst[63:32] := a[127:96] + a[95:64] dst[95:64] := b[63:32] + b[31:0] dst[127:96] := b[127:96] + b[95:64] dst[159:128] := a[191:160] + a[159:128] dst[191:160] := a[255:224] + a[223:192] dst[223:192] := b[191:160] + b[159:128] dst[255:224] := b[255:224] + b[223:192] dst[MAX:256] := 0
Floating Point AVX Arithmetic Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst". dst[63:0] := a[63:0] - a[127:64] dst[127:64] := b[63:0] - b[127:64] dst[191:128] := a[191:128] - a[255:192] dst[255:192] := b[191:128] - b[255:192] dst[MAX:256] := 0
Floating Point AVX Arithmetic Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst". dst[31:0] := a[31:0] - a[63:32] dst[63:32] := a[95:64] - a[127:96] dst[95:64] := b[31:0] - b[63:32] dst[127:96] := b[95:64] - b[127:96] dst[159:128] := a[159:128] - a[191:160] dst[191:160] := a[223:192] - a[255:224] dst[223:192] := b[159:128] - b[191:160] dst[255:224] := b[223:192] - b[255:224] dst[MAX:256] := 0
Floating Point AVX Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] * b[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] * b[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] OR b[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] OR b[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Swizzle Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst". dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64] dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192] dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192] dst[MAX:256] := 0
Floating Point AVX Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(b[127:0], imm8[5:4]) dst[127:96] := SELECT4(b[127:0], imm8[7:6]) dst[159:128] := SELECT4(a[255:128], imm8[1:0]) dst[191:160] := SELECT4(a[255:128], imm8[3:2]) dst[223:192] := SELECT4(b[255:128], imm8[5:4]) dst[255:224] := SELECT4(b[255:128], imm8[7:6]) dst[MAX:256] := 0
Floating Point AVX Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] - b[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 1 i := j*64 dst[i+63:i] := ( a[i+63:i] OP b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR dst[MAX:128] := 0
Floating Point AVX Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 3 i := j*64 dst[i+63:i] := ( a[i+63:i] OP b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR dst[MAX:256] := 0
Floating Point AVX Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] OP b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR dst[MAX:128] := 0
Floating Point AVX Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in "dst". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 7 i := j*32 dst[i+31:i] := ( a[i+31:i] OP b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR dst[MAX:256] := 0
Floating Point AVX Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC dst[63:0] := ( a[63:0] OP b[63:0] ) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC dst[31:0] := ( a[31:0] OP b[31:0] ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point Integer AVX Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*32 m := j*64 dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 32*j dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Swizzle Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] ESAC dst[MAX:128] := 0
Floating Point AVX Swizzle Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] ESAC dst[MAX:128] := 0
Integer AVX Swizzle Extract 128 bits (composed of integer data) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] ESAC dst[MAX:128] := 0
Integer AVX Swizzle Extract a 32-bit integer from "a", selected with "index", and store the result in "dst". dst[31:0] := (a[255:0] >> (index[2:0] * 32))[31:0]
Integer AVX Swizzle Extract a 64-bit integer from "a", selected with "index", and store the result in "dst". dst[63:0] := (a[255:0] >> (index[1:0] * 64))[63:0]
AVX General Support Zero the contents of all XMM or YMM registers. YMM0[MAX:0] := 0 YMM1[MAX:0] := 0 YMM2[MAX:0] := 0 YMM3[MAX:0] := 0 YMM4[MAX:0] := 0 YMM5[MAX:0] := 0 YMM6[MAX:0] := 0 YMM7[MAX:0] := 0 IF _64_BIT_MODE YMM8[MAX:0] := 0 YMM9[MAX:0] := 0 YMM10[MAX:0] := 0 YMM11[MAX:0] := 0 YMM12[MAX:0] := 0 YMM13[MAX:0] := 0 YMM14[MAX:0] := 0 YMM15[MAX:0] := 0 FI
AVX General Support Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified. YMM0[MAX:128] := 0 YMM1[MAX:128] := 0 YMM2[MAX:128] := 0 YMM3[MAX:128] := 0 YMM4[MAX:128] := 0 YMM5[MAX:128] := 0 YMM6[MAX:128] := 0 YMM7[MAX:128] := 0 IF _64_BIT_MODE YMM8[MAX:128] := 0 YMM9[MAX:128] := 0 YMM10[MAX:128] := 0 YMM11[MAX:128] := 0 YMM12[MAX:128] := 0 YMM13[MAX:128] := 0 YMM14[MAX:128] := 0 YMM15[MAX:128] := 0 FI
Floating Point AVX Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], b[1:0]) dst[63:32] := SELECT4(a[127:0], b[33:32]) dst[95:64] := SELECT4(a[127:0], b[65:64]) dst[127:96] := SELECT4(a[127:0], b[97:96]) dst[159:128] := SELECT4(a[255:128], b[129:128]) dst[191:160] := SELECT4(a[255:128], b[161:160]) dst[223:192] := SELECT4(a[255:128], b[193:192]) dst[255:224] := SELECT4(a[255:128], b[225:224]) dst[MAX:256] := 0
Floating Point AVX Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], b[1:0]) dst[63:32] := SELECT4(a[127:0], b[33:32]) dst[95:64] := SELECT4(a[127:0], b[65:64]) dst[127:96] := SELECT4(a[127:0], b[97:96]) dst[MAX:128] := 0
Floating Point AVX Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(a[127:0], imm8[5:4]) dst[127:96] := SELECT4(a[127:0], imm8[7:6]) dst[159:128] := SELECT4(a[255:128], imm8[1:0]) dst[191:160] := SELECT4(a[255:128], imm8[3:2]) dst[223:192] := SELECT4(a[255:128], imm8[5:4]) dst[255:224] := SELECT4(a[255:128], imm8[7:6]) dst[MAX:256] := 0
Floating Point AVX Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(a[127:0], imm8[5:4]) dst[127:96] := SELECT4(a[127:0], imm8[7:6]) dst[MAX:128] := 0
Floating Point AVX Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst". IF (b[1] == 0) dst[63:0] := a[63:0]; FI IF (b[1] == 1) dst[63:0] := a[127:64]; FI IF (b[65] == 0) dst[127:64] := a[63:0]; FI IF (b[65] == 1) dst[127:64] := a[127:64]; FI IF (b[129] == 0) dst[191:128] := a[191:128]; FI IF (b[129] == 1) dst[191:128] := a[255:192]; FI IF (b[193] == 0) dst[255:192] := a[191:128]; FI IF (b[193] == 1) dst[255:192] := a[255:192]; FI dst[MAX:256] := 0
Floating Point AVX Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst". IF (b[1] == 0) dst[63:0] := a[63:0]; FI IF (b[1] == 1) dst[63:0] := a[127:64]; FI IF (b[65] == 0) dst[127:64] := a[63:0]; FI IF (b[65] == 1) dst[127:64] := a[127:64]; FI dst[MAX:128] := 0
Floating Point AVX Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst". IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI IF (imm8[2] == 0) dst[191:128] := a[191:128]; FI IF (imm8[2] == 1) dst[191:128] := a[255:192]; FI IF (imm8[3] == 0) dst[255:192] := a[191:128]; FI IF (imm8[3] == 1) dst[255:192] := a[255:192]; FI dst[MAX:256] := 0
Floating Point AVX Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst". IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI dst[MAX:128] := 0
Floating Point AVX Swizzle Shuffle 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst". DEFINE SELECT4(src1, src2, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src1[127:0] 1: tmp[127:0] := src1[255:128] 2: tmp[127:0] := src2[127:0] 3: tmp[127:0] := src2[255:128] ESAC IF control[3] tmp[127:0] := 0 FI RETURN tmp[127:0] } dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0]) dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4]) dst[MAX:256] := 0
Floating Point AVX Swizzle Shuffle 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst". DEFINE SELECT4(src1, src2, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src1[127:0] 1: tmp[127:0] := src1[255:128] 2: tmp[127:0] := src2[127:0] 3: tmp[127:0] := src2[255:128] ESAC IF control[3] tmp[127:0] := 0 FI RETURN tmp[127:0] } dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0]) dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4]) dst[MAX:256] := 0
Integer AVX Swizzle Shuffle 128-bits (composed of integer data) selected by "imm8" from "a" and "b", and store the results in "dst". DEFINE SELECT4(src1, src2, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src1[127:0] 1: tmp[127:0] := src1[255:128] 2: tmp[127:0] := src2[127:0] 3: tmp[127:0] := src2[255:128] ESAC IF control[3] tmp[127:0] := 0 FI RETURN tmp[127:0] } dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0]) dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4]) dst[MAX:256] := 0
Floating Point AVX Load Broadcast a single-precision (32-bit) floating-point element from memory to all elements of "dst". tmp[31:0] := MEM[mem_addr+31:mem_addr] FOR j := 0 to 7 i := j*32 dst[i+31:i] := tmp[31:0] ENDFOR dst[MAX:256] := 0
Floating Point AVX Load Swizzle Broadcast a single-precision (32-bit) floating-point element from memory to all elements of "dst". tmp[31:0] := MEM[mem_addr+31:mem_addr] FOR j := 0 to 3 i := j*32 dst[i+31:i] := tmp[31:0] ENDFOR dst[MAX:128] := 0
Floating Point AVX Load Swizzle Broadcast a double-precision (64-bit) floating-point element from memory to all elements of "dst". tmp[63:0] := MEM[mem_addr+63:mem_addr] FOR j := 0 to 3 i := j*64 dst[i+63:i] := tmp[63:0] ENDFOR dst[MAX:256] := 0
Floating Point AVX Load Swizzle Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements of "dst". tmp[127:0] := MEM[mem_addr+127:mem_addr] dst[127:0] := tmp[127:0] dst[255:128] := tmp[127:0] dst[MAX:256] := 0
Floating Point AVX Load Swizzle Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements of "dst". tmp[127:0] := MEM[mem_addr+127:mem_addr] dst[127:0] := tmp[127:0] dst[255:128] := tmp[127:0] dst[MAX:256] := 0
Floating Point AVX Swizzle Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8". dst[255:0] := a[255:0] CASE (imm8[0]) OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] ESAC dst[MAX:256] := 0
Floating Point AVX Swizzle Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8". dst[255:0] := a[255:0] CASE imm8[0] OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] ESAC dst[MAX:256] := 0
Integer AVX Swizzle Copy "a" to "dst", then insert 128 bits from "b" into "dst" at the location specified by "imm8". dst[255:0] := a[255:0] CASE (imm8[0]) OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] ESAC dst[MAX:256] := 0
Integer AVX Swizzle Copy "a" to "dst", and insert the 8-bit integer "i" into "dst" at the location specified by "index". dst[255:0] := a[255:0] sel := index[4:0]*8 dst[sel+7:sel] := i[7:0]
Integer AVX Swizzle Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "index". dst[255:0] := a[255:0] sel := index[3:0]*16 dst[sel+15:sel] := i[15:0]
Integer AVX Swizzle Copy "a" to "dst", and insert the 32-bit integer "i" into "dst" at the location specified by "index". dst[255:0] := a[255:0] sel := index[2:0]*32 dst[sel+31:sel] := i[31:0]
Integer AVX Swizzle Copy "a" to "dst", and insert the 64-bit integer "i" into "dst" at the location specified by "index". dst[255:0] := a[255:0] sel := index[1:0]*64 dst[sel+63:sel] := i[63:0]
Floating Point AVX Load Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Floating Point AVX Store Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. MEM[mem_addr+255:mem_addr] := a[255:0]
Floating Point AVX Load Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into "dst". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Floating Point AVX Store Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. MEM[mem_addr+255:mem_addr] := a[255:0]
Floating Point AVX Load Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Floating Point AVX Store Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+255:mem_addr] := a[255:0]
Floating Point AVX Load Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Floating Point AVX Store Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+255:mem_addr] := a[255:0]
Integer AVX Load Load 256-bits of integer data from memory into "dst". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX Store Store 256-bits of integer data from "a" into memory. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. MEM[mem_addr+255:mem_addr] := a[255:0]
Integer AVX Load Load 256-bits of integer data from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX Store Store 256-bits of integer data from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+255:mem_addr] := a[255:0]
Floating Point AVX Load Load packed double-precision (64-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set). FOR j := 0 to 3 i := j*64 IF mask[i+63] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX Store Store packed double-precision (64-bit) floating-point elements from "a" into memory using "mask". FOR j := 0 to 3 i := j*64 IF mask[i+63] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Floating Point AVX Load Load packed double-precision (64-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set). FOR j := 0 to 1 i := j*64 IF mask[i+63] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX Store Store packed double-precision (64-bit) floating-point elements from "a" into memory using "mask". FOR j := 0 to 1 i := j*64 IF mask[i+63] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Floating Point AVX Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set). FOR j := 0 to 7 i := j*32 IF mask[i+31] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX Store Store packed single-precision (32-bit) floating-point elements from "a" into memory using "mask". FOR j := 0 to 7 i := j*32 IF mask[i+31] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Floating Point AVX Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using "mask" (elements are zeroed out when the high bit of the corresponding element is not set). FOR j := 0 to 3 i := j*32 IF mask[i+31] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX Store Store packed single-precision (32-bit) floating-point elements from "a" into memory using "mask". FOR j := 0 to 3 i := j*32 IF mask[i+31] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Floating Point AVX Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". dst[31:0] := a[63:32] dst[63:32] := a[63:32] dst[95:64] := a[127:96] dst[127:96] := a[127:96] dst[159:128] := a[191:160] dst[191:160] := a[191:160] dst[223:192] := a[255:224] dst[255:224] := a[255:224] dst[MAX:256] := 0
Floating Point AVX Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". dst[31:0] := a[31:0] dst[63:32] := a[31:0] dst[95:64] := a[95:64] dst[127:96] := a[95:64] dst[159:128] := a[159:128] dst[191:160] := a[159:128] dst[223:192] := a[223:192] dst[255:224] := a[223:192] dst[MAX:256] := 0
Floating Point AVX Move Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst". dst[63:0] := a[63:0] dst[127:64] := a[63:0] dst[191:128] := a[191:128] dst[255:192] := a[191:128] dst[MAX:256] := 0
Integer AVX Load Load 256-bits of integer data from unaligned memory into "dst". This intrinsic may perform better than "_mm256_loadu_si256" when the data crosses a cache line boundary. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX Store Store 256-bits of integer data from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. MEM[mem_addr+255:mem_addr] := a[255:0]
Floating Point AVX Store Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. MEM[mem_addr+255:mem_addr] := a[255:0]
Floating Point AVX Store Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. MEM[mem_addr+255:mem_addr] := a[255:0]
Floating Point AVX Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. FOR j := 0 to 7 i := j*32 dst[i+31:i] := 1.0 / a[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. FOR j := 0 to 7 i := j*32 dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := SQRT(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := SQRT(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst". [round_note] FOR j := 0 to 3 i := j*64 dst[i+63:i] := ROUND(a[i+63:i], rounding) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed single-precision floating-point elements in "dst". [round_note] FOR j := 0 to 7 i := j*32 dst[i+31:i] := ROUND(a[i+31:i], rounding) ENDFOR dst[MAX:256] := 0
Floating Point AVX Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Floating Point AVX Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Floating Point AVX Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Floating Point AVX Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Integer Flag AVX Logical Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "ZF" value. IF ((a[255:0] AND b[255:0]) == 0) ZF := 1 ELSE ZF := 0 FI IF (((NOT a[255:0]) AND b[255:0]) == 0) CF := 1 ELSE CF := 0 FI RETURN ZF
Integer Flag AVX Logical Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "CF" value. IF ((a[255:0] AND b[255:0]) == 0) ZF := 1 ELSE ZF := 0 FI IF (((NOT a[255:0]) AND b[255:0]) == 0) CF := 1 ELSE CF := 0 FI RETURN CF
Integer Flag AVX Logical Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. IF ((a[255:0] AND b[255:0]) == 0) ZF := 1 ELSE ZF := 0 FI IF (((NOT a[255:0]) AND b[255:0]) == 0) CF := 1 ELSE CF := 0 FI IF (ZF == 0 && CF == 0) dst := 1 ELSE dst := 0 FI
Floating Point Flag AVX Logical Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value. tmp[255:0] := a[255:0] AND b[255:0] IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0) ZF := 1 ELSE ZF := 0 FI tmp[255:0] := (NOT a[255:0]) AND b[255:0] IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0) CF := 1 ELSE CF := 0 FI dst := ZF
Floating Point Flag AVX Logical Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value. tmp[255:0] := a[255:0] AND b[255:0] IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0) ZF := 1 ELSE ZF := 0 FI tmp[255:0] := (NOT a[255:0]) AND b[255:0] IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0) CF := 1 ELSE CF := 0 FI dst := CF
Floating Point Flag AVX Logical Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. tmp[255:0] := a[255:0] AND b[255:0] IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0) ZF := 1 ELSE ZF := 0 FI tmp[255:0] := (NOT a[255:0]) AND b[255:0] IF (tmp[63] == 0 && tmp[127] == 0 && tmp[191] == 0 && tmp[255] == 0) CF := 1 ELSE CF := 0 FI IF (ZF == 0 && CF == 0) dst := 1 ELSE dst := 0 FI
Floating Point Flag AVX Logical Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value. tmp[127:0] := a[127:0] AND b[127:0] IF (tmp[63] == 0 && tmp[127] == 0) ZF := 1 ELSE ZF := 0 FI tmp[127:0] := (NOT a[127:0]) AND b[127:0] IF (tmp[63] == 0 && tmp[127] == 0) CF := 1 ELSE CF := 0 FI dst := ZF
Floating Point Flag AVX Logical Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value. tmp[127:0] := a[127:0] AND b[127:0] IF (tmp[63] == 0 && tmp[127] == 0) ZF := 1 ELSE ZF := 0 FI tmp[127:0] := (NOT a[127:0]) AND b[127:0] IF (tmp[63] == 0 && tmp[127] == 0) CF := 1 ELSE CF := 0 FI dst := CF
Floating Point Flag AVX Logical Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. tmp[127:0] := a[127:0] AND b[127:0] IF (tmp[63] == 0 && tmp[127] == 0) ZF := 1 ELSE ZF := 0 FI tmp[127:0] := (NOT a[127:0]) AND b[127:0] IF (tmp[63] == 0 && tmp[127] == 0) CF := 1 ELSE CF := 0 FI IF (ZF == 0 && CF == 0) dst := 1 ELSE dst := 0 FI
Floating Point Flag AVX Logical Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value. tmp[255:0] := a[255:0] AND b[255:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \ tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0) ZF := 1 ELSE ZF := 0 FI tmp[255:0] := (NOT a[255:0]) AND b[255:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \ tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0) CF := 1 ELSE CF := 0 FI dst := ZF
Floating Point Flag AVX Logical Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value. tmp[255:0] := a[255:0] AND b[255:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \ tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0) ZF := 1 ELSE ZF := 0 FI tmp[255:0] := (NOT a[255:0]) AND b[255:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \ tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0) CF := 1 ELSE CF := 0 FI dst := CF
Floating Point Flag AVX Logical Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. tmp[255:0] := a[255:0] AND b[255:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \ tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0) ZF := 1 ELSE ZF := 0 FI tmp[255:0] := (NOT a[255:0]) AND b[255:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0 && \ tmp[159] == 0 && tmp[191] == 0 && tmp[223] == 0 && tmp[255] == 0) CF := 1 ELSE CF := 0 FI IF (ZF == 0 && CF == 0) dst := 1 ELSE dst := 0 FI
Floating Point Flag AVX Logical Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value. tmp[127:0] := a[127:0] AND b[127:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0) ZF := 1 ELSE ZF := 0 FI tmp[127:0] := (NOT a[127:0]) AND b[127:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0) CF := 1 ELSE CF := 0 FI dst := ZF
Floating Point Flag AVX Logical Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value. tmp[127:0] := a[127:0] AND b[127:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0) ZF := 1 ELSE ZF := 0 FI tmp[127:0] := (NOT a[127:0]) AND b[127:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0) CF := 1 ELSE CF := 0 FI dst := CF
Floating Point Flag AVX Logical Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. tmp[127:0] := a[127:0] AND b[127:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0) ZF := 1 ELSE ZF := 0 FI tmp[127:0] := (NOT a[127:0]) AND b[127:0] IF (tmp[31] == 0 && tmp[63] == 0 && tmp[95] == 0 && tmp[127] == 0) CF := 1 ELSE CF := 0 FI IF (ZF == 0 && CF == 0) dst := 1 ELSE dst := 0 FI
Floating Point AVX Miscellaneous Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a". FOR j := 0 to 3 i := j*64 IF a[i+63] dst[j] := 1 ELSE dst[j] := 0 FI ENDFOR dst[MAX:4] := 0
Floating Point AVX Miscellaneous Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a". FOR j := 0 to 7 i := j*32 IF a[i+31] dst[j] := 1 ELSE dst[j] := 0 FI ENDFOR dst[MAX:8] := 0
Floating Point AVX Set Return vector of type __m256d with all elements set to zero. dst[MAX:0] := 0
Floating Point AVX Set Return vector of type __m256 with all elements set to zero. dst[MAX:0] := 0
Integer AVX Set Return vector of type __m256i with all elements set to zero. dst[MAX:0] := 0
Floating Point AVX Set Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values. dst[63:0] := e0 dst[127:64] := e1 dst[191:128] := e2 dst[255:192] := e3 dst[MAX:256] := 0
Floating Point AVX Set Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values. dst[31:0] := e0 dst[63:32] := e1 dst[95:64] := e2 dst[127:96] := e3 dst[159:128] := e4 dst[191:160] := e5 dst[223:192] := e6 dst[255:224] := e7 dst[MAX:256] := 0
Integer AVX Set Set packed 8-bit integers in "dst" with the supplied values. dst[7:0] := e0 dst[15:8] := e1 dst[23:16] := e2 dst[31:24] := e3 dst[39:32] := e4 dst[47:40] := e5 dst[55:48] := e6 dst[63:56] := e7 dst[71:64] := e8 dst[79:72] := e9 dst[87:80] := e10 dst[95:88] := e11 dst[103:96] := e12 dst[111:104] := e13 dst[119:112] := e14 dst[127:120] := e15 dst[135:128] := e16 dst[143:136] := e17 dst[151:144] := e18 dst[159:152] := e19 dst[167:160] := e20 dst[175:168] := e21 dst[183:176] := e22 dst[191:184] := e23 dst[199:192] := e24 dst[207:200] := e25 dst[215:208] := e26 dst[223:216] := e27 dst[231:224] := e28 dst[239:232] := e29 dst[247:240] := e30 dst[255:248] := e31 dst[MAX:256] := 0
Integer AVX Set Set packed 16-bit integers in "dst" with the supplied values. dst[15:0] := e0 dst[31:16] := e1 dst[47:32] := e2 dst[63:48] := e3 dst[79:64] := e4 dst[95:80] := e5 dst[111:96] := e6 dst[127:112] := e7 dst[143:128] := e8 dst[159:144] := e9 dst[175:160] := e10 dst[191:176] := e11 dst[207:192] := e12 dst[223:208] := e13 dst[239:224] := e14 dst[255:240] := e15 dst[MAX:256] := 0
Integer AVX Set Set packed 32-bit integers in "dst" with the supplied values. dst[31:0] := e0 dst[63:32] := e1 dst[95:64] := e2 dst[127:96] := e3 dst[159:128] := e4 dst[191:160] := e5 dst[223:192] := e6 dst[255:224] := e7 dst[MAX:256] := 0
Integer AVX Set Set packed 64-bit integers in "dst" with the supplied values. dst[63:0] := e0 dst[127:64] := e1 dst[191:128] := e2 dst[255:192] := e3 dst[MAX:256] := 0
Floating Point AVX Set Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order. dst[63:0] := e3 dst[127:64] := e2 dst[191:128] := e1 dst[255:192] := e0 dst[MAX:256] := 0
Floating Point AVX Set Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order. dst[31:0] := e7 dst[63:32] := e6 dst[95:64] := e5 dst[127:96] := e4 dst[159:128] := e3 dst[191:160] := e2 dst[223:192] := e1 dst[255:224] := e0 dst[MAX:256] := 0
Integer AVX Set Set packed 8-bit integers in "dst" with the supplied values in reverse order. dst[7:0] := e31 dst[15:8] := e30 dst[23:16] := e29 dst[31:24] := e28 dst[39:32] := e27 dst[47:40] := e26 dst[55:48] := e25 dst[63:56] := e24 dst[71:64] := e23 dst[79:72] := e22 dst[87:80] := e21 dst[95:88] := e20 dst[103:96] := e19 dst[111:104] := e18 dst[119:112] := e17 dst[127:120] := e16 dst[135:128] := e15 dst[143:136] := e14 dst[151:144] := e13 dst[159:152] := e12 dst[167:160] := e11 dst[175:168] := e10 dst[183:176] := e9 dst[191:184] := e8 dst[199:192] := e7 dst[207:200] := e6 dst[215:208] := e5 dst[223:216] := e4 dst[231:224] := e3 dst[239:232] := e2 dst[247:240] := e1 dst[255:248] := e0 dst[MAX:256] := 0
Integer AVX Set Set packed 16-bit integers in "dst" with the supplied values in reverse order. dst[15:0] := e15 dst[31:16] := e14 dst[47:32] := e13 dst[63:48] := e12 dst[79:64] := e11 dst[95:80] := e10 dst[111:96] := e9 dst[127:112] := e8 dst[143:128] := e7 dst[159:144] := e6 dst[175:160] := e5 dst[191:176] := e4 dst[207:192] := e3 dst[223:208] := e2 dst[239:224] := e1 dst[255:240] := e0 dst[MAX:256] := 0
Integer AVX Set Set packed 32-bit integers in "dst" with the supplied values in reverse order. dst[31:0] := e7 dst[63:32] := e6 dst[95:64] := e5 dst[127:96] := e4 dst[159:128] := e3 dst[191:160] := e2 dst[223:192] := e1 dst[255:224] := e0 dst[MAX:256] := 0
Integer AVX Set Set packed 64-bit integers in "dst" with the supplied values in reverse order. dst[63:0] := e3 dst[127:64] := e2 dst[191:128] := e1 dst[255:192] := e0 dst[MAX:256] := 0
Floating Point AVX Set Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:256] := 0
Floating Point AVX Set Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:256] := 0
Integer AVX Set Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastb". FOR j := 0 to 31 i := j*8 dst[i+7:i] := a[7:0] ENDFOR dst[MAX:256] := 0
Integer AVX Set Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate the "vpbroadcastw". FOR j := 0 to 15 i := j*16 dst[i+15:i] := a[15:0] ENDFOR dst[MAX:256] := 0
Integer AVX Set Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastd". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:256] := 0
Integer AVX Set Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:256] := 0
Floating Point AVX Cast Cast vector of type __m256d to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m256 to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer AVX Cast Cast vector of type __m256 to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer AVX Cast Cast vector of type __m256d to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m256i to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m256i to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m256 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m256d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX Cast Cast vector of type __m256i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX Cast Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m128 to type __m256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Cast Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX Cast Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := FLOOR(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := CEIL(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := FLOOR(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := CEIL(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX General Support Return vector of type __m256 with undefined elements.
Floating Point AVX General Support Return vector of type __m256d with undefined elements.
Integer AVX General Support Return vector of type __m256i with undefined elements.
Floating Point AVX Set Set packed __m256 vector "dst" with the supplied values. dst[127:0] := lo[127:0] dst[255:128] := hi[127:0] dst[MAX:256] := 0
Floating Point AVX Set Set packed __m256d vector "dst" with the supplied values. dst[127:0] := lo[127:0] dst[255:128] := hi[127:0] dst[MAX:256] := 0
Integer AVX Set Set packed __m256i vector "dst" with the supplied values. dst[127:0] := lo[127:0] dst[255:128] := hi[127:0] dst[MAX:256] := 0
Floating Point AVX Set Set packed __m256 vector "dst" with the supplied values. dst[127:0] := lo[127:0] dst[255:128] := hi[127:0] dst[MAX:256] := 0
Floating Point AVX Set Set packed __m256d vector "dst" with the supplied values. dst[127:0] := lo[127:0] dst[255:128] := hi[127:0] dst[MAX:256] := 0
Integer AVX Set Set packed __m256i vector "dst" with the supplied values. dst[127:0] := lo[127:0] dst[255:128] := hi[127:0] dst[MAX:256] := 0
Floating Point AVX Load Load two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value in "dst". "hiaddr" and "loaddr" do not need to be aligned on any particular boundary. dst[127:0] := MEM[loaddr+127:loaddr] dst[255:128] := MEM[hiaddr+127:hiaddr] dst[MAX:256] := 0
Floating Point AVX Load Load two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value in "dst". "hiaddr" and "loaddr" do not need to be aligned on any particular boundary. dst[127:0] := MEM[loaddr+127:loaddr] dst[255:128] := MEM[hiaddr+127:hiaddr] dst[MAX:256] := 0
Integer AVX Load Load two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value in "dst". "hiaddr" and "loaddr" do not need to be aligned on any particular boundary. dst[127:0] := MEM[loaddr+127:loaddr] dst[255:128] := MEM[hiaddr+127:hiaddr] dst[MAX:256] := 0
Floating Point AVX Store Store the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory two different 128-bit locations. "hiaddr" and "loaddr" do not need to be aligned on any particular boundary. MEM[loaddr+127:loaddr] := a[127:0] MEM[hiaddr+127:hiaddr] := a[255:128]
Floating Point AVX Store Store the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory two different 128-bit locations. "hiaddr" and "loaddr" do not need to be aligned on any particular boundary. MEM[loaddr+127:loaddr] := a[127:0] MEM[hiaddr+127:hiaddr] := a[255:128]
Integer AVX Store Store the high and low 128-bit halves (each composed of integer data) from "a" into memory two different 128-bit locations. "hiaddr" and "loaddr" do not need to be aligned on any particular boundary. MEM[loaddr+127:loaddr] := a[127:0] MEM[hiaddr+127:hiaddr] := a[255:128]
Floating Point AVX Trigonometry Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ACOS(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ACOS(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ACOSH(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ACOSH(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ASIN(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ASIN(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ASINH(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ASINH(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ATAN(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ATAN(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians. FOR j := 0 to 3 i := j*64 dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians. FOR j := 0 to 7 i := j*32 dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ATANH(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ATANH(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := CubeRoot(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := CubeRoot(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := CDFNormal(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := CDFNormal(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := InverseCDFNormal(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := InverseCDFNormal(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of "e" raised to the power of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]". DEFINE CEXP(a[31:0], b[31:0]) { result[31:0] := POW(FP32(e), a[31:0]) * COS(b[31:0]) result[63:32] := POW(FP32(e), a[31:0]) * SIN(b[31:0]) RETURN result } FOR j := 0 to 3 i := j*64 dst[i+63:i] := CEXP(a[i+31:i], a[i+63:i+32]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the natural logarithm of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]". DEFINE CLOG(a[31:0], b[31:0]) { result[31:0] := LOG(SQRT(POW(a, 2.0) + POW(b, 2.0))) result[63:32] := ATAN2(b, a) RETURN result } FOR j := 0 to 3 i := j*64 dst[i+63:i] := CLOG(a[i+31:i], a[i+63:i+32]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := COS(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := COS(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := COSD(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := COSD(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := COSH(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := COSH(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the square root of packed complex snumbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]". DEFINE CSQRT(a[31:0], b[31:0]) { sign[31:0] := (b < 0.0) ? -FP32(1.0) : FP32(1.0) result[31:0] := SQRT((a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0) result[63:32] := sign * SQRT((-a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0) RETURN result } FOR j := 0 to 3 i := j*64 dst[i+63:i] := CSQRT(a[i+31:i], a[i+63:i+32]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 31 i := 8*j IF b[i+7:i] == 0 #DE FI dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 15 i := 16*j IF b[i+15:i] == 0 #DE FI dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 7 i := 32*j IF b[i+31:i] == 0 #DE FI dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 3 i := 64*j IF b[i+63:i] == 0 #DE FI dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 31 i := 8*j IF b[i+7:i] == 0 #DE FI dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 15 i := 16*j IF b[i+15:i] == 0 #DE FI dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 7 i := 32*j IF b[i+31:i] == 0 #DE FI dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 3 i := 64*j IF b[i+63:i] == 0 #DE FI dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ERF(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ERF(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := 1.0 - ERF(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+63:i] := 1.0 - ERF(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i])) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i])) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := 1.0 / ERF(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Probability/Statistics Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+63:i] := 1.0 / ERF(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := POW(e, a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := POW(FP32(e), a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := POW(10.0, a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := POW(FP32(10.0), a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := POW(2.0, a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := POW(e, a[i+63:i]) - 1.0 ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0 ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0)) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0)) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed 32-bit integers into memory at "mem_addr". FOR j := 0 to 7 i := 32*j dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i]) MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the inverse cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := InvCubeRoot(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the inverse cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := InvCubeRoot(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := InvSQRT(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := InvSQRT(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := LOG(1.0 + a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := LOG(1.0 + a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 3 i := j*64 dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*32 dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := POW(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := POW(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 31 i := 8*j dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 15 i := 16*j dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 3 i := 64*j dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 31 i := 8*j dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 15 i := 16*j dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 3 i := 64*j dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := SIN(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := SIN(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr". FOR j := 0 to 3 i := j*64 dst[i+63:i] := SIN(a[i+63:i]) MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr". FOR j := 0 to 7 i := j*32 dst[i+31:i] := SIN(a[i+31:i]) MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := SIND(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := SIND(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := SINH(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := SINH(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction. FOR j := 0 to 3 i := j*64 dst[i+63:i] := CEIL(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction. FOR j := 0 to 7 i := j*32 dst[i+31:i] := CEIL(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction. FOR j := 0 to 3 i := j*64 dst[i+63:i] := FLOOR(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction. FOR j := 0 to 7 i := j*32 dst[i+31:i] := FLOOR(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction. FOR j := 0 to 3 i := j*64 dst[i+63:i] := ROUND(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction. FOR j := 0 to 7 i := j*32 dst[i+31:i] := ROUND(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_pd". FOR j := 0 to 3 i := j*64 dst[i+63:i] := SQRT(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps". FOR j := 0 to 7 i := j*32 dst[i+31:i] := SQRT(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := TAN(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := TAN(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := TAND(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := TAND(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := TANH(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Trigonometry Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := TANH(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Miscellaneous Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction. FOR j := 0 to 3 i := j*64 dst[i+63:i] := TRUNCATE(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Miscellaneous Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction. FOR j := 0 to 7 i := j*32 dst[i+31:i] := TRUNCATE(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed unsigned 32-bit integers into memory at "mem_addr". FOR j := 0 to 7 i := 32*j dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i]) MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX Convert Copy the lower single-precision (32-bit) floating-point element of "a" to "dst". dst[31:0] := a[31:0]
Floating Point AVX Convert Copy the lower double-precision (64-bit) floating-point element of "a" to "dst". dst[63:0] := a[63:0]
Integer AVX Convert Copy the lower 32-bit integer in "a" to "dst". dst[31:0] := a[31:0]
Integer AVX2 Swizzle Extract an 8-bit integer from "a", selected with "index", and store the result in "dst". dst[7:0] := (a[255:0] >> (index[4:0] * 8))[7:0]
Integer AVX2 Swizzle Extract a 16-bit integer from "a", selected with "index", and store the result in "dst". dst[15:0] := (a[255:0] >> (index[3:0] * 16))[15:0]
Integer AVX2 Special Math Functions Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := ABS(a[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := ABS(a[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ABS(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := a[i+7:i] + b[i+7:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := a[i+15:i] + b[i+15:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] + b[i+63:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Add packed 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Add packed 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ENDFOR dst[MAX:256] := 0
Integer AVX2 Miscellaneous Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst". FOR j := 0 to 1 i := j*128 tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8) dst[i+127:i] := tmp[127:0] ENDFOR dst[MAX:256] := 0
Integer AVX2 Logical Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[255:0] := (a[255:0] AND b[255:0]) dst[MAX:256] := 0
Integer AVX2 Logical Compute the bitwise NOT of 256 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst". dst[255:0] := ((NOT a[255:0]) AND b[255:0]) dst[MAX:256] := 0
Integer AVX2 Probability/Statistics Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ENDFOR dst[MAX:256] := 0
Integer AVX2 Probability/Statistics Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Blend packed 16-bit integers from "a" and "b" within 128-bit lanes using control mask "imm8", and store the results in "dst". FOR j := 0 to 15 i := j*16 IF imm8[j%8] dst[i+15:i] := b[i+15:i] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Blend packed 32-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst". FOR j := 0 to 3 i := j*32 IF imm8[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX2 Swizzle Blend packed 32-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst". FOR j := 0 to 7 i := j*32 IF imm8[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Blend packed 8-bit integers from "a" and "b" using "mask", and store the results in "dst". FOR j := 0 to 31 i := j*8 IF mask[i+7] dst[i+7:i] := b[i+7:i] ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Broadcast the low packed 8-bit integer from "a" to all elements of "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := a[7:0] ENDFOR dst[MAX:128] := 0
Integer AVX2 Swizzle Broadcast the low packed 8-bit integer from "a" to all elements of "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := a[7:0] ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Broadcast the low packed 32-bit integer from "a" to all elements of "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:128] := 0
Integer AVX2 Swizzle Broadcast the low packed 32-bit integer from "a" to all elements of "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Broadcast the low packed 64-bit integer from "a" to all elements of "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:128] := 0
Integer AVX2 Swizzle Broadcast the low packed 64-bit integer from "a" to all elements of "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:256] := 0
Floating Point AVX2 Swizzle Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:128] := 0
Floating Point AVX2 Swizzle Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Broadcast 128 bits of integer data from "a" to all 128-bit lanes in "dst". dst[127:0] := a[127:0] dst[255:128] := a[127:0] dst[MAX:256] := 0
Integer AVX2 Swizzle Broadcast 128 bits of integer data from "a" to all 128-bit lanes in "dst". dst[127:0] := a[127:0] dst[255:128] := a[127:0] dst[MAX:256] := 0
Floating Point AVX2 Swizzle Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:128] := 0
Floating Point AVX2 Swizzle Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Broadcast the low packed 16-bit integer from "a" to all elements of "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := a[15:0] ENDFOR dst[MAX:128] := 0
Integer AVX2 Swizzle Broadcast the low packed 16-bit integer from "a" to all elements of "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := a[15:0] ENDFOR dst[MAX:256] := 0
Integer AVX2 Compare Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Compare Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Compare Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Compare Compare packed 64-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ( a[i+63:i] == b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ( a[i+63:i] > b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j:= 0 to 7 i := 32*j k := 16*j dst[i+31:i] := SignExtend32(a[k+15:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j:= 0 to 3 i := 64*j k := 16*j dst[i+63:i] := SignExtend64(a[k+15:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j:= 0 to 3 i := 64*j k := 32*j dst[i+63:i] := SignExtend64(a[k+31:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst". FOR j := 0 to 15 i := j*8 l := j*16 dst[l+15:l] := SignExtend16(a[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 8*j dst[i+31:i] := SignExtend32(a[k+7:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 8*j dst[i+63:i] := SignExtend64(a[k+7:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 16*j dst[i+31:i] := ZeroExtend32(a[k+15:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j:= 0 to 3 i := 64*j k := 16*j dst[i+63:i] := ZeroExtend64(a[k+15:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j:= 0 to 3 i := 64*j k := 32*j dst[i+63:i] := ZeroExtend64(a[k+31:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst". FOR j := 0 to 15 i := j*8 l := j*16 dst[l+15:l] := ZeroExtend16(a[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 8*j dst[i+31:i] := ZeroExtend32(a[k+7:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Convert Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 8*j dst[i+63:i] := ZeroExtend64(a[k+7:k]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Extract 128 bits (composed of integer data) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] ESAC dst[MAX:128] := 0
Integer AVX2 Arithmetic Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". dst[15:0] := a[31:16] + a[15:0] dst[31:16] := a[63:48] + a[47:32] dst[47:32] := a[95:80] + a[79:64] dst[63:48] := a[127:112] + a[111:96] dst[79:64] := b[31:16] + b[15:0] dst[95:80] := b[63:48] + b[47:32] dst[111:96] := b[95:80] + b[79:64] dst[127:112] := b[127:112] + b[111:96] dst[143:128] := a[159:144] + a[143:128] dst[159:144] := a[191:176] + a[175:160] dst[175:160] := a[223:208] + a[207:192] dst[191:176] := a[255:240] + a[239:224] dst[207:192] := b[159:144] + b[143:128] dst[223:208] := b[191:176] + b[175:160] dst[239:224] := b[223:208] + b[207:192] dst[255:240] := b[255:240] + b[239:224] dst[MAX:256] := 0
Integer AVX2 Arithmetic Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". dst[31:0] := a[63:32] + a[31:0] dst[63:32] := a[127:96] + a[95:64] dst[95:64] := b[63:32] + b[31:0] dst[127:96] := b[127:96] + b[95:64] dst[159:128] := a[191:160] + a[159:128] dst[191:160] := a[255:224] + a[223:192] dst[223:192] := b[191:160] + b[159:128] dst[255:224] := b[255:224] + b[223:192] dst[MAX:256] := 0
Integer AVX2 Arithmetic Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". dst[15:0] := Saturate16(a[31:16] + a[15:0]) dst[31:16] := Saturate16(a[63:48] + a[47:32]) dst[47:32] := Saturate16(a[95:80] + a[79:64]) dst[63:48] := Saturate16(a[127:112] + a[111:96]) dst[79:64] := Saturate16(b[31:16] + b[15:0]) dst[95:80] := Saturate16(b[63:48] + b[47:32]) dst[111:96] := Saturate16(b[95:80] + b[79:64]) dst[127:112] := Saturate16(b[127:112] + b[111:96]) dst[143:128] := Saturate16(a[159:144] + a[143:128]) dst[159:144] := Saturate16(a[191:176] + a[175:160]) dst[175:160] := Saturate16(a[223:208] + a[207:192]) dst[191:176] := Saturate16(a[255:240] + a[239:224]) dst[207:192] := Saturate16(b[159:144] + b[143:128]) dst[223:208] := Saturate16(b[191:176] + b[175:160]) dst[239:224] := Saturate16(b[223:208] + b[207:192]) dst[255:240] := Saturate16(b[255:240] + b[239:224]) dst[MAX:256] := 0
Integer AVX2 Arithmetic Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". dst[15:0] := a[15:0] - a[31:16] dst[31:16] := a[47:32] - a[63:48] dst[47:32] := a[79:64] - a[95:80] dst[63:48] := a[111:96] - a[127:112] dst[79:64] := b[15:0] - b[31:16] dst[95:80] := b[47:32] - b[63:48] dst[111:96] := b[79:64] - b[95:80] dst[127:112] := b[111:96] - b[127:112] dst[143:128] := a[143:128] - a[159:144] dst[159:144] := a[175:160] - a[191:176] dst[175:160] := a[207:192] - a[223:208] dst[191:176] := a[239:224] - a[255:240] dst[207:192] := b[143:128] - b[159:144] dst[223:208] := b[175:160] - b[191:176] dst[239:224] := b[207:192] - b[223:208] dst[255:240] := b[239:224] - b[255:240] dst[MAX:256] := 0
Integer AVX2 Arithmetic Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". dst[31:0] := a[31:0] - a[63:32] dst[63:32] := a[95:64] - a[127:96] dst[95:64] := b[31:0] - b[63:32] dst[127:96] := b[95:64] - b[127:96] dst[159:128] := a[159:128] - a[191:160] dst[191:160] := a[223:192] - a[255:224] dst[223:192] := b[159:128] - b[191:160] dst[255:224] := b[223:192] - b[255:224] dst[MAX:256] := 0
Integer AVX2 Arithmetic Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". dst[15:0] := Saturate16(a[15:0] - a[31:16]) dst[31:16] := Saturate16(a[47:32] - a[63:48]) dst[47:32] := Saturate16(a[79:64] - a[95:80]) dst[63:48] := Saturate16(a[111:96] - a[127:112]) dst[79:64] := Saturate16(b[15:0] - b[31:16]) dst[95:80] := Saturate16(b[47:32] - b[63:48]) dst[111:96] := Saturate16(b[79:64] - b[95:80]) dst[127:112] := Saturate16(b[111:96] - b[127:112]) dst[143:128] := Saturate16(a[143:128] - a[159:144]) dst[159:144] := Saturate16(a[175:160] - a[191:176]) dst[175:160] := Saturate16(a[207:192] - a[223:208]) dst[191:176] := Saturate16(a[239:224] - a[255:240]) dst[207:192] := Saturate16(b[143:128] - b[159:144]) dst[223:208] := Saturate16(b[175:160] - b[191:176]) dst[239:224] := Saturate16(b[207:192] - b[223:208]) dst[255:240] := Saturate16(b[239:224] - b[255:240]) dst[MAX:256] := 0
Floating Point AVX2 Load Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:128] := 0
Floating Point AVX2 Load Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:256] := 0
Floating Point AVX2 Load Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:128] := 0
Floating Point AVX2 Load Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:256] := 0
Integer AVX2 Load Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:128] := 0
Integer AVX2 Load Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:256] := 0
Integer AVX2 Load Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:128] := 0
Integer AVX2 Load Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:256] := 0
Floating Point AVX2 Load Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:128] := 0
Floating Point AVX2 Load Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:256] := 0
Floating Point AVX2 Load Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:64] := 0
Floating Point AVX2 Load Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:128] := 0
Integer AVX2 Load Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:64] := 0
Integer AVX2 Load Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:128] := 0
Integer AVX2 Load Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:128] := 0
Integer AVX2 Load Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Copy "a" to "dst", then insert 128 bits (composed of integer data) from "b" into "dst" at the location specified by "imm8". dst[255:0] := a[255:0] CASE (imm8[0]) OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] ESAC dst[MAX:256] := 0
Integer AVX2 Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ENDFOR dst[MAX:256] := 0
Floating Point AVX2 Load Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 IF mask[i+63] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR mask[MAX:128] := 0 dst[MAX:128] := 0
Floating Point AVX2 Load Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 IF mask[i+63] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR mask[MAX:256] := 0 dst[MAX:256] := 0
Floating Point AVX2 Load Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 IF mask[i+31] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR mask[MAX:128] := 0 dst[MAX:128] := 0
Floating Point AVX2 Load Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 IF mask[i+31] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR mask[MAX:256] := 0 dst[MAX:256] := 0
Integer AVX2 Load Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 IF mask[i+31] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR mask[MAX:128] := 0 dst[MAX:128] := 0
Integer AVX2 Load Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 IF mask[i+31] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR mask[MAX:256] := 0 dst[MAX:256] := 0
Integer AVX2 Load Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 IF mask[i+63] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR mask[MAX:128] := 0 dst[MAX:128] := 0
Integer AVX2 Load Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 IF mask[i+63] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR mask[MAX:256] := 0 dst[MAX:256] := 0
Floating Point AVX2 Load Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 IF mask[i+63] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR mask[MAX:128] := 0 dst[MAX:128] := 0
Floating Point AVX2 Load Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 IF mask[i+63] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR mask[MAX:256] := 0 dst[MAX:256] := 0
Floating Point AVX2 Load Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 IF mask[i+31] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR mask[MAX:64] := 0 dst[MAX:64] := 0
Floating Point AVX2 Load Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 IF mask[i+31] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR mask[MAX:128] := 0 dst[MAX:128] := 0
Integer AVX2 Load Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 IF mask[i+31] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR mask[MAX:64] := 0 dst[MAX:64] := 0
Integer AVX2 Load Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 IF mask[i+31] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR mask[MAX:128] := 0 dst[MAX:128] := 0
Integer AVX2 Load Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 IF mask[i+63] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR mask[MAX:128] := 0 dst[MAX:128] := 0
Integer AVX2 Load Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using "mask" (elements are copied from "src" when the highest bit is not set in the corresponding element). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 IF mask[i+63] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR mask[MAX:256] := 0 dst[MAX:256] := 0
Integer AVX2 Load Load packed 32-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element). FOR j := 0 to 3 i := j*32 IF mask[i+31] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX2 Load Load packed 32-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element). FOR j := 0 to 7 i := j*32 IF mask[i+31] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Load Load packed 64-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element). FOR j := 0 to 1 i := j*64 IF mask[i+63] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX2 Load Load packed 64-bit integers from memory into "dst" using "mask" (elements are zeroed out when the highest bit is not set in the corresponding element). FOR j := 0 to 3 i := j*64 IF mask[i+63] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Store Store packed 32-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element). FOR j := 0 to 3 i := j*32 IF mask[i+31] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Integer AVX2 Store Store packed 32-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element). FOR j := 0 to 7 i := j*32 IF mask[i+31] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Integer AVX2 Store Store packed 64-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element). FOR j := 0 to 1 i := j*64 IF mask[i+63] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Integer AVX2 Store Store packed 64-bit integers from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element). FOR j := 0 to 3 i := j*64 IF mask[i+63] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Integer AVX2 Special Math Functions Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Miscellaneous Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst". FOR j := 0 to 31 i := j*8 dst[j] := a[i+7] ENDFOR
Integer AVX2 Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst". Eight SADs are performed for each 128-bit lane using one quadruplet from "b" and eight quadruplets from "a". One quadruplet is selected from "b" starting at on the offset specified in "imm8". Eight quadruplets are formed from sequential 8-bit integers selected from "a" starting at the offset specified in "imm8". DEFINE MPSADBW(a[127:0], b[127:0], imm8[2:0]) { a_offset := imm8[2]*32 b_offset := imm8[1:0]*32 FOR j := 0 to 7 i := j*8 k := a_offset+i l := b_offset tmp[i*2+15:i*2] := ABS(Signed(a[k+7:k] - b[l+7:l])) + ABS(Signed(a[k+15:k+8] - b[l+15:l+8])) + \ ABS(Signed(a[k+23:k+16] - b[l+23:l+16])) + ABS(Signed(a[k+31:k+24] - b[l+31:l+24])) ENDFOR RETURN tmp[127:0] } dst[127:0] := MPSADBW(a[127:0], b[127:0], imm8[2:0]) dst[255:128] := MPSADBW(a[255:128], b[255:128], imm8[5:3]) dst[MAX:256] := 0
Integer AVX2 Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+31:i] * b[i+31:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 15 i := j*16 tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 15 i := j*16 tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst". FOR j := 0 to 15 i := j*16 tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst". FOR j := 0 to 15 i := j*16 tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Multiply the packed signed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst". FOR j := 0 to 7 i := j*32 tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ENDFOR dst[MAX:256] := 0
Integer AVX2 Logical Compute the bitwise OR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[255:0] := (a[255:0] OR b[255:0]) dst[MAX:256] := 0
Integer AVX2 Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst". dst[7:0] := Saturate8(a[15:0]) dst[15:8] := Saturate8(a[31:16]) dst[23:16] := Saturate8(a[47:32]) dst[31:24] := Saturate8(a[63:48]) dst[39:32] := Saturate8(a[79:64]) dst[47:40] := Saturate8(a[95:80]) dst[55:48] := Saturate8(a[111:96]) dst[63:56] := Saturate8(a[127:112]) dst[71:64] := Saturate8(b[15:0]) dst[79:72] := Saturate8(b[31:16]) dst[87:80] := Saturate8(b[47:32]) dst[95:88] := Saturate8(b[63:48]) dst[103:96] := Saturate8(b[79:64]) dst[111:104] := Saturate8(b[95:80]) dst[119:112] := Saturate8(b[111:96]) dst[127:120] := Saturate8(b[127:112]) dst[135:128] := Saturate8(a[143:128]) dst[143:136] := Saturate8(a[159:144]) dst[151:144] := Saturate8(a[175:160]) dst[159:152] := Saturate8(a[191:176]) dst[167:160] := Saturate8(a[207:192]) dst[175:168] := Saturate8(a[223:208]) dst[183:176] := Saturate8(a[239:224]) dst[191:184] := Saturate8(a[255:240]) dst[199:192] := Saturate8(b[143:128]) dst[207:200] := Saturate8(b[159:144]) dst[215:208] := Saturate8(b[175:160]) dst[223:216] := Saturate8(b[191:176]) dst[231:224] := Saturate8(b[207:192]) dst[239:232] := Saturate8(b[223:208]) dst[247:240] := Saturate8(b[239:224]) dst[255:248] := Saturate8(b[255:240]) dst[MAX:256] := 0
Integer AVX2 Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst". dst[15:0] := Saturate16(a[31:0]) dst[31:16] := Saturate16(a[63:32]) dst[47:32] := Saturate16(a[95:64]) dst[63:48] := Saturate16(a[127:96]) dst[79:64] := Saturate16(b[31:0]) dst[95:80] := Saturate16(b[63:32]) dst[111:96] := Saturate16(b[95:64]) dst[127:112] := Saturate16(b[127:96]) dst[143:128] := Saturate16(a[159:128]) dst[159:144] := Saturate16(a[191:160]) dst[175:160] := Saturate16(a[223:192]) dst[191:176] := Saturate16(a[255:224]) dst[207:192] := Saturate16(b[159:128]) dst[223:208] := Saturate16(b[191:160]) dst[239:224] := Saturate16(b[223:192]) dst[255:240] := Saturate16(b[255:224]) dst[MAX:256] := 0
Integer AVX2 Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst". dst[7:0] := SaturateU8(a[15:0]) dst[15:8] := SaturateU8(a[31:16]) dst[23:16] := SaturateU8(a[47:32]) dst[31:24] := SaturateU8(a[63:48]) dst[39:32] := SaturateU8(a[79:64]) dst[47:40] := SaturateU8(a[95:80]) dst[55:48] := SaturateU8(a[111:96]) dst[63:56] := SaturateU8(a[127:112]) dst[71:64] := SaturateU8(b[15:0]) dst[79:72] := SaturateU8(b[31:16]) dst[87:80] := SaturateU8(b[47:32]) dst[95:88] := SaturateU8(b[63:48]) dst[103:96] := SaturateU8(b[79:64]) dst[111:104] := SaturateU8(b[95:80]) dst[119:112] := SaturateU8(b[111:96]) dst[127:120] := SaturateU8(b[127:112]) dst[135:128] := SaturateU8(a[143:128]) dst[143:136] := SaturateU8(a[159:144]) dst[151:144] := SaturateU8(a[175:160]) dst[159:152] := SaturateU8(a[191:176]) dst[167:160] := SaturateU8(a[207:192]) dst[175:168] := SaturateU8(a[223:208]) dst[183:176] := SaturateU8(a[239:224]) dst[191:184] := SaturateU8(a[255:240]) dst[199:192] := SaturateU8(b[143:128]) dst[207:200] := SaturateU8(b[159:144]) dst[215:208] := SaturateU8(b[175:160]) dst[223:216] := SaturateU8(b[191:176]) dst[231:224] := SaturateU8(b[207:192]) dst[239:232] := SaturateU8(b[223:208]) dst[247:240] := SaturateU8(b[239:224]) dst[255:248] := SaturateU8(b[255:240]) dst[MAX:256] := 0
Integer AVX2 Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst". dst[15:0] := SaturateU16(a[31:0]) dst[31:16] := SaturateU16(a[63:32]) dst[47:32] := SaturateU16(a[95:64]) dst[63:48] := SaturateU16(a[127:96]) dst[79:64] := SaturateU16(b[31:0]) dst[95:80] := SaturateU16(b[63:32]) dst[111:96] := SaturateU16(b[95:64]) dst[127:112] := SaturateU16(b[127:96]) dst[143:128] := SaturateU16(a[159:128]) dst[159:144] := SaturateU16(a[191:160]) dst[175:160] := SaturateU16(a[223:192]) dst[191:176] := SaturateU16(a[255:224]) dst[207:192] := SaturateU16(b[159:128]) dst[223:208] := SaturateU16(b[191:160]) dst[239:224] := SaturateU16(b[223:192]) dst[255:240] := SaturateU16(b[255:224]) dst[MAX:256] := 0
Integer AVX2 Swizzle Shuffle 128-bits (composed of integer data) selected by "imm8" from "a" and "b", and store the results in "dst". DEFINE SELECT4(src1, src2, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src1[127:0] 1: tmp[127:0] := src1[255:128] 2: tmp[127:0] := src2[127:0] 3: tmp[127:0] := src2[255:128] ESAC IF control[3] tmp[127:0] := 0 FI RETURN tmp[127:0] } dst[127:0] := SELECT4(a[255:0], b[255:0], imm8[3:0]) dst[255:128] := SELECT4(a[255:0], b[255:0], imm8[7:4]) dst[MAX:256] := 0
Integer AVX2 Swizzle Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } dst[63:0] := SELECT4(a[255:0], imm8[1:0]) dst[127:64] := SELECT4(a[255:0], imm8[3:2]) dst[191:128] := SELECT4(a[255:0], imm8[5:4]) dst[255:192] := SELECT4(a[255:0], imm8[7:6]) dst[MAX:256] := 0
Floating Point AVX2 Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } dst[63:0] := SELECT4(a[255:0], imm8[1:0]) dst[127:64] := SELECT4(a[255:0], imm8[3:2]) dst[191:128] := SELECT4(a[255:0], imm8[5:4]) dst[255:192] := SELECT4(a[255:0], imm8[7:6]) dst[MAX:256] := 0
Integer AVX2 Swizzle Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*32 id := idx[i+2:i]*32 dst[i+31:i] := a[id+31:id] ENDFOR dst[MAX:256] := 0
Floating Point AVX2 Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx". FOR j := 0 to 7 i := j*32 id := idx[i+2:i]*32 dst[i+31:i] := a[id+31:id] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst". FOR j := 0 to 31 i := j*8 tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i]) ENDFOR FOR j := 0 to 3 i := j*64 dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \ tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56] dst[i+63:i+16] := 0 ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(a[127:0], imm8[5:4]) dst[127:96] := SELECT4(a[127:0], imm8[7:6]) dst[159:128] := SELECT4(a[255:128], imm8[1:0]) dst[191:160] := SELECT4(a[255:128], imm8[3:2]) dst[223:192] := SELECT4(a[255:128], imm8[5:4]) dst[255:224] := SELECT4(a[255:128], imm8[7:6]) dst[MAX:256] := 0
Integer AVX2 Swizzle Shuffle 8-bit integers in "a" within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst". FOR j := 0 to 15 i := j*8 IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[3:0] := b[i+3:i] dst[i+7:i] := a[index*8+7:index*8] FI IF b[128+i+7] == 1 dst[128+i+7:128+i] := 0 ELSE index[3:0] := b[128+i+3:128+i] dst[128+i+7:128+i] := a[128+index*8+7:128+index*8] FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Swizzle Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst". dst[63:0] := a[63:0] dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] dst[127:112] := (a >> (imm8[7:6] * 16))[79:64] dst[191:128] := a[191:128] dst[207:192] := (a >> (imm8[1:0] * 16))[207:192] dst[223:208] := (a >> (imm8[3:2] * 16))[207:192] dst[239:224] := (a >> (imm8[5:4] * 16))[207:192] dst[255:240] := (a >> (imm8[7:6] * 16))[207:192] dst[MAX:256] := 0
Integer AVX2 Swizzle Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst". dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] dst[127:64] := a[127:64] dst[143:128] := (a >> (imm8[1:0] * 16))[143:128] dst[159:144] := (a >> (imm8[3:2] * 16))[143:128] dst[175:160] := (a >> (imm8[5:4] * 16))[143:128] dst[191:176] := (a >> (imm8[7:6] * 16))[143:128] dst[255:192] := a[255:192] dst[MAX:256] := 0
Integer AVX2 Arithmetic Negate packed signed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 31 i := j*8 IF b[i+7:i] < 0 dst[i+7:i] := -(a[i+7:i]) ELSE IF b[i+7:i] == 0 dst[i+7:i] := 0 ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Negate packed signed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 15 i := j*16 IF b[i+15:i] < 0 dst[i+15:i] := -(a[i+15:i]) ELSE IF b[i+15:i] == 0 dst[i+15:i] := 0 ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Negate packed signed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 7 i := j*32 IF b[i+31:i] < 0 dst[i+31:i] := -(a[i+31:i]) ELSE IF b[i+31:i] == 0 dst[i+31:i] := 0 ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] << (tmp*8) dst[255:128] := a[255:128] << (tmp*8) dst[MAX:256] := 0
Integer AVX2 Shift Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] << (tmp*8) dst[255:128] := a[255:128] << (tmp*8) dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX2 Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ENDFOR dst[MAX:128] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] >> (tmp*8) dst[255:128] := a[255:128] >> (tmp*8) dst[MAX:256] := 0
Integer AVX2 Shift Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] >> (tmp*8) dst[255:128] := a[255:128] >> (tmp*8) dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX2 Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX2 Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX2 Load Load 256-bits of integer data from memory into "dst" using a non-temporal memory hint. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX2 Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := a[i+7:i] - b[i+7:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := a[i+15:i] - b[i+15:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] - b[i+63:i] ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 31 i := j*8 dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX2 Logical Compute the bitwise XOR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[255:0] := (a[255:0] XOR b[255:0]) dst[MAX:256] := 0
Integer AVX2 Swizzle Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128]) dst[MAX:256] := 0
Integer AVX2 Swizzle Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Integer AVX2 Swizzle Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Integer AVX2 Swizzle Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Integer AVX2 Swizzle Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128]) dst[MAX:256] := 0
Integer AVX2 Swizzle Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Integer AVX2 Swizzle Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Integer AVX2 Swizzle Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) dst[MAX:256] := 0
Mask AVX512BW Miscellaneous Unpack and interleave 32 bits from masks "a" and "b", and store the 64-bit result in "dst". dst[31:0] := b[31:0] dst[63:32] := a[31:0] dst[MAX:64] := 0
Mask AVX512BW Miscellaneous Unpack and interleave 16 bits from masks "a" and "b", and store the 32-bit result in "dst". dst[15:0] := b[15:0] dst[31:16] := a[15:0] dst[MAX:32] := 0
Integer AVX512VL AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst". Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. FOR i := 0 to 1 tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ] tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ] tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ] tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ] ENDFOR FOR j := 0 to 3 i := j*64 dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. FOR i := 0 to 1 tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ] tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ] tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ] tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ] ENDFOR FOR j := 0 to 3 i := j*64 tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. FOR i := 0 to 1 tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ] tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ] tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ] tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ] ENDFOR FOR j := 0 to 3 i := j*64 tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst". Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. FOR i := 0 to 3 tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ] tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ] tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ] tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ] ENDFOR FOR j := 0 to 7 i := j*64 dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. FOR i := 0 to 3 tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ] tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ] tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ] tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ] ENDFOR FOR j := 0 to 7 i := j*64 tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected from within 128-bit lanes according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. FOR i := 0 to 3 tmp.m128[i].dword[0] := b.m128[i].dword[ imm8[1:0] ] tmp.m128[i].dword[1] := b.m128[i].dword[ imm8[3:2] ] tmp.m128[i].dword[2] := b.m128[i].dword[ imm8[5:4] ] tmp.m128[i].dword[3] := b.m128[i].dword[ imm8[7:6] ] ENDFOR FOR j := 0 to 7 i := j*64 tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst". Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. tmp.dword[0] := b.dword[ imm8[1:0] ] tmp.dword[1] := b.dword[ imm8[3:2] ] tmp.dword[2] := b.dword[ imm8[5:4] ] tmp.dword[3] := b.dword[ imm8[7:6] ] FOR j := 0 to 1 i := j*64 dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. tmp.dword[0] := b.dword[ imm8[1:0] ] tmp.dword[1] := b.dword[ imm8[3:2] ] tmp.dword[2] := b.dword[ imm8[5:4] ] tmp.dword[3] := b.dword[ imm8[7:6] ] FOR j := 0 to 1 i := j*64 tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Four SADs are performed on four 8-bit quadruplets for each 64-bit lane. The first two SADs use the lower 8-bit quadruplet of the lane from "a", and the last two SADs use the uppper 8-bit quadruplet of the lane from "a". Quadruplets from "b" are selected according to the control in "imm8", and each SAD in each 64-bit lane uses the selected quadruplet at 8-bit offsets. tmp.dword[0] := b.dword[ imm8[1:0] ] tmp.dword[1] := b.dword[ imm8[3:2] ] tmp.dword[2] := b.dword[ imm8[5:4] ] tmp.dword[3] := b.dword[ imm8[7:6] ] FOR j := 0 to 1 i := j*64 tmp_dst[i+15:i] := ABS(a[i+7:i] - tmp[i+7:i]) + ABS(a[i+15:i+8] - tmp[i+15:i+8]) +\ ABS(a[i+23:i+16] - tmp[i+23:i+16]) + ABS(a[i+31:i+24] - tmp[i+31:i+24]) tmp_dst[i+31:i+16] := ABS(a[i+7:i] - tmp[i+15:i+8]) + ABS(a[i+15:i+8] - tmp[i+23:i+16]) +\ ABS(a[i+23:i+16] - tmp[i+31:i+24]) + ABS(a[i+31:i+24] - tmp[i+39:i+32]) tmp_dst[i+47:i+32] := ABS(a[i+39:i+32] - tmp[i+23:i+16]) + ABS(a[i+47:i+40] - tmp[i+31:i+24]) +\ ABS(a[i+55:i+48] - tmp[i+39:i+32]) + ABS(a[i+63:i+56] - tmp[i+47:i+40]) tmp_dst[i+63:i+48] := ABS(a[i+39:i+32] - tmp[i+31:i+24]) + ABS(a[i+47:i+40] - tmp[i+39:i+32]) +\ ABS(a[i+55:i+48] - tmp[i+47:i+40]) + ABS(a[i+63:i+56] - tmp[i+55:i+48]) ENDFOR FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Load Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Move Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Store Store packed 16-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*16 IF k[j] MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i] FI ENDFOR
Integer AVX512VL AVX512BW Load Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Move Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Load Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Move Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Store Store packed 16-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 31 i := j*16 IF k[j] MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i] FI ENDFOR
Integer AVX512BW Load Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Move Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Load Load packed 16-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Move Move packed 16-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Store Store packed 16-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*16 IF k[j] MEM[mem_addr+i+15:mem_addr+i] := a[i+15:i] FI ENDFOR
Integer AVX512VL AVX512BW Load Load packed 16-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+i+15:mem_addr+i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Move Move packed 16-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Load Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Move Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Store Store packed 8-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 31 i := j*8 IF k[j] MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i] FI ENDFOR
Integer AVX512VL AVX512BW Load Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Move Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Load Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Move Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Store Store packed 8-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 63 i := j*8 IF k[j] MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i] FI ENDFOR
Integer AVX512BW Load Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Move Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Load Load packed 8-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Move Move packed 8-bit integers from "a" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Store Store packed 8-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*8 IF k[j] MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i] FI ENDFOR
Integer AVX512VL AVX512BW Load Load packed 8-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+i+7:mem_addr+i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Move Move packed 8-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := ABS(a[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := ABS(a[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := ABS(a[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := ABS(a[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := ABS(a[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := ABS(a[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := ABS(a[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := ABS(a[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := ABS(a[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := ABS(a[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := ABS(a[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := ABS(a[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := ABS(a[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := ABS(a[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := Saturate16(a[31:0]) tmp_dst[31:16] := Saturate16(a[63:32]) tmp_dst[47:32] := Saturate16(a[95:64]) tmp_dst[63:48] := Saturate16(a[127:96]) tmp_dst[79:64] := Saturate16(b[31:0]) tmp_dst[95:80] := Saturate16(b[63:32]) tmp_dst[111:96] := Saturate16(b[95:64]) tmp_dst[127:112] := Saturate16(b[127:96]) tmp_dst[143:128] := Saturate16(a[159:128]) tmp_dst[159:144] := Saturate16(a[191:160]) tmp_dst[175:160] := Saturate16(a[223:192]) tmp_dst[191:176] := Saturate16(a[255:224]) tmp_dst[207:192] := Saturate16(b[159:128]) tmp_dst[223:208] := Saturate16(b[191:160]) tmp_dst[239:224] := Saturate16(b[223:192]) tmp_dst[255:240] := Saturate16(b[255:224]) FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := Saturate16(a[31:0]) tmp_dst[31:16] := Saturate16(a[63:32]) tmp_dst[47:32] := Saturate16(a[95:64]) tmp_dst[63:48] := Saturate16(a[127:96]) tmp_dst[79:64] := Saturate16(b[31:0]) tmp_dst[95:80] := Saturate16(b[63:32]) tmp_dst[111:96] := Saturate16(b[95:64]) tmp_dst[127:112] := Saturate16(b[127:96]) tmp_dst[143:128] := Saturate16(a[159:128]) tmp_dst[159:144] := Saturate16(a[191:160]) tmp_dst[175:160] := Saturate16(a[223:192]) tmp_dst[191:176] := Saturate16(a[255:224]) tmp_dst[207:192] := Saturate16(b[159:128]) tmp_dst[223:208] := Saturate16(b[191:160]) tmp_dst[239:224] := Saturate16(b[223:192]) tmp_dst[255:240] := Saturate16(b[255:224]) FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := Saturate16(a[31:0]) tmp_dst[31:16] := Saturate16(a[63:32]) tmp_dst[47:32] := Saturate16(a[95:64]) tmp_dst[63:48] := Saturate16(a[127:96]) tmp_dst[79:64] := Saturate16(b[31:0]) tmp_dst[95:80] := Saturate16(b[63:32]) tmp_dst[111:96] := Saturate16(b[95:64]) tmp_dst[127:112] := Saturate16(b[127:96]) tmp_dst[143:128] := Saturate16(a[159:128]) tmp_dst[159:144] := Saturate16(a[191:160]) tmp_dst[175:160] := Saturate16(a[223:192]) tmp_dst[191:176] := Saturate16(a[255:224]) tmp_dst[207:192] := Saturate16(b[159:128]) tmp_dst[223:208] := Saturate16(b[191:160]) tmp_dst[239:224] := Saturate16(b[223:192]) tmp_dst[255:240] := Saturate16(b[255:224]) tmp_dst[271:256] := Saturate16(a[287:256]) tmp_dst[287:272] := Saturate16(a[319:288]) tmp_dst[303:288] := Saturate16(a[351:320]) tmp_dst[319:304] := Saturate16(a[383:352]) tmp_dst[335:320] := Saturate16(b[287:256]) tmp_dst[351:336] := Saturate16(b[319:288]) tmp_dst[367:352] := Saturate16(b[351:320]) tmp_dst[383:368] := Saturate16(b[383:352]) tmp_dst[399:384] := Saturate16(a[415:384]) tmp_dst[415:400] := Saturate16(a[447:416]) tmp_dst[431:416] := Saturate16(a[479:448]) tmp_dst[447:432] := Saturate16(a[511:480]) tmp_dst[463:448] := Saturate16(b[415:384]) tmp_dst[479:464] := Saturate16(b[447:416]) tmp_dst[495:480] := Saturate16(b[479:448]) tmp_dst[511:496] := Saturate16(b[511:480]) FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := Saturate16(a[31:0]) tmp_dst[31:16] := Saturate16(a[63:32]) tmp_dst[47:32] := Saturate16(a[95:64]) tmp_dst[63:48] := Saturate16(a[127:96]) tmp_dst[79:64] := Saturate16(b[31:0]) tmp_dst[95:80] := Saturate16(b[63:32]) tmp_dst[111:96] := Saturate16(b[95:64]) tmp_dst[127:112] := Saturate16(b[127:96]) tmp_dst[143:128] := Saturate16(a[159:128]) tmp_dst[159:144] := Saturate16(a[191:160]) tmp_dst[175:160] := Saturate16(a[223:192]) tmp_dst[191:176] := Saturate16(a[255:224]) tmp_dst[207:192] := Saturate16(b[159:128]) tmp_dst[223:208] := Saturate16(b[191:160]) tmp_dst[239:224] := Saturate16(b[223:192]) tmp_dst[255:240] := Saturate16(b[255:224]) tmp_dst[271:256] := Saturate16(a[287:256]) tmp_dst[287:272] := Saturate16(a[319:288]) tmp_dst[303:288] := Saturate16(a[351:320]) tmp_dst[319:304] := Saturate16(a[383:352]) tmp_dst[335:320] := Saturate16(b[287:256]) tmp_dst[351:336] := Saturate16(b[319:288]) tmp_dst[367:352] := Saturate16(b[351:320]) tmp_dst[383:368] := Saturate16(b[383:352]) tmp_dst[399:384] := Saturate16(a[415:384]) tmp_dst[415:400] := Saturate16(a[447:416]) tmp_dst[431:416] := Saturate16(a[479:448]) tmp_dst[447:432] := Saturate16(a[511:480]) tmp_dst[463:448] := Saturate16(b[415:384]) tmp_dst[479:464] := Saturate16(b[447:416]) tmp_dst[495:480] := Saturate16(b[479:448]) tmp_dst[511:496] := Saturate16(b[511:480]) FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst". dst[15:0] := Saturate16(a[31:0]) dst[31:16] := Saturate16(a[63:32]) dst[47:32] := Saturate16(a[95:64]) dst[63:48] := Saturate16(a[127:96]) dst[79:64] := Saturate16(b[31:0]) dst[95:80] := Saturate16(b[63:32]) dst[111:96] := Saturate16(b[95:64]) dst[127:112] := Saturate16(b[127:96]) dst[143:128] := Saturate16(a[159:128]) dst[159:144] := Saturate16(a[191:160]) dst[175:160] := Saturate16(a[223:192]) dst[191:176] := Saturate16(a[255:224]) dst[207:192] := Saturate16(b[159:128]) dst[223:208] := Saturate16(b[191:160]) dst[239:224] := Saturate16(b[223:192]) dst[255:240] := Saturate16(b[255:224]) dst[271:256] := Saturate16(a[287:256]) dst[287:272] := Saturate16(a[319:288]) dst[303:288] := Saturate16(a[351:320]) dst[319:304] := Saturate16(a[383:352]) dst[335:320] := Saturate16(b[287:256]) dst[351:336] := Saturate16(b[319:288]) dst[367:352] := Saturate16(b[351:320]) dst[383:368] := Saturate16(b[383:352]) dst[399:384] := Saturate16(a[415:384]) dst[415:400] := Saturate16(a[447:416]) dst[431:416] := Saturate16(a[479:448]) dst[447:432] := Saturate16(a[511:480]) dst[463:448] := Saturate16(b[415:384]) dst[479:464] := Saturate16(b[447:416]) dst[495:480] := Saturate16(b[479:448]) dst[511:496] := Saturate16(b[511:480]) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := Saturate16(a[31:0]) tmp_dst[31:16] := Saturate16(a[63:32]) tmp_dst[47:32] := Saturate16(a[95:64]) tmp_dst[63:48] := Saturate16(a[127:96]) tmp_dst[79:64] := Saturate16(b[31:0]) tmp_dst[95:80] := Saturate16(b[63:32]) tmp_dst[111:96] := Saturate16(b[95:64]) tmp_dst[127:112] := Saturate16(b[127:96]) FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := Saturate16(a[31:0]) tmp_dst[31:16] := Saturate16(a[63:32]) tmp_dst[47:32] := Saturate16(a[95:64]) tmp_dst[63:48] := Saturate16(a[127:96]) tmp_dst[79:64] := Saturate16(b[31:0]) tmp_dst[95:80] := Saturate16(b[63:32]) tmp_dst[111:96] := Saturate16(b[95:64]) tmp_dst[127:112] := Saturate16(b[127:96]) FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[7:0] := Saturate8(a[15:0]) tmp_dst[15:8] := Saturate8(a[31:16]) tmp_dst[23:16] := Saturate8(a[47:32]) tmp_dst[31:24] := Saturate8(a[63:48]) tmp_dst[39:32] := Saturate8(a[79:64]) tmp_dst[47:40] := Saturate8(a[95:80]) tmp_dst[55:48] := Saturate8(a[111:96]) tmp_dst[63:56] := Saturate8(a[127:112]) tmp_dst[71:64] := Saturate8(b[15:0]) tmp_dst[79:72] := Saturate8(b[31:16]) tmp_dst[87:80] := Saturate8(b[47:32]) tmp_dst[95:88] := Saturate8(b[63:48]) tmp_dst[103:96] := Saturate8(b[79:64]) tmp_dst[111:104] := Saturate8(b[95:80]) tmp_dst[119:112] := Saturate8(b[111:96]) tmp_dst[127:120] := Saturate8(b[127:112]) tmp_dst[135:128] := Saturate8(a[143:128]) tmp_dst[143:136] := Saturate8(a[159:144]) tmp_dst[151:144] := Saturate8(a[175:160]) tmp_dst[159:152] := Saturate8(a[191:176]) tmp_dst[167:160] := Saturate8(a[207:192]) tmp_dst[175:168] := Saturate8(a[223:208]) tmp_dst[183:176] := Saturate8(a[239:224]) tmp_dst[191:184] := Saturate8(a[255:240]) tmp_dst[199:192] := Saturate8(b[143:128]) tmp_dst[207:200] := Saturate8(b[159:144]) tmp_dst[215:208] := Saturate8(b[175:160]) tmp_dst[223:216] := Saturate8(b[191:176]) tmp_dst[231:224] := Saturate8(b[207:192]) tmp_dst[239:232] := Saturate8(b[223:208]) tmp_dst[247:240] := Saturate8(b[239:224]) tmp_dst[255:248] := Saturate8(b[255:240]) FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[7:0] := Saturate8(a[15:0]) tmp_dst[15:8] := Saturate8(a[31:16]) tmp_dst[23:16] := Saturate8(a[47:32]) tmp_dst[31:24] := Saturate8(a[63:48]) tmp_dst[39:32] := Saturate8(a[79:64]) tmp_dst[47:40] := Saturate8(a[95:80]) tmp_dst[55:48] := Saturate8(a[111:96]) tmp_dst[63:56] := Saturate8(a[127:112]) tmp_dst[71:64] := Saturate8(b[15:0]) tmp_dst[79:72] := Saturate8(b[31:16]) tmp_dst[87:80] := Saturate8(b[47:32]) tmp_dst[95:88] := Saturate8(b[63:48]) tmp_dst[103:96] := Saturate8(b[79:64]) tmp_dst[111:104] := Saturate8(b[95:80]) tmp_dst[119:112] := Saturate8(b[111:96]) tmp_dst[127:120] := Saturate8(b[127:112]) tmp_dst[135:128] := Saturate8(a[143:128]) tmp_dst[143:136] := Saturate8(a[159:144]) tmp_dst[151:144] := Saturate8(a[175:160]) tmp_dst[159:152] := Saturate8(a[191:176]) tmp_dst[167:160] := Saturate8(a[207:192]) tmp_dst[175:168] := Saturate8(a[223:208]) tmp_dst[183:176] := Saturate8(a[239:224]) tmp_dst[191:184] := Saturate8(a[255:240]) tmp_dst[199:192] := Saturate8(b[143:128]) tmp_dst[207:200] := Saturate8(b[159:144]) tmp_dst[215:208] := Saturate8(b[175:160]) tmp_dst[223:216] := Saturate8(b[191:176]) tmp_dst[231:224] := Saturate8(b[207:192]) tmp_dst[239:232] := Saturate8(b[223:208]) tmp_dst[247:240] := Saturate8(b[239:224]) tmp_dst[255:248] := Saturate8(b[255:240]) FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[7:0] := Saturate8(a[15:0]) tmp_dst[15:8] := Saturate8(a[31:16]) tmp_dst[23:16] := Saturate8(a[47:32]) tmp_dst[31:24] := Saturate8(a[63:48]) tmp_dst[39:32] := Saturate8(a[79:64]) tmp_dst[47:40] := Saturate8(a[95:80]) tmp_dst[55:48] := Saturate8(a[111:96]) tmp_dst[63:56] := Saturate8(a[127:112]) tmp_dst[71:64] := Saturate8(b[15:0]) tmp_dst[79:72] := Saturate8(b[31:16]) tmp_dst[87:80] := Saturate8(b[47:32]) tmp_dst[95:88] := Saturate8(b[63:48]) tmp_dst[103:96] := Saturate8(b[79:64]) tmp_dst[111:104] := Saturate8(b[95:80]) tmp_dst[119:112] := Saturate8(b[111:96]) tmp_dst[127:120] := Saturate8(b[127:112]) tmp_dst[135:128] := Saturate8(a[143:128]) tmp_dst[143:136] := Saturate8(a[159:144]) tmp_dst[151:144] := Saturate8(a[175:160]) tmp_dst[159:152] := Saturate8(a[191:176]) tmp_dst[167:160] := Saturate8(a[207:192]) tmp_dst[175:168] := Saturate8(a[223:208]) tmp_dst[183:176] := Saturate8(a[239:224]) tmp_dst[191:184] := Saturate8(a[255:240]) tmp_dst[199:192] := Saturate8(b[143:128]) tmp_dst[207:200] := Saturate8(b[159:144]) tmp_dst[215:208] := Saturate8(b[175:160]) tmp_dst[223:216] := Saturate8(b[191:176]) tmp_dst[231:224] := Saturate8(b[207:192]) tmp_dst[239:232] := Saturate8(b[223:208]) tmp_dst[247:240] := Saturate8(b[239:224]) tmp_dst[255:248] := Saturate8(b[255:240]) tmp_dst[263:256] := Saturate8(a[271:256]) tmp_dst[271:264] := Saturate8(a[287:272]) tmp_dst[279:272] := Saturate8(a[303:288]) tmp_dst[287:280] := Saturate8(a[319:304]) tmp_dst[295:288] := Saturate8(a[335:320]) tmp_dst[303:296] := Saturate8(a[351:336]) tmp_dst[311:304] := Saturate8(a[367:352]) tmp_dst[319:312] := Saturate8(a[383:368]) tmp_dst[327:320] := Saturate8(b[271:256]) tmp_dst[335:328] := Saturate8(b[287:272]) tmp_dst[343:336] := Saturate8(b[303:288]) tmp_dst[351:344] := Saturate8(b[319:304]) tmp_dst[359:352] := Saturate8(b[335:320]) tmp_dst[367:360] := Saturate8(b[351:336]) tmp_dst[375:368] := Saturate8(b[367:352]) tmp_dst[383:376] := Saturate8(b[383:368]) tmp_dst[391:384] := Saturate8(a[399:384]) tmp_dst[399:392] := Saturate8(a[415:400]) tmp_dst[407:400] := Saturate8(a[431:416]) tmp_dst[415:408] := Saturate8(a[447:432]) tmp_dst[423:416] := Saturate8(a[463:448]) tmp_dst[431:424] := Saturate8(a[479:464]) tmp_dst[439:432] := Saturate8(a[495:480]) tmp_dst[447:440] := Saturate8(a[511:496]) tmp_dst[455:448] := Saturate8(b[399:384]) tmp_dst[463:456] := Saturate8(b[415:400]) tmp_dst[471:464] := Saturate8(b[431:416]) tmp_dst[479:472] := Saturate8(b[447:432]) tmp_dst[487:480] := Saturate8(b[463:448]) tmp_dst[495:488] := Saturate8(b[479:464]) tmp_dst[503:496] := Saturate8(b[495:480]) tmp_dst[511:504] := Saturate8(b[511:496]) FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[7:0] := Saturate8(a[15:0]) tmp_dst[15:8] := Saturate8(a[31:16]) tmp_dst[23:16] := Saturate8(a[47:32]) tmp_dst[31:24] := Saturate8(a[63:48]) tmp_dst[39:32] := Saturate8(a[79:64]) tmp_dst[47:40] := Saturate8(a[95:80]) tmp_dst[55:48] := Saturate8(a[111:96]) tmp_dst[63:56] := Saturate8(a[127:112]) tmp_dst[71:64] := Saturate8(b[15:0]) tmp_dst[79:72] := Saturate8(b[31:16]) tmp_dst[87:80] := Saturate8(b[47:32]) tmp_dst[95:88] := Saturate8(b[63:48]) tmp_dst[103:96] := Saturate8(b[79:64]) tmp_dst[111:104] := Saturate8(b[95:80]) tmp_dst[119:112] := Saturate8(b[111:96]) tmp_dst[127:120] := Saturate8(b[127:112]) tmp_dst[135:128] := Saturate8(a[143:128]) tmp_dst[143:136] := Saturate8(a[159:144]) tmp_dst[151:144] := Saturate8(a[175:160]) tmp_dst[159:152] := Saturate8(a[191:176]) tmp_dst[167:160] := Saturate8(a[207:192]) tmp_dst[175:168] := Saturate8(a[223:208]) tmp_dst[183:176] := Saturate8(a[239:224]) tmp_dst[191:184] := Saturate8(a[255:240]) tmp_dst[199:192] := Saturate8(b[143:128]) tmp_dst[207:200] := Saturate8(b[159:144]) tmp_dst[215:208] := Saturate8(b[175:160]) tmp_dst[223:216] := Saturate8(b[191:176]) tmp_dst[231:224] := Saturate8(b[207:192]) tmp_dst[239:232] := Saturate8(b[223:208]) tmp_dst[247:240] := Saturate8(b[239:224]) tmp_dst[255:248] := Saturate8(b[255:240]) tmp_dst[263:256] := Saturate8(a[271:256]) tmp_dst[271:264] := Saturate8(a[287:272]) tmp_dst[279:272] := Saturate8(a[303:288]) tmp_dst[287:280] := Saturate8(a[319:304]) tmp_dst[295:288] := Saturate8(a[335:320]) tmp_dst[303:296] := Saturate8(a[351:336]) tmp_dst[311:304] := Saturate8(a[367:352]) tmp_dst[319:312] := Saturate8(a[383:368]) tmp_dst[327:320] := Saturate8(b[271:256]) tmp_dst[335:328] := Saturate8(b[287:272]) tmp_dst[343:336] := Saturate8(b[303:288]) tmp_dst[351:344] := Saturate8(b[319:304]) tmp_dst[359:352] := Saturate8(b[335:320]) tmp_dst[367:360] := Saturate8(b[351:336]) tmp_dst[375:368] := Saturate8(b[367:352]) tmp_dst[383:376] := Saturate8(b[383:368]) tmp_dst[391:384] := Saturate8(a[399:384]) tmp_dst[399:392] := Saturate8(a[415:400]) tmp_dst[407:400] := Saturate8(a[431:416]) tmp_dst[415:408] := Saturate8(a[447:432]) tmp_dst[423:416] := Saturate8(a[463:448]) tmp_dst[431:424] := Saturate8(a[479:464]) tmp_dst[439:432] := Saturate8(a[495:480]) tmp_dst[447:440] := Saturate8(a[511:496]) tmp_dst[455:448] := Saturate8(b[399:384]) tmp_dst[463:456] := Saturate8(b[415:400]) tmp_dst[471:464] := Saturate8(b[431:416]) tmp_dst[479:472] := Saturate8(b[447:432]) tmp_dst[487:480] := Saturate8(b[463:448]) tmp_dst[495:488] := Saturate8(b[479:464]) tmp_dst[503:496] := Saturate8(b[495:480]) tmp_dst[511:504] := Saturate8(b[511:496]) FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst". dst[7:0] := Saturate8(a[15:0]) dst[15:8] := Saturate8(a[31:16]) dst[23:16] := Saturate8(a[47:32]) dst[31:24] := Saturate8(a[63:48]) dst[39:32] := Saturate8(a[79:64]) dst[47:40] := Saturate8(a[95:80]) dst[55:48] := Saturate8(a[111:96]) dst[63:56] := Saturate8(a[127:112]) dst[71:64] := Saturate8(b[15:0]) dst[79:72] := Saturate8(b[31:16]) dst[87:80] := Saturate8(b[47:32]) dst[95:88] := Saturate8(b[63:48]) dst[103:96] := Saturate8(b[79:64]) dst[111:104] := Saturate8(b[95:80]) dst[119:112] := Saturate8(b[111:96]) dst[127:120] := Saturate8(b[127:112]) dst[135:128] := Saturate8(a[143:128]) dst[143:136] := Saturate8(a[159:144]) dst[151:144] := Saturate8(a[175:160]) dst[159:152] := Saturate8(a[191:176]) dst[167:160] := Saturate8(a[207:192]) dst[175:168] := Saturate8(a[223:208]) dst[183:176] := Saturate8(a[239:224]) dst[191:184] := Saturate8(a[255:240]) dst[199:192] := Saturate8(b[143:128]) dst[207:200] := Saturate8(b[159:144]) dst[215:208] := Saturate8(b[175:160]) dst[223:216] := Saturate8(b[191:176]) dst[231:224] := Saturate8(b[207:192]) dst[239:232] := Saturate8(b[223:208]) dst[247:240] := Saturate8(b[239:224]) dst[255:248] := Saturate8(b[255:240]) dst[263:256] := Saturate8(a[271:256]) dst[271:264] := Saturate8(a[287:272]) dst[279:272] := Saturate8(a[303:288]) dst[287:280] := Saturate8(a[319:304]) dst[295:288] := Saturate8(a[335:320]) dst[303:296] := Saturate8(a[351:336]) dst[311:304] := Saturate8(a[367:352]) dst[319:312] := Saturate8(a[383:368]) dst[327:320] := Saturate8(b[271:256]) dst[335:328] := Saturate8(b[287:272]) dst[343:336] := Saturate8(b[303:288]) dst[351:344] := Saturate8(b[319:304]) dst[359:352] := Saturate8(b[335:320]) dst[367:360] := Saturate8(b[351:336]) dst[375:368] := Saturate8(b[367:352]) dst[383:376] := Saturate8(b[383:368]) dst[391:384] := Saturate8(a[399:384]) dst[399:392] := Saturate8(a[415:400]) dst[407:400] := Saturate8(a[431:416]) dst[415:408] := Saturate8(a[447:432]) dst[423:416] := Saturate8(a[463:448]) dst[431:424] := Saturate8(a[479:464]) dst[439:432] := Saturate8(a[495:480]) dst[447:440] := Saturate8(a[511:496]) dst[455:448] := Saturate8(b[399:384]) dst[463:456] := Saturate8(b[415:400]) dst[471:464] := Saturate8(b[431:416]) dst[479:472] := Saturate8(b[447:432]) dst[487:480] := Saturate8(b[463:448]) dst[495:488] := Saturate8(b[479:464]) dst[503:496] := Saturate8(b[495:480]) dst[511:504] := Saturate8(b[511:496]) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[7:0] := Saturate8(a[15:0]) tmp_dst[15:8] := Saturate8(a[31:16]) tmp_dst[23:16] := Saturate8(a[47:32]) tmp_dst[31:24] := Saturate8(a[63:48]) tmp_dst[39:32] := Saturate8(a[79:64]) tmp_dst[47:40] := Saturate8(a[95:80]) tmp_dst[55:48] := Saturate8(a[111:96]) tmp_dst[63:56] := Saturate8(a[127:112]) tmp_dst[71:64] := Saturate8(b[15:0]) tmp_dst[79:72] := Saturate8(b[31:16]) tmp_dst[87:80] := Saturate8(b[47:32]) tmp_dst[95:88] := Saturate8(b[63:48]) tmp_dst[103:96] := Saturate8(b[79:64]) tmp_dst[111:104] := Saturate8(b[95:80]) tmp_dst[119:112] := Saturate8(b[111:96]) tmp_dst[127:120] := Saturate8(b[127:112]) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[7:0] := Saturate8(a[15:0]) tmp_dst[15:8] := Saturate8(a[31:16]) tmp_dst[23:16] := Saturate8(a[47:32]) tmp_dst[31:24] := Saturate8(a[63:48]) tmp_dst[39:32] := Saturate8(a[79:64]) tmp_dst[47:40] := Saturate8(a[95:80]) tmp_dst[55:48] := Saturate8(a[111:96]) tmp_dst[63:56] := Saturate8(a[127:112]) tmp_dst[71:64] := Saturate8(b[15:0]) tmp_dst[79:72] := Saturate8(b[31:16]) tmp_dst[87:80] := Saturate8(b[47:32]) tmp_dst[95:88] := Saturate8(b[63:48]) tmp_dst[103:96] := Saturate8(b[79:64]) tmp_dst[111:104] := Saturate8(b[95:80]) tmp_dst[119:112] := Saturate8(b[111:96]) tmp_dst[127:120] := Saturate8(b[127:112]) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := SaturateU16(a[31:0]) tmp_dst[31:16] := SaturateU16(a[63:32]) tmp_dst[47:32] := SaturateU16(a[95:64]) tmp_dst[63:48] := SaturateU16(a[127:96]) tmp_dst[79:64] := SaturateU16(b[31:0]) tmp_dst[95:80] := SaturateU16(b[63:32]) tmp_dst[111:96] := SaturateU16(b[95:64]) tmp_dst[127:112] := SaturateU16(b[127:96]) tmp_dst[143:128] := SaturateU16(a[159:128]) tmp_dst[159:144] := SaturateU16(a[191:160]) tmp_dst[175:160] := SaturateU16(a[223:192]) tmp_dst[191:176] := SaturateU16(a[255:224]) tmp_dst[207:192] := SaturateU16(b[159:128]) tmp_dst[223:208] := SaturateU16(b[191:160]) tmp_dst[239:224] := SaturateU16(b[223:192]) tmp_dst[255:240] := SaturateU16(b[255:224]) FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := SaturateU16(a[31:0]) tmp_dst[31:16] := SaturateU16(a[63:32]) tmp_dst[47:32] := SaturateU16(a[95:64]) tmp_dst[63:48] := SaturateU16(a[127:96]) tmp_dst[79:64] := SaturateU16(b[31:0]) tmp_dst[95:80] := SaturateU16(b[63:32]) tmp_dst[111:96] := SaturateU16(b[95:64]) tmp_dst[127:112] := SaturateU16(b[127:96]) tmp_dst[143:128] := SaturateU16(a[159:128]) tmp_dst[159:144] := SaturateU16(a[191:160]) tmp_dst[175:160] := SaturateU16(a[223:192]) tmp_dst[191:176] := SaturateU16(a[255:224]) tmp_dst[207:192] := SaturateU16(b[159:128]) tmp_dst[223:208] := SaturateU16(b[191:160]) tmp_dst[239:224] := SaturateU16(b[223:192]) tmp_dst[255:240] := SaturateU16(b[255:224]) FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := SaturateU16(a[31:0]) tmp_dst[31:16] := SaturateU16(a[63:32]) tmp_dst[47:32] := SaturateU16(a[95:64]) tmp_dst[63:48] := SaturateU16(a[127:96]) tmp_dst[79:64] := SaturateU16(b[31:0]) tmp_dst[95:80] := SaturateU16(b[63:32]) tmp_dst[111:96] := SaturateU16(b[95:64]) tmp_dst[127:112] := SaturateU16(b[127:96]) tmp_dst[143:128] := SaturateU16(a[159:128]) tmp_dst[159:144] := SaturateU16(a[191:160]) tmp_dst[175:160] := SaturateU16(a[223:192]) tmp_dst[191:176] := SaturateU16(a[255:224]) tmp_dst[207:192] := SaturateU16(b[159:128]) tmp_dst[223:208] := SaturateU16(b[191:160]) tmp_dst[239:224] := SaturateU16(b[223:192]) tmp_dst[255:240] := SaturateU16(b[255:224]) tmp_dst[271:256] := SaturateU16(a[287:256]) tmp_dst[287:272] := SaturateU16(a[319:288]) tmp_dst[303:288] := SaturateU16(a[351:320]) tmp_dst[319:304] := SaturateU16(a[383:352]) tmp_dst[335:320] := SaturateU16(b[287:256]) tmp_dst[351:336] := SaturateU16(b[319:288]) tmp_dst[367:352] := SaturateU16(b[351:320]) tmp_dst[383:368] := SaturateU16(b[383:352]) tmp_dst[399:384] := SaturateU16(a[415:384]) tmp_dst[415:400] := SaturateU16(a[447:416]) tmp_dst[431:416] := SaturateU16(a[479:448]) tmp_dst[447:432] := SaturateU16(a[511:480]) tmp_dst[463:448] := SaturateU16(b[415:384]) tmp_dst[479:464] := SaturateU16(b[447:416]) tmp_dst[495:480] := SaturateU16(b[479:448]) tmp_dst[511:496] := SaturateU16(b[511:480]) FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := SaturateU16(a[31:0]) tmp_dst[31:16] := SaturateU16(a[63:32]) tmp_dst[47:32] := SaturateU16(a[95:64]) tmp_dst[63:48] := SaturateU16(a[127:96]) tmp_dst[79:64] := SaturateU16(b[31:0]) tmp_dst[95:80] := SaturateU16(b[63:32]) tmp_dst[111:96] := SaturateU16(b[95:64]) tmp_dst[127:112] := SaturateU16(b[127:96]) tmp_dst[143:128] := SaturateU16(a[159:128]) tmp_dst[159:144] := SaturateU16(a[191:160]) tmp_dst[175:160] := SaturateU16(a[223:192]) tmp_dst[191:176] := SaturateU16(a[255:224]) tmp_dst[207:192] := SaturateU16(b[159:128]) tmp_dst[223:208] := SaturateU16(b[191:160]) tmp_dst[239:224] := SaturateU16(b[223:192]) tmp_dst[255:240] := SaturateU16(b[255:224]) tmp_dst[271:256] := SaturateU16(a[287:256]) tmp_dst[287:272] := SaturateU16(a[319:288]) tmp_dst[303:288] := SaturateU16(a[351:320]) tmp_dst[319:304] := SaturateU16(a[383:352]) tmp_dst[335:320] := SaturateU16(b[287:256]) tmp_dst[351:336] := SaturateU16(b[319:288]) tmp_dst[367:352] := SaturateU16(b[351:320]) tmp_dst[383:368] := SaturateU16(b[383:352]) tmp_dst[399:384] := SaturateU16(a[415:384]) tmp_dst[415:400] := SaturateU16(a[447:416]) tmp_dst[431:416] := SaturateU16(a[479:448]) tmp_dst[447:432] := SaturateU16(a[511:480]) tmp_dst[463:448] := SaturateU16(b[415:384]) tmp_dst[479:464] := SaturateU16(b[447:416]) tmp_dst[495:480] := SaturateU16(b[479:448]) tmp_dst[511:496] := SaturateU16(b[511:480]) FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst". dst[15:0] := SaturateU16(a[31:0]) dst[31:16] := SaturateU16(a[63:32]) dst[47:32] := SaturateU16(a[95:64]) dst[63:48] := SaturateU16(a[127:96]) dst[79:64] := SaturateU16(b[31:0]) dst[95:80] := SaturateU16(b[63:32]) dst[111:96] := SaturateU16(b[95:64]) dst[127:112] := SaturateU16(b[127:96]) dst[143:128] := SaturateU16(a[159:128]) dst[159:144] := SaturateU16(a[191:160]) dst[175:160] := SaturateU16(a[223:192]) dst[191:176] := SaturateU16(a[255:224]) dst[207:192] := SaturateU16(b[159:128]) dst[223:208] := SaturateU16(b[191:160]) dst[239:224] := SaturateU16(b[223:192]) dst[255:240] := SaturateU16(b[255:224]) dst[271:256] := SaturateU16(a[287:256]) dst[287:272] := SaturateU16(a[319:288]) dst[303:288] := SaturateU16(a[351:320]) dst[319:304] := SaturateU16(a[383:352]) dst[335:320] := SaturateU16(b[287:256]) dst[351:336] := SaturateU16(b[319:288]) dst[367:352] := SaturateU16(b[351:320]) dst[383:368] := SaturateU16(b[383:352]) dst[399:384] := SaturateU16(a[415:384]) dst[415:400] := SaturateU16(a[447:416]) dst[431:416] := SaturateU16(a[479:448]) dst[447:432] := SaturateU16(a[511:480]) dst[463:448] := SaturateU16(b[415:384]) dst[479:464] := SaturateU16(b[447:416]) dst[495:480] := SaturateU16(b[479:448]) dst[511:496] := SaturateU16(b[511:480]) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := SaturateU16(a[31:0]) tmp_dst[31:16] := SaturateU16(a[63:32]) tmp_dst[47:32] := SaturateU16(a[95:64]) tmp_dst[63:48] := SaturateU16(a[127:96]) tmp_dst[79:64] := SaturateU16(b[31:0]) tmp_dst[95:80] := SaturateU16(b[63:32]) tmp_dst[111:96] := SaturateU16(b[95:64]) tmp_dst[127:112] := SaturateU16(b[127:96]) FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := SaturateU16(a[31:0]) tmp_dst[31:16] := SaturateU16(a[63:32]) tmp_dst[47:32] := SaturateU16(a[95:64]) tmp_dst[63:48] := SaturateU16(a[127:96]) tmp_dst[79:64] := SaturateU16(b[31:0]) tmp_dst[95:80] := SaturateU16(b[63:32]) tmp_dst[111:96] := SaturateU16(b[95:64]) tmp_dst[127:112] := SaturateU16(b[127:96]) FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[7:0] := SaturateU8(a[15:0]) tmp_dst[15:8] := SaturateU8(a[31:16]) tmp_dst[23:16] := SaturateU8(a[47:32]) tmp_dst[31:24] := SaturateU8(a[63:48]) tmp_dst[39:32] := SaturateU8(a[79:64]) tmp_dst[47:40] := SaturateU8(a[95:80]) tmp_dst[55:48] := SaturateU8(a[111:96]) tmp_dst[63:56] := SaturateU8(a[127:112]) tmp_dst[71:64] := SaturateU8(b[15:0]) tmp_dst[79:72] := SaturateU8(b[31:16]) tmp_dst[87:80] := SaturateU8(b[47:32]) tmp_dst[95:88] := SaturateU8(b[63:48]) tmp_dst[103:96] := SaturateU8(b[79:64]) tmp_dst[111:104] := SaturateU8(b[95:80]) tmp_dst[119:112] := SaturateU8(b[111:96]) tmp_dst[127:120] := SaturateU8(b[127:112]) tmp_dst[135:128] := SaturateU8(a[143:128]) tmp_dst[143:136] := SaturateU8(a[159:144]) tmp_dst[151:144] := SaturateU8(a[175:160]) tmp_dst[159:152] := SaturateU8(a[191:176]) tmp_dst[167:160] := SaturateU8(a[207:192]) tmp_dst[175:168] := SaturateU8(a[223:208]) tmp_dst[183:176] := SaturateU8(a[239:224]) tmp_dst[191:184] := SaturateU8(a[255:240]) tmp_dst[199:192] := SaturateU8(b[143:128]) tmp_dst[207:200] := SaturateU8(b[159:144]) tmp_dst[215:208] := SaturateU8(b[175:160]) tmp_dst[223:216] := SaturateU8(b[191:176]) tmp_dst[231:224] := SaturateU8(b[207:192]) tmp_dst[239:232] := SaturateU8(b[223:208]) tmp_dst[247:240] := SaturateU8(b[239:224]) tmp_dst[255:248] := SaturateU8(b[255:240]) FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[7:0] := SaturateU8(a[15:0]) tmp_dst[15:8] := SaturateU8(a[31:16]) tmp_dst[23:16] := SaturateU8(a[47:32]) tmp_dst[31:24] := SaturateU8(a[63:48]) tmp_dst[39:32] := SaturateU8(a[79:64]) tmp_dst[47:40] := SaturateU8(a[95:80]) tmp_dst[55:48] := SaturateU8(a[111:96]) tmp_dst[63:56] := SaturateU8(a[127:112]) tmp_dst[71:64] := SaturateU8(b[15:0]) tmp_dst[79:72] := SaturateU8(b[31:16]) tmp_dst[87:80] := SaturateU8(b[47:32]) tmp_dst[95:88] := SaturateU8(b[63:48]) tmp_dst[103:96] := SaturateU8(b[79:64]) tmp_dst[111:104] := SaturateU8(b[95:80]) tmp_dst[119:112] := SaturateU8(b[111:96]) tmp_dst[127:120] := SaturateU8(b[127:112]) tmp_dst[135:128] := SaturateU8(a[143:128]) tmp_dst[143:136] := SaturateU8(a[159:144]) tmp_dst[151:144] := SaturateU8(a[175:160]) tmp_dst[159:152] := SaturateU8(a[191:176]) tmp_dst[167:160] := SaturateU8(a[207:192]) tmp_dst[175:168] := SaturateU8(a[223:208]) tmp_dst[183:176] := SaturateU8(a[239:224]) tmp_dst[191:184] := SaturateU8(a[255:240]) tmp_dst[199:192] := SaturateU8(b[143:128]) tmp_dst[207:200] := SaturateU8(b[159:144]) tmp_dst[215:208] := SaturateU8(b[175:160]) tmp_dst[223:216] := SaturateU8(b[191:176]) tmp_dst[231:224] := SaturateU8(b[207:192]) tmp_dst[239:232] := SaturateU8(b[223:208]) tmp_dst[247:240] := SaturateU8(b[239:224]) tmp_dst[255:248] := SaturateU8(b[255:240]) FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[7:0] := SaturateU8(a[15:0]) tmp_dst[15:8] := SaturateU8(a[31:16]) tmp_dst[23:16] := SaturateU8(a[47:32]) tmp_dst[31:24] := SaturateU8(a[63:48]) tmp_dst[39:32] := SaturateU8(a[79:64]) tmp_dst[47:40] := SaturateU8(a[95:80]) tmp_dst[55:48] := SaturateU8(a[111:96]) tmp_dst[63:56] := SaturateU8(a[127:112]) tmp_dst[71:64] := SaturateU8(b[15:0]) tmp_dst[79:72] := SaturateU8(b[31:16]) tmp_dst[87:80] := SaturateU8(b[47:32]) tmp_dst[95:88] := SaturateU8(b[63:48]) tmp_dst[103:96] := SaturateU8(b[79:64]) tmp_dst[111:104] := SaturateU8(b[95:80]) tmp_dst[119:112] := SaturateU8(b[111:96]) tmp_dst[127:120] := SaturateU8(b[127:112]) tmp_dst[135:128] := SaturateU8(a[143:128]) tmp_dst[143:136] := SaturateU8(a[159:144]) tmp_dst[151:144] := SaturateU8(a[175:160]) tmp_dst[159:152] := SaturateU8(a[191:176]) tmp_dst[167:160] := SaturateU8(a[207:192]) tmp_dst[175:168] := SaturateU8(a[223:208]) tmp_dst[183:176] := SaturateU8(a[239:224]) tmp_dst[191:184] := SaturateU8(a[255:240]) tmp_dst[199:192] := SaturateU8(b[143:128]) tmp_dst[207:200] := SaturateU8(b[159:144]) tmp_dst[215:208] := SaturateU8(b[175:160]) tmp_dst[223:216] := SaturateU8(b[191:176]) tmp_dst[231:224] := SaturateU8(b[207:192]) tmp_dst[239:232] := SaturateU8(b[223:208]) tmp_dst[247:240] := SaturateU8(b[239:224]) tmp_dst[255:248] := SaturateU8(b[255:240]) tmp_dst[263:256] := SaturateU8(a[271:256]) tmp_dst[271:264] := SaturateU8(a[287:272]) tmp_dst[279:272] := SaturateU8(a[303:288]) tmp_dst[287:280] := SaturateU8(a[319:304]) tmp_dst[295:288] := SaturateU8(a[335:320]) tmp_dst[303:296] := SaturateU8(a[351:336]) tmp_dst[311:304] := SaturateU8(a[367:352]) tmp_dst[319:312] := SaturateU8(a[383:368]) tmp_dst[327:320] := SaturateU8(b[271:256]) tmp_dst[335:328] := SaturateU8(b[287:272]) tmp_dst[343:336] := SaturateU8(b[303:288]) tmp_dst[351:344] := SaturateU8(b[319:304]) tmp_dst[359:352] := SaturateU8(b[335:320]) tmp_dst[367:360] := SaturateU8(b[351:336]) tmp_dst[375:368] := SaturateU8(b[367:352]) tmp_dst[383:376] := SaturateU8(b[383:368]) tmp_dst[391:384] := SaturateU8(a[399:384]) tmp_dst[399:392] := SaturateU8(a[415:400]) tmp_dst[407:400] := SaturateU8(a[431:416]) tmp_dst[415:408] := SaturateU8(a[447:432]) tmp_dst[423:416] := SaturateU8(a[463:448]) tmp_dst[431:424] := SaturateU8(a[479:464]) tmp_dst[439:432] := SaturateU8(a[495:480]) tmp_dst[447:440] := SaturateU8(a[511:496]) tmp_dst[455:448] := SaturateU8(b[399:384]) tmp_dst[463:456] := SaturateU8(b[415:400]) tmp_dst[471:464] := SaturateU8(b[431:416]) tmp_dst[479:472] := SaturateU8(b[447:432]) tmp_dst[487:480] := SaturateU8(b[463:448]) tmp_dst[495:488] := SaturateU8(b[479:464]) tmp_dst[503:496] := SaturateU8(b[495:480]) tmp_dst[511:504] := SaturateU8(b[511:496]) FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[7:0] := SaturateU8(a[15:0]) tmp_dst[15:8] := SaturateU8(a[31:16]) tmp_dst[23:16] := SaturateU8(a[47:32]) tmp_dst[31:24] := SaturateU8(a[63:48]) tmp_dst[39:32] := SaturateU8(a[79:64]) tmp_dst[47:40] := SaturateU8(a[95:80]) tmp_dst[55:48] := SaturateU8(a[111:96]) tmp_dst[63:56] := SaturateU8(a[127:112]) tmp_dst[71:64] := SaturateU8(b[15:0]) tmp_dst[79:72] := SaturateU8(b[31:16]) tmp_dst[87:80] := SaturateU8(b[47:32]) tmp_dst[95:88] := SaturateU8(b[63:48]) tmp_dst[103:96] := SaturateU8(b[79:64]) tmp_dst[111:104] := SaturateU8(b[95:80]) tmp_dst[119:112] := SaturateU8(b[111:96]) tmp_dst[127:120] := SaturateU8(b[127:112]) tmp_dst[135:128] := SaturateU8(a[143:128]) tmp_dst[143:136] := SaturateU8(a[159:144]) tmp_dst[151:144] := SaturateU8(a[175:160]) tmp_dst[159:152] := SaturateU8(a[191:176]) tmp_dst[167:160] := SaturateU8(a[207:192]) tmp_dst[175:168] := SaturateU8(a[223:208]) tmp_dst[183:176] := SaturateU8(a[239:224]) tmp_dst[191:184] := SaturateU8(a[255:240]) tmp_dst[199:192] := SaturateU8(b[143:128]) tmp_dst[207:200] := SaturateU8(b[159:144]) tmp_dst[215:208] := SaturateU8(b[175:160]) tmp_dst[223:216] := SaturateU8(b[191:176]) tmp_dst[231:224] := SaturateU8(b[207:192]) tmp_dst[239:232] := SaturateU8(b[223:208]) tmp_dst[247:240] := SaturateU8(b[239:224]) tmp_dst[255:248] := SaturateU8(b[255:240]) tmp_dst[263:256] := SaturateU8(a[271:256]) tmp_dst[271:264] := SaturateU8(a[287:272]) tmp_dst[279:272] := SaturateU8(a[303:288]) tmp_dst[287:280] := SaturateU8(a[319:304]) tmp_dst[295:288] := SaturateU8(a[335:320]) tmp_dst[303:296] := SaturateU8(a[351:336]) tmp_dst[311:304] := SaturateU8(a[367:352]) tmp_dst[319:312] := SaturateU8(a[383:368]) tmp_dst[327:320] := SaturateU8(b[271:256]) tmp_dst[335:328] := SaturateU8(b[287:272]) tmp_dst[343:336] := SaturateU8(b[303:288]) tmp_dst[351:344] := SaturateU8(b[319:304]) tmp_dst[359:352] := SaturateU8(b[335:320]) tmp_dst[367:360] := SaturateU8(b[351:336]) tmp_dst[375:368] := SaturateU8(b[367:352]) tmp_dst[383:376] := SaturateU8(b[383:368]) tmp_dst[391:384] := SaturateU8(a[399:384]) tmp_dst[399:392] := SaturateU8(a[415:400]) tmp_dst[407:400] := SaturateU8(a[431:416]) tmp_dst[415:408] := SaturateU8(a[447:432]) tmp_dst[423:416] := SaturateU8(a[463:448]) tmp_dst[431:424] := SaturateU8(a[479:464]) tmp_dst[439:432] := SaturateU8(a[495:480]) tmp_dst[447:440] := SaturateU8(a[511:496]) tmp_dst[455:448] := SaturateU8(b[399:384]) tmp_dst[463:456] := SaturateU8(b[415:400]) tmp_dst[471:464] := SaturateU8(b[431:416]) tmp_dst[479:472] := SaturateU8(b[447:432]) tmp_dst[487:480] := SaturateU8(b[463:448]) tmp_dst[495:488] := SaturateU8(b[479:464]) tmp_dst[503:496] := SaturateU8(b[495:480]) tmp_dst[511:504] := SaturateU8(b[511:496]) FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst". dst[7:0] := SaturateU8(a[15:0]) dst[15:8] := SaturateU8(a[31:16]) dst[23:16] := SaturateU8(a[47:32]) dst[31:24] := SaturateU8(a[63:48]) dst[39:32] := SaturateU8(a[79:64]) dst[47:40] := SaturateU8(a[95:80]) dst[55:48] := SaturateU8(a[111:96]) dst[63:56] := SaturateU8(a[127:112]) dst[71:64] := SaturateU8(b[15:0]) dst[79:72] := SaturateU8(b[31:16]) dst[87:80] := SaturateU8(b[47:32]) dst[95:88] := SaturateU8(b[63:48]) dst[103:96] := SaturateU8(b[79:64]) dst[111:104] := SaturateU8(b[95:80]) dst[119:112] := SaturateU8(b[111:96]) dst[127:120] := SaturateU8(b[127:112]) dst[135:128] := SaturateU8(a[143:128]) dst[143:136] := SaturateU8(a[159:144]) dst[151:144] := SaturateU8(a[175:160]) dst[159:152] := SaturateU8(a[191:176]) dst[167:160] := SaturateU8(a[207:192]) dst[175:168] := SaturateU8(a[223:208]) dst[183:176] := SaturateU8(a[239:224]) dst[191:184] := SaturateU8(a[255:240]) dst[199:192] := SaturateU8(b[143:128]) dst[207:200] := SaturateU8(b[159:144]) dst[215:208] := SaturateU8(b[175:160]) dst[223:216] := SaturateU8(b[191:176]) dst[231:224] := SaturateU8(b[207:192]) dst[239:232] := SaturateU8(b[223:208]) dst[247:240] := SaturateU8(b[239:224]) dst[255:248] := SaturateU8(b[255:240]) dst[263:256] := SaturateU8(a[271:256]) dst[271:264] := SaturateU8(a[287:272]) dst[279:272] := SaturateU8(a[303:288]) dst[287:280] := SaturateU8(a[319:304]) dst[295:288] := SaturateU8(a[335:320]) dst[303:296] := SaturateU8(a[351:336]) dst[311:304] := SaturateU8(a[367:352]) dst[319:312] := SaturateU8(a[383:368]) dst[327:320] := SaturateU8(b[271:256]) dst[335:328] := SaturateU8(b[287:272]) dst[343:336] := SaturateU8(b[303:288]) dst[351:344] := SaturateU8(b[319:304]) dst[359:352] := SaturateU8(b[335:320]) dst[367:360] := SaturateU8(b[351:336]) dst[375:368] := SaturateU8(b[367:352]) dst[383:376] := SaturateU8(b[383:368]) dst[391:384] := SaturateU8(a[399:384]) dst[399:392] := SaturateU8(a[415:400]) dst[407:400] := SaturateU8(a[431:416]) dst[415:408] := SaturateU8(a[447:432]) dst[423:416] := SaturateU8(a[463:448]) dst[431:424] := SaturateU8(a[479:464]) dst[439:432] := SaturateU8(a[495:480]) dst[447:440] := SaturateU8(a[511:496]) dst[455:448] := SaturateU8(b[399:384]) dst[463:456] := SaturateU8(b[415:400]) dst[471:464] := SaturateU8(b[431:416]) dst[479:472] := SaturateU8(b[447:432]) dst[487:480] := SaturateU8(b[463:448]) dst[495:488] := SaturateU8(b[479:464]) dst[503:496] := SaturateU8(b[495:480]) dst[511:504] := SaturateU8(b[511:496]) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[7:0] := SaturateU8(a[15:0]) tmp_dst[15:8] := SaturateU8(a[31:16]) tmp_dst[23:16] := SaturateU8(a[47:32]) tmp_dst[31:24] := SaturateU8(a[63:48]) tmp_dst[39:32] := SaturateU8(a[79:64]) tmp_dst[47:40] := SaturateU8(a[95:80]) tmp_dst[55:48] := SaturateU8(a[111:96]) tmp_dst[63:56] := SaturateU8(a[127:112]) tmp_dst[71:64] := SaturateU8(b[15:0]) tmp_dst[79:72] := SaturateU8(b[31:16]) tmp_dst[87:80] := SaturateU8(b[47:32]) tmp_dst[95:88] := SaturateU8(b[63:48]) tmp_dst[103:96] := SaturateU8(b[79:64]) tmp_dst[111:104] := SaturateU8(b[95:80]) tmp_dst[119:112] := SaturateU8(b[111:96]) tmp_dst[127:120] := SaturateU8(b[127:112]) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[7:0] := SaturateU8(a[15:0]) tmp_dst[15:8] := SaturateU8(a[31:16]) tmp_dst[23:16] := SaturateU8(a[47:32]) tmp_dst[31:24] := SaturateU8(a[63:48]) tmp_dst[39:32] := SaturateU8(a[79:64]) tmp_dst[47:40] := SaturateU8(a[95:80]) tmp_dst[55:48] := SaturateU8(a[111:96]) tmp_dst[63:56] := SaturateU8(a[127:112]) tmp_dst[71:64] := SaturateU8(b[15:0]) tmp_dst[79:72] := SaturateU8(b[31:16]) tmp_dst[87:80] := SaturateU8(b[47:32]) tmp_dst[95:88] := SaturateU8(b[63:48]) tmp_dst[103:96] := SaturateU8(b[79:64]) tmp_dst[111:104] := SaturateU8(b[95:80]) tmp_dst[119:112] := SaturateU8(b[111:96]) tmp_dst[127:120] := SaturateU8(b[127:112]) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] + b[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] + b[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := a[i+7:i] + b[i+7:i] ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] + b[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] + b[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] + b[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] + b[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] + b[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] + b[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := a[i+15:i] + b[i+15:i] ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] + b[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] + b[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] + b[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] + b[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*128 tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8) tmp_dst[i+127:i] := tmp[127:0] ENDFOR FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*128 tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8) tmp_dst[i+127:i] := tmp[127:0] ENDFOR FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst". FOR j := 0 to 3 i := j*128 tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8) dst[i+127:i] := tmp[127:0] ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*128 tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8) tmp_dst[i+127:i] := tmp[127:0] ENDFOR FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*128 tmp[255:0] := ((a[i+127:i] << 128)[255:0] OR b[i+127:i]) >> (imm8*8) tmp_dst[i+127:i] := tmp[127:0] ENDFOR FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Concatenate pairs of 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := b[i+7:i] ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := b[i+7:i] ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Blend packed 8-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := b[i+7:i] ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := b[i+15:i] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := b[i+15:i] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Blend packed 16-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := b[i+15:i] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Set Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Set Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Broadcast the low packed 8-bit integer from "a" to all elements of "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := a[7:0] ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Set Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Set Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Set Broadcast 8-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Broadcast the low packed 8-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Set Broadcast 8-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[7:0] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Set Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Set Broadcast 16-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Broadcast the low packed 16-bit integer from "a" to all elements of "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := a[15:0] ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Set Broadcast 16-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Set Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Set Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Set Broadcast the low packed 16-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*8 k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 63 i := j*8 k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*8 k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] OP b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] == b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] >= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] > b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] <= b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] < b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 8-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ( a[i+7:i] != b[i+7:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed unsigned 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*16 k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 31 i := j*16 k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*16 k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] OP b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] == b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] >= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] > b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] <= b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] < b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compare packed signed 16-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ( a[i+15:i] != b[i+15:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] off := 16*idx[i+3:i] dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := idx[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] off := 16*idx[i+3:i] dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] off := 16*idx[i+3:i] dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 15 i := j*16 off := 16*idx[i+3:i] dst[i+15:i] := idx[i+4] ? b[off+15:off] : a[off+15:off] ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] off := 16*idx[i+4:i] dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := idx[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] off := 16*idx[i+4:i] dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] off := 16*idx[i+4:i] dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 31 i := j*16 off := 16*idx[i+4:i] dst[i+15:i] := idx[i+5] ? b[off+15:off] : a[off+15:off] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] off := 16*idx[i+2:i] dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := idx[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] off := 16*idx[i+2:i] dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] off := 16*idx[i+2:i] dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*16 off := 16*idx[i+2:i] dst[i+15:i] := idx[i+3] ? b[off+15:off] : a[off+15:off] ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 id := idx[i+3:i]*16 IF k[j] dst[i+15:i] := a[id+15:id] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 id := idx[i+3:i]*16 IF k[j] dst[i+15:i] := a[id+15:id] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 15 i := j*16 id := idx[i+3:i]*16 dst[i+15:i] := a[id+15:id] ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 id := idx[i+4:i]*16 IF k[j] dst[i+15:i] := a[id+15:id] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 id := idx[i+4:i]*16 IF k[j] dst[i+15:i] := a[id+15:id] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 31 i := j*16 id := idx[i+4:i]*16 dst[i+15:i] := a[id+15:id] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 id := idx[i+2:i]*16 IF k[j] dst[i+15:i] := a[id+15:id] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 id := idx[i+2:i]*16 IF k[j] dst[i+15:i] := a[id+15:id] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in "a" using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*16 id := idx[i+2:i]*16 dst[i+15:i] := a[id+15:id] ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed unsigned 8-bit integers in "a" by packed signed 8-bit integers in "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer Mask AVX512VL AVX512BW Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a". FOR j := 0 to 31 i := j*8 IF a[i+7] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a". FOR j := 0 to 63 i := j*8 IF a[i+7] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512VL AVX512BW Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 8-bit integer in "a". FOR j := 0 to 15 i := j*8 IF a[i+7] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer AVX512VL AVX512BW Miscellaneous Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := 0xFF ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := 0xFF ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW AVX512VL Miscellaneous Set each packed 8-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := 0xFF ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := 0xFFFF ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := 0xFFFF ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Set each packed 16-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := 0xFFFF ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 15 i := 16*j l := 8*j dst[l+7:l] := Saturate8(a[i+15:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Store Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i]) FI ENDFOR
Integer AVX512VL AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 31 i := 16*j l := 8*j dst[l+7:l] := Saturate8(a[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Store Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i]) FI ENDFOR
Integer AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 7 i := 16*j l := 8*j dst[l+7:l] := Saturate8(a[i+15:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512BW Convert Store Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+15:i]) FI ENDFOR
Integer AVX512VL AVX512BW Convert Convert packed signed 16-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512BW Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 l := j*16 IF k[j] dst[l+15:l] := SignExtend16(a[i+7:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 l := j*16 IF k[j] dst[l+15:l] := SignExtend16(a[i+7:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst". FOR j := 0 to 31 i := j*8 l := j*16 dst[l+15:l] := SignExtend16(a[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 l := j*16 IF k[j] dst[l+15:l] := SignExtend16(a[i+7:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 l := j*16 IF k[j] dst[l+15:l] := SignExtend16(a[i+7:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*8 l := j*16 IF k[j] dst[l+15:l] := SignExtend16(a[i+7:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*8 l := j*16 IF k[j] dst[l+15:l] := SignExtend16(a[i+7:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 15 i := 16*j l := 8*j dst[l+7:l] := SaturateU8(a[i+15:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Store Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i]) FI ENDFOR
Integer AVX512VL AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 31 i := 16*j l := 8*j dst[l+7:l] := SaturateU8(a[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Store Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i]) FI ENDFOR
Integer AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 7 i := 16*j l := 8*j dst[l+7:l] := SaturateU8(a[i+15:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512BW Convert Store Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+15:i]) FI ENDFOR
Integer AVX512VL AVX512BW Convert Convert packed unsigned 16-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer Mask AVX512VL AVX512BW Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a". FOR j := 0 to 15 i := j*16 IF a[i+15] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512BW Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a". FOR j := 0 to 31 i := j*16 IF a[i+15] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 16-bit integer in "a". FOR j := 0 to 7 i := j*16 IF a[i+15] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer AVX512VL AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 15 i := 16*j l := 8*j dst[l+7:l] := Truncate8(a[i+15:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Store Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i]) FI ENDFOR
Integer AVX512VL AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 31 i := 16*j l := 8*j dst[l+7:l] := Truncate8(a[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Store Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i]) FI ENDFOR
Integer AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 16*j l := 8*j dst[l+7:l] := Truncate8(a[i+15:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+15:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512BW Convert Store Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+15:i]) FI ENDFOR
Integer AVX512VL AVX512BW Convert Convert packed 16-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 16*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+15:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512BW Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 l := j*16 IF k[j] dst[l+15:l] := ZeroExtend16(a[i+7:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 l := j*16 IF k[j] dst[l+15:l] := ZeroExtend16(a[i+7:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst". FOR j := 0 to 31 i := j*8 l := j*16 dst[l+15:l] := ZeroExtend16(a[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 l := j*16 IF k[j] dst[l+15:l] := ZeroExtend16(a[i+7:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 l := j*16 IF k[j] dst[l+15:l] := ZeroExtend16(a[i+7:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*8 l := j*16 IF k[j] dst[l+15:l] := ZeroExtend16(a[i+7:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*8 l := j*16 IF k[j] dst[l+15:l] := ZeroExtend16(a[i+7:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst". FOR j := 0 to 31 i := j*16 tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 31 i := j*16 tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 31 i := j*16 tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst". FOR j := 0 to 31 i := j*16 tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512BW Miscellaneous Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce eight unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst". FOR j := 0 to 63 i := j*8 tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i]) ENDFOR FOR j := 0 to 7 i := j*64 dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \ tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56] dst[i+63:i+16] := 0 ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[4:0] := b[i+3:i] + (j & 0x10) dst[i+7:i] := a[index*8+7:index*8] FI ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[4:0] := b[i+3:i] + (j & 0x10) dst[i+7:i] := a[index*8+7:index*8] FI ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Shuffle 8-bit integers in "a" within 128-bit lanes using the control in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[5:0] := b[i+3:i] + (j & 0x30) dst[i+7:i] := a[index*8+7:index*8] FI ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[5:0] := b[i+3:i] + (j & 0x30) dst[i+7:i] := a[index*8+7:index*8] FI ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst". FOR j := 0 to 63 i := j*8 IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[5:0] := b[i+3:i] + (j & 0x30) dst[i+7:i] := a[index*8+7:index*8] FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[3:0] := b[i+3:i] dst[i+7:i] := a[index*8+7:index*8] FI ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[3:0] := b[i+3:i] dst[i+7:i] := a[index*8+7:index*8] FI ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[63:0] := a[63:0] tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64] tmp_dst[191:128] := a[191:128] tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192] tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192] tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192] tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192] FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[63:0] := a[63:0] tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64] tmp_dst[191:128] := a[191:128] tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192] tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192] tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192] tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192] FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[63:0] := a[63:0] tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64] tmp_dst[191:128] := a[191:128] tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192] tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192] tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192] tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192] tmp_dst[319:256] := a[319:256] tmp_dst[335:320] := (a >> (imm8[1:0] * 16))[335:320] tmp_dst[351:336] := (a >> (imm8[3:2] * 16))[335:320] tmp_dst[367:352] := (a >> (imm8[5:4] * 16))[335:320] tmp_dst[383:368] := (a >> (imm8[7:6] * 16))[335:320] tmp_dst[447:384] := a[447:384] tmp_dst[463:448] := (a >> (imm8[1:0] * 16))[463:448] tmp_dst[479:464] := (a >> (imm8[3:2] * 16))[463:448] tmp_dst[495:480] := (a >> (imm8[5:4] * 16))[463:448] tmp_dst[511:496] := (a >> (imm8[7:6] * 16))[463:448] FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[63:0] := a[63:0] tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64] tmp_dst[191:128] := a[191:128] tmp_dst[207:192] := (a >> (imm8[1:0] * 16))[207:192] tmp_dst[223:208] := (a >> (imm8[3:2] * 16))[207:192] tmp_dst[239:224] := (a >> (imm8[5:4] * 16))[207:192] tmp_dst[255:240] := (a >> (imm8[7:6] * 16))[207:192] tmp_dst[319:256] := a[319:256] tmp_dst[335:320] := (a >> (imm8[1:0] * 16))[335:320] tmp_dst[351:336] := (a >> (imm8[3:2] * 16))[335:320] tmp_dst[367:352] := (a >> (imm8[5:4] * 16))[335:320] tmp_dst[383:368] := (a >> (imm8[7:6] * 16))[335:320] tmp_dst[447:384] := a[447:384] tmp_dst[463:448] := (a >> (imm8[1:0] * 16))[463:448] tmp_dst[479:464] := (a >> (imm8[3:2] * 16))[463:448] tmp_dst[495:480] := (a >> (imm8[5:4] * 16))[463:448] tmp_dst[511:496] := (a >> (imm8[7:6] * 16))[463:448] FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the high 64 bits of 128-bit lanes of "dst", with the low 64 bits of 128-bit lanes being copied from from "a" to "dst". dst[63:0] := a[63:0] dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] dst[127:112] := (a >> (imm8[7:6] * 16))[79:64] dst[191:128] := a[191:128] dst[207:192] := (a >> (imm8[1:0] * 16))[207:192] dst[223:208] := (a >> (imm8[3:2] * 16))[207:192] dst[239:224] := (a >> (imm8[5:4] * 16))[207:192] dst[255:240] := (a >> (imm8[7:6] * 16))[207:192] dst[319:256] := a[319:256] dst[335:320] := (a >> (imm8[1:0] * 16))[335:320] dst[351:336] := (a >> (imm8[3:2] * 16))[335:320] dst[367:352] := (a >> (imm8[5:4] * 16))[335:320] dst[383:368] := (a >> (imm8[7:6] * 16))[335:320] dst[447:384] := a[447:384] dst[463:448] := (a >> (imm8[1:0] * 16))[463:448] dst[479:464] := (a >> (imm8[3:2] * 16))[463:448] dst[495:480] := (a >> (imm8[5:4] * 16))[463:448] dst[511:496] := (a >> (imm8[7:6] * 16))[463:448] dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[63:0] := a[63:0] tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64] FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[63:0] := a[63:0] tmp_dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] tmp_dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] tmp_dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] tmp_dst[127:112] := (a >> (imm8[7:6] * 16))[79:64] FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] tmp_dst[127:64] := a[127:64] tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128] tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128] tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128] tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128] tmp_dst[255:192] := a[255:192] FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] tmp_dst[127:64] := a[127:64] tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128] tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128] tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128] tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128] tmp_dst[255:192] := a[255:192] FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] tmp_dst[127:64] := a[127:64] tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128] tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128] tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128] tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128] tmp_dst[255:192] := a[255:192] tmp_dst[271:256] := (a >> (imm8[1:0] * 16))[271:256] tmp_dst[287:272] := (a >> (imm8[3:2] * 16))[271:256] tmp_dst[303:288] := (a >> (imm8[5:4] * 16))[271:256] tmp_dst[319:304] := (a >> (imm8[7:6] * 16))[271:256] tmp_dst[383:320] := a[383:320] tmp_dst[399:384] := (a >> (imm8[1:0] * 16))[399:384] tmp_dst[415:400] := (a >> (imm8[3:2] * 16))[399:384] tmp_dst[431:416] := (a >> (imm8[5:4] * 16))[399:384] tmp_dst[447:432] := (a >> (imm8[7:6] * 16))[399:384] tmp_dst[511:448] := a[511:448] FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] tmp_dst[127:64] := a[127:64] tmp_dst[143:128] := (a >> (imm8[1:0] * 16))[143:128] tmp_dst[159:144] := (a >> (imm8[3:2] * 16))[143:128] tmp_dst[175:160] := (a >> (imm8[5:4] * 16))[143:128] tmp_dst[191:176] := (a >> (imm8[7:6] * 16))[143:128] tmp_dst[255:192] := a[255:192] tmp_dst[271:256] := (a >> (imm8[1:0] * 16))[271:256] tmp_dst[287:272] := (a >> (imm8[3:2] * 16))[271:256] tmp_dst[303:288] := (a >> (imm8[5:4] * 16))[271:256] tmp_dst[319:304] := (a >> (imm8[7:6] * 16))[271:256] tmp_dst[383:320] := a[383:320] tmp_dst[399:384] := (a >> (imm8[1:0] * 16))[399:384] tmp_dst[415:400] := (a >> (imm8[3:2] * 16))[399:384] tmp_dst[431:416] := (a >> (imm8[5:4] * 16))[399:384] tmp_dst[447:432] := (a >> (imm8[7:6] * 16))[399:384] tmp_dst[511:448] := a[511:448] FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of "a" using the control in "imm8". Store the results in the low 64 bits of 128-bit lanes of "dst", with the high 64 bits of 128-bit lanes being copied from from "a" to "dst". dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] dst[127:64] := a[127:64] dst[143:128] := (a >> (imm8[1:0] * 16))[143:128] dst[159:144] := (a >> (imm8[3:2] * 16))[143:128] dst[175:160] := (a >> (imm8[5:4] * 16))[143:128] dst[191:176] := (a >> (imm8[7:6] * 16))[143:128] dst[255:192] := a[255:192] dst[271:256] := (a >> (imm8[1:0] * 16))[271:256] dst[287:272] := (a >> (imm8[3:2] * 16))[271:256] dst[303:288] := (a >> (imm8[5:4] * 16))[271:256] dst[319:304] := (a >> (imm8[7:6] * 16))[271:256] dst[383:320] := a[383:320] dst[399:384] := (a >> (imm8[1:0] * 16))[399:384] dst[415:400] := (a >> (imm8[3:2] * 16))[399:384] dst[431:416] := (a >> (imm8[5:4] * 16))[399:384] dst[447:432] := (a >> (imm8[7:6] * 16))[399:384] dst[511:448] := a[511:448] dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst", using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] tmp_dst[127:64] := a[127:64] FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst", using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] tmp_dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] tmp_dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] tmp_dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] tmp_dst[127:64] := a[127:64] FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512BW Shift Shift 128-bit lanes in "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] << (tmp*8) dst[255:128] := a[255:128] << (tmp*8) dst[383:256] := a[383:256] << (tmp*8) dst[511:384] := a[511:384] << (tmp*8) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := SignExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := (a[i+15] ? 0xFFFF : 0) FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512BW Shift Shift 128-bit lanes in "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] >> (tmp*8) dst[255:128] := a[255:128] >> (tmp*8) dst[383:256] := a[383:256] >> (tmp*8) dst[511:384] := a[511:384] >> (tmp*8) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF count[i+15:i] < 16 dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 31 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] - b[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] - b[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] - b[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] - b[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := a[i+7:i] - b[i+7:i] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] - b[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[i+7:i] - b[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] - b[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] - b[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] - b[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] - b[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := a[i+15:i] - b[i+15:i] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] - b[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[i+15:i] - b[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 31 i := j*8 k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 63 i := j*8 k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise AND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 15 i := j*8 k[j] := ((a[i+7:i] AND b[i+7:i]) != 0) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 15 i := j*16 k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512BW Compare Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 31 i := j*16 k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise AND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 7 i := j*16 k[j] := ((a[i+15:i] AND b[i+15:i]) != 0) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 31 i := j*8 IF k1[j] k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 31 i := j*8 k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 63 i := j*8 IF k1[j] k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:64] := 0
Integer Mask AVX512BW Compare Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 63 i := j*8 k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0 ENDFOR k[MAX:64] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 15 i := j*8 IF k1[j] k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise NAND of packed 8-bit integers in "a" and "b", producing intermediate 8-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 15 i := j*8 k[j] := ((a[i+7:i] AND b[i+7:i]) == 0) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 15 i := j*16 IF k1[j] k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 15 i := j*16 k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512BW Compare Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 31 i := j*16 IF k1[j] k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:32] := 0
Integer Mask AVX512BW Compare Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 31 i := j*16 k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0 ENDFOR k[MAX:32] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 7 i := j*16 IF k1[j] k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512BW Compare Compute the bitwise NAND of packed 16-bit integers in "a" and "b", producing intermediate 16-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 7 i := j*16 k[j] := ((a[i+15:i] AND b[i+15:i]) == 0) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128]) FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128]) FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384]) FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384]) FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_BYTES(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_HIGH_BYTES(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_HIGH_BYTES(a[511:384], b[511:384]) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0]) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0]) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128]) FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128]) FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384]) FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384]) FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_WORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_HIGH_WORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_HIGH_WORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0]) FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0]) FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128]) FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128]) FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384]) FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384]) FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_BYTES(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_BYTES(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_BYTES(a[511:384], b[511:384]) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0]) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0]) FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := tmp_dst[i+7:i] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128]) FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128]) FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384]) FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384]) FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_WORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_WORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_WORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512BW Miscellaneous Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0]) FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := tmp_dst[i+15:i] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512BW Store Store 512-bits (composed of 32 packed 16-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+511:mem_addr] := a[511:0]
Integer AVX512BW Store Store 512-bits (composed of 64 packed 8-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+511:mem_addr] := a[511:0]
Integer AVX512VL AVX512BW Store Store 256-bits (composed of 16 packed 16-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+255:mem_addr] := a[255:0]
Integer AVX512VL AVX512BW Store Store 256-bits (composed of 32 packed 8-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+255:mem_addr] := a[255:0]
Integer AVX512VL AVX512BW Store Store 128-bits (composed of 8 packed 16-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer AVX512VL AVX512BW Store Store 128-bits (composed of 16 packed 8-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer AVX512BW Load Load 512-bits (composed of 32 packed 16-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512BW Load Load 512-bits (composed of 64 packed 8-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512VL AVX512BW Load Load 256-bits (composed of 16 packed 16-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX512VL AVX512BW Load Load 256-bits (composed of 32 packed 8-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX512VL AVX512BW Load Load 128-bits (composed of 8 packed 16-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[127:0] := MEM[mem_addr+127:mem_addr] dst[MAX:128] := 0
Integer AVX512VL AVX512BW Load Load 128-bits (composed of 16 packed 8-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[127:0] := MEM[mem_addr+127:mem_addr] dst[MAX:128] := 0
Mask AVX512BW Mask Add 32-bit masks in "a" and "b", and store the result in "k". k[31:0] := a[31:0] + b[31:0] k[MAX:32] := 0
Mask AVX512BW Mask Add 64-bit masks in "a" and "b", and store the result in "k". k[63:0] := a[63:0] + b[63:0] k[MAX:64] := 0
Mask AVX512BW Mask Compute the bitwise AND of 32-bit masks "a" and "b", and store the result in "k". k[31:0] := a[31:0] AND b[31:0] k[MAX:32] := 0
Mask AVX512BW Mask Compute the bitwise AND of 64-bit masks "a" and "b", and store the result in "k". k[63:0] := a[63:0] AND b[63:0] k[MAX:64] := 0
Mask AVX512BW Mask Compute the bitwise NOT of 32-bit masks "a" and then AND with "b", and store the result in "k". k[31:0] := (NOT a[31:0]) AND b[31:0] k[MAX:32] := 0
Mask AVX512BW Mask Compute the bitwise NOT of 64-bit masks "a" and then AND with "b", and store the result in "k". k[63:0] := (NOT a[63:0]) AND b[63:0] k[MAX:64] := 0
Mask AVX512BW Mask Compute the bitwise NOT of 32-bit mask "a", and store the result in "k". k[31:0] := NOT a[31:0] k[MAX:32] := 0
Mask AVX512BW Mask Compute the bitwise NOT of 64-bit mask "a", and store the result in "k". k[63:0] := NOT a[63:0] k[MAX:64] := 0
Mask AVX512BW Mask Compute the bitwise OR of 32-bit masks "a" and "b", and store the result in "k". k[31:0] := a[31:0] OR b[31:0] k[MAX:32] := 0
Mask AVX512BW Mask Compute the bitwise OR of 64-bit masks "a" and "b", and store the result in "k". k[63:0] := a[63:0] OR b[63:0] k[MAX:64] := 0
Mask AVX512BW Mask Compute the bitwise XNOR of 32-bit masks "a" and "b", and store the result in "k". k[31:0] := NOT (a[31:0] XOR b[31:0]) k[MAX:32] := 0
Mask AVX512BW Mask Compute the bitwise XNOR of 64-bit masks "a" and "b", and store the result in "k". k[63:0] := NOT (a[63:0] XOR b[63:0]) k[MAX:64] := 0
Mask AVX512BW Mask Compute the bitwise XOR of 32-bit masks "a" and "b", and store the result in "k". k[31:0] := a[31:0] XOR b[31:0] k[MAX:32] := 0
Mask AVX512BW Mask Compute the bitwise XOR of 64-bit masks "a" and "b", and store the result in "k". k[63:0] := a[63:0] XOR b[63:0] k[MAX:64] := 0
Mask AVX512BW Mask Shift the bits of 32-bit mask "a" left by "count" while shifting in zeros, and store the least significant 32 bits of the result in "k". k[MAX:0] := 0 IF count[7:0] <= 31 k[31:0] := a[31:0] << count[7:0] FI
Mask AVX512BW Mask Shift the bits of 64-bit mask "a" left by "count" while shifting in zeros, and store the least significant 64 bits of the result in "k". k[MAX:0] := 0 IF count[7:0] <= 63 k[63:0] := a[63:0] << count[7:0] FI
Mask AVX512BW Mask Shift the bits of 32-bit mask "a" right by "count" while shifting in zeros, and store the least significant 32 bits of the result in "k". k[MAX:0] := 0 IF count[7:0] <= 31 k[31:0] := a[31:0] >> count[7:0] FI
Mask AVX512BW Mask Shift the bits of 64-bit mask "a" right by "count" while shifting in zeros, and store the least significant 64 bits of the result in "k". k[MAX:0] := 0 IF count[7:0] <= 63 k[63:0] := a[63:0] >> count[7:0] FI
Mask AVX512BW Load Load 32-bit mask from memory into "k". k[31:0] := MEM[mem_addr+31:mem_addr]
Mask AVX512BW Load Load 64-bit mask from memory into "k". k[63:0] := MEM[mem_addr+63:mem_addr]
Mask AVX512BW Store Store 32-bit mask from "a" into memory. MEM[mem_addr+31:mem_addr] := a[31:0]
Mask AVX512BW Store Store 64-bit mask from "a" into memory. MEM[mem_addr+63:mem_addr] := a[63:0]
Mask AVX512BW Mask Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones". tmp[31:0] := a[31:0] OR b[31:0] IF tmp[31:0] == 0x0 dst := 1 ELSE dst := 0 FI IF tmp[31:0] == 0xFFFFFFFF MEM[all_ones+7:all_ones] := 1 ELSE MEM[all_ones+7:all_ones] := 0 FI
Mask AVX512BW Mask Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst". tmp[31:0] := a[31:0] OR b[31:0] IF tmp[31:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512BW Mask Compute the bitwise OR of 32-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst". tmp[31:0] := a[31:0] OR b[31:0] IF tmp[31:0] == 0xFFFFFFFF dst := 1 ELSE dst := 0 FI
Mask AVX512BW Mask Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones". tmp[63:0] := a[63:0] OR b[63:0] IF tmp[63:0] == 0x0 dst := 1 ELSE dst := 0 FI IF tmp[7:0] == 0xFFFFFFFFFFFFFFFF MEM[all_ones+7:all_ones] := 1 ELSE MEM[all_ones+7:all_ones] := 0 FI
Mask AVX512BW Mask Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst". tmp[63:0] := a[63:0] OR b[63:0] IF tmp[63:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512BW Mask Compute the bitwise OR of 64-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst". tmp[63:0] := a[63:0] OR b[63:0] IF tmp[63:0] == 0xFFFFFFFFFFFFFFFF dst := 1 ELSE dst := 0 FI
Mask AVX512BW Mask Compute the bitwise AND of 32-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not". tmp1[31:0] := a[31:0] AND b[31:0] IF tmp1[31:0] == 0x0 dst := 1 ELSE dst := 0 FI tmp2[31:0] := (NOT a[31:0]) AND b[31:0] IF tmp2[31:0] == 0x0 MEM[and_not+7:and_not] := 1 ELSE MEM[and_not+7:and_not] := 0 FI
Mask AVX512BW Mask Compute the bitwise AND of 32-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". tmp[31:0] := a[31:0] AND b[31:0] IF tmp[31:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512BW Mask Compute the bitwise NOT of 32-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst". tmp[31:0] := (NOT a[31:0]) AND b[31:0] IF tmp[31:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512BW Mask Compute the bitwise AND of 64-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not". tmp1[63:0] := a[63:0] AND b[63:0] IF tmp1[63:0] == 0x0 dst := 1 ELSE dst := 0 FI tmp2[63:0] := (NOT a[63:0]) AND b[63:0] IF tmp2[63:0] == 0x0 MEM[and_not+7:and_not] := 1 ELSE MEM[and_not+7:and_not] := 0 FI
Mask AVX512BW Mask Compute the bitwise AND of 64-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". tmp[63:0] := a[63:0] AND b[63:0] IF tmp[63:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512BW Mask Compute the bitwise NOT of 64-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst". tmp[63:0] := (NOT a[63:0]) AND b[63:0] IF tmp[63:0] == 0x0 dst := 1 ELSE dst := 0 FI
AVX512BW Mask Convert 32-bit mask "a" into an integer value, and store the result in "dst". dst := ZeroExtend32(a[31:0])
AVX512BW Mask Convert 64-bit mask "a" into an integer value, and store the result in "dst". dst := ZeroExtend64(a[63:0])
AVX512BW Mask Convert integer value "a" into an 32-bit mask, and store the result in "k". k := ZeroExtend32(a[31:0])
AVX512BW Mask Convert integer value "a" into an 64-bit mask, and store the result in "k". k := ZeroExtend64(a[63:0])
Integer AVX512VL AVX512CD Miscellaneous Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ZeroExtend64(k[7:0]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Miscellaneous Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ZeroExtend64(k[7:0]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Miscellaneous Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ZeroExtend32(k[15:0]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Miscellaneous Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ZeroExtend32(k[15:0]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 7 i := j*32 FOR k := 0 to j-1 m := k*32 dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 7 i := j*32 IF k[j] FOR l := 0 to j-1 m := l*32 dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 7 i := j*32 IF k[j] FOR l := 0 to j-1 m := l*32 dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 3 i := j*32 FOR k := 0 to j-1 m := k*32 dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 3 i := j*32 IF k[j] FOR l := 0 to j-1 m := l*32 dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 3 i := j*32 IF k[j] FOR l := 0 to j-1 m := l*32 dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 3 i := j*64 FOR k := 0 to j-1 m := k*64 dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 3 i := j*64 IF k[j] FOR l := 0 to j-1 m := l*64 dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 3 i := j*64 IF k[j] FOR l := 0 to j-1 m := l*64 dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 1 i := j*64 FOR k := 0 to j-1 m := k*64 dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 1 i := j*64 IF k[j] FOR l := 0 to j-1 m := l*64 dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 1 i := j*64 IF k[j] FOR l := 0 to j-1 m := l*64 dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst". FOR j := 0 to 7 i := j*32 tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512CD Swizzle Broadcast the low 8-bits from input mask "k" to all 64-bit elements of "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ZeroExtend64(k[7:0]) ENDFOR dst[MAX:512] := 0
Integer AVX512CD Swizzle Broadcast the low 16-bits from input mask "k" to all 32-bit elements of "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ZeroExtend32(k[15:0]) ENDFOR dst[MAX:512] := 0
Integer AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 15 i := j*32 FOR k := 0 to j-1 m := k*32 dst[i+k] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ENDFOR dst[MAX:512] := 0
Integer AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 15 i := j*32 IF k[j] FOR l := 0 to j-1 m := l*32 dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512CD Compare Test each 32-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 15 i := j*32 IF k[j] FOR l := 0 to j-1 m := l*32 dst[i+l] := (a[i+31:i] == a[m+31:m]) ? 1 : 0 ENDFOR dst[i+31:i+j] := 0 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit. Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 7 i := j*64 FOR k := 0 to j-1 m := k*64 dst[i+k] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ENDFOR dst[MAX:512] := 0
Integer AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 7 i := j*64 IF k[j] FOR l := 0 to j-1 m := l*64 dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512CD Compare Test each 64-bit element of "a" for equality with all other elements in "a" closer to the least significant bit using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). Each element's comparison forms a zero extended bit vector in "dst". FOR j := 0 to 7 i := j*64 IF k[j] FOR l := 0 to j-1 m := l*64 dst[i+l] := (a[i+63:i] == a[m+63:m]) ? 1 : 0 ENDFOR dst[i+63:i+j] := 0 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ENDFOR dst[MAX:512] := 0
Integer AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 32-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] tmp := 31 dst[i+31:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+31:i] := dst[i+31:i] + 1 OD ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ENDFOR dst[MAX:512] := 0
Integer AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512CD Bit Manipulation Counts the number of leading zero bits in each packed 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp := 63 dst[i+63:i] := 0 DO WHILE (tmp >= 0 AND a[i+tmp] == 0) tmp := tmp - 1 dst[i+63:i] := dst[i+63:i] + 1 OD ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst". FOR j := 0 to 7 i := j*32 n := (j % 2)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 n := (j % 2)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the lower 2 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 n := (j % 8)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 8)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the 8 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 8)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst". FOR j := 0 to 3 i := j*64 n := (j % 2)*64 dst[i+63:i] := a[n+63:n] ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 n := (j % 2)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 n := (j % 2)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst". FOR j := 0 to 7 i := j*64 n := (j % 2)*64 dst[i+63:i] := a[n+63:n] ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 n := (j % 2)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Broadcast the 2 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 n := (j % 2)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst. FOR j := 0 to 7 i := j*32 n := (j % 2)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:256] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst. FOR j := 0 to 15 i := j*32 n := (j % 2)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst. FOR j := 0 to 3 i := j*32 n := (j % 2)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:128] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the lower 2 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 n := (j % 2)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
AVX512DQ Miscellaneous Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 n := (j % 8)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 8)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Broadcast the 8 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 8)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst". FOR j := 0 to 3 i := j*64 n := (j % 2)*64 dst[i+63:i] := a[n+63:n] ENDFOR dst[MAX:256] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 n := (j % 2)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512DQ Miscellaneous Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 n := (j % 2)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512DQ Miscellaneous Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst". FOR j := 0 to 7 i := j*64 n := (j % 2)*64 dst[i+63:i] := a[n+63:n] ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 n := (j % 2)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Broadcast the 2 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 n := (j % 2)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 3 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". FOR j := 0 to 3 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*64 l := j*32 dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 l := j*32 dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := j*64 l := j*32 dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := j*64 l := j*32 dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed signed 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". [sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_Int64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". [sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_FP64_To_UInt64_Truncate(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". [sae_note] FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_Int64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". [sae_note] FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := j*64 l := j*32 dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 64-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_FP32_To_UInt64_Truncate(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*64 l := j*32 dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 l := j*32 dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := j*64 l := j*32 dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := j*64 l := j*32 dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512DQ Convert Convert packed unsigned 64-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_Int64_To_FP32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point AVX512DQ Miscellaneous Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[255:0] := a[255:0] 1: dst[255:0] := a[511:256] ESAC dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[255:0] := a[255:0] 1: tmp[255:0] := a[511:256] ESAC FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Extract 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[255:0] := a[255:0] 1: tmp[255:0] := a[511:256] ESAC FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] ESAC dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] ESAC FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] ESAC FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst". CASE imm8[1:0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] 2: dst[127:0] := a[383:256] 3: dst[127:0] := a[511:384] ESAC dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[1:0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] 2: tmp[127:0] := a[383:256] 3: tmp[127:0] := a[511:384] ESAC FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[1:0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] 2: tmp[127:0] := a[383:256] 3: tmp[127:0] := a[511:384] ESAC FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512DQ Miscellaneous Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[255:0] := a[255:0] 1: dst[255:0] := a[511:256] ESAC dst[MAX:256] := 0
Integer AVX512DQ Miscellaneous Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[255:0] := a[255:0] 1: tmp[255:0] := a[511:256] ESAC FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512DQ Miscellaneous Extract 256 bits (composed of 8 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[255:0] := a[255:0] 1: tmp[255:0] := a[511:256] ESAC FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] ESAC dst[MAX:128] := 0
Integer AVX512VL AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] ESAC FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] ESAC FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst". CASE imm8[1:0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] 2: dst[127:0] := a[383:256] 3: dst[127:0] := a[511:384] ESAC dst[MAX:128] := 0
Integer AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[1:0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] 2: tmp[127:0] := a[383:256] 3: tmp[127:0] := a[511:384] ESAC FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512DQ Miscellaneous Extract 128 bits (composed of 2 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[1:0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] 2: tmp[127:0] := a[383:256] 3: tmp[127:0] := a[511:384] ESAC FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Mask AVX512VL AVX512DQ Miscellaneous Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k". [fpclass_note] FOR j := 0 to 3 i := j*64 k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ENDFOR k[MAX:4] := 0
Floating Point Mask AVX512VL AVX512DQ Miscellaneous Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [fpclass_note] FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Floating Point Mask AVX512DQ Miscellaneous Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k". [fpclass_note] FOR j := 0 to 7 i := j*64 k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512DQ Miscellaneous Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [fpclass_note] FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512VL AVX512DQ Miscellaneous Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k". [fpclass_note] FOR j := 0 to 1 i := j*64 k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ENDFOR k[MAX:2] := 0
Floating Point Mask AVX512VL AVX512DQ Miscellaneous Test packed double-precision (64-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [fpclass_note] FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Floating Point Mask AVX512VL AVX512DQ Miscellaneous Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k". [fpclass_note] FOR j := 0 to 7 i := j*32 k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0]) ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512VL AVX512DQ Miscellaneous Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [fpclass_note] FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512DQ Miscellaneous Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k". [fpclass_note] FOR j := 0 to 15 i := j*32 k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0]) ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512DQ Miscellaneous Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [fpclass_note] FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512VL AVX512DQ Miscellaneous Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k". [fpclass_note] FOR j := 0 to 3 i := j*32 k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0]) ENDFOR k[MAX:4] := 0
Floating Point Mask AVX512VL AVX512DQ Miscellaneous Test packed single-precision (32-bit) floating-point elements in "a" for special categories specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [fpclass_note] FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := CheckFPClass_FP32(a[i+31:i], imm8[7:0]) ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Floating Point Mask AVX512DQ Miscellaneous Test the lower double-precision (64-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k". [fpclass_note] k[0] := CheckFPClass_FP64(a[63:0], imm8[7:0]) k[MAX:1] := 0
Floating Point Mask AVX512DQ Miscellaneous Test the lower double-precision (64-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [fpclass_note] IF k1[0] k[0] := CheckFPClass_FP64(a[63:0], imm8[7:0]) ELSE k[0] := 0 FI k[MAX:1] := 0
Floating Point Mask AVX512DQ Miscellaneous Test the lower single-precision (32-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k. [fpclass_note] k[0] := CheckFPClass_FP32(a[31:0], imm8[7:0]) k[MAX:1] := 0
Floating Point Mask AVX512DQ Miscellaneous Test the lower single-precision (32-bit) floating-point element in "a" for special categories specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [fpclass_note] IF k1[0] k[0] := CheckFPClass_FP32(a[31:0], imm8[7:0]) ELSE k[0] := 0 FI k[MAX:1] := 0
Floating Point AVX512DQ Miscellaneous Copy "a" to "dst", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8". dst[511:0] := a[511:0] CASE (imm8[0]) OF 0: dst[255:0] := b[255:0] 1: dst[511:256] := b[255:0] ESAC dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[0]) OF 0: tmp[255:0] := b[255:0] 1: tmp[511:256] := b[255:0] ESAC FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 256 bits (composed of 8 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[0]) OF 0: tmp[255:0] := b[255:0] 1: tmp[511:256] := b[255:0] ESAC FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8". dst[255:0] := a[255:0] CASE imm8[0] OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] ESAC dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[255:0] := a[255:0] CASE (imm8[0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] ESAC FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[255:0] := a[255:0] CASE (imm8[0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] ESAC FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Copy "a" to "dst", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8". dst[511:0] := a[511:0] CASE imm8[1:0] OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] 2: dst[383:256] := b[127:0] 3: dst[511:384] := b[127:0] ESAC dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] 2: tmp[383:256] := b[127:0] 3: tmp[511:384] := b[127:0] ESAC FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] 2: tmp[383:256] := b[127:0] 3: tmp[511:384] := b[127:0] ESAC FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Copy "a" to "dst", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8". dst[511:0] := a[511:0] CASE imm8[0] OF 0: dst[255:0] := b[255:0] 1: dst[511:256] := b[255:0] ESAC dst[MAX:512] := 0
AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[0]) OF 0: tmp[255:0] := b[255:0] 1: tmp[511:256] := b[255:0] ESAC FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 256 bits (composed of 8 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[0]) OF 0: tmp[255:0] := b[255:0] 1: tmp[511:256] := b[255:0] ESAC FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512VL AVX512DQ Miscellaneous Copy "a" to "dst", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8". dst[255:0] := a[255:0] CASE imm8[0] OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] ESAC dst[MAX:256] := 0
AVX512VL AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[255:0] := a[255:0] CASE (imm8[0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] ESAC FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[255:0] := a[255:0] CASE (imm8[0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] ESAC FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512DQ Miscellaneous Copy "a" to "dst", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8". dst[511:0] := a[511:0] CASE imm8[1:0] OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] 2: dst[383:256] := b[127:0] 3: dst[511:384] := b[127:0] ESAC dst[MAX:512] := 0
AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] 2: tmp[383:256] := b[127:0] 3: tmp[511:384] := b[127:0] ESAC FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
AVX512DQ Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 2 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] 2: tmp[383:256] := b[127:0] 3: tmp[511:384] := b[127:0] ESAC FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] OR b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] OR b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer Mask AVX512VL AVX512DQ Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a". FOR j := 0 to 7 i := j*32 IF a[i+31] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512DQ Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a". FOR j := 0 to 15 i := j*32 IF a[i+31] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512VL AVX512DQ Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 32-bit integer in "a". FOR j := 0 to 3 i := j*32 IF a[i+31] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer AVX512VL AVX512DQ Miscellaneous Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := 0xFFFFFFFF ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512DQ Miscellaneous Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := 0xFFFFFFFF ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512DQ Miscellaneous Set each packed 32-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := 0xFFFFFFFF ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512DQ Miscellaneous Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := 0xFFFFFFFFFFFFFFFF ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512DQ Miscellaneous Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := 0xFFFFFFFFFFFFFFFF ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512DQ Miscellaneous Set each packed 64-bit integer in "dst" to all ones or all zeros based on the value of the corresponding bit in "k". FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := 0xFFFFFFFFFFFFFFFF ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer Mask AVX512VL AVX512DQ Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a". FOR j := 0 to 3 i := j*64 IF a[i+63] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512DQ Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a". FOR j := 0 to 7 i := j*64 IF a[i+63] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512DQ Miscellaneous Set each bit of mask register "k" based on the most significant bit of the corresponding packed 64-bit integer in "a". FOR j := 0 to 1 i := j*64 IF a[i+63] k[j] := 1 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer AVX512VL AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst". FOR j := 0 to 3 i := j*64 tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ENDFOR dst[MAX:256] := 0
Integer AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst". FOR j := 0 to 7 i := j*64 tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512DQ Arithmetic Multiply the packed 64-bit integers in "a" and "b", producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in "dst". FOR j := 0 to 1 i := j*64 tmp[127:0] := a[i+63:i] * b[i+63:i] dst[i+63:i] := tmp[63:0] ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 3 i := j*64 dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*64 dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*64 dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } FOR j := 0 to 1 i := j*64 dst[i+63:i] := RANGE(a[i+63:i], b[i+63:i], imm8[1:0], imm8[3:2]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 7 i := j*32 dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 15 i := j*32 dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 15 i := j*32 dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[63:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } FOR j := 0 to 3 i := j*32 dst[i+31:i] := RANGE(a[i+31:i], b[i+31:i], imm8[1:0], imm8[3:2]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } IF k[0] dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } IF k[0] dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } IF k[0] dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } IF k[0] dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[63:0], src2[63:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src1[63:0] : src2[63:0] 1: tmp[63:0] := (src1[63:0] <= src2[63:0]) ? src2[63:0] : src1[63:0] 2: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src1[63:0] : src2[63:0] 3: tmp[63:0] := (ABS(src1[63:0]) <= ABS(src2[63:0])) ? src2[63:0] : src1[63:0] ESAC CASE signSelCtl[1:0] OF 0: dst[63:0] := (src1[63] << 63) OR (tmp[62:0]) 1: dst[63:0] := tmp[63:0] 2: dst[63:0] := (0 << 63) OR (tmp[62:0]) 3: dst[63:0] := (1 << 63) OR (tmp[62:0]) ESAC RETURN dst } dst[63:0] := RANGE(a[63:0], b[63:0], imm8[1:0], imm8[3:2]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[31:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } IF k[0] dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[31:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } IF k[0] dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[31:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } IF k[0] dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[31:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } IF k[0] dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Calculate the max, min, absolute max, or absolute min (depending on control in "imm8") for the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". imm8[1:0] specifies the operation control: 00 = min, 01 = max, 10 = absolute max, 11 = absolute min. imm8[3:2] specifies the sign control: 00 = sign from a, 01 = sign from compare result, 10 = clear sign bit, 11 = set sign bit. [sae_note] DEFINE RANGE(src1[31:0], src2[31:0], opCtl[1:0], signSelCtl[1:0]) { CASE opCtl[1:0] OF 0: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src1[31:0] : src2[31:0] 1: tmp[31:0] := (src1[31:0] <= src2[31:0]) ? src2[31:0] : src1[31:0] 2: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src1[31:0] : src2[31:0] 3: tmp[31:0] := (ABS(src1[31:0]) <= ABS(src2[31:0])) ? src2[31:0] : src1[31:0] ESAC CASE signSelCtl[1:0] OF 0: dst[31:0] := (src1[31] << 31) OR (tmp[30:0]) 1: dst[31:0] := tmp[31:0] 2: dst[31:0] := (0 << 31) OR (tmp[30:0]) 3: dst[31:0] := (1 << 31) OR (tmp[30:0]) ESAC RETURN dst } dst[31:0] := RANGE(a[31:0], b[31:0], imm8[1:0], imm8[3:2]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 3 i := j*64 dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed double-precision (64-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } FOR j := 0 to 1 i := j*64 dst[i+63:i] := ReduceArgumentPD(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] RETURN tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI } FOR j := 0 to 7 i := j*32 dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Miscellaneous Extract the reduced argument of packed single-precision (32-bit) floating-point elements in "a" by the number of bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } FOR j := 0 to 3 i := j*32 dst[i+31:i] := ReduceArgumentPS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } IF k[0] dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } IF k[0] dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } IF k[0] dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } IF k[0] dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower double-precision (64-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note] DEFINE ReduceArgumentPD(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) tmp[63:0] := src1[63:0] - tmp[63:0] IF IsInf(tmp[63:0]) tmp[63:0] := FP64(0.0) FI RETURN tmp[63:0] } dst[63:0] := ReduceArgumentPD(b[63:0], imm8[7:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } IF k[0] dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } IF k[0] dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } IF k[0] dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } IF k[0] dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512DQ Miscellaneous Extract the reduced argument of the lower single-precision (32-bit) floating-point element in "b" by the number of bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note] DEFINE ReduceArgumentPS(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) tmp[31:0] := src1[31:0] - tmp[31:0] IF IsInf(tmp[31:0]) tmp[31:0] := FP32(0.0) FI RETURN tmp[31:0] } dst[31:0] := ReduceArgumentPS(b[31:0], imm8[7:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512DQ Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512DQ Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512DQ Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Mask AVX512DQ Mask Add 8-bit masks in "a" and "b", and store the result in "k". k[7:0] := a[7:0] + b[7:0] k[MAX:8] := 0
Mask AVX512DQ Mask Add 16-bit masks in "a" and "b", and store the result in "k". k[15:0] := a[15:0] + b[15:0] k[MAX:16] := 0
Mask AVX512DQ Mask Compute the bitwise AND of 8-bit masks "a" and "b", and store the result in "k". k[7:0] := a[7:0] AND b[7:0] k[MAX:8] := 0
Mask AVX512DQ Mask Compute the bitwise NOT of 8-bit masks "a" and then AND with "b", and store the result in "k". k[7:0] := (NOT a[7:0]) AND b[7:0] k[MAX:8] := 0
Mask AVX512DQ Mask Compute the bitwise NOT of 8-bit mask "a", and store the result in "k". k[7:0] := NOT a[7:0] k[MAX:8] := 0
Mask AVX512DQ Mask Compute the bitwise OR of 8-bit masks "a" and "b", and store the result in "k". k[7:0] := a[7:0] OR b[7:0] k[MAX:8] := 0
Mask AVX512DQ Mask Compute the bitwise XNOR of 8-bit masks "a" and "b", and store the result in "k". k[7:0] := NOT (a[7:0] XOR b[7:0]) k[MAX:8] := 0
Mask AVX512DQ Mask Compute the bitwise XOR of 8-bit masks "a" and "b", and store the result in "k". k[7:0] := a[7:0] XOR b[7:0] k[MAX:8] := 0
Mask AVX512DQ Mask Shift the bits of 8-bit mask "a" left by "count" while shifting in zeros, and store the least significant 8 bits of the result in "k". k[MAX:0] := 0 IF count[7:0] <= 7 k[7:0] := a[7:0] << count[7:0] FI
Mask AVX512DQ Mask Shift the bits of 8-bit mask "a" right by "count" while shifting in zeros, and store the least significant 8 bits of the result in "k". k[MAX:0] := 0 IF count[7:0] <= 7 k[7:0] := a[7:0] >> count[7:0] FI
Mask AVX512DQ Load Load 8-bit mask from memory into "k". k[7:0] := MEM[mem_addr+7:mem_addr]
Mask AVX512DQ Store Store 8-bit mask from "a" into memory. MEM[mem_addr+7:mem_addr] := a[7:0]
Mask AVX512DQ Mask Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones". tmp[7:0] := a[7:0] OR b[7:0] IF tmp[7:0] == 0x0 dst := 1 ELSE dst := 0 FI IF tmp[7:0] == 0xFF MEM[all_ones+7:all_ones] := 1 ELSE MEM[all_ones+7:all_ones] := 0 FI
Mask AVX512DQ Mask Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst". tmp[7:0] := a[7:0] OR b[7:0] IF tmp[7:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512DQ Mask Compute the bitwise OR of 8-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst". tmp[7:0] := a[7:0] OR b[7:0] IF tmp[7:0] == 0xFF dst := 1 ELSE dst := 0 FI
Mask AVX512DQ Mask Compute the bitwise AND of 8-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not". tmp1[7:0] := a[7:0] AND b[7:0] IF tmp1[7:0] == 0x0 dst := 1 ELSE dst := 0 FI tmp2[7:0] := (NOT a[7:0]) AND b[7:0] IF tmp2[7:0] == 0x0 MEM[and_not+7:and_not] := 1 ELSE MEM[and_not+7:and_not] := 0 FI
Mask AVX512DQ Mask Compute the bitwise AND of 8-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". tmp[7:0] := a[7:0] AND b[7:0] IF tmp[7:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512DQ Mask Compute the bitwise NOT of 8-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst". tmp[7:0] := (NOT a[7:0]) AND b[7:0] IF tmp[7:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512DQ Mask Compute the bitwise AND of 16-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". Compute the bitwise NOT of "a" and then AND with "b", if the result is all zeros, store 1 in "and_not", otherwise store 0 in "and_not". tmp1[15:0] := a[15:0] AND b[15:0] IF tmp1[15:0] == 0x0 dst := 1 ELSE dst := 0 FI tmp2[15:0] := (NOT a[15:0]) AND b[15:0] IF tmp2[15:0] == 0x0 MEM[and_not+7:and_not] := 1 ELSE MEM[and_not+7:and_not] := 0 FI
Mask AVX512DQ Mask Compute the bitwise AND of 16-bit masks "a" and "b", and if the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". tmp[15:0] := a[15:0] AND b[15:0] IF tmp[15:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512DQ Mask Compute the bitwise NOT of 16-bit mask "a" and then AND with "b", if the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst". tmp[15:0] := (NOT a[15:0]) AND b[15:0] IF tmp[15:0] == 0x0 dst := 1 ELSE dst := 0 FI
AVX512DQ Mask Convert 8-bit mask "a" into an integer value, and store the result in "dst". dst := ZeroExtend32(a[7:0])
AVX512DQ Mask Convert integer value "a" into an 8-bit mask, and store the result in "k". k := a[7:0]
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-23. [sae_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-23. FOR j := 0 to 15 i := j*32 dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-23. [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-23. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-23. [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-23. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-23. [sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := POW(2.0, a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-23. FOR j := 0 to 7 i := j*64 dst[i+63:i] := POW(2.0, a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-23. [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(2.0, a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-23. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(2.0, a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-23. [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(2.0, a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-23. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(2.0, a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] dst[63:0] := (1.0 / b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. dst[63:0] := (1.0 / b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] IF k[0] dst[63:0] := (1.0 / b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. IF k[0] dst[63:0] := (1.0 / b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] IF k[0] dst[63:0] := (1.0 / b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. IF k[0] dst[63:0] := (1.0 / b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst". The maximum relative error for this approximation is less than 2^-28, and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note] dst[31:0] := (1.0 / b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. dst[31:0] := (1.0 / b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] IF k[0] dst[31:0] := (1.0 / b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. IF k[0] dst[31:0] := (1.0 / b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] IF k[0] dst[31:0] := (1.0 / b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. IF k[0] dst[31:0] := (1.0 / b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := (1.0 / a[i+31:i]) ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 15 i := j*32 dst[i+31:i] := (1.0 / a[i+31:i]) ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := (1.0 / a[i+63:i]) ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 7 i := j*64 dst[i+63:i] := (1.0 / a[i+63:i]) ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] dst[63:0] := (1.0 / SQRT(b[63:0])) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. dst[63:0] := (1.0 / SQRT(b[63:0])) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] IF k[0] dst[63:0] := (1.0 / SQRT(b[63:0])) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. IF k[0] dst[63:0] := (1.0 / SQRT(b[63:0])) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] IF k[0] dst[63:0] := (1.0 / SQRT(b[63:0])) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-28. IF k[0] dst[63:0] := (1.0 / SQRT(b[63:0])) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] dst[31:0] := (1.0 / SQRT(b[31:0])) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. dst[31:0] := (1.0 / SQRT(b[31:0])) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] IF k[0] dst[31:0] := (1.0 / SQRT(b[31:0])) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. IF k[0] dst[31:0] := (1.0 / SQRT(b[31:0])) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] IF k[0] dst[31:0] := (1.0 / SQRT(b[31:0])) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-28. IF k[0] dst[31:0] := (1.0 / SQRT(b[31:0])) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", store the results in "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", store the results in "dst". The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 15 i := j*32 dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := 0 FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := 0 FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", store the results in "dst". The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", store the results in "dst". The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 7 i := j*64 dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := 0 FI ENDFOR
Floating Point AVX512ER Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-28. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := 0 FI ENDFOR
Floating Point AVX512F AVX512VL Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F AVX512VL Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F AVX512VL Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512F AVX512VL Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512F AVX512VL Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F AVX512VL Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F AVX512VL Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512F AVX512VL Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst". temp[511:256] := a[255:0] temp[255:0] := b[255:0] temp[511:0] := temp[511:0] >> (32*imm8[2:0]) dst[255:0] := temp[255:0] dst[MAX:256] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). temp[511:256] := a[255:0] temp[255:0] := b[255:0] temp[511:0] := temp[511:0] >> (32*imm8[2:0]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := temp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 32 bytes (8 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). temp[511:256] := a[255:0] temp[255:0] := b[255:0] temp[511:0] := temp[511:0] >> (32*imm8[2:0]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := temp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst". temp[255:128] := a[127:0] temp[127:0] := b[127:0] temp[255:0] := temp[255:0] >> (32*imm8[1:0]) dst[127:0] := temp[127:0] dst[MAX:128] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). temp[255:128] := a[127:0] temp[127:0] := b[127:0] temp[255:0] := temp[255:0] >> (32*imm8[1:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := temp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 16 bytes (4 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). temp[255:128] := a[127:0] temp[127:0] := b[127:0] temp[255:0] := temp[255:0] >> (32*imm8[1:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := temp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst". temp[511:256] := a[255:0] temp[255:0] := b[255:0] temp[511:0] := temp[511:0] >> (64*imm8[1:0]) dst[255:0] := temp[255:0] dst[MAX:256] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). temp[511:256] := a[255:0] temp[255:0] := b[255:0] temp[511:0] := temp[511:0] >> (64*imm8[1:0]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := temp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 64-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 32 bytes (4 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). temp[511:256] := a[255:0] temp[255:0] := b[255:0] temp[511:0] := temp[511:0] >> (64*imm8[1:0]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := temp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst". temp[255:128] := a[127:0] temp[127:0] := b[127:0] temp[255:0] := temp[255:0] >> (64*imm8[0]) dst[127:0] := temp[127:0] dst[MAX:128] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). temp[255:128] := a[127:0] temp[127:0] := b[127:0] temp[255:0] := temp[255:0] >> (64*imm8[0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := temp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F AVX512VL Miscellaneous Concatenate "a" and "b" into a 32-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 16 bytes (2 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). temp[255:128] := a[127:0] temp[127:0] := b[127:0] temp[255:0] := temp[255:0] >> (64*imm8[0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := temp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst". FOR j := 0 to 7 i := j*32 n := (j % 4)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := (j % 4)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := (j % 4)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst". FOR j := 0 to 7 i := j*32 n := (j % 4)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := (j % 4)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := (j % 4)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Mask AVX512VL AVX512F Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 3 i := j*64 k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0 ENDFOR k[MAX:4] := 0
Floating Point Mask AVX512VL AVX512F Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Floating Point Mask AVX512VL AVX512F Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Floating Point Mask AVX512VL AVX512F Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Floating Point Mask AVX512VL AVX512F Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 7 i := j*32 k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512VL AVX512F Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512VL AVX512F Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Floating Point Mask AVX512VL AVX512F Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Floating Point AVX512VL AVX512F Miscellaneous Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 64 m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[255:m] := src[255:m] dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Store Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 64 m := base_addr FOR j := 0 to 3 i := j*64 IF k[j] MEM[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR
Floating Point AVX512VL AVX512F Miscellaneous Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 64 m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[255:m] := 0 dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 64 m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[127:m] := src[127:m] dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Store Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 64 m := base_addr FOR j := 0 to 1 i := j*64 IF k[j] MEM[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR
Floating Point AVX512VL AVX512F Miscellaneous Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 64 m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[127:m] := 0 dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 32 m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[255:m] := src[255:m] dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Store Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 32 m := base_addr FOR j := 0 to 7 i := j*32 IF k[j] MEM[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR
Floating Point AVX512VL AVX512F Miscellaneous Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 32 m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[255:m] := 0 dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 32 m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[127:m] := src[127:m] dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Store Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 32 m := base_addr FOR j := 0 to 3 i := j*32 IF k[j] MEM[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR
Floating Point AVX512VL AVX512F Miscellaneous Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 32 m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[127:m] := 0 dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 m := j*64 IF k[j] dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ELSE dst[m+63:m] := src[m+63:m] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 m := j*64 IF k[j] dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ELSE dst[m+63:m] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*32 m := j*64 IF k[j] dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ELSE dst[m+63:m] := src[m+63:m] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*32 m := j*64 IF k[j] dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ELSE dst[m+63:m] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:64] := 0
Floating Point AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k]) ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point AVX512VL AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 m := j*16 IF k[j] dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 m := j*16 IF k[j] dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 m := j*16 IF k[j] dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 m := j*16 IF k[j] dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] FOR j := 0 to 7 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] FOR j := 0 to 7 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] FOR j := 0 to 7 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] FOR j := 0 to 7 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] FOR j := 0 to 3 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:64] := 0
Floating Point AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] FOR j := 0 to 3 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:64] := 0
Floating Point AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] FOR j := 0 to 3 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] FOR j := 0 to 3 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k]) ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:64] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 32*j dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*64 l := j*32 dst[i+63:i] := Convert_Int32_To_FP64(a[l+31:l]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_Int32_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := j*64 l := j*32 dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l]) ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j IF k[j] dst[i+63:i] := a[i+63:i] / b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j IF k[j] dst[i+63:i] := a[i+63:i] / b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j IF k[j] dst[i+63:i] := a[i+63:i] / b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j IF k[j] dst[i+63:i] := a[i+63:i] / b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := a[i+31:i] / b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j IF k[j] dst[i+31:i] := a[i+31:i] / b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := a[i+31:i] / b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j IF k[j] dst[i+31:i] := a[i+31:i] / b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] ESAC dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] ESAC FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] ESAC FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] ESAC dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] ESAC FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] ESAC FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN: j := 0 SNAN_TOKEN: j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 3 i := j*64 dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 1 i := j*64 dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 7 i := j*32 dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 3 i := j*32 dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:64] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 3 i := j*64 dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 1 i := j*64 dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*32 dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 3 i := j*32 dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 3 i := j*64 dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 1 i := j*64 dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 7 i := j*32 dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 3 i := j*32 dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8". dst[255:0] := a[255:0] CASE (imm8[0]) OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] ESAC dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[255:0] := a[255:0] CASE (imm8[0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] ESAC FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[255:0] := a[255:0] CASE (imm8[0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] ESAC FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Copy "a" to "dst", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8". dst[255:0] := a[255:0] CASE (imm8[0]) OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] ESAC dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[255:0] := a[255:0] CASE (imm8[0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] ESAC FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[255:0] := a[255:0] CASE (imm8[0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] ESAC FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Store Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Floating Point AVX512VL AVX512F Load Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Store Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 1 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Floating Point AVX512VL AVX512F Load Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Store Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Floating Point AVX512VL AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Store Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Floating Point AVX512VL AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[63:0] := a[63:0] tmp[127:64] := a[63:0] tmp[191:128] := a[191:128] tmp[255:192] := a[191:128] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[63:0] := a[63:0] tmp[127:64] := a[63:0] tmp[191:128] := a[191:128] tmp[255:192] := a[191:128] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[63:0] := a[63:0] tmp[127:64] := a[63:0] FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[63:0] := a[63:0] tmp[127:64] := a[63:0] FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Move Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Store Store packed 32-bit integers from "a" into memory using writemask "k". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Integer AVX512VL AVX512F Load Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Move Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Move Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Store Store packed 32-bit integers from "a" into memory using writemask "k". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Integer AVX512VL AVX512F Load Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Move Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Move Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Store Store packed 64-bit integers from "a" into memory using writemask "k". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Integer AVX512VL AVX512F Load Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Move Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Move Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Store Store packed 64-bit integers from "a" into memory using writemask "k". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 1 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Integer AVX512VL AVX512F Load Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Move Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Store Store packed 32-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Integer AVX512VL AVX512F Load Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Store Store packed 32-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Integer AVX512VL AVX512F Load Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Store Store packed 64-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Integer AVX512VL AVX512F Load Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Store Store packed 64-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 1 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Integer AVX512VL AVX512F Load Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[31:0] := a[63:32] tmp[63:32] := a[63:32] tmp[95:64] := a[127:96] tmp[127:96] := a[127:96] tmp[159:128] := a[191:160] tmp[191:160] := a[191:160] tmp[223:192] := a[255:224] tmp[255:224] := a[255:224] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[31:0] := a[63:32] tmp[63:32] := a[63:32] tmp[95:64] := a[127:96] tmp[127:96] := a[127:96] tmp[159:128] := a[191:160] tmp[191:160] := a[191:160] tmp[223:192] := a[255:224] tmp[255:224] := a[255:224] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[31:0] := a[63:32] tmp[63:32] := a[63:32] tmp[95:64] := a[127:96] tmp[127:96] := a[127:96] FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[31:0] := a[63:32] tmp[63:32] := a[63:32] tmp[95:64] := a[127:96] tmp[127:96] := a[127:96] FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[31:0] := a[31:0] tmp[63:32] := a[31:0] tmp[95:64] := a[95:64] tmp[127:96] := a[95:64] tmp[159:128] := a[159:128] tmp[191:160] := a[159:128] tmp[223:192] := a[223:192] tmp[255:224] := a[223:192] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[31:0] := a[31:0] tmp[63:32] := a[31:0] tmp[95:64] := a[95:64] tmp[127:96] := a[95:64] tmp[159:128] := a[159:128] tmp[191:160] := a[159:128] tmp[223:192] := a[223:192] tmp[255:224] := a[223:192] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[31:0] := a[31:0] tmp[63:32] := a[31:0] tmp[95:64] := a[95:64] tmp[127:96] := a[95:64] FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[31:0] := a[31:0] tmp[63:32] := a[31:0] tmp[95:64] := a[95:64] tmp[127:96] := a[95:64] FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Store Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Floating Point AVX512VL AVX512F Load Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Store Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 1 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Floating Point AVX512VL AVX512F Load Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Store Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Floating Point AVX512VL AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Store Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Floating Point AVX512VL AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). RM. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ABS(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ABS(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ABS(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ABS(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ABS(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ABS(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ABS(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ABS(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ABS(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ABS(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] :=0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] AND b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] AND b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] AND b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] AND b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] AND b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] AND b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] AND b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] AND b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Set Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Set Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Set Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Set Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Set Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Set Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Set Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Set Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*32 k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*32 k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 3 i := j*64 k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 1 i := j*64 k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer AVX512VL AVX512F Miscellaneous Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 32 m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[255:m] := src[255:m] dst[MAX:256] := 0
Integer AVX512VL AVX512F Store Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 32 m := base_addr FOR j := 0 to 7 i := j*32 IF k[j] MEM[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR
Integer AVX512VL AVX512F Miscellaneous Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 32 m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[255:m] := 0 dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 32 m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[127:m] := src[127:m] dst[MAX:128] := 0
Integer AVX512VL AVX512F Store Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 32 m := base_addr FOR j := 0 to 3 i := j*32 IF k[j] MEM[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR
Integer AVX512VL AVX512F Miscellaneous Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 32 m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[127:m] := 0 dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 64 m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[255:m] := src[255:m] dst[MAX:256] := 0
Integer AVX512VL AVX512F Store Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 64 m := base_addr FOR j := 0 to 3 i := j*64 IF k[j] MEM[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR
Integer AVX512VL AVX512F Miscellaneous Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 64 m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[255:m] := 0 dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 64 m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[127:m] := src[127:m] dst[MAX:128] := 0
Integer AVX512VL AVX512F Store Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 64 m := base_addr FOR j := 0 to 1 i := j*64 IF k[j] MEM[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR
Integer AVX512VL AVX512F Miscellaneous Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 64 m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[127:m] := 0 dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 id := idx[i+2:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 id := idx[i+2:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*32 id := idx[i+2:i]*32 dst[i+31:i] := a[id+31:id] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 off := idx[i+2:i]*32 IF k[j] dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := idx[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 off := idx[i+2:i]*32 IF k[j] dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 off := idx[i+2:i]*32 IF k[j] dst[i+31:i] := (idx[i+3]) ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*32 off := idx[i+2:i]*32 dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 off := idx[i+1:i]*32 IF k[j] dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := idx[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 off := idx[i+1:i]*32 IF k[j] dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 off := idx[i+1:i]*32 IF k[j] dst[i+31:i] := (idx[i+2]) ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 3 i := j*32 off := idx[i+1:i]*32 dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off] ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 off := idx[i+1:i]*64 IF k[j] dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := idx[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 off := idx[i+1:i]*64 IF k[j] dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 off := idx[i+1:i]*64 IF k[j] dst[i+63:i] := (idx[i+2]) ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 3 i := j*64 off := idx[i+1:i]*64 dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off] ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set) FOR j := 0 to 1 i := j*64 off := idx[i]*64 IF k[j] dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := idx[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 off := idx[i]*64 IF k[j] dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 off := idx[i]*64 IF k[j] dst[i+63:i] := (idx[i+1]) ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 1 i := j*64 off := idx[i]*64 dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off] ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 off := idx[i+2:i]*32 IF k[j] dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := idx[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 off := idx[i+2:i]*32 IF k[j] dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 off := idx[i+2:i]*32 IF k[j] dst[i+31:i] := (idx[i+3]) ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*32 off := idx[i+2:i]*32 dst[i+31:i] := idx[i+3] ? b[off+31:off] : a[off+31:off] ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 off := idx[i+1:i]*32 IF k[j] dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := idx[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 off := idx[i+1:i]*32 IF k[j] dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 off := idx[i+1:i]*32 IF k[j] dst[i+31:i] := (idx[i+2]) ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 3 i := j*32 off := idx[i+1:i]*32 dst[i+31:i] := idx[i+2] ? b[off+31:off] : a[off+31:off] ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 off := idx[i+1:i]*64 IF k[j] dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := idx[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 off := idx[i+1:i]*64 IF k[j] dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 off := idx[i+1:i]*64 IF k[j] dst[i+63:i] := (idx[i+2]) ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 3 i := j*64 off := idx[i+1:i]*64 dst[i+63:i] := idx[i+2] ? b[off+63:off] : a[off+63:off] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 off := idx[i]*64 IF k[j] dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := idx[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 off := idx[i]*64 IF k[j] dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 off := idx[i]*64 IF k[j] dst[i+63:i] := (idx[i+1]) ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 1 i := j*64 off := idx[i]*64 dst[i+63:i] := idx[i+1] ? b[off+63:off] : a[off+63:off] ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], b[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], b[33:32]) tmp_dst[95:64] := SELECT4(a[127:0], b[65:64]) tmp_dst[127:96] := SELECT4(a[127:0], b[97:96]) tmp_dst[159:128] := SELECT4(a[255:128], b[129:128]) tmp_dst[191:160] := SELECT4(a[255:128], b[161:160]) tmp_dst[223:192] := SELECT4(a[255:128], b[193:192]) tmp_dst[255:224] := SELECT4(a[255:128], b[225:224]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], b[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], b[33:32]) tmp_dst[95:64] := SELECT4(a[127:0], b[65:64]) tmp_dst[127:96] := SELECT4(a[127:0], b[97:96]) tmp_dst[159:128] := SELECT4(a[255:128], b[129:128]) tmp_dst[191:160] := SELECT4(a[255:128], b[161:160]) tmp_dst[223:192] := SELECT4(a[255:128], b[193:192]) tmp_dst[255:224] := SELECT4(a[255:128], b[225:224]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], b[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], b[33:32]) tmp_dst[95:64] := SELECT4(a[127:0], b[65:64]) tmp_dst[127:96] := SELECT4(a[127:0], b[97:96]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], b[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], b[33:32]) tmp_dst[95:64] := SELECT4(a[127:0], b[65:64]) tmp_dst[127:96] := SELECT4(a[127:0], b[97:96]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0]) tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2]) tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4]) tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 id := idx[i+1:i]*64 IF k[j] dst[i+63:i] := a[id+63:id] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0]) tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2]) tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4]) tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 id := idx[i+1:i]*64 IF k[j] dst[i+63:i] := a[id+63:id] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } dst[63:0] := SELECT4(a[255:0], imm8[1:0]) dst[127:64] := SELECT4(a[255:0], imm8[3:2]) dst[191:128] := SELECT4(a[255:0], imm8[5:4]) dst[255:192] := SELECT4(a[255:0], imm8[7:6]) dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 3 i := j*64 id := idx[i+1:i]*64 dst[i+63:i] := a[id+63:id] ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 id := idx[i+2:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 id := idx[i+2:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx". FOR j := 0 to 7 i := j*32 id := idx[i+2:i]*32 dst[i+31:i] := a[id+31:id] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" across lanes lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0]) tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2]) tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4]) tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 id := idx[i+1:i]*64 IF k[j] dst[i+63:i] := a[id+63:id] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0]) tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2]) tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4]) tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 id := idx[i+1:i]*64 IF k[j] dst[i+63:i] := a[id+63:id] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" across lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } dst[63:0] := SELECT4(a[255:0], imm8[1:0]) dst[127:64] := SELECT4(a[255:0], imm8[3:2]) dst[191:128] := SELECT4(a[255:0], imm8[5:4]) dst[255:192] := SELECT4(a[255:0], imm8[7:6]) dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 3 i := j*64 id := idx[i+1:i]*64 dst[i+63:i] := a[id+63:id] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Load Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 8*j dst[k+7:k] := Truncate8(a[i+31:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 8*j dst[k+7:k] := Truncate8(a[i+31:i]) ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 16*j dst[k+15:k] := Truncate16(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 16*j dst[k+15:k] := Truncate16(a[i+31:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 8*j dst[k+7:k] := Truncate8(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 8*j dst[k+7:k] := Truncate8(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 32*j dst[k+31:k] := Truncate32(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Truncate32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Truncate32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 32*j dst[k+31:k] := Truncate32(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Truncate32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Truncate32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 16*j dst[k+15:k] := Truncate16(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 16*j dst[k+15:k] := Truncate16(a[i+63:i]) ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 8*j dst[k+7:k] := Saturate8(a[i+31:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 8*j dst[k+7:k] := Saturate8(a[i+31:i]) ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 16*j dst[k+15:k] := Saturate16(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 16*j dst[k+15:k] := Saturate16(a[i+31:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 8*j dst[k+7:k] := Saturate8(a[i+63:i]) ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 8*j dst[k+7:k] := Saturate8(a[i+63:i]) ENDFOR dst[MAX:16] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:16] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:16] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 32*j dst[k+31:k] := Saturate32(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Saturate32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Saturate32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 32*j dst[k+31:k] := Saturate32(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Saturate32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Saturate32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 16*j dst[k+15:k] := Saturate16(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 16*j dst[k+15:k] := Saturate16(a[i+63:i]) ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[i+31:i] := SignExtend32(a[l+7:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[i+31:i] := SignExtend32(a[l+7:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[i+31:i] := SignExtend32(a[l+7:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[i+31:i] := SignExtend32(a[l+7:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[i+63:i] := SignExtend64(a[l+7:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[i+63:i] := SignExtend64(a[l+7:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[i+63:i] := SignExtend64(a[l+7:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[i+63:i] := SignExtend64(a[l+7:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[i+63:i] := SignExtend64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[i+63:i] := SignExtend64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[i+63:i] := SignExtend64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[i+63:i] := SignExtend64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 l := j*16 IF k[j] dst[i+31:i] := SignExtend32(a[l+15:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[i+31:i] := SignExtend32(a[l+15:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 l := j*16 IF k[j] dst[i+31:i] := SignExtend32(a[l+15:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[i+31:i] := SignExtend32(a[l+15:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[i+63:i] := SignExtend64(a[l+15:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[i+63:i] := SignExtend64(a[l+15:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[i+63:i] := SignExtend64(a[l+15:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Sign extend packed 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[i+63:i] := SignExtend64(a[l+15:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 8*j dst[k+7:k] := SaturateU8(a[i+31:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 8*j dst[k+7:k] := SaturateU8(a[i+31:i]) ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 16*j dst[k+15:k] := SaturateU16(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 16*j dst[k+15:k] := SaturateU16(a[i+31:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 8*j dst[k+7:k] := SaturateU8(a[i+63:i]) ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 8*j dst[k+7:k] := SaturateU8(a[i+63:i]) ENDFOR dst[MAX:16] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:16] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:16] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 32*j dst[k+31:k] := SaturateU32(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[l+31:l] := SaturateU32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[l+31:l] := SaturateU32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 32*j dst[k+31:k] := SaturateU32(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[l+31:l] := SaturateU32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[l+31:l] := SaturateU32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 3 i := 64*j k := 16*j dst[k+15:k] := SaturateU16(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 16*j dst[k+15:k] := SaturateU16(a[i+63:i]) ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i]) FI ENDFOR
Integer AVX512VL AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:32] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+7:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 8*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+7:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+7:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 8-bit integers in th elow 4 bytes of "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 8*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+7:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+7:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 8*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+7:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+7:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 2 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 8*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+7:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 32*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 32*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+15:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 16*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+15:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+15:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 32*j l := 16*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+15:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+15:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 16-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := 64*j l := 16*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+15:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+15:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Convert Zero extend packed unsigned 16-bit integers in the low 4 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := 64*j l := 16*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+15:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Arithmetic Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst". DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 7 i := j*32 dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst". DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 3 i := j*32 dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst". DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 3 i := j*64 dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst". DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 1 i := j*64 dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 7 i := j*32 dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 3 i := j*32 dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 3 i := j*64 dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 1 i := j*64 dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst". DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 7 i := j*32 dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst". DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 3 i := j*32 dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst". DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 3 i := j*64 dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst". DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 1 i := j*64 dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 7 i := j*32 dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 3 i := j*32 dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 3 i := j*64 dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 1 i := j*64 dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Store Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Integer AVX512VL AVX512F Store Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Integer AVX512VL AVX512F Store Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Integer AVX512VL AVX512F Store Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Integer AVX512VL AVX512F Store Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Integer AVX512VL AVX512F Store Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Integer AVX512VL AVX512F Store Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Integer AVX512VL AVX512F Store Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Integer AVX512VL AVX512F Store Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Integer AVX512VL AVX512F Store Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Integer AVX512VL AVX512F Store Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Integer AVX512VL AVX512F Store Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Integer AVX512VL AVX512F Store Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Integer AVX512VL AVX512F Store Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Integer AVX512VL AVX512F Store Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Integer AVX512VL AVX512F Store Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "src", "a", and "b" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] FOR h := 0 to 31 index[2:0] := (src[i+h] << 2) OR (a[i+h] << 1) OR b[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] FOR h := 0 to 31 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst". FOR j := 0 to 7 i := j*32 FOR h := 0 to 31 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "src", "a", and "b" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] FOR h := 0 to 31 index[2:0] := (src[i+h] << 2) OR (a[i+h] << 1) OR b[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] FOR h := 0 to 31 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst". FOR j := 0 to 3 i := j*32 FOR h := 0 to 31 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "src", "a", and "b" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] FOR h := 0 to 63 index[2:0] := (src[i+h] << 2) OR (a[i+h] << 1) OR b[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] FOR h := 0 to 63 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst". FOR j := 0 to 3 i := j*64 FOR h := 0 to 63 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "src", "a", and "b" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] FOR h := 0 to 63 index[2:0] := (src[i+h] << 2) OR (a[i+h] << 1) OR b[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] FOR h := 0 to 63 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst". FOR j := 0 to 1 i := j*64 FOR h := 0 to 63 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ENDFOR dst[MAX:128] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 7 i := j*32 k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 3 i := j*32 k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 3 i := j*64 k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 1 i := j*64 k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 7 i := j*32 IF k1[j] k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 7 i := j*32 k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 3 i := j*32 IF k1[j] k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 3 i := j*32 k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 3 i := j*64 IF k1[j] k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 3 i := j*64 k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0 ENDFOR k[MAX:4] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 1 i := j*64 IF k1[j] k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:2] := 0
Integer Mask AVX512VL AVX512F Compare Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 1 i := j*64 k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0 ENDFOR k[MAX:2] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Miscellaneous Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*64 dst[i+63:i] := (1.0 / a[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 1 i := j*64 dst[i+63:i] := (1.0 / a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*32 dst[i+31:i] := (1.0 / a[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*32 dst[i+31:i] := (1.0 / a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 3 i := j*64 dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 1 i := j*64 dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 7 i := j*32 dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 3 i := j*32 dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 3 i := j*64 dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 1 i := j*64 dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 7 i := j*32 dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 3 i := j*32 dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Floating Point AVX512VL AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Floating Point AVX512VL AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Floating Point AVX512VL AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Floating Point AVX512VL AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Floating Point AVX512VL AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Floating Point AVX512VL AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Floating Point AVX512VL AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Floating Point AVX512VL AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Floating Point AVX512VL AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Floating Point AVX512VL AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Floating Point AVX512VL AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Floating Point AVX512VL AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Floating Point AVX512VL AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 3 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Floating Point AVX512VL AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Floating Point AVX512VL AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 1 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Floating Point AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst.m128[0] := a.m128[imm8[0]] tmp_dst.m128[1] := b.m128[imm8[1]] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst.m128[0] := a.m128[imm8[0]] tmp_dst.m128[1] := b.m128[imm8[1]] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst". dst.m128[0] := a.m128[imm8[0]] dst.m128[1] := b.m128[imm8[1]] dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst.m128[0] := a.m128[imm8[0]] tmp_dst.m128[1] := b.m128[imm8[1]] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst.m128[0] := a.m128[imm8[0]] tmp_dst.m128[1] := b.m128[imm8[1]] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst". dst.m128[0] := a.m128[imm8[0]] dst.m128[1] := b.m128[imm8[1]] dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst.m128[0] := a.m128[imm8[0]] tmp_dst.m128[1] := b.m128[imm8[1]] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst.m128[0] := a.m128[imm8[0]] tmp_dst.m128[1] := b.m128[imm8[1]] FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst". dst.m128[0] := a.m128[imm8[0]] dst.m128[1] := b.m128[imm8[1]] dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst.m128[0] := a.m128[imm8[0]] tmp_dst.m128[1] := b.m128[imm8[1]] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst.m128[0] := a.m128[imm8[0]] tmp_dst.m128[1] := b.m128[imm8[1]] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512VL AVX512F Miscellaneous Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst". dst.m128[0] := a.m128[imm8[0]] dst.m128[1] := b.m128[imm8[1]] dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64] tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192] tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64] tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192] tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192] FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64] FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64] FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := SQRT(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := SQRT(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := SQRT(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := SQRT(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := SQRT(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := SQRT(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := SQRT(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := SQRT(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave single-precision (32-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave single-precision (32-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512VL AVX512F Miscellaneous Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Store Store 512-bits (composed of 8 packed 64-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+511:mem_addr] := a[511:0]
Integer AVX512F Store Store 512-bits (composed of 16 packed 32-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+511:mem_addr] := a[511:0]
Integer AVX512VL AVX512F Store Store 256-bits (composed of 4 packed 64-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+255:mem_addr] := a[255:0]
Integer AVX512VL AVX512F Store Store 256-bits (composed of 8 packed 32-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+255:mem_addr] := a[255:0]
Integer AVX512VL AVX512F Store Store 128-bits (composed of 2 packed 64-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer AVX512VL AVX512F Store Store 128-bits (composed of 4 packed 32-bit integers) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer AVX512VL AVX512F Store Store 256-bits (composed of 4 packed 64-bit integers) from "a" into memory. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. MEM[mem_addr+255:mem_addr] := a[255:0]
Integer AVX512VL AVX512F Store Store 256-bits (composed of 8 packed 32-bit integers) from "a" into memory. "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. MEM[mem_addr+255:mem_addr] := a[255:0]
Integer AVX512VL AVX512F Store Store 128-bits (composed of 2 packed 64-bit integers) from "a" into memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer AVX512VL AVX512F Store Store 128-bits (composed of 4 packed 32-bit integers) from "a" into memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer AVX512F Load Load 512-bits (composed of 8 packed 64-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512F Load Load 512-bits (composed of 16 packed 32-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512VL AVX512F Load Load 256-bits (composed of 4 packed 64-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load 256-bits (composed of 8 packed 32-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load 128-bits (composed of 2 packed 64-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[127:0] := MEM[mem_addr+127:mem_addr] dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load 128-bits (composed of 4 packed 32-bit integers) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[127:0] := MEM[mem_addr+127:mem_addr] dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load 256-bits (composed of 4 packed 64-bit integers) from memory into "dst". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load 256-bits (composed of 8 packed 32-bit integers) from memory into "dst". "mem_addr" must be aligned on a 32-byte boundary or a general-protection exception may be generated. dst[255:0] := MEM[mem_addr+255:mem_addr] dst[MAX:256] := 0
Integer AVX512VL AVX512F Load Load 128-bits (composed of 2 packed 64-bit integers) from memory into "dst". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. dst[127:0] := MEM[mem_addr+127:mem_addr] dst[MAX:128] := 0
Integer AVX512VL AVX512F Load Load 128-bits (composed of 4 packed 32-bit integers) from memory into "dst". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. dst[127:0] := MEM[mem_addr+127:mem_addr] dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := a[i+63:i] OR b[i+63:i] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := a[i+31:i] OR b[i+31:i] ENDFOR dst[MAX:256] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] OR b[i+63:i] ENDFOR dst[MAX:128] := 0
Integer AVX512VL AVX512F Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] OR b[i+31:i] ENDFOR dst[MAX:128] := 0
Integer AVX512F VAES Cryptography Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"." FOR j := 0 to 3 i := j*128 a[i+127:i] := ShiftRows(a[i+127:i]) a[i+127:i] := SubBytes(a[i+127:i]) dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F VAES Cryptography Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"." FOR j := 0 to 3 i := j*128 a[i+127:i] := ShiftRows(a[i+127:i]) a[i+127:i] := SubBytes(a[i+127:i]) a[i+127:i] := MixColumns(a[i+127:i]) dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F VAES Cryptography Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst". FOR j := 0 to 3 i := j*128 a[i+127:i] := InvShiftRows(a[i+127:i]) a[i+127:i] := InvSubBytes(a[i+127:i]) dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F VAES Cryptography Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst". FOR j := 0 to 3 i := j*128 a[i+127:i] := InvShiftRows(a[i+127:i]) a[i+127:i] := InvSubBytes(a[i+127:i]) a[i+127:i] := InvMixColumns(a[i+127:i]) dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i] ENDFOR dst[MAX:512] := 0
Mask AVX512F Mask Compute the bitwise AND of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] AND b[15:0] k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise NOT of 16-bit masks "a" and then AND with "b", and store the result in "k". k[15:0] := (NOT a[15:0]) AND b[15:0] k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise NOT of 16-bit mask "a", and store the result in "k". k[15:0] := NOT a[15:0] k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise OR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] OR b[15:0] k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise XNOR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := NOT (a[15:0] XOR b[15:0]) k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise XOR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] XOR b[15:0] k[MAX:16] := 0
Mask AVX512F Mask Shift the bits of 16-bit mask "a" left by "count" while shifting in zeros, and store the least significant 16 bits of the result in "k". k[MAX:0] := 0 IF count[7:0] <= 15 k[15:0] := a[15:0] << count[7:0] FI
Mask AVX512F Mask Shift the bits of 16-bit mask "a" right by "count" while shifting in zeros, and store the least significant 16 bits of the result in "k". k[MAX:0] := 0 IF count[7:0] <= 15 k[15:0] := a[15:0] >> count[7:0] FI
Mask AVX512F Load Load 16-bit mask from memory into "k". k[15:0] := MEM[mem_addr+15:mem_addr]
Mask AVX512F Store Store 16-bit mask from "a" into memory. MEM[mem_addr+15:mem_addr] := a[15:0]
Mask AVX512F Mask Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all zeros, store 1 in "dst", otherwise store 0 in "dst". If the result is all ones, store 1 in "all_ones", otherwise store 0 in "all_ones". tmp[15:0] := a[15:0] OR b[15:0] IF tmp[15:0] == 0x0 dst := 1 ELSE dst := 0 FI IF tmp[15:0] == 0xFFFF MEM[all_ones+7:all_ones] := 1 ELSE MEM[all_ones+7:all_ones] := 0 FI
Mask AVX512F Mask Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all zeroes, store 1 in "dst", otherwise store 0 in "dst". tmp[15:0] := a[15:0] OR b[15:0] IF tmp[15:0] == 0x0 dst := 1 ELSE dst := 0 FI
Mask AVX512F Mask Compute the bitwise OR of 16-bit masks "a" and "b". If the result is all ones, store 1 in "dst", otherwise store 0 in "dst". tmp[15:0] := a[15:0] OR b[15:0] IF tmp[15:0] == 0xFFFF dst := 1 ELSE dst := 0 FI
AVX512F Mask Convert 16-bit mask "a" into an integer value, and store the result in "dst". dst := ZeroExtend32(a[15:0])
AVX512F Mask Convert integer value "a" into an 16-bit mask, and store the result in "k". k := ZeroExtend16(a[15:0])
Mask AVX512F Mask Compute the bitwise NOT of 16-bit masks "a" and then AND with "b", and store the result in "k". k[15:0] := (NOT a[15:0]) AND b[15:0] k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise AND of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] AND b[15:0] k[MAX:16] := 0
Mask AVX512F Mask Copy 16-bit mask "a" to "k". k[15:0] := a[15:0] k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise NOT of 16-bit mask "a", and store the result in "k". k[15:0] := NOT a[15:0] k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise OR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] OR b[15:0] k[MAX:16] := 0
Mask AVX512F Mask Unpack and interleave 8 bits from masks "a" and "b", and store the 16-bit result in "k". k[7:0] := b[7:0] k[15:8] := a[7:0] k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise XNOR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := NOT (a[15:0] XOR b[15:0]) k[MAX:16] := 0
Mask AVX512F Mask Compute the bitwise XOR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] XOR b[15:0] k[MAX:16] := 0
Floating Point AVX512F Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := a[63:0] + b[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := a[63:0] + b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := a[63:0] + b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := a[63:0] + b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := a[63:0] + b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := a[31:0] + b[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := a[31:0] + b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := a[31:0] + b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := a[31:0] + b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := a[31:0] + b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Integer AVX512F Miscellaneous Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and stores the low 64 bytes (16 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). temp[1023:512] := a[511:0] temp[511:0] := b[511:0] temp[1023:0] := temp[1023:0] >> (32*imm8[3:0]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := temp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Miscellaneous Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 64 bytes (8 elements) in "dst". temp[1023:512] := a[511:0] temp[511:0] := b[511:0] temp[1023:0] := temp[1023:0] >> (64*imm8[2:0]) dst[511:0] := temp[511:0] dst[MAX:512] := 0
Integer AVX512F Miscellaneous Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and store the low 64 bytes (8 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). temp[1023:512] := a[511:0] temp[511:0] := b[511:0] temp[1023:0] := temp[1023:0] >> (64*imm8[2:0]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := temp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Miscellaneous Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 64-bit elements, and stores the low 64 bytes (8 elements) in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). temp[1023:512] := a[511:0] temp[511:0] := b[511:0] temp[1023:0] := temp[1023:0] >> (64*imm8[2:0]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := temp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 n := (j % 4)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 4)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the 4 packed single-precision (32-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 4)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst". FOR j := 0 to 7 i := j*64 n := (j % 4)*64 dst[i+63:i] := a[n+63:n] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 n := (j % 4)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the 4 packed double-precision (64-bit) floating-point elements from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 n := (j % 4)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 n := (j % 4)*32 dst[i+31:i] := a[n+31:n] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 4)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the 4 packed 32-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 n := (j % 4)*32 IF k[j] dst[i+31:i] := a[n+31:n] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst". FOR j := 0 to 7 i := j*64 n := (j % 4)*64 dst[i+63:i] := a[n+63:n] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 n := (j % 4)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the 4 packed 64-bit integers from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 n := (j % 4)*64 IF k[j] dst[i+63:i] := a[n+63:n] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Mask AVX512F Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k". [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0 k[MAX:1] := 0
Floating Point Mask AVX512F Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0 k[MAX:1] := 0
Floating Point Mask AVX512F Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC IF k1[0] k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0 ELSE k[0] := 0 FI k[MAX:1] := 0
Floating Point Mask AVX512F Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC IF k1[0] k[0] := ( a[63:0] OP b[63:0] ) ? 1 : 0 ELSE k[0] := 0 FI k[MAX:1] := 0
Floating Point Mask AVX512F Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k". [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0 k[MAX:1] := 0
Floating Point Mask AVX512F Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0 k[MAX:1] := 0
Floating Point Mask AVX512F Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC IF k1[0] k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0 ELSE k[0] := 0 FI k[MAX:1] := 0
Floating Point Mask AVX512F Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and store the result in mask vector "k" using zeromask "k1" (the element is zeroed out when mask bit 0 is not set). CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC IF k1[0] k[0] := ( a[31:0] OP b[31:0] ) ? 1 : 0 ELSE k[0] := 0 FI k[MAX:1] := 0
Floating Point Flag AVX512F Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1). [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC RETURN ( a[63:0] OP b[63:0] ) ? 1 : 0
Floating Point Flag AVX512F Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" based on the comparison operand specified by "imm8", and return the boolean result (0 or 1). [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC RETURN ( a[31:0] OP b[31:0] ) ? 1 : 0
Floating Point AVX512F Swizzle Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 64 m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[511:m] := src[511:m] dst[MAX:512] := 0
Floating Point AVX512F Store Swizzle Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 64 m := base_addr FOR j := 0 to 7 i := j*64 IF k[j] MEM[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR
Floating Point AVX512F Swizzle Contiguously store the active double-precision (64-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 64 m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[511:m] := 0 dst[MAX:512] := 0
Floating Point AVX512F Swizzle Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 32 m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[511:m] := src[511:m] dst[MAX:512] := 0
Floating Point AVX512F Store Swizzle Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 32 m := base_addr FOR j := 0 to 15 i := j*32 IF k[j] MEM[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR
Floating Point AVX512F Swizzle Contiguously store the active single-precision (32-bit) floating-point elements in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 32 m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[511:m] := 0 dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := j*32 m := j*64 dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ELSE dst[m+63:m] := src[m+63:m] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ELSE dst[m+63:m] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [round_note] FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". [round_note] FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [round_note] FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_FP32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". [round_note] FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_UInt32(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [sae_note] FOR j := 0 to 15 i := j*32 m := j*16 dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 15 i := j*32 m := j*16 dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := j*32 m := j*16 IF k[j] dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 m := j*16 IF k[j] dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := j*32 m := j*16 IF k[j] dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 m := j*16 IF k[j] dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". [round_note] FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". [sae_note] FOR j := 0 to 7 i := 64*j k := 32*j dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 32*j dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[i+63:i] := Convert_FP32_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". [sae_note] FOR j := 0 to 15 i := 16*j l := 32*j dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". [sae_note] FOR j := 0 to 15 i := 16*j l := 32*j dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := 16*j l := 32*j IF k[j] dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". [round_note] FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". [round_note] dst[31:0] := Convert_FP64_To_Int32(a[63:0])
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". [round_note] dst[63:0] := Convert_FP64_To_Int64(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". [round_note] dst[31:0] := Convert_FP64_To_Int32(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". [round_note] dst[63:0] := Convert_FP64_To_Int64(a[63:0])
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". dst[31:0] := Convert_FP64_To_Int32(a[63:0])
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". dst[63:0] := Convert_FP64_To_Int64(a[63:0])
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := Convert_FP64_To_FP32(b[63:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := Convert_FP64_To_FP32(b[63:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := Convert_FP64_To_FP32(b[63:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := Convert_FP64_To_FP32(b[63:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := Convert_FP64_To_FP32(b[63:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst". [round_note] dst[31:0] := Convert_FP64_To_UInt32(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst". [round_note] dst[63:0] := Convert_FP64_To_UInt64(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst". dst[31:0] := Convert_FP64_To_UInt32(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst". dst[63:0] := Convert_FP64_To_UInt64(a[63:0])
Floating Point AVX512F Convert Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := Convert_Int64_To_FP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := Convert_Int64_To_FP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the signed 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := Convert_Int32_To_FP64(b[31:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := Convert_Int64_To_FP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := Convert_Int64_To_FP32(b[63:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := Convert_Int64_To_FP32(b[63:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int64_To_FP32(b[63:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [sae_note] dst[63:0] := Convert_FP32_To_FP64(b[31:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note] IF k[0] dst[63:0] := Convert_FP32_To_FP64(b[31:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := Convert_FP32_To_FP64(b[31:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note] IF k[0] dst[63:0] := Convert_FP32_To_FP64(b[31:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := Convert_FP32_To_FP64(b[31:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". [round_note] dst[31:0] := Convert_FP32_To_Int32(a[31:0])
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". [round_note] dst[63:0] := Convert_FP32_To_Int64(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". [round_note] dst[31:0] := Convert_FP32_To_Int32(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". [round_note] dst[63:0] := Convert_FP32_To_Int64(a[31:0])
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". dst[31:0] := Convert_FP32_To_Int32(a[31:0])
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". dst[63:0] := Convert_FP32_To_Int64(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst". [round_note] dst[31:0] := Convert_FP32_To_UInt32(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst". [round_note] dst[63:0] := Convert_FP32_To_UInt64(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer, and store the result in "dst". dst[31:0] := Convert_FP32_To_UInt32(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer, and store the result in "dst". dst[63:0] := Convert_FP32_To_UInt64(a[31:0])
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". [sae_note] FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". [sae_note] FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[k+63:k]) ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 32*j l := 64*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[l+63:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". [sae_note] FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". [sae_note] FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP32_To_UInt32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed double-precision (32-bit) floating-point elements in "a" to packed unsigned 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_FP64_To_UInt32_Truncate(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". [sae_note] dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". [sae_note] dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". [sae_note] dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". [sae_note] dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
Floating Point AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst". [sae_note] dst[31:0] := Convert_FP64_To_UInt32_Truncate(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst". [sae_note] dst[63:0] := Convert_FP64_To_UInt64_Truncate(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst". dst[31:0] := Convert_FP64_To_UInt32_Truncate(a[63:0])
Floating Point Integer AVX512F Convert Convert the lower double-precision (64-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst". dst[63:0] := Convert_FP64_To_UInt64_Truncate(a[63:0])
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". [sae_note] dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". [sae_note] dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". [sae_note] dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". [sae_note] dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
Floating Point AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst". [sae_note] dst[31:0] := Convert_FP32_To_UInt32_Truncate(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst". [sae_note] dst[63:0] := Convert_FP32_To_UInt64_Truncate(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 32-bit integer with truncation, and store the result in "dst". dst[31:0] := Convert_FP32_To_UInt32_Truncate(a[31:0])
Floating Point Integer AVX512F Convert Convert the lower single-precision (32-bit) floating-point element in "a" to an unsigned 64-bit integer with truncation, and store the result in "dst". dst[63:0] := Convert_FP32_To_UInt64_Truncate(a[31:0])
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := j*64 l := j*32 dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[i+63:i] := Convert_Int64_To_FP64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". [round_note] FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F Convert Convert the unsigned 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := Convert_Int64_To_FP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the unsigned 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := Convert_Int32_To_FP64(b[31:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the unsigned 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := Convert_Int64_To_FP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the unsigned 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the unsigned 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := Convert_Int64_To_FP32(b[63:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the unsigned 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point Integer AVX512F Convert Convert the unsigned 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int64_To_FP32(b[63:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". FOR j := 0 to 7 i := 64*j dst[i+63:i] := a[i+63:i] / b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", =and store the results in "dst". [round_note] FOR j := 0 to 7 i := 64*j dst[i+63:i] := a[i+63:i] / b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j IF k[j] dst[i+63:i] := a[i+63:i] / b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := 64*j IF k[j] dst[i+63:i] := a[i+63:i] / b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j IF k[j] dst[i+63:i] := a[i+63:i] / b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := 64*j IF k[j] dst[i+63:i] := a[i+63:i] / b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := a[i+31:i] / b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". [round_note] FOR j := 0 to 15 i := 32*j dst[i+31:i] := a[i+31:i] / b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := a[i+31:i] / b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := a[i+31:i] / b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := a[i+31:i] / b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := a[i+31:i] / b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := a[63:0] / b[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := a[63:0] / b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := a[63:0] / b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := a[63:0] / b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := a[63:0] / b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := a[31:0] / b[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := a[31:0] / b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := a[31:0] / b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := a[31:0] / b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := a[31:0] / b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Swizzle Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Swizzle Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Load contiguous active double-precision (64-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Swizzle Load contiguous active double-precision (64-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Swizzle Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Load contiguous active single-precision (32-bit) floating-point elements from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Swizzle Load contiguous active single-precision (32-bit) floating-point elements from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst". CASE imm8[1:0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] 2: dst[127:0] := a[383:256] 3: dst[127:0] := a[511:384] ESAC dst[MAX:128] := 0
Floating Point AVX512F Swizzle Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[1:0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] 2: tmp[127:0] := a[383:256] 3: tmp[127:0] := a[511:384] ESAC FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512F Swizzle Extract 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[1:0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] 2: tmp[127:0] := a[383:256] 3: tmp[127:0] := a[511:384] ESAC FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512F Swizzle Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[255:0] := a[255:0] 1: dst[255:0] := a[511:256] ESAC dst[MAX:256] := 0
Floating Point AVX512F Swizzle Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[255:0] := a[255:0] 1: tmp[255:0] := a[511:256] ESAC FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Swizzle Extract 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[255:0] := a[255:0] 1: tmp[255:0] := a[511:256] ESAC FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Swizzle Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the result in "dst". CASE imm8[1:0] OF 0: dst[127:0] := a[127:0] 1: dst[127:0] := a[255:128] 2: dst[127:0] := a[383:256] 3: dst[127:0] := a[511:384] ESAC dst[MAX:128] := 0
Integer AVX512F Swizzle Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[1:0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] 2: tmp[127:0] := a[383:256] 3: tmp[127:0] := a[511:384] ESAC FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Swizzle Extract 128 bits (composed of 4 packed 32-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[1:0] OF 0: tmp[127:0] := a[127:0] 1: tmp[127:0] := a[255:128] 2: tmp[127:0] := a[383:256] 3: tmp[127:0] := a[511:384] ESAC FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Swizzle Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the result in "dst". CASE imm8[0] OF 0: dst[255:0] := a[255:0] 1: dst[255:0] := a[511:256] ESAC dst[MAX:256] := 0
Integer AVX512F Swizzle Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[255:0] := a[255:0] 1: tmp[255:0] := a[511:256] ESAC FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Swizzle Extract 256 bits (composed of 4 packed 64-bit integers) from "a", selected with "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). CASE imm8[0] OF 0: tmp[255:0] := a[255:0] 1: tmp[255:0] := a[511:256] ESAC FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 7 i := j*64 dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 7 i := j*64 dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed double-precision (64-bit) floating-point elements in "a" and "b" using packed 64-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := FIXUPIMMPD(a[i+63:i], b[i+63:i], c[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 15 i := j*32 dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst". "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 15 i := j*32 dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up packed single-precision (32-bit) floating-point elements in "a" and "b" using packed 32-bit integers in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FIXUPIMMPD(a[i+31:i], b[i+31:i], c[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } IF k[0] dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0]) ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } IF k[0] dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0]) ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } IF k[0] dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower double-precision (64-bit) floating-point elements in "a" and "b" using the lower 64-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[63:0], src2[63:0], src3[63:0], imm8[7:0]) { tsrc[63:0] := ((src2[62:52] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[63:0] CASE(tsrc[63:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[63:0] := src1[63:0] 1 : dest[63:0] := tsrc[63:0] 2 : dest[63:0] := QNaN(tsrc[63:0]) 3 : dest[63:0] := QNAN_Indefinite 4 : dest[63:0] := -INF 5 : dest[63:0] := +INF 6 : dest[63:0] := tsrc.sign? -INF : +INF 7 : dest[63:0] := -0 8 : dest[63:0] := +0 9 : dest[63:0] := -1 10: dest[63:0] := +1 11: dest[63:0] := 1/2 12: dest[63:0] := 90.0 13: dest[63:0] := PI/2 14: dest[63:0] := MAX_FLOAT 15: dest[63:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[63:0] } IF k[0] dst[63:0] := FIXUPIMMPD(a[63:0], b[63:0], c[63:0], imm8[7:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } IF k[0] dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0]) ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } IF k[0] dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0]) ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". "imm8" is used to set the required flags reporting. [sae_note] enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } IF k[0] dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Fix up the lower single-precision (32-bit) floating-point elements in "a" and "b" using the lower 32-bit integer in "c", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". "imm8" is used to set the required flags reporting. enum TOKEN_TYPE { QNAN_TOKEN := 0, \ SNAN_TOKEN := 1, \ ZERO_VALUE_TOKEN := 2, \ ONE_VALUE_TOKEN := 3, \ NEG_INF_TOKEN := 4, \ POS_INF_TOKEN := 5, \ NEG_VALUE_TOKEN := 6, \ POS_VALUE_TOKEN := 7 } DEFINE FIXUPIMMPD(src1[31:0], src2[31:0], src3[31:0], imm8[7:0]) { tsrc[31:0] := ((src2[30:23] == 0) AND (MXCSR.DAZ == 1)) ? 0.0 : src2[31:0] CASE(tsrc[31:0]) OF QNAN_TOKEN:j := 0 SNAN_TOKEN:j := 1 ZERO_VALUE_TOKEN: j := 2 ONE_VALUE_TOKEN: j := 3 NEG_INF_TOKEN: j := 4 POS_INF_TOKEN: j := 5 NEG_VALUE_TOKEN: j := 6 POS_VALUE_TOKEN: j := 7 ESAC token_response[3:0] := src3[3+4*j:4*j] CASE(token_response[3:0]) OF 0 : dest[31:0] := src1[31:0] 1 : dest[31:0] := tsrc[31:0] 2 : dest[31:0] := QNaN(tsrc[31:0]) 3 : dest[31:0] := QNAN_Indefinite 4 : dest[31:0] := -INF 5 : dest[31:0] := +INF 6 : dest[31:0] := tsrc.sign? -INF : +INF 7 : dest[31:0] := -0 8 : dest[31:0] := +0 9 : dest[31:0] := -1 10: dest[31:0] := +1 11: dest[31:0] := 1/2 12: dest[31:0] := 90.0 13: dest[31:0] := PI/2 14: dest[31:0] := MAX_FLOAT 15: dest[31:0] := -MAX_FLOAT ESAC CASE(tsrc[31:0]) OF ZERO_VALUE_TOKEN: IF (imm8[0]) #ZE; FI ZERO_VALUE_TOKEN: IF (imm8[1]) #IE; FI ONE_VALUE_TOKEN: IF (imm8[2]) #ZE; FI ONE_VALUE_TOKEN: IF (imm8[3]) #IE; FI SNAN_TOKEN: IF (imm8[4]) #IE; FI NEG_INF_TOKEN: IF (imm8[5]) #IE; FI NEG_VALUE_TOKEN: IF (imm8[6]) #IE; FI POS_INF_TOKEN: IF (imm8[7]) #IE; FI ESAC RETURN dest[31:0] } IF k[0] dst[31:0] := FIXUPIMMPD(a[31:0], b[31:0], c[31:0], imm8[7:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "a" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := (a[63:0] * b[63:0]) + c[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := (a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := c[63:0] FI dst[127:64] := c[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst". IF k[0] dst[63:0] := (a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := c[63:0] FI dst[127:64] := c[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := (a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := (a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := (a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := (a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := (a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := c[31:0] FI dst[127:32] := c[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst". IF k[0] dst[31:0] := (a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := c[31:0] FI dst[127:32] := c[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := (a[31:0] * b[31:0]) + c[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := (a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := (a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := (a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := (a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := (a[63:0] * b[63:0]) - c[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := (a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := c[63:0] FI dst[127:64] := c[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst". IF k[0] dst[63:0] := (a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := c[63:0] FI dst[127:64] := c[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := (a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := (a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := (a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := (a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := (a[31:0] * b[31:0]) - c[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := (a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := c[31:0] FI dst[127:32] := c[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst". IF k[0] dst[31:0] := (a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := c[31:0] FI dst[127:32] := c[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := (a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := (a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := (a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := (a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := c[63:0] FI dst[127:64] := c[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst". IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := c[63:0] FI dst[127:64] := c[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := c[31:0] FI dst[127:32] := c[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst". IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := c[31:0] FI dst[127:32] := c[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := c[63:0] FI dst[127:64] := c[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "c" to the upper element of "dst". IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := c[63:0] FI dst[127:64] := c[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := a[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", subtract the lower element in "c" from the negated intermediate result, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := c[31:0] FI dst[127:32] := c[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "c" to the upper elements of "dst". IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := c[31:0] FI dst[127:32] := c[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using writemask "k" (the element is copied from "c" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := a[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Load Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Load Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note] dst[63:0] := ConvertExpFP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. dst[63:0] := ConvertExpFP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note] IF k[0] dst[63:0] := ConvertExpFP64(b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. IF k[0] dst[63:0] := ConvertExpFP64(b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note] IF k[0] dst[63:0] := ConvertExpFP64(b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower double-precision (64-bit) floating-point element in "b" to a double-precision (64-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. IF k[0] dst[63:0] := ConvertExpFP64(b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note] dst[31:0] := ConvertExpFP32(b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. dst[31:0] := ConvertExpFP32(b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note] IF k[0] dst[31:0] := ConvertExpFP32(b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. IF k[0] dst[31:0] := ConvertExpFP32(b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. [sae_note] IF k[0] dst[31:0] := ConvertExpFP32(b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Convert the exponent of the lower single-precision (32-bit) floating-point element in "b" to a single-precision (32-bit) floating-point number representing the integer exponent, store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "floor(log2(x))" for the lower element. IF k[0] dst[31:0] := ConvertExpFP32(b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] IF k[0] dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] IF k[0] dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] IF k[0] dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] IF k[0] dst[63:0] := GetNormalizedMantissa(b[63:0], sc, interv) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] IF k[0] dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] IF k[0] dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] IF k[0] dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Normalize the mantissas of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] IF k[0] dst[31:0] := GetNormalizedMantissa(b[31:0], sc, interv) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Swizzle Copy "a" to "dst", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8". dst[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] 2: dst[383:256] := b[127:0] 3: dst[511:384] := b[127:0] ESAC dst[MAX:512] := 0
Floating Point AVX512F Swizzle Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] 2: tmp[383:256] := b[127:0] 3: tmp[511:384] := b[127:0] ESAC FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Copy "a" to "tmp", then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] 2: tmp[383:256] := b[127:0] 3: tmp[511:384] := b[127:0] ESAC FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Copy "a" to "dst", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "dst" at the location specified by "imm8". dst[511:0] := a[511:0] CASE (imm8[0]) OF 0: dst[255:0] := b[255:0] 1: dst[511:256] := b[255:0] ESAC dst[MAX:512] := 0
Floating Point AVX512F Swizzle Copy "a" to "tmp", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[0]) OF 0: tmp[255:0] := b[255:0] 1: tmp[511:256] := b[255:0] ESAC FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Copy "a" to "tmp", then insert 256 bits (composed of 4 packed double-precision (64-bit) floating-point elements) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[0]) OF 0: tmp[255:0] := b[255:0] 1: tmp[511:256] := b[255:0] ESAC FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Copy "a" to "dst", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "dst" at the location specified by "imm8". dst[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: dst[127:0] := b[127:0] 1: dst[255:128] := b[127:0] 2: dst[383:256] := b[127:0] 3: dst[511:384] := b[127:0] ESAC dst[MAX:512] := 0
Integer AVX512F Swizzle Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] 2: tmp[383:256] := b[127:0] 3: tmp[511:384] := b[127:0] ESAC FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Copy "a" to "tmp", then insert 128 bits (composed of 4 packed 32-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[1:0]) OF 0: tmp[127:0] := b[127:0] 1: tmp[255:128] := b[127:0] 2: tmp[383:256] := b[127:0] 3: tmp[511:384] := b[127:0] ESAC FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Copy "a" to "dst", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "dst" at the location specified by "imm8". dst[511:0] := a[511:0] CASE (imm8[0]) OF 0: dst[255:0] := b[255:0] 1: dst[511:256] := b[255:0] ESAC dst[MAX:512] := 0
Integer AVX512F Swizzle Copy "a" to "tmp", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[0]) OF 0: tmp[255:0] := b[255:0] 1: tmp[511:256] := b[255:0] ESAC FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Copy "a" to "tmp", then insert 256 bits (composed of 4 packed 64-bit integers) from "b" into "tmp" at the location specified by "imm8". Store "tmp" to "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[511:0] := a[511:0] CASE (imm8[0]) OF 0: tmp[255:0] := b[255:0] 1: tmp[511:256] := b[255:0] ESAC FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [sae_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note] IF k[0] dst[63:0] := MAX(a[63:0], b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := MAX(a[63:0], b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note] IF k[0] dst[63:0] := MAX(a[63:0], b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := MAX(a[63:0], b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [sae_note] dst[63:0] := MAX(a[63:0], b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note] IF k[0] dst[31:0] := MAX(a[31:0], b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := MAX(a[31:0], b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note] IF k[0] dst[31:0] := MAX(a[31:0], b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := MAX(a[31:0], b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note] dst[31:0] := MAX(a[31:0], b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [sae_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note] IF k[0] dst[63:0] := MIN(a[63:0], b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := MIN(a[63:0], b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [sae_note] IF k[0] dst[63:0] := MIN(a[63:0], b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := MIN(a[63:0], b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" , and copy the upper element from "a" to the upper element of "dst". [sae_note] dst[63:0] := MIN(a[63:0], b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note] IF k[0] dst[31:0] := MIN(a[31:0], b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := MIN(a[31:0], b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note] IF k[0] dst[31:0] := MIN(a[31:0], b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := MIN(a[31:0], b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [sae_note] dst[31:0] := MIN(a[31:0], b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Load Load packed double-precision (64-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Move packed double-precision (64-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Move packed single-precision (32-bit) floating-point elements from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[63:0] := a[63:0] tmp[127:64] := a[63:0] tmp[191:128] := a[191:128] tmp[255:192] := a[191:128] tmp[319:256] := a[319:256] tmp[383:320] := a[319:256] tmp[447:384] := a[447:384] tmp[511:448] := a[447:384] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[63:0] := a[63:0] tmp[127:64] := a[63:0] tmp[191:128] := a[191:128] tmp[255:192] := a[191:128] tmp[319:256] := a[319:256] tmp[383:320] := a[319:256] tmp[447:384] := a[447:384] tmp[511:448] := a[447:384] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst". dst[63:0] := a[63:0] dst[127:64] := a[63:0] dst[191:128] := a[191:128] dst[255:192] := a[191:128] dst[319:256] := a[319:256] dst[383:320] := a[319:256] dst[447:384] := a[447:384] dst[511:448] := a[447:384] dst[MAX:512] := 0
Integer AVX512F Load Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Move Move packed 32-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Move Move packed 64-bit integers from "a" into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Load 512-bits of integer data from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512F Load Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Store Store packed 32-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Integer AVX512F Load Load packed 32-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Store Store 512-bits of integer data from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+511:mem_addr] := a[511:0]
Integer AVX512F Load Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Store Store packed 64-bit integers from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Integer AVX512F Load Load packed 64-bit integers from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Load 512-bits of integer data from memory into "dst" using a non-temporal memory hint. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512F Store Store 512-bits of integer data from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. MEM[mem_addr+511:mem_addr] := a[511:0]
Floating Point AVX512F Store Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. MEM[mem_addr+511:mem_addr] := a[511:0]
Floating Point AVX512F Store Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. MEM[mem_addr+511:mem_addr] := a[511:0]
Floating Point AVX512F Load Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and set the upper element of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. IF k[0] dst[63:0] := MEM[mem_addr+63:mem_addr] ELSE dst[63:0] := src[63:0] FI dst[MAX:64] := 0
Floating Point AVX512F Move Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Store Store the lower double-precision (64-bit) floating-point element from "a" into memory using writemask "k". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. IF k[0] MEM[mem_addr+63:mem_addr] := a[63:0] FI
Floating Point AVX512F Load Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and set the upper element of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. IF k[0] dst[63:0] := MEM[mem_addr+63:mem_addr] ELSE dst[63:0] := 0 FI dst[MAX:64] := 0
Floating Point AVX512F Move Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[31:0] := a[63:32] tmp[63:32] := a[63:32] tmp[95:64] := a[127:96] tmp[127:96] := a[127:96] tmp[159:128] := a[191:160] tmp[191:160] := a[191:160] tmp[223:192] := a[255:224] tmp[255:224] := a[255:224] tmp[287:256] := a[319:288] tmp[319:288] := a[319:288] tmp[351:320] := a[383:352] tmp[383:352] := a[383:352] tmp[415:384] := a[447:416] tmp[447:416] := a[447:416] tmp[479:448] := a[511:480] tmp[511:480] := a[511:480] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[31:0] := a[63:32] tmp[63:32] := a[63:32] tmp[95:64] := a[127:96] tmp[127:96] := a[127:96] tmp[159:128] := a[191:160] tmp[191:160] := a[191:160] tmp[223:192] := a[255:224] tmp[255:224] := a[255:224] tmp[287:256] := a[319:288] tmp[319:288] := a[319:288] tmp[351:320] := a[383:352] tmp[383:352] := a[383:352] tmp[415:384] := a[447:416] tmp[447:416] := a[447:416] tmp[479:448] := a[511:480] tmp[511:480] := a[511:480] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". dst[31:0] := a[63:32] dst[63:32] := a[63:32] dst[95:64] := a[127:96] dst[127:96] := a[127:96] dst[159:128] := a[191:160] dst[191:160] := a[191:160] dst[223:192] := a[255:224] dst[255:224] := a[255:224] dst[287:256] := a[319:288] dst[319:288] := a[319:288] dst[351:320] := a[383:352] dst[383:352] := a[383:352] dst[415:384] := a[447:416] dst[447:416] := a[447:416] dst[479:448] := a[511:480] dst[511:480] := a[511:480] dst[MAX:512] := 0
Floating Point AVX512F Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp[31:0] := a[31:0] tmp[63:32] := a[31:0] tmp[95:64] := a[95:64] tmp[127:96] := a[95:64] tmp[159:128] := a[159:128] tmp[191:160] := a[159:128] tmp[223:192] := a[223:192] tmp[255:224] := a[223:192] tmp[287:256] := a[287:256] tmp[319:288] := a[287:256] tmp[351:320] := a[351:320] tmp[383:352] := a[351:320] tmp[415:384] := a[415:384] tmp[447:416] := a[415:384] tmp[479:448] := a[479:448] tmp[511:480] := a[479:448] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp[31:0] := a[31:0] tmp[63:32] := a[31:0] tmp[95:64] := a[95:64] tmp[127:96] := a[95:64] tmp[159:128] := a[159:128] tmp[191:160] := a[159:128] tmp[223:192] := a[223:192] tmp[255:224] := a[223:192] tmp[287:256] := a[287:256] tmp[319:288] := a[287:256] tmp[351:320] := a[351:320] tmp[383:352] := a[351:320] tmp[415:384] := a[415:384] tmp[447:416] := a[415:384] tmp[479:448] := a[479:448] tmp[511:480] := a[479:448] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". dst[31:0] := a[31:0] dst[63:32] := a[31:0] dst[95:64] := a[95:64] dst[127:96] := a[95:64] dst[159:128] := a[159:128] dst[191:160] := a[159:128] dst[223:192] := a[223:192] dst[255:224] := a[223:192] dst[287:256] := a[287:256] dst[319:288] := a[287:256] dst[351:320] := a[351:320] dst[383:352] := a[351:320] dst[415:384] := a[415:384] dst[447:416] := a[415:384] dst[479:448] := a[479:448] dst[511:480] := a[479:448] dst[MAX:512] := 0
Floating Point AVX512F Load Load a single-precision (32-bit) floating-point element from memory into the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and set the upper elements of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. IF k[0] dst[31:0] := MEM[mem_addr+31:mem_addr] ELSE dst[31:0] := src[31:0] FI dst[MAX:32] := 0
Floating Point AVX512F Move Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Store Store the lower single-precision (32-bit) floating-point element from "a" into memory using writemask "k". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. IF k[0] MEM[mem_addr+31:mem_addr] := a[31:0] FI
Floating Point AVX512F Load Load a single-precision (32-bit) floating-point element from memory into the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and set the upper elements of "dst" to zero. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. IF k[0] dst[31:0] := MEM[mem_addr+31:mem_addr] ELSE dst[31:0] := 0 FI dst[MAX:32] := 0
Floating Point AVX512F Move Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Load Load 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Floating Point AVX512F Load Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Store Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Floating Point AVX512F Load Load packed double-precision (64-bit) floating-point elements from memoy into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Store Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+511:mem_addr] := a[511:0]
Floating Point AVX512F Load Load 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Floating Point AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Store Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Floating Point AVX512F Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Store Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+511:mem_addr] := a[511:0]
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := a[63:0] * b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := a[63:0] * b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := a[63:0] * b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := a[63:0] * b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := a[63:0] * b[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := a[31:0] * b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := a[31:0] * b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := a[31:0] * b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := a[31:0] * b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := a[31:0] * b[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Integer AVX512F Special Math Functions Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ABS(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ABS(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ABS(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ABS(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ABS(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compute the absolute value of packed signed 64-bit integers in "a", and store the unsigned results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ABS(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] + b[i+63:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] AND b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (NOT a[i+63:i]) AND b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] AND b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Broadcast 8-bit integer "a" to all elements of "dst". FOR j := 0 to 63 i := j*8 dst[i+7:i] := a[7:0] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the low packed 32-bit integer from "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Broadcast 32-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the low packed 32-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Broadcast 32-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[31:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Broadcast 32-bit integer "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the low packed 64-bit integer from "a" to all elements of "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Broadcast 64-bit integer "a" to all elements of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Broadcast the low packed 64-bit integer from "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Broadcast 64-bit integer "a" to all elements of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[63:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Broadcast 64-bit integer "a" to all elements of "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Broadcast the low packed 16-bit integer from "a" to all all elements of "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := a[15:0] ENDFOR dst[MAX:512] := 0
Integer Mask AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F Compare Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed signed 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] == b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] >= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] > b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] <= b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] < b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Compare Compare packed unsigned 64-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] != b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer AVX512F Swizzle Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 32 m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[511:m] := src[511:m] dst[MAX:512] := 0
Integer AVX512F Store Swizzle Contiguously store the active 32-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 32 m := base_addr FOR j := 0 to 15 i := j*32 IF k[j] MEM[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR
Integer AVX512F Swizzle Contiguously store the active 32-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 32 m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[m+size-1:m] := a[i+31:i] m := m + size FI ENDFOR dst[511:m] := 0 dst[MAX:512] := 0
Integer AVX512F Swizzle Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 64 m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[511:m] := src[511:m] dst[MAX:512] := 0
Integer AVX512F Store Swizzle Contiguously store the active 64-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 64 m := base_addr FOR j := 0 to 7 i := j*64 IF k[j] MEM[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR
Integer AVX512F Swizzle Contiguously store the active 64-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 64 m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[m+size-1:m] := a[i+63:i] m := m + size FI ENDFOR dst[511:m] := 0 dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 id := idx[i+3:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 id := idx[i+3:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 15 i := j*32 id := idx[i+3:i]*32 dst[i+31:i] := a[id+31:id] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 off := idx[i+3:i]*32 IF k[j] dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := idx[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 off := idx[i+3:i]*32 IF k[j] dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 off := idx[i+3:i]*32 IF k[j] dst[i+31:i] := (idx[i+4]) ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 32-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 15 i := j*32 off := idx[i+3:i]*32 dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set) FOR j := 0 to 7 i := j*64 off := idx[i+2:i]*64 IF k[j] dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := idx[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 off := idx[i+2:i]*64 IF k[j] dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 off := idx[i+2:i]*64 IF k[j] dst[i+63:i] := (idx[i+3]) ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*64 off := idx[i+2:i]*64 dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 off := idx[i+3:i]*32 IF k[j] dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := idx[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 off := idx[i+3:i]*32 IF k[j] dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 off := idx[i+3:i]*32 IF k[j] dst[i+31:i] := (idx[i+4]) ? b[off+31:off] : a[off+31:off] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 15 i := j*32 off := idx[i+3:i]*32 dst[i+31:i] := idx[i+4] ? b[off+31:off] : a[off+31:off] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "idx" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 off := idx[i+2:i]*64 IF k[j] dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := idx[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 off := idx[i+2:i]*64 IF k[j] dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 off := idx[i+2:i]*64 IF k[j] dst[i+63:i] := (idx[i+3]) ? b[off+63:off] : a[off+63:off] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*64 off := idx[i+2:i]*64 dst[i+63:i] := idx[i+3] ? b[off+63:off] : a[off+63:off] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI IF (imm8[4] == 0) tmp_dst[319:256] := a[319:256]; FI IF (imm8[4] == 1) tmp_dst[319:256] := a[383:320]; FI IF (imm8[5] == 0) tmp_dst[383:320] := a[319:256]; FI IF (imm8[5] == 1) tmp_dst[383:320] := a[383:320]; FI IF (imm8[6] == 0) tmp_dst[447:384] := a[447:384]; FI IF (imm8[6] == 1) tmp_dst[447:384] := a[511:448]; FI IF (imm8[7] == 0) tmp_dst[511:448] := a[447:384]; FI IF (imm8[7] == 1) tmp_dst[511:448] := a[511:448]; FI FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI IF (b[257] == 0) tmp_dst[319:256] := a[319:256]; FI IF (b[257] == 1) tmp_dst[319:256] := a[383:320]; FI IF (b[321] == 0) tmp_dst[383:320] := a[319:256]; FI IF (b[321] == 1) tmp_dst[383:320] := a[383:320]; FI IF (b[385] == 0) tmp_dst[447:384] := a[447:384]; FI IF (b[385] == 1) tmp_dst[447:384] := a[511:448]; FI IF (b[449] == 0) tmp_dst[511:448] := a[447:384]; FI IF (b[449] == 1) tmp_dst[511:448] := a[511:448]; FI FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). IF (imm8[0] == 0) tmp_dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) tmp_dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) tmp_dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) tmp_dst[127:64] := a[127:64]; FI IF (imm8[2] == 0) tmp_dst[191:128] := a[191:128]; FI IF (imm8[2] == 1) tmp_dst[191:128] := a[255:192]; FI IF (imm8[3] == 0) tmp_dst[255:192] := a[191:128]; FI IF (imm8[3] == 1) tmp_dst[255:192] := a[255:192]; FI IF (imm8[4] == 0) tmp_dst[319:256] := a[319:256]; FI IF (imm8[4] == 1) tmp_dst[319:256] := a[383:320]; FI IF (imm8[5] == 0) tmp_dst[383:320] := a[319:256]; FI IF (imm8[5] == 1) tmp_dst[383:320] := a[383:320]; FI IF (imm8[6] == 0) tmp_dst[447:384] := a[447:384]; FI IF (imm8[6] == 1) tmp_dst[447:384] := a[511:448]; FI IF (imm8[7] == 0) tmp_dst[511:448] := a[447:384]; FI IF (imm8[7] == 1) tmp_dst[511:448] := a[511:448]; FI FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). IF (b[1] == 0) tmp_dst[63:0] := a[63:0]; FI IF (b[1] == 1) tmp_dst[63:0] := a[127:64]; FI IF (b[65] == 0) tmp_dst[127:64] := a[63:0]; FI IF (b[65] == 1) tmp_dst[127:64] := a[127:64]; FI IF (b[129] == 0) tmp_dst[191:128] := a[191:128]; FI IF (b[129] == 1) tmp_dst[191:128] := a[255:192]; FI IF (b[193] == 0) tmp_dst[255:192] := a[191:128]; FI IF (b[193] == 1) tmp_dst[255:192] := a[255:192]; FI IF (b[257] == 0) tmp_dst[319:256] := a[319:256]; FI IF (b[257] == 1) tmp_dst[319:256] := a[383:320]; FI IF (b[321] == 0) tmp_dst[383:320] := a[319:256]; FI IF (b[321] == 1) tmp_dst[383:320] := a[383:320]; FI IF (b[385] == 0) tmp_dst[447:384] := a[447:384]; FI IF (b[385] == 1) tmp_dst[447:384] := a[511:448]; FI IF (b[449] == 0) tmp_dst[511:448] := a[447:384]; FI IF (b[449] == 1) tmp_dst[511:448] := a[511:448]; FI FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst". IF (imm8[0] == 0) dst[63:0] := a[63:0]; FI IF (imm8[0] == 1) dst[63:0] := a[127:64]; FI IF (imm8[1] == 0) dst[127:64] := a[63:0]; FI IF (imm8[1] == 1) dst[127:64] := a[127:64]; FI IF (imm8[2] == 0) dst[191:128] := a[191:128]; FI IF (imm8[2] == 1) dst[191:128] := a[255:192]; FI IF (imm8[3] == 0) dst[255:192] := a[191:128]; FI IF (imm8[3] == 1) dst[255:192] := a[255:192]; FI IF (imm8[4] == 0) dst[319:256] := a[319:256]; FI IF (imm8[4] == 1) dst[319:256] := a[383:320]; FI IF (imm8[5] == 0) dst[383:320] := a[319:256]; FI IF (imm8[5] == 1) dst[383:320] := a[383:320]; FI IF (imm8[6] == 0) dst[447:384] := a[447:384]; FI IF (imm8[6] == 1) dst[447:384] := a[511:448]; FI IF (imm8[7] == 0) dst[511:448] := a[447:384]; FI IF (imm8[7] == 1) dst[511:448] := a[511:448]; FI dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst". IF (b[1] == 0) dst[63:0] := a[63:0]; FI IF (b[1] == 1) dst[63:0] := a[127:64]; FI IF (b[65] == 0) dst[127:64] := a[63:0]; FI IF (b[65] == 1) dst[127:64] := a[127:64]; FI IF (b[129] == 0) dst[191:128] := a[191:128]; FI IF (b[129] == 1) dst[191:128] := a[255:192]; FI IF (b[193] == 0) dst[255:192] := a[191:128]; FI IF (b[193] == 1) dst[255:192] := a[255:192]; FI IF (b[257] == 0) dst[319:256] := a[319:256]; FI IF (b[257] == 1) dst[319:256] := a[383:320]; FI IF (b[321] == 0) dst[383:320] := a[319:256]; FI IF (b[321] == 1) dst[383:320] := a[383:320]; FI IF (b[385] == 0) dst[447:384] := a[447:384]; FI IF (b[385] == 1) dst[447:384] := a[511:448]; FI IF (b[449] == 0) dst[511:448] := a[447:384]; FI IF (b[449] == 1) dst[511:448] := a[511:448]; FI dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6]) tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0]) tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2]) tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4]) tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6]) tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0]) tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2]) tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4]) tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], b[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], b[33:32]) tmp_dst[95:64] := SELECT4(a[127:0], b[65:64]) tmp_dst[127:96] := SELECT4(a[127:0], b[97:96]) tmp_dst[159:128] := SELECT4(a[255:128], b[129:128]) tmp_dst[191:160] := SELECT4(a[255:128], b[161:160]) tmp_dst[223:192] := SELECT4(a[255:128], b[193:192]) tmp_dst[255:224] := SELECT4(a[255:128], b[225:224]) tmp_dst[287:256] := SELECT4(a[383:256], b[257:256]) tmp_dst[319:288] := SELECT4(a[383:256], b[289:288]) tmp_dst[351:320] := SELECT4(a[383:256], b[321:320]) tmp_dst[383:352] := SELECT4(a[383:256], b[353:352]) tmp_dst[415:384] := SELECT4(a[511:384], b[385:384]) tmp_dst[447:416] := SELECT4(a[511:384], b[417:416]) tmp_dst[479:448] := SELECT4(a[511:384], b[449:448]) tmp_dst[511:480] := SELECT4(a[511:384], b[481:480]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6]) tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0]) tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2]) tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4]) tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6]) tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0]) tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2]) tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4]) tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], b[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], b[33:32]) tmp_dst[95:64] := SELECT4(a[127:0], b[65:64]) tmp_dst[127:96] := SELECT4(a[127:0], b[97:96]) tmp_dst[159:128] := SELECT4(a[255:128], b[129:128]) tmp_dst[191:160] := SELECT4(a[255:128], b[161:160]) tmp_dst[223:192] := SELECT4(a[255:128], b[193:192]) tmp_dst[255:224] := SELECT4(a[255:128], b[225:224]) tmp_dst[287:256] := SELECT4(a[383:256], b[257:256]) tmp_dst[319:288] := SELECT4(a[383:256], b[289:288]) tmp_dst[351:320] := SELECT4(a[383:256], b[321:320]) tmp_dst[383:352] := SELECT4(a[383:256], b[353:352]) tmp_dst[415:384] := SELECT4(a[511:384], b[385:384]) tmp_dst[447:416] := SELECT4(a[511:384], b[417:416]) tmp_dst[479:448] := SELECT4(a[511:384], b[449:448]) tmp_dst[511:480] := SELECT4(a[511:384], b[481:480]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(a[127:0], imm8[5:4]) dst[127:96] := SELECT4(a[127:0], imm8[7:6]) dst[159:128] := SELECT4(a[255:128], imm8[1:0]) dst[191:160] := SELECT4(a[255:128], imm8[3:2]) dst[223:192] := SELECT4(a[255:128], imm8[5:4]) dst[255:224] := SELECT4(a[255:128], imm8[7:6]) dst[287:256] := SELECT4(a[383:256], imm8[1:0]) dst[319:288] := SELECT4(a[383:256], imm8[3:2]) dst[351:320] := SELECT4(a[383:256], imm8[5:4]) dst[383:352] := SELECT4(a[383:256], imm8[7:6]) dst[415:384] := SELECT4(a[511:384], imm8[1:0]) dst[447:416] := SELECT4(a[511:384], imm8[3:2]) dst[479:448] := SELECT4(a[511:384], imm8[5:4]) dst[511:480] := SELECT4(a[511:384], imm8[7:6]) dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], b[1:0]) dst[63:32] := SELECT4(a[127:0], b[33:32]) dst[95:64] := SELECT4(a[127:0], b[65:64]) dst[127:96] := SELECT4(a[127:0], b[97:96]) dst[159:128] := SELECT4(a[255:128], b[129:128]) dst[191:160] := SELECT4(a[255:128], b[161:160]) dst[223:192] := SELECT4(a[255:128], b[193:192]) dst[255:224] := SELECT4(a[255:128], b[225:224]) dst[287:256] := SELECT4(a[383:256], b[257:256]) dst[319:288] := SELECT4(a[383:256], b[289:288]) dst[351:320] := SELECT4(a[383:256], b[321:320]) dst[383:352] := SELECT4(a[383:256], b[353:352]) dst[415:384] := SELECT4(a[511:384], b[385:384]) dst[447:416] := SELECT4(a[511:384], b[417:416]) dst[479:448] := SELECT4(a[511:384], b[449:448]) dst[511:480] := SELECT4(a[511:384], b[481:480]) dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0]) tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2]) tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4]) tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6]) tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0]) tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2]) tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4]) tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 id := idx[i+2:i]*64 IF k[j] dst[i+63:i] := a[id+63:id] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0]) tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2]) tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4]) tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6]) tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0]) tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2]) tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4]) tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 id := idx[i+2:i]*64 IF k[j] dst[i+63:i] := a[id+63:id] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } dst[63:0] := SELECT4(a[255:0], imm8[1:0]) dst[127:64] := SELECT4(a[255:0], imm8[3:2]) dst[191:128] := SELECT4(a[255:0], imm8[5:4]) dst[255:192] := SELECT4(a[255:0], imm8[7:6]) dst[319:256] := SELECT4(a[511:256], imm8[1:0]) dst[383:320] := SELECT4(a[511:256], imm8[3:2]) dst[447:384] := SELECT4(a[511:256], imm8[5:4]) dst[511:448] := SELECT4(a[511:256], imm8[7:6]) dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*64 id := idx[i+2:i]*64 dst[i+63:i] := a[id+63:id] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 id := idx[i+3:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 id := idx[i+3:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx". FOR j := 0 to 15 i := j*32 id := idx[i+3:i]*32 dst[i+31:i] := a[id+31:id] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0]) tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2]) tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4]) tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6]) tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0]) tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2]) tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4]) tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 id := idx[i+2:i]*64 IF k[j] dst[i+63:i] := a[id+63:id] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } tmp_dst[63:0] := SELECT4(a[255:0], imm8[1:0]) tmp_dst[127:64] := SELECT4(a[255:0], imm8[3:2]) tmp_dst[191:128] := SELECT4(a[255:0], imm8[5:4]) tmp_dst[255:192] := SELECT4(a[255:0], imm8[7:6]) tmp_dst[319:256] := SELECT4(a[511:256], imm8[1:0]) tmp_dst[383:320] := SELECT4(a[511:256], imm8[3:2]) tmp_dst[447:384] := SELECT4(a[511:256], imm8[5:4]) tmp_dst[511:448] := SELECT4(a[511:256], imm8[7:6]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 id := idx[i+2:i]*64 IF k[j] dst[i+63:i] := a[id+63:id] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" within 256-bit lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[63:0] := src[63:0] 1: tmp[63:0] := src[127:64] 2: tmp[63:0] := src[191:128] 3: tmp[63:0] := src[255:192] ESAC RETURN tmp[63:0] } dst[63:0] := SELECT4(a[255:0], imm8[1:0]) dst[127:64] := SELECT4(a[255:0], imm8[3:2]) dst[191:128] := SELECT4(a[255:0], imm8[5:4]) dst[255:192] := SELECT4(a[255:0], imm8[7:6]) dst[319:256] := SELECT4(a[511:256], imm8[1:0]) dst[383:320] := SELECT4(a[511:256], imm8[3:2]) dst[447:384] := SELECT4(a[511:256], imm8[5:4]) dst[511:448] := SELECT4(a[511:256], imm8[7:6]) dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 64-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 7 i := j*64 id := idx[i+2:i]*64 dst[i+63:i] := a[id+63:id] ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Swizzle Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Load contiguous active 32-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[m+31:m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Swizzle Load contiguous active 32-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+m+31:mem_addr+m] m := m + 32 ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Swizzle Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Load contiguous active 64-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[m+63:m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Swizzle Load contiguous active 64-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+m+63:mem_addr+m] m := m + 64 ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:256] := 0
Integer AVX512F Load Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Load Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:512] := 0
Integer AVX512F Load Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed signed 64-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed unsigned 64-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed signed 64-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Special Math Functions Compare packed unsigned 64-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 8*j dst[k+7:k] := Truncate8(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Store Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+31:i]) FI ENDFOR
Integer AVX512F Convert Convert packed 32-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 16*j dst[k+15:k] := Truncate16(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Store Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+31:i]) FI ENDFOR
Integer AVX512F Convert Convert packed 32-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 8*j dst[k+7:k] := Truncate8(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Truncate8(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 8-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Truncate8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 32*j dst[k+31:k] := Truncate32(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Truncate32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := Truncate32(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 32-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Truncate32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 16*j dst[k+15:k] := Truncate16(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Store Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Truncate16(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed 64-bit integers in "a" to packed 16-bit integers with truncation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Truncate16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 8*j dst[k+7:k] := Saturate8(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Store Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+31:i]) FI ENDFOR
Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 16*j dst[k+15:k] := Saturate16(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Store Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+31:i]) FI ENDFOR
Integer AVX512F Convert Convert packed signed 32-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 8*j dst[k+7:k] := Saturate8(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := Saturate8(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 8-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[l+7:l] := Saturate8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 32*j dst[k+31:k] := Saturate32(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Saturate32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := Saturate32(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 32-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[l+31:l] := Saturate32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 16*j dst[k+15:k] := Saturate16(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Store Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := Saturate16(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed signed 64-bit integers in "a" to packed 16-bit integers with signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[l+15:l] := Saturate16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 8*j dst[i+31:i] := SignExtend32(a[k+7:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[i+31:i] := SignExtend32(a[l+7:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[i+31:i] := SignExtend32(a[l+7:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 8*j dst[i+63:i] := SignExtend64(a[k+7:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[i+63:i] := SignExtend64(a[l+7:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[i+63:i] := SignExtend64(a[l+7:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 32*j dst[i+63:i] := SignExtend64(a[k+31:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[i+63:i] := SignExtend64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[i+63:i] := SignExtend64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 16*j dst[i+31:i] := SignExtend32(a[k+15:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 l := j*16 IF k[j] dst[i+31:i] := SignExtend32(a[l+15:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[i+31:i] := SignExtend32(a[l+15:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 16*j dst[i+63:i] := SignExtend64(a[k+15:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[i+63:i] := SignExtend64(a[l+15:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[i+63:i] := SignExtend64(a[l+15:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 8*j dst[k+7:k] := SaturateU8(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+31:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Store Convert packed unsigned 32-bit integers in "a" to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+31:i]) FI ENDFOR
Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+31:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 16*j dst[k+15:k] := SaturateU16(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+31:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Store Convert packed unsigned 32-bit integers in "a" to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+31:i]) FI ENDFOR
Integer AVX512F Convert Convert packed unsigned 32-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+31:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 8*j dst[k+7:k] := SaturateU8(a[i+63:i]) ENDFOR dst[MAX:64] := 0
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+63:i]) ELSE dst[l+7:l] := src[l+7:l] FI ENDFOR dst[MAX:64] := 0
Integer AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed 8-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] MEM[base_addr+l+7:base_addr+l] := SaturateU8(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 8-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[l+7:l] := SaturateU8(a[i+63:i]) ELSE dst[l+7:l] := 0 FI ENDFOR dst[MAX:64] := 0
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 32*j dst[k+31:k] := SaturateU32(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[l+31:l] := SaturateU32(a[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed 32-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] MEM[base_addr+l+31:base_addr+l] := SaturateU32(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 32-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[l+31:l] := SaturateU32(a[i+63:i]) ELSE dst[l+31:l] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 16*j dst[k+15:k] := SaturateU16(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+63:i]) ELSE dst[l+15:l] := src[l+15:l] FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Store Convert packed unsigned 64-bit integers in "a" to packed 16-bit integers with unsigned saturation, and store the active results (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] MEM[base_addr+l+15:base_addr+l] := SaturateU16(a[i+63:i]) FI ENDFOR
Integer AVX512F Convert Convert packed unsigned 64-bit integers in "a" to packed unsigned 16-bit integers with unsigned saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[l+15:l] := SaturateU16(a[i+63:i]) ELSE dst[l+15:l] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512F Convert Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 8*j dst[i+31:i] := ZeroExtend32(a[k+7:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+7:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 8*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+7:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 8*j dst[i+63:i] := ZeroExtend64(a[k+7:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+7:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 8*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+7:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 32*j dst[i+63:i] := ZeroExtend64(a[k+31:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+31:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 32*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+31:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 15 i := 32*j k := 16*j dst[i+31:i] := ZeroExtend32(a[k+15:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+15:l]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j l := 16*j IF k[j] dst[i+31:i] := ZeroExtend32(a[l+15:l]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 7 i := 64*j k := 16*j dst[i+63:i] := ZeroExtend64(a[k+15:k]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+15:l]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Convert Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := 64*j l := 16*j IF k[j] dst[i+63:i] := ZeroExtend64(a[l+15:l]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+31:i] * b[i+31:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst". DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 15 i := j*32 dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in "imm8", and store the results in "dst". DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 7 i := j*64 dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE LEFT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src << count) OR (src >> (32 - count)) } FOR j := 0 to 15 i := j*32 dst[i+31:i] := LEFT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the left by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE LEFT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src << count) OR (src >> (64 - count)) } FOR j := 0 to 7 i := j*64 dst[i+63:i] := LEFT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst". DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 15 i := j*32 dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in "imm8", and store the results in "dst". DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 7 i := j*64 dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Miscellaneous Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 32-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE RIGHT_ROTATE_DWORDS(src, count_src) { count := count_src % 32 RETURN (src >>count) OR (src << (32 - count)) } FOR j := 0 to 15 i := j*32 dst[i+31:i] := RIGHT_ROTATE_DWORDS(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Rotate the bits in each packed 64-bit integer in "a" to the right by the number of bits specified in the corresponding element of "b", and store the results in "dst". DEFINE RIGHT_ROTATE_QWORDS(src, count_src) { count := count_src % 64 RETURN (src >> count) OR (src << (64 - count)) } FOR j := 0 to 7 i := j*64 dst[i+63:i] := RIGHT_ROTATE_QWORDS(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Store Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Integer AVX512F Store Scatter 64-bit integers from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Integer AVX512F Store Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Integer AVX512F Store Scatter 32-bit integers from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Integer AVX512F Store Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Integer AVX512F Store Scatter 64-bit integers from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Integer AVX512F Swizzle Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6]) tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0]) tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2]) tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4]) tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6]) tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0]) tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2]) tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4]) tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF count[63:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0x0) ELSE dst[i+63:i] := SignExtend64(a[i+63:i] >> imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := SignExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := (a[i+63] ? 0xFFFFFFFFFFFFFFFF : 0) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Shift Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*64 IF count[i+63:i] < 64 dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] - b[i+63:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "src", "a", and "b" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using writemask "k" at 32-bit granularity (32-bit elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] FOR h := 0 to 31 index[2:0] := (src[i+h] << 2) OR (a[i+h] << 1) OR b[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using zeromask "k" at 32-bit granularity (32-bit elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] FOR h := 0 to 31 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 32-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst". FOR j := 0 to 15 i := j*32 FOR h := 0 to 31 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "src", "a", and "b" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using writemask "k" at 64-bit granularity (64-bit elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] FOR h := 0 to 63 index[2:0] := (src[i+h] << 2) OR (a[i+h] << 1) OR b[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst" using zeromask "k" at 64-bit granularity (64-bit elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] FOR h := 0 to 63 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Bitwise ternary logic that provides the capability to implement any three-operand binary function; the specific binary function is specified by value in "imm8". For each bit in each packed 64-bit integer, the corresponding bit from "a", "b", and "c" are used to form a 3 bit index into "imm8", and the value at that bit in "imm8" is written to the corresponding bit in "dst". FOR j := 0 to 7 i := j*64 FOR h := 0 to 63 index[2:0] := (a[i+h] << 2) OR (b[i+h] << 1) OR c[i+h] dst[i+h] := imm8[index[2:0]] ENDFOR ENDFOR dst[MAX:512] := 0
Integer Mask AVX512F Logical Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Logical Compute the bitwise AND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 7 i := j*64 k[j] := ((a[i+63:i] AND b[i+63:i]) != 0) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Logical Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F Logical Compute the bitwise NAND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 15 i := j*32 k[j] := ((a[i+31:i] AND b[i+31:i]) == 0) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F Logical Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is zero. FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Integer Mask AVX512F Logical Compute the bitwise NAND of packed 64-bit integers in "a" and "b", producing intermediate 64-bit values, and set the corresponding bit in result mask "k" if the intermediate value is zero. FOR j := 0 to 7 i := j*64 k[j] := ((a[i+63:i] AND b[i+63:i]) == 0) ? 1 : 0 ENDFOR k[MAX:8] := 0
Integer AVX512F Swizzle Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Integer AVX512F Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*64 dst[i+63:i] := (1.0 / a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 15 i := j*32 dst[i+31:i] := (1.0 / a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14. IF k[0] dst[63:0] := (1.0 / b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14. IF k[0] dst[63:0] := (1.0 / b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14. dst[63:0] := (1.0 / b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14. IF k[0] dst[31:0] := (1.0 / b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14. IF k[0] dst[31:0] := (1.0 / b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14. dst[31:0] := (1.0 / b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed double-precision (64-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } FOR j := 0 to 7 i := j*64 dst[i+63:i] := RoundScaleFP64(a[i+63:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_imm_note][sae_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_imm_note][sae_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round packed single-precision (32-bit) floating-point elements in "a" to the number of fraction bits specified by "imm8", and store the results in "dst". [round_imm_note][sae_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } FOR j := 0 to 15 i := j*32 dst[i+31:i] := RoundScaleFP32(a[i+31:i], imm8[7:0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } IF k[0] dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } IF k[0] dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } IF k[0] dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } IF k[0] dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note][sae_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower double-precision (64-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_imm_note] DEFINE RoundScaleFP64(src1[63:0], imm8[7:0]) { m[63:0] := FP64(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[63:0] := POW(2.0, -m) * ROUND(POW(2.0, m) * src1[63:0], imm8[3:0]) IF IsInf(tmp[63:0]) tmp[63:0] := src1[63:0] FI RETURN tmp[63:0] } dst[63:0] := RoundScaleFP64(b[63:0], imm8[7:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } IF k[0] dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } IF k[0] dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } IF k[0] dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } IF k[0] dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note][sae_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Round the lower single-precision (32-bit) floating-point element in "b" to the number of fraction bits specified by "imm8", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_imm_note] DEFINE RoundScaleFP32(src1[31:0], imm8[7:0]) { m[31:0] := FP32(imm8[7:4]) // number of fraction bits after the binary point to be preserved tmp[31:0] := POW(FP32(2.0), -m) * ROUND(POW(FP32(2.0), m) * src1[31:0], imm8[3:0]) IF IsInf(tmp[31:0]) tmp[31:0] := src1[31:0] FI RETURN tmp[31:0] } dst[31:0] := RoundScaleFP32(b[31:0], imm8[7:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 7 i := j*64 dst[i+63:i] := (1.0 / SQRT(a[i+63:i])) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 2^-14. FOR j := 0 to 15 i := j*32 dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14. IF k[0] dst[63:0] := (1.0 / SQRT(b[63:0])) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14. IF k[0] dst[63:0] := (1.0 / SQRT(b[63:0])) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". The maximum relative error for this approximation is less than 2^-14. dst[63:0] := (1.0 / SQRT(b[63:0])) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14. IF k[0] dst[31:0] := (1.0 / SQRT(b[31:0])) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14. IF k[0] dst[31:0] := (1.0 / SQRT(b[31:0])) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 2^-14. dst[31:0] := (1.0 / SQRT(b[31:0])) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 7 i := j*64 dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", and store the results in "dst". [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } FOR j := 0 to 7 i := j*64 dst[i+63:i] := SCALE(a[i+63:0], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 15 i := j*32 dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", and store the results in "dst". [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[31:0] } FOR j := 0 to 15 i := j*32 dst[i+31:i] := SCALE(a[i+31:0], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } IF k[0] dst[63:0] := SCALE(a[63:0], b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } IF k[0] dst[63:0] := SCALE(a[63:0], b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } IF k[0] dst[63:0] := SCALE(a[63:0], b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } IF k[0] dst[63:0] := SCALE(a[63:0], b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } dst[63:0] := SCALE(a[63:0], b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed double-precision (64-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[63:0] := tmp_src1[63:0] * POW(2.0, FLOOR(tmp_src2[63:0])) RETURN dst[63:0] } dst[63:0] := SCALE(a[63:0], b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[63:0] } IF k[0] dst[31:0] := SCALE(a[31:0], b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[63:0] } IF k[0] dst[31:0] := SCALE(a[31:0], b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[63:0] } IF k[0] dst[31:0] := SCALE(a[31:0], b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[63:0] } IF k[0] dst[31:0] := SCALE(a[31:0], b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[63:0] } dst[31:0] := SCALE(a[31:0], b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Miscellaneous Scale the packed single-precision (32-bit) floating-point elements in "a" using values from "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". DEFINE SCALE(src1, src2) { IF (src2 == NaN) IF (src2 == SNaN) RETURN QNAN(src2) FI ELSE IF (src1 == NaN) IF (src1 == SNaN) RETURN QNAN(src1) FI IF (src2 != INF) RETURN QNAN(src1) FI ELSE tmp_src2 := src2 tmp_src1 := src1 IF (IS_DENORMAL(src2) AND MXCSR.DAZ) tmp_src2 := 0 FI IF (IS_DENORMAL(src1) AND MXCSR.DAZ) tmp_src1 := 0 FI FI dst[31:0] := tmp_src1[31:0] * POW(2.0, FLOOR(tmp_src2[31:0])) RETURN dst[63:0] } dst[31:0] := SCALE(a[31:0], b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Floating Point AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 32-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Floating Point AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Floating Point AVX512F Store Scatter double-precision (64-bit) floating-point elements from "a" into memory using 64-bit indices. 64-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Floating Point AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Floating Point AVX512F Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 64-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Floating Point AVX512F Swizzle Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0]) tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2]) tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4]) tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0]) tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2]) tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4]) tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle 128-bits (composed of 4 single-precision (32-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } dst[127:0] := SELECT4(a[511:0], imm8[1:0]) dst[255:128] := SELECT4(a[511:0], imm8[3:2]) dst[383:256] := SELECT4(b[511:0], imm8[5:4]) dst[511:384] := SELECT4(b[511:0], imm8[7:6]) dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0]) tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2]) tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4]) tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0]) tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2]) tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4]) tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle 128-bits (composed of 2 double-precision (64-bit) floating-point elements) selected by "imm8" from "a" and "b", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } dst[127:0] := SELECT4(a[511:0], imm8[1:0]) dst[255:128] := SELECT4(a[511:0], imm8[3:2]) dst[383:256] := SELECT4(b[511:0], imm8[5:4]) dst[511:384] := SELECT4(b[511:0], imm8[7:6]) dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0]) tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2]) tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4]) tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0]) tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2]) tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4]) tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 128-bits (composed of 4 32-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } dst[127:0] := SELECT4(a[511:0], imm8[1:0]) dst[255:128] := SELECT4(a[511:0], imm8[3:2]) dst[383:256] := SELECT4(b[511:0], imm8[5:4]) dst[511:384] := SELECT4(b[511:0], imm8[7:6]) dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0]) tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2]) tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4]) tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp_dst[127:0] := SELECT4(a[511:0], imm8[1:0]) tmp_dst[255:128] := SELECT4(a[511:0], imm8[3:2]) tmp_dst[383:256] := SELECT4(b[511:0], imm8[5:4]) tmp_dst[511:384] := SELECT4(b[511:0], imm8[7:6]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Swizzle Shuffle 128-bits (composed of 2 64-bit integers) selected by "imm8" from "a" and "b", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } dst[127:0] := SELECT4(a[511:0], imm8[1:0]) dst[255:128] := SELECT4(a[511:0], imm8[3:2]) dst[383:256] := SELECT4(b[511:0], imm8[5:4]) dst[511:384] := SELECT4(b[511:0], imm8[7:6]) dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64] tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192] tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192] tmp_dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320] tmp_dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320] tmp_dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448] tmp_dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). tmp_dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] tmp_dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64] tmp_dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192] tmp_dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192] tmp_dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320] tmp_dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320] tmp_dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448] tmp_dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in "imm8", and store the results in "dst". dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64] dst[191:128] := (imm8[2] == 0) ? a[191:128] : a[255:192] dst[255:192] := (imm8[3] == 0) ? b[191:128] : b[255:192] dst[319:256] := (imm8[4] == 0) ? a[319:256] : a[383:320] dst[383:320] := (imm8[5] == 0) ? b[319:256] : b[383:320] dst[447:384] := (imm8[6] == 0) ? a[447:384] : a[511:448] dst[511:448] := (imm8[7] == 0) ? b[447:384] : b[511:448] dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6]) tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0]) tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2]) tmp_dst[351:320] := SELECT4(b[383:256], imm8[5:4]) tmp_dst[383:352] := SELECT4(b[383:256], imm8[7:6]) tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0]) tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2]) tmp_dst[479:448] := SELECT4(b[511:384], imm8[5:4]) tmp_dst[511:480] := SELECT4(b[511:384], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(b[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(b[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(b[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(b[255:128], imm8[7:6]) tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0]) tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2]) tmp_dst[351:320] := SELECT4(b[383:256], imm8[5:4]) tmp_dst[383:352] := SELECT4(b[383:256], imm8[7:6]) tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0]) tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2]) tmp_dst[479:448] := SELECT4(b[511:384], imm8[5:4]) tmp_dst[511:480] := SELECT4(b[511:384], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(b[127:0], imm8[5:4]) dst[127:96] := SELECT4(b[127:0], imm8[7:6]) dst[159:128] := SELECT4(a[255:128], imm8[1:0]) dst[191:160] := SELECT4(a[255:128], imm8[3:2]) dst[223:192] := SELECT4(b[255:128], imm8[5:4]) dst[255:224] := SELECT4(b[255:128], imm8[7:6]) dst[287:256] := SELECT4(a[383:256], imm8[1:0]) dst[319:288] := SELECT4(a[383:256], imm8[3:2]) dst[351:320] := SELECT4(b[383:256], imm8[5:4]) dst[383:352] := SELECT4(b[383:256], imm8[7:6]) dst[415:384] := SELECT4(a[511:384], imm8[1:0]) dst[447:416] := SELECT4(a[511:384], imm8[3:2]) dst[479:448] := SELECT4(b[511:384], imm8[5:4]) dst[511:480] := SELECT4(b[511:384], imm8[7:6]) dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SQRT(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SQRT(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SQRT(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note]. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SQRT(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := SQRT(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". [round_note]. FOR j := 0 to 7 i := j*64 dst[i+63:i] := SQRT(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SQRT(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SQRT(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SQRT(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SQRT(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := SQRT(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". [round_note]. FOR j := 0 to 15 i := j*32 dst[i+31:i] := SQRT(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := SQRT(b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := SQRT(b[63:0]) ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := SQRT(b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := SQRT(b[63:0]) ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := SQRT(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := SQRT(b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := SQRT(b[31:0]) ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := SQRT(b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := SQRT(b[31:0]) ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Elementary Math Functions Compute the square root of the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := SQRT(b[31:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Arithmetic Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := a[63:0] - b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := a[63:0] - b[63:0] ELSE dst[63:0] := src[63:0] FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". [round_note] IF k[0] dst[63:0] := a[63:0] - b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper element from "a" to the upper element of "dst". IF k[0] dst[63:0] := a[63:0] - b[63:0] ELSE dst[63:0] := 0 FI dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := a[63:0] - b[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := a[31:0] - b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using writemask "k" (the element is copied from "src" when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := a[31:0] - b[31:0] ELSE dst[31:0] := src[31:0] FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] IF k[0] dst[31:0] := a[31:0] - b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set), and copy the upper 3 packed elements from "a" to the upper elements of "dst". IF k[0] dst[31:0] := a[31:0] - b[31:0] ELSE dst[31:0] := 0 FI dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Arithmetic Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := a[31:0] - b[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point AVX512F Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_QWORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_HIGH_QWORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_HIGH_QWORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_HIGH_DWORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_HIGH_DWORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_HIGH_DWORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384]) FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := tmp_dst[i+63:i] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_QWORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_QWORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_QWORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } tmp_dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) tmp_dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) tmp_dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256]) tmp_dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0]) dst[255:128] := INTERLEAVE_DWORDS(a[255:128], b[255:128]) dst[383:256] := INTERLEAVE_DWORDS(a[383:256], b[383:256]) dst[511:384] := INTERLEAVE_DWORDS(a[511:384], b[511:384]) dst[MAX:512] := 0
Floating Point AVX512F Cast Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m512d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m512 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m512d to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m128 to type __m512; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m256 to type __m512; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m512 to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX512F Cast Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX512F Cast Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX512F Cast Cast vector of type __m512i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX512F Cast Cast vector of type __m512i to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m128d to type __m512d; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m128 to type __m512; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX512F Cast Cast vector of type __m128i to type __m512i; the upper 384 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m256d to type __m512d; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Cast Cast vector of type __m256 to type __m512; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX512F Cast Cast vector of type __m256i to type __m512i; the upper 256 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F Set Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[63:0] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Set Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[31:0] ENDFOR dst[MAX:512] := 0
Integer AVX512F Set Set packed 32-bit integers in "dst" with the repeated 4 element sequence. dst[31:0] := a dst[63:32] := b dst[95:64] := c dst[127:96] := d dst[159:128] := a dst[191:160] := b dst[223:192] := c dst[255:224] := d dst[287:256] := a dst[319:288] := b dst[351:320] := c dst[383:352] := d dst[415:384] := a dst[447:416] := b dst[479:448] := c dst[511:480] := d dst[MAX:512] := 0
Integer AVX512F Set Set packed 64-bit integers in "dst" with the repeated 4 element sequence. dst[63:0] := a dst[127:64] := b dst[191:128] := c dst[255:192] := d dst[319:256] := a dst[383:320] := b dst[447:384] := c dst[511:448] := d dst[MAX:512] := 0
Floating Point AVX512F Set Set packed double-precision (64-bit) floating-point elements in "dst" with the repeated 4 element sequence. dst[63:0] := a dst[127:64] := b dst[191:128] := c dst[255:192] := d dst[319:256] := a dst[383:320] := b dst[447:384] := c dst[511:448] := d dst[MAX:512] := 0
Floating Point AVX512F Set Set packed single-precision (32-bit) floating-point elements in "dst" with the repeated 4 element sequence. dst[31:0] := a dst[63:32] := b dst[95:64] := c dst[127:96] := d dst[159:128] := a dst[191:160] := b dst[223:192] := c dst[255:224] := d dst[287:256] := a dst[319:288] := b dst[351:320] := c dst[383:352] := d dst[415:384] := a dst[447:416] := b dst[479:448] := c dst[511:480] := d dst[MAX:512] := 0
Integer AVX512F Set Set packed 8-bit integers in "dst" with the supplied values. dst[7:0] := e0 dst[15:8] := e1 dst[23:16] := e2 dst[31:24] := e3 dst[39:32] := e4 dst[47:40] := e5 dst[55:48] := e6 dst[63:56] := e7 dst[71:64] := e8 dst[79:72] := e9 dst[87:80] := e10 dst[95:88] := e11 dst[103:96] := e12 dst[111:104] := e13 dst[119:112] := e14 dst[127:120] := e15 dst[135:128] := e16 dst[143:136] := e17 dst[151:144] := e18 dst[159:152] := e19 dst[167:160] := e20 dst[175:168] := e21 dst[183:176] := e22 dst[191:184] := e23 dst[199:192] := e24 dst[207:200] := e25 dst[215:208] := e26 dst[223:216] := e27 dst[231:224] := e28 dst[239:232] := e29 dst[247:240] := e30 dst[255:248] := e31 dst[263:256] := e32 dst[271:264] := e33 dst[279:272] := e34 dst[287:280] := e35 dst[295:288] := e36 dst[303:296] := e37 dst[311:304] := e38 dst[319:312] := e39 dst[327:320] := e40 dst[335:328] := e41 dst[343:336] := e42 dst[351:344] := e43 dst[359:352] := e44 dst[367:360] := e45 dst[375:368] := e46 dst[383:376] := e47 dst[391:384] := e48 dst[399:392] := e49 dst[407:400] := e50 dst[415:408] := e51 dst[423:416] := e52 dst[431:424] := e53 dst[439:432] := e54 dst[447:440] := e55 dst[455:448] := e56 dst[463:456] := e57 dst[471:464] := e58 dst[479:472] := e59 dst[487:480] := e60 dst[495:488] := e61 dst[503:496] := e62 dst[511:504] := e63 dst[MAX:512] := 0
Integer AVX512F Set Set packed 16-bit integers in "dst" with the supplied values. dst[15:0] := e0 dst[31:16] := e1 dst[47:32] := e2 dst[63:48] := e3 dst[79:64] := e4 dst[95:80] := e5 dst[111:96] := e6 dst[127:112] := e7 dst[143:128] := e8 dst[159:144] := e9 dst[175:160] := e10 dst[191:176] := e11 dst[207:192] := e12 dst[223:208] := e13 dst[239:224] := e14 dst[255:240] := e15 dst[271:256] := e16 dst[287:272] := e17 dst[303:288] := e18 dst[319:304] := e19 dst[335:320] := e20 dst[351:336] := e21 dst[367:352] := e22 dst[383:368] := e23 dst[399:384] := e24 dst[415:400] := e25 dst[431:416] := e26 dst[447:432] := e27 dst[463:448] := e28 dst[479:464] := e29 dst[495:480] := e30 dst[511:496] := e31 dst[MAX:512] := 0
Integer AVX512F Set Set packed 32-bit integers in "dst" with the supplied values. dst[31:0] := e0 dst[63:32] := e1 dst[95:64] := e2 dst[127:96] := e3 dst[159:128] := e4 dst[191:160] := e5 dst[223:192] := e6 dst[255:224] := e7 dst[287:256] := e8 dst[319:288] := e9 dst[351:320] := e10 dst[383:352] := e11 dst[415:384] := e12 dst[447:416] := e13 dst[479:448] := e14 dst[511:480] := e15 dst[MAX:512] := 0
Integer AVX512F Set Set packed 64-bit integers in "dst" with the supplied values. dst[63:0] := e0 dst[127:64] := e1 dst[191:128] := e2 dst[255:192] := e3 dst[319:256] := e4 dst[383:320] := e5 dst[447:384] := e6 dst[511:448] := e7 dst[MAX:512] := 0
Floating Point AVX512F Set Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values. dst[63:0] := e0 dst[127:64] := e1 dst[191:128] := e2 dst[255:192] := e3 dst[319:256] := e4 dst[383:320] := e5 dst[447:384] := e6 dst[511:448] := e7 dst[MAX:512] := 0
Floating Point AVX512F Set Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values. dst[31:0] := e0 dst[63:32] := e1 dst[95:64] := e2 dst[127:96] := e3 dst[159:128] := e4 dst[191:160] := e5 dst[223:192] := e6 dst[255:224] := e7 dst[287:256] := e8 dst[319:288] := e9 dst[351:320] := e10 dst[383:352] := e11 dst[415:384] := e12 dst[447:416] := e13 dst[479:448] := e14 dst[511:480] := e15 dst[MAX:512] := 0
Integer AVX512F Set Set packed 32-bit integers in "dst" with the repeated 4 element sequence in reverse order. dst[31:0] := d dst[63:32] := c dst[95:64] := b dst[127:96] := a dst[159:128] := d dst[191:160] := c dst[223:192] := b dst[255:224] := a dst[287:256] := d dst[319:288] := c dst[351:320] := b dst[383:352] := a dst[415:384] := d dst[447:416] := c dst[479:448] := b dst[511:480] := a dst[MAX:512] := 0
Integer AVX512F Set Set packed 64-bit integers in "dst" with the repeated 4 element sequence in reverse order. dst[63:0] := d dst[127:64] := c dst[191:128] := b dst[255:192] := a dst[319:256] := d dst[383:320] := c dst[447:384] := b dst[511:448] := a dst[MAX:512] := 0
Floating Point AVX512F Set Set packed double-precision (64-bit) floating-point elements in "dst" with the repeated 4 element sequence in reverse order. dst[63:0] := d dst[127:64] := c dst[191:128] := b dst[255:192] := a dst[319:256] := d dst[383:320] := c dst[447:384] := b dst[511:448] := a dst[MAX:512] := 0
Floating Point AVX512F Set Set packed single-precision (32-bit) floating-point elements in "dst" with the repeated 4 element sequence in reverse order. dst[31:0] := d dst[63:32] := c dst[95:64] := b dst[127:96] := a dst[159:128] := d dst[191:160] := c dst[223:192] := b dst[255:224] := a dst[287:256] := d dst[319:288] := c dst[351:320] := b dst[383:352] := a dst[415:384] := d dst[447:416] := c dst[479:448] := b dst[511:480] := a dst[MAX:512] := 0
Integer AVX512F Set Set packed 32-bit integers in "dst" with the supplied values in reverse order. dst[31:0] := e15 dst[63:32] := e14 dst[95:64] := e13 dst[127:96] := e12 dst[159:128] := e11 dst[191:160] := e10 dst[223:192] := e9 dst[255:224] := e8 dst[287:256] := e7 dst[319:288] := e6 dst[351:320] := e5 dst[383:352] := e4 dst[415:384] := e3 dst[447:416] := e2 dst[479:448] := e1 dst[511:480] := e0 dst[MAX:512] := 0
Integer AVX512F Set Set packed 64-bit integers in "dst" with the supplied values in reverse order. dst[63:0] := e7 dst[127:64] := e6 dst[191:128] := e5 dst[255:192] := e4 dst[319:256] := e3 dst[383:320] := e2 dst[447:384] := e1 dst[511:448] := e0 dst[MAX:512] := 0
Floating Point AVX512F Set Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order. dst[63:0] := e7 dst[127:64] := e6 dst[191:128] := e5 dst[255:192] := e4 dst[319:256] := e3 dst[383:320] := e2 dst[447:384] := e1 dst[511:448] := e0 dst[MAX:512] := 0
Floating Point AVX512F Set Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order. dst[31:0] := e15 dst[63:32] := e14 dst[95:64] := e13 dst[127:96] := e12 dst[159:128] := e11 dst[191:160] := e10 dst[223:192] := e9 dst[255:224] := e8 dst[287:256] := e7 dst[319:288] := e6 dst[351:320] := e5 dst[383:352] := e4 dst[415:384] := e3 dst[447:416] := e2 dst[479:448] := e1 dst[511:480] := e0 dst[MAX:512] := 0
AVX512F Set Return vector of type __m512 with all elements set to zero. dst[MAX:0] := 0
Integer AVX512F Set Return vector of type __m512i with all elements set to zero. dst[MAX:0] := 0
Floating Point AVX512F Set Return vector of type __m512d with all elements set to zero. dst[MAX:0] := 0
Floating Point AVX512F Set Return vector of type __m512 with all elements set to zero. dst[MAX:0] := 0
Integer AVX512F Set Return vector of type __m512i with all elements set to zero. dst[MAX:0] := 0
AVX512F General Support Return vector of type __m512 with undefined elements.
Integer AVX512F General Support Return vector of type __m512i with undefined elements.
Floating Point AVX512F General Support Return vector of type __m512d with undefined elements.
Floating Point AVX512F General Support Return vector of type __m512 with undefined elements.
Floating Point AVX512F Trigonometry Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ACOS(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ACOS(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ACOS(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ACOS(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ACOSH(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ACOSH(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ACOSH(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ACOSH(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ASIN(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ASIN(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ASIN(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ASIN(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ASINH(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ASINH(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ASINH(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ASINH(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians. FOR j := 0 to 7 i := j*64 dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians. FOR j := 0 to 15 i := j*32 dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" and store the results in "dst" expressed in radians. FOR j := 0 to 7 i := j*64 dst[i+63:i] := ATAN(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ATAN(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" expressed in radians. FOR j := 0 to 15 i := j*32 dst[i+31:i] := ATAN(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ATAN(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" and store the results in "dst" expressed in radians. FOR j := 0 to 7 i := j*64 dst[i+63:i] := ATANH(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" expressed in radians using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ATANH(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" expressed in radians. FOR j := 0 to 15 i := j*32 dst[i+31:i] := ATANH(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ATANH(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := CubeRoot(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := CubeRoot(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := CubeRoot(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := CubeRoot(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := CDFNormal(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := CDFNormal(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := CDFNormal(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := CDFNormal(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := InverseCDFNormal(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := InverseCDFNormal(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := InverseCDFNormal(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := InverseCDFNormal(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := CEIL(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := CEIL(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := CEIL(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := CEIL(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := COS(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := COS(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := COS(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := COS(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := COSD(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := COSD(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := COSD(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := COSD(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := COSH(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := COSH(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := COSH(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := COSH(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ERF(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ERF(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := 1.0 - ERF(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := 1.0 - ERF(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ERF(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ERF(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+63:i] := 1.0 - ERF(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+63:i] := 1.0 - ERF(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := 1.0 / ERF(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := 1.0 / ERF(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+63:i] := 1.0 / ERF(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+63:i] := 1.0 / ERF(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i])) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i])) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i])) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Probability/Statistics Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := POW(10.0, a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(10.0, a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := POW(FP32(10.0), a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(FP32(10.0), a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := POW(2.0, a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(2.0, a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := POW(e, a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(e, a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := POW(FP32(e), a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(FP32(e), a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := POW(e, a[i+63:i]) - 1.0 ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(e, a[i+63:i]) - 1.0 ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0 ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := FLOOR(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := FLOOR(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := FLOOR(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FLOOR(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0)) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0)) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0)) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0)) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 15 i := 32*j IF b[i+31:i] == 0 #DE FI dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed signed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] IF b[i+31:i] == 0 #DE FI dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 63 i := 8*j IF b[i+7:i] == 0 #DE FI dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 31 i := 16*j IF b[i+15:i] == 0 #DE FI dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 7 i := 64*j IF b[i+63:i] == 0 #DE FI dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := InvSQRT(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := InvSQRT(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := InvSQRT(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := InvSQRT(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 63 i := 8*j dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 31 i := 16*j dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 7 i := 64*j dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := LOG(1.0 + a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := LOG(1.0 + a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := LOG(1.0 + a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LOG(1.0 + a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := LOG(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LOG(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*64 dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 15 i := j*32 dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Rounds each packed double-precision (64-bit) floating-point element in "a" to the nearest integer value and stores the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := NearbyInt(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Rounds each packed double-precision (64-bit) floating-point element in "a" to the nearest integer value and stores the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := NearbyInt(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Rounds each packed single-precision (32-bit) floating-point element in "a" to the nearest integer value and stores the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := NearbyInt(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Rounds each packed single-precision (32-bit) floating-point element in "a" to the nearest integer value and stores the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := NearbyInt(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := POW(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POW(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := POW(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POW(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Computes the reciprocal of packed double-precision (64-bit) floating-point elements in "a", storing the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := (1.0 / a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Computes the reciprocal of packed double-precision (64-bit) floating-point elements in "a", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (1.0 / a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Computes the reciprocal of packed single-precision (32-bit) floating-point elements in "a", storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (1.0 / a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Elementary Math Functions Computes the reciprocal of packed single-precision (32-bit) floating-point elements in "a", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Rounds the packed double-precision (64-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := RoundToNearestEven(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Rounds the packed double-precision (64-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := RoundToNearestEven(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Rounds the packed single-precision (32-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := RoundToNearestEven(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Rounds the packed single-precision (32-bit) floating-point elements in "a" to the nearest even integer value and stores the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := RoundToNearestEven(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ROUND(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ROUND(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := SIN(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SIN(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := SIN(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SIN(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := SINH(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SINH(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := SINH(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SINH(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := SIND(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SIND(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := SIND(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SIND(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := TAN(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := TAN(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := TAN(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := TAN(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := TAND(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := TAND(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := TAND(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := TAND(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := TANH(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := TANH(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := TANH(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := TANH(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := TRUNCATE(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := TRUNCATE(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := TRUNCATE(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Special Math Functions Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := TRUNCATE(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 15 i := 32*j IF b[i+31:i] == 0 #DE FI dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] IF b[i+31:i] == 0 #DE FI dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 63 i := 8*j IF b[i+7:i] == 0 #DE FI dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 31 i := 16*j IF b[i+15:i] == 0 #DE FI dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 7 i := 64*j IF b[i+63:i] == 0 #DE FI dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 15 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := 32*j IF k[j] dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 63 i := 8*j dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 31 i := 16*j dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F Arithmetic Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 7 i := 64*j dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:512] := 0
Mask AVX512F Mask Performs bitwise OR between "k1" and "k2", storing the result in "dst". ZF flag is set if "dst" is 0. dst[15:0] := k1[15:0] | k2[15:0] IF dst == 0 SetZF() FI
Mask AVX512F Mask Performs bitwise OR between "k1" and "k2", storing the result in "dst". CF flag is set if "dst" consists of all 1's. dst[15:0] := k1[15:0] | k2[15:0] IF PopCount(dst[15:0]) == 16 SetCF() FI
AVX512F Mask Converts bit mask "k1" into an integer value, storing the results in "dst". dst := ZeroExtend32(k1)
AVX512F Mask Converts integer "mask" into bitmask, storing the result in "dst". dst := mask[15:0]
Integer AVX512F Store Multiplies elements in packed 64-bit integer vectors "a" and "b" together, storing the lower 64 bits of the result in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] * b[i+63:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F Store Multiplies elements in packed 64-bit integer vectors "a" and "b" together, storing the lower 64 bits of the result in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr". FOR j := 0 to 7 i := j*64 dst[i+63:i] := SIN(a[i+63:i]) MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i]) ENDFOR dst[MAX:512] := 0 cos_res[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", store the cosine into memory at "mem_addr". Elements are written to their respective locations using writemask "k" (elements are copied from "sin_src" or "cos_src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := SIN(a[i+63:i]) MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i]) ELSE dst[i+63:i] := sin_src[i+63:i] MEM[mem_addr+i+63:mem_addr+i] := cos_src[i+63:i] FI ENDFOR dst[MAX:512] := 0 cos_res[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr". FOR j := 0 to 15 i := j*32 dst[i+31:i] := SIN(a[i+31:i]) MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i]) ENDFOR dst[MAX:512] := 0 cos_res[MAX:512] := 0
Floating Point AVX512F Trigonometry Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", store the cosine into memory at "mem_addr". Elements are written to their respective locations using writemask "k" (elements are copied from "sin_src" or "cos_src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := SIN(a[i+31:i]) MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i]) ELSE dst[i+31:i] := sin_src[i+31:i] MEM[mem_addr+i+31:mem_addr+i] := cos_src[i+31:i] FI ENDFOR dst[MAX:512] := 0 cos_res[MAX:512] := 0
Floating Point AVX512F Convert Copy the lower single-precision (32-bit) floating-point element of "a" to "dst". dst[31:0] := a[31:0]
Floating Point AVX512F Convert Copy the lower double-precision (64-bit) floating-point element of "a" to "dst". dst[63:0] := a[63:0]
Integer AVX512F Convert Copy the lower 32-bit integer in "a" to "dst". dst[31:0] := a[31:0]
Floating Point AVX512F/KNCNI Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] + b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] + b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] + b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Miscellaneous Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 64 bytes (16 elements) in "dst". temp[1023:512] := a[511:0] temp[511:0] := b[511:0] temp[1023:0] := temp[1023:0] >> (32*imm8[3:0]) dst[511:0] := temp[511:0] dst[MAX:512] := 0
Integer AVX512F/KNCNI Miscellaneous Concatenate "a" and "b" into a 128-byte immediate result, shift the result right by "imm8" 32-bit elements, and store the low 64 bytes (16 elements) in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). temp[1023:512] := a[511:0] temp[511:0] := b[511:0] temp[1023:0] := temp[1023:0] >> (32*imm8[3:0]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := temp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Swizzle Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Swizzle Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 7 i := j*64 k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 7 i := j*64 k[j] := (a[i+63:i] OP b[i+63:i]) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := (a[i+63:i] == b[i+63:i]) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := (a[i+63:i] <= b[i+63:i]) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := (a[i+63:i] < b[i+63:i]) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := (a[i+63:i] != b[i+63:i]) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := (!(a[i+63:i] <= b[i+63:i])) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := (!(a[i+63:i] < b[i+63:i])) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k". FOR j := 0 to 7 i := j*64 k[j] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 1 : 0 ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := ( a[i+63:i] OP b[i+63:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := (a[i+63:i] == b[i+63:i]) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := (a[i+63:i] <= b[i+63:i]) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := (a[i+63:i] < b[i+63:i]) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := (a[i+63:i] != b[i+63:i]) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := (!(a[i+63:i] <= b[i+63:i])) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := (!(a[i+63:i] < b[i+63:i])) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k1[j] k[j] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:8] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 15 i := j*32 k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 15 i := j*32 k[j] := (a[i+31:i] OP b[i+31:i]) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := (a[i+31:i] == b[i+31:i]) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := (a[i+31:i] <= b[i+31:i]) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := (a[i+31:i] < b[i+31:i]) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := (a[i+31:i] != b[i+31:i]) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := (!(a[i+31:i] <= b[i+31:i])) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := (!(a[i+31:i] < b[i+31:i])) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ((a[i+31:i] != NaN) AND (b[i+31:i] != NaN)) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ((a[i+31:i] == NaN) OR (b[i+31:i] == NaN)) ? 1 : 0 ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). [sae_note] CASE (imm8[4:0]) OF 0: OP := _CMP_EQ_OQ 1: OP := _CMP_LT_OS 2: OP := _CMP_LE_OS 3: OP := _CMP_UNORD_Q 4: OP := _CMP_NEQ_UQ 5: OP := _CMP_NLT_US 6: OP := _CMP_NLE_US 7: OP := _CMP_ORD_Q 8: OP := _CMP_EQ_UQ 9: OP := _CMP_NGE_US 10: OP := _CMP_NGT_US 11: OP := _CMP_FALSE_OQ 12: OP := _CMP_NEQ_OQ 13: OP := _CMP_GE_OS 14: OP := _CMP_GT_OS 15: OP := _CMP_TRUE_UQ 16: OP := _CMP_EQ_OS 17: OP := _CMP_LT_OQ 18: OP := _CMP_LE_OQ 19: OP := _CMP_UNORD_S 20: OP := _CMP_NEQ_US 21: OP := _CMP_NLT_UQ 22: OP := _CMP_NLE_UQ 23: OP := _CMP_ORD_S 24: OP := _CMP_EQ_US 25: OP := _CMP_NGE_UQ 26: OP := _CMP_NGT_UQ 27: OP := _CMP_FALSE_OS 28: OP := _CMP_NEQ_OS 29: OP := _CMP_GE_OQ 30: OP := _CMP_GT_OQ 31: OP := _CMP_TRUE_US ESAC FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := (a[i+31:i] == b[i+31:i]) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := (a[i+31:i] <= b[i+31:i]) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := (a[i+31:i] < b[i+31:i]) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := (a[i+31:i] != b[i+31:i]) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := (!(a[i+31:i] <= b[i+31:i])) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := (!(a[i+31:i] < b[i+31:i])) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ((a[i+31:i] != NaN) AND (b[i+31:i] != NaN)) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point Mask AVX512F/KNCNI Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ((a[i+31:i] == NaN) OR (b[i+31:i] == NaN)) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := c[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 15 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*64 dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision (64-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 15 i := j*32 dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision (32-bit) floating-point number representing the integer exponent, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "floor(log2(x))" for each element. [sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Normalize the mantissas of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := GetNormalizedMantissa(a[i+63:i], sc, interv) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Miscellaneous Normalize the mantissas of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). This intrinsic essentially calculates "±(2^k)*|x.significand|", where "k" depends on the interval range defined by "interv" and the sign depends on "sc" and the source sign. [getmant_note][sae_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := GetNormalizedMantissa(a[i+31:i], sc, interv) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Load 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Load packed double-precision (64-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Move Move packed double-precision (64-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Store Store packed double-precision (64-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Floating Point AVX512F/KNCNI Store Store 512-bits (composed of 8 packed double-precision (64-bit) floating-point elements) from "a" into memory. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. MEM[mem_addr+511:mem_addr] := a[511:0]
Floating Point AVX512F/KNCNI Load Load 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from memory into "dst". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Load packed single-precision (32-bit) floating-point elements from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Move Move packed single-precision (32-bit) floating-point elements from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Store Store packed single-precision (32-bit) floating-point elements from "a" into memory using writemask "k". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 15 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Floating Point AVX512F/KNCNI Store Store 512-bits (composed of 16 packed single-precision (32-bit) floating-point elements) from "a" into memory. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. MEM[mem_addr+511:mem_addr] := a[511:0]
Integer AVX512F/KNCNI Load Load 512-bits (composed of 16 packed 32-bit integers) from memory into "dst". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Load 512-bits of integer data from memory into "dst". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Load packed 32-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MEM[mem_addr+i+31:mem_addr+i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Move Move packed 32-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Store Store packed 32-bit integers from "a" into memory using writemask "k". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 15 i := j*32 IF k[j] MEM[mem_addr+i+31:mem_addr+i] := a[i+31:i] FI ENDFOR
Integer AVX512F/KNCNI Store Store 512-bits (composed of 16 packed 32-bit integers) from "a" into memory. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. MEM[mem_addr+511:mem_addr] := a[511:0]
Integer AVX512F/KNCNI Store Store 512-bits of integer data from "a" into memory. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. MEM[mem_addr+511:mem_addr] := a[511:0]
Integer AVX512F/KNCNI Load Load 512-bits (composed of 8 packed 64-bit integers) from memory into "dst". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. dst[511:0] := MEM[mem_addr+511:mem_addr] dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Load packed 64-bit integers from memory into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := MEM[mem_addr+i+63:mem_addr+i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Move Move packed 64-bit integers from "a" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Store Store packed 64-bit integers from "a" into memory using writemask "k". "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. FOR j := 0 to 7 i := j*64 IF k[j] MEM[mem_addr+i+63:mem_addr+i] := a[i+63:i] FI ENDFOR
Integer AVX512F/KNCNI Store Store 512-bits (composed of 8 packed 64-bit integers) from "a" into memory. "mem_addr" must be aligned on a 64-byte boundary or a general-protection exception may be generated. MEM[mem_addr+511:mem_addr] := a[511:0]
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). RM. FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] * b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] * b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] * b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). RM. FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] * b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] * b[i+31:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] + b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise AND of packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] AND b[i+31:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise AND of 512 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[511:0] := (a[511:0] AND b[511:0]) dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (NOT a[i+31:i]) AND b[i+31:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise NOT of 512 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst". dst[511:0] := ((NOT a[511:0]) AND b[511:0]) dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise NOT of packed 32-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise NOT of 512 bits (composed of packed 64-bit integers) in "a" and then AND with "b", and store the results in "dst". dst[511:0] := ((NOT a[511:0]) AND b[511:0]) dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise NOT of packed 64-bit integers in "a" and then AND with "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise AND of 512 bits (composed of packed 64-bit integers) in "a" and "b", and store the results in "dst". dst[511:0] := (a[511:0] AND b[511:0]) dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise AND of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] AND b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Swizzle Blend packed 32-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Swizzle Blend packed 64-bit integers from "a" and "b" using control mask "k", and store the results in "dst". FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k". CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" based on the comparison operand specified by "imm8", and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). CASE (imm8[2:0]) OF 0: OP := _MM_CMPINT_EQ 1: OP := _MM_CMPINT_LT 2: OP := _MM_CMPINT_LE 3: OP := _MM_CMPINT_FALSE 4: OP := _MM_CMPINT_NE 5: OP := _MM_CMPINT_NLT 6: OP := _MM_CMPINT_NLE 7: OP := _MM_CMPINT_TRUE ESAC FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] OP b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for equality, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] == b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] >= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for greater-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] > b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] <= b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Compare Compare packed unsigned 32-bit integers in "a" and "b" for not-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] != b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer AVX512F/KNCNI Swizzle Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the "permutevar" name. This intrinsic is identical to "_mm512_mask_permutexvar_epi32", and it is recommended that you use that intrinsic name. FOR j := 0 to 15 i := j*32 id := idx[i+3:i]*32 IF k[j] dst[i+31:i] := a[id+31:id] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Swizzle Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". Note that this intrinsic shuffles across 128-bit lanes, unlike past intrinsics that use the "permutevar" name. This intrinsic is identical to "_mm512_permutexvar_epi32", and it is recommended that you use that intrinsic name. FOR j := 0 to 15 i := j*32 id := idx[i+3:i]*32 dst[i+31:i] := a[id+31:id] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst". "scale" should be 1, 2, 4 or 8. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 15 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Arithmetic Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Arithmetic Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst". FOR j := 0 to 15 i := j*32 tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] OR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise OR of packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] OR b[i+31:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise OR of 512 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[511:0] := (a[511:0] OR b[511:0]) dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] OR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise OR of packed 64-bit integers in "a" and "b", and store the resut in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] OR b[i+63:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Store Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Integer AVX512F/KNCNI Store Scatter 32-bit integers from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 15 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Integer AVX512F/KNCNI Swizzle Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } tmp_dst[31:0] := SELECT4(a[127:0], imm8[1:0]) tmp_dst[63:32] := SELECT4(a[127:0], imm8[3:2]) tmp_dst[95:64] := SELECT4(a[127:0], imm8[5:4]) tmp_dst[127:96] := SELECT4(a[127:0], imm8[7:6]) tmp_dst[159:128] := SELECT4(a[255:128], imm8[1:0]) tmp_dst[191:160] := SELECT4(a[255:128], imm8[3:2]) tmp_dst[223:192] := SELECT4(a[255:128], imm8[5:4]) tmp_dst[255:224] := SELECT4(a[255:128], imm8[7:6]) tmp_dst[287:256] := SELECT4(a[383:256], imm8[1:0]) tmp_dst[319:288] := SELECT4(a[383:256], imm8[3:2]) tmp_dst[351:320] := SELECT4(a[383:256], imm8[5:4]) tmp_dst[383:352] := SELECT4(a[383:256], imm8[7:6]) tmp_dst[415:384] := SELECT4(a[511:384], imm8[1:0]) tmp_dst[447:416] := SELECT4(a[511:384], imm8[3:2]) tmp_dst[479:448] := SELECT4(a[511:384], imm8[5:4]) tmp_dst[511:480] := SELECT4(a[511:384], imm8[7:6]) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := tmp_dst[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Swizzle Shuffle 32-bit integers in "a" within 128-bit lanes using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(a[127:0], imm8[5:4]) dst[127:96] := SELECT4(a[127:0], imm8[7:6]) dst[159:128] := SELECT4(a[255:128], imm8[1:0]) dst[191:160] := SELECT4(a[255:128], imm8[3:2]) dst[223:192] := SELECT4(a[255:128], imm8[5:4]) dst[255:224] := SELECT4(a[255:128], imm8[7:6]) dst[287:256] := SELECT4(a[383:256], imm8[1:0]) dst[319:288] := SELECT4(a[383:256], imm8[3:2]) dst[351:320] := SELECT4(a[383:256], imm8[5:4]) dst[383:352] := SELECT4(a[383:256], imm8[7:6]) dst[415:384] := SELECT4(a[511:384], imm8[1:0]) dst[447:416] := SELECT4(a[511:384], imm8[3:2]) dst[479:448] := SELECT4(a[511:384], imm8[5:4]) dst[511:480] := SELECT4(a[511:384], imm8[7:6]) dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := SignExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Shift Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 15 i := j*32 IF count[i+31:i] < 32 dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR dst[MAX:512] := 0
Integer Mask AVX512F/KNCNI Logical Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" (subject to writemask "k") if the intermediate value is non-zero. FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Integer Mask AVX512F/KNCNI Logical Compute the bitwise AND of packed 32-bit integers in "a" and "b", producing intermediate 32-bit values, and set the corresponding bit in result mask "k" if the intermediate value is non-zero. FOR j := 0 to 15 i := j*32 k[j] := ((a[i+31:i] AND b[i+31:i]) != 0) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise XOR of packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise XOR of 512 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[511:0] := (a[511:0] XOR b[511:0]) dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Logical Compute the bitwise XOR of packed 64-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Floating Point AVX512F/KNCNI Store Scatter single-precision (32-bit) floating-point elements from "a" into memory using 32-bit indices. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not stored when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 15 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] - b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := a[i+63:i] - b[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Cast Cast vector of type __m512d to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer AVX512F/KNCNI Cast Cast vector of type __m512d to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F/KNCNI Cast Cast vector of type __m512 to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer AVX512F/KNCNI Cast Cast vector of type __m512 to type __m512i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F/KNCNI Cast Cast vector of type __m512i to type __m512d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point AVX512F/KNCNI Cast Cast vector of type __m512i to type __m512. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer AVX512F/KNCNI Arithmetic Reduce the packed 32-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a". dst[31:0] := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := dst[31:0] + a[i+31:i] FI ENDFOR
Integer AVX512F/KNCNI Arithmetic Reduce the packed 64-bit integers in "a" by addition using mask "k". Returns the sum of all active elements in "a". dst[63:0] := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := dst[63:0] + a[i+63:i] FI ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Reduce the packed double-precision (64-bit) floating-point elements in "a" by addition using mask "k". Returns the sum of all active elements in "a". dst[63:0] := 0.0 FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := dst[63:0] + a[i+63:i] FI ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Reduce the packed single-precision (32-bit) floating-point elements in "a" by addition using mask "k". Returns the sum of all active elements in "a". dst[31:0] := 0.0 FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := dst[31:0] + a[i+31:i] FI ENDFOR
Integer AVX512F/KNCNI Logical Reduce the packed 32-bit integers in "a" by bitwise AND using mask "k". Returns the bitwise AND of all active elements in "a". dst[31:0] := 0xFFFFFFFF FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := dst[31:0] AND a[i+31:i] FI ENDFOR
Integer AVX512F/KNCNI Logical Reduce the packed 64-bit integers in "a" by bitwise AND using mask "k". Returns the bitwise AND of all active elements in "a". dst[63:0] := 0xFFFFFFFFFFFFFFFF FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := dst[63:0] AND a[i+63:i] FI ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed signed 32-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a". dst[31:0] := Int32(-0x80000000) FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := (dst[31:0] > a[i+31:i] ? dst[31:0] : a[i+31:i]) FI ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed signed 64-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a". dst[63:0] := Int64(-0x8000000000000000) FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := (dst[63:0] > a[i+63:i] ? dst[63:0] : a[i+63:i]) FI ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed unsigned 32-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a". dst[31:0] := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := (dst[31:0] > a[i+31:i] ? dst[31:0] : a[i+31:i]) FI ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed unsigned 64-bit integers in "a" by maximum using mask "k". Returns the maximum of all active elements in "a". dst[63:0] := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := (dst[63:0] > a[i+63:i] ? dst[63:0] : a[i+63:i]) FI ENDFOR
Floating Point AVX512F/KNCNI Special Math Functions Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum using mask "k". Returns the maximum of all active elements in "a". dst[63:0] := Cast_FP64(0xFFEFFFFFFFFFFFFF) FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := (dst[63:0] > a[i+63:i] ? dst[63:0] : a[i+63:i]) FI ENDFOR
Floating Point AVX512F/KNCNI Special Math Functions Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum using mask "k". Returns the maximum of all active elements in "a". dst[31:0] := Cast_FP32(0xFF7FFFFF) FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := (dst[31:0] > a[i+31:i] ? dst[31:0] : a[i+31:i]) FI ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed signed 32-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". dst[31:0] := Int32(0x7FFFFFFF) FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := (dst[31:0] < a[i+31:i] ? dst[31:0] : a[i+31:i]) FI ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed signed 64-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". dst[63:0] := Int64(0x7FFFFFFFFFFFFFFF) FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := (dst[63:0] < a[i+63:i] ? dst[63:0] : a[i+63:i]) FI ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed unsigned 32-bit integers in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". dst[31:0] := 0xFFFFFFFF FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := (dst[31:0] < a[i+31:i] ? dst[31:0] : a[i+31:i]) FI ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed unsigned 64-bit integers in "a" by minimum using mask "k". Returns the minimum of all active elements in "a". dst[63:0] := 0xFFFFFFFFFFFFFFFF FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := (dst[63:0] < a[i+63:i] ? dst[63:0] : a[i+63:i]) FI ENDFOR
Floating Point AVX512F/KNCNI Special Math Functions Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". dst[63:0] := Cast_FP64(0x7FEFFFFFFFFFFFFF) FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := (dst[63:0] < a[i+63:i] ? dst[63:0] : a[i+63:i]) FI ENDFOR
Floating Point AVX512F/KNCNI Special Math Functions Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum using mask "k". Returns the minimum of all active elements in "a". dst[31:0] := Cast_FP32(0x7F7FFFFF) FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := (dst[31:0] < a[i+31:i] ? dst[31:0] : a[i+31:i]) FI ENDFOR
Integer AVX512F/KNCNI Arithmetic Reduce the packed 32-bit integers in "a" by multiplication using mask "k". Returns the product of all active elements in "a". dst[31:0] := 1 FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := dst[31:0] * a[i+31:i] FI ENDFOR
Integer AVX512F/KNCNI Arithmetic Reduce the packed 64-bit integers in "a" by multiplication using mask "k". Returns the product of all active elements in "a". dst[63:0] := 1 FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := dst[63:0] * a[i+63:i] FI ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Reduce the packed double-precision (64-bit) floating-point elements in "a" by multiplication using mask "k". Returns the product of all active elements in "a". dst[63:0] := 1.0 FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := dst[63:0] * a[i+63:i] FI ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Reduce the packed single-precision (32-bit) floating-point elements in "a" by multiplication using mask "k". Returns the product of all active elements in "a". dst[31:0] := FP32(1.0) FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := dst[31:0] * a[i+31:i] FI ENDFOR
Integer AVX512F/KNCNI Logical Reduce the packed 32-bit integers in "a" by bitwise OR using mask "k". Returns the bitwise OR of all active elements in "a". dst[31:0] := 0 FOR j := 0 to 15 i := j*32 IF k[j] dst[31:0] := dst[31:0] OR a[i+31:i] FI ENDFOR
Integer AVX512F/KNCNI Logical Reduce the packed 64-bit integers in "a" by bitwise OR using mask "k". Returns the bitwise OR of all active elements in "a". dst[63:0] := 0 FOR j := 0 to 7 i := j*64 IF k[j] dst[63:0] := dst[63:0] OR a[i+63:i] FI ENDFOR
Integer AVX512F/KNCNI Arithmetic Reduce the packed 32-bit integers in "a" by addition. Returns the sum of all elements in "a". dst[31:0] := 0 FOR j := 0 to 15 i := j*32 dst[31:0] := dst[31:0] + a[i+31:i] ENDFOR
Integer AVX512F/KNCNI Arithmetic Reduce the packed 64-bit integers in "a" by addition. Returns the sum of all elements in "a". dst[63:0] := 0 FOR j := 0 to 7 i := j*64 dst[63:0] := dst[63:0] + a[i+63:i] ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Reduce the packed double-precision (64-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a". dst[63:0] := 0.0 FOR j := 0 to 7 i := j*64 dst[63:0] := dst[63:0] + a[i+63:i] ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Reduce the packed single-precision (32-bit) floating-point elements in "a" by addition. Returns the sum of all elements in "a". dst[31:0] := 0.0 FOR j := 0 to 15 i := j*32 dst[31:0] := dst[31:0] + a[i+31:i] ENDFOR
Integer AVX512F/KNCNI Logical Reduce the packed 32-bit integers in "a" by bitwise AND. Returns the bitwise AND of all elements in "a". dst[31:0] := 0xFFFFFFFF FOR j := 0 to 15 i := j*32 dst[31:0] := dst[31:0] AND a[i+31:i] ENDFOR
Integer AVX512F/KNCNI Logical Reduce the packed 64-bit integers in "a" by bitwise AND. Returns the bitwise AND of all elements in "a". dst[63:0] := 0xFFFFFFFFFFFFFFFF FOR j := 0 to 7 i := j*64 dst[63:0] := dst[63:0] AND a[i+63:i] ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed signed 32-bit integers in "a" by maximum. Returns the maximum of all elements in "a". dst[31:0] := Int32(-0x80000000) FOR j := 0 to 15 i := j*32 dst[31:0] := (dst[31:0] > a[i+31:i] ? dst[31:0] : a[i+31:i]) ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed signed 64-bit integers in "a" by maximum. Returns the maximum of all elements in "a". dst[63:0] := Int64(-0x8000000000000000) FOR j := 0 to 7 i := j*64 dst[63:0] := (dst[63:0] > a[i+63:i] ? dst[63:0] : a[i+63:i]) ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed unsigned 32-bit integers in "a" by maximum. Returns the maximum of all elements in "a". dst[31:0] := 0 FOR j := 0 to 15 i := j*32 dst[31:0] := (dst[31:0] > a[i+31:i] ? dst[31:0] : a[i+31:i]) ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed unsigned 64-bit integers in "a" by maximum. Returns the maximum of all elements in "a". dst[63:0] := 0 FOR j := 0 to 7 i := j*64 dst[63:0] := (dst[63:0] > a[i+63:i] ? dst[63:0] : a[i+63:i]) ENDFOR
Floating Point AVX512F/KNCNI Special Math Functions Reduce the packed double-precision (64-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a". dst[63:0] := Cast_FP64(0xFFEFFFFFFFFFFFFF) FOR j := 0 to 7 i := j*64 dst[63:0] := (dst[63:0] > a[i+63:i] ? dst[63:0] : a[i+63:i]) ENDFOR
Floating Point AVX512F/KNCNI Special Math Functions Reduce the packed single-precision (32-bit) floating-point elements in "a" by maximum. Returns the maximum of all elements in "a". dst[31:0] := Cast_FP32(0xFF7FFFFF) FOR j := 0 to 15 i := j*32 dst[31:0] := (dst[31:0] > a[i+31:i] ? dst[31:0] : a[i+31:i]) ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed signed 32-bit integers in "a" by minimum. Returns the minimum of all elements in "a". dst[31:0] := Int32(0x7FFFFFFF) FOR j := 0 to 15 i := j*32 dst[31:0] := (dst[31:0] < a[i+31:i] ? dst[31:0] : a[i+31:i]) ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed signed 64-bit integers in "a" by minimum. Returns the minimum of all elements in "a". dst[63:0] := Int64(0x7FFFFFFFFFFFFFFF) FOR j := 0 to 7 i := j*64 dst[63:0] := (dst[63:0] < a[i+63:i] ? dst[63:0] : a[i+63:i]) ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed unsigned 32-bit integers in "a" by minimum. Returns the minimum of all elements in "a". dst[31:0] := 0xFFFFFFFF FOR j := 0 to 15 i := j*32 dst[31:0] := (dst[31:0] < a[i+31:i] ? dst[31:0] : a[i+31:i]) ENDFOR
Integer AVX512F/KNCNI Special Math Functions Reduce the packed unsigned 64-bit integers in "a" by minimum. Returns the minimum of all elements in "a". dst[63:0] := 0xFFFFFFFFFFFFFFFF FOR j := 0 to 7 i := j*64 dst[63:0] := (dst[63:0] < a[i+63:i] ? dst[63:0] : a[i+63:i]) ENDFOR
Floating Point AVX512F/KNCNI Special Math Functions Reduce the packed double-precision (64-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a". dst[63:0] := Cast_FP64(0x7FEFFFFFFFFFFFFF) FOR j := 0 to 7 i := j*64 dst[63:0] := (dst[63:0] < a[i+63:i] ? dst[63:0] : a[i+63:i]) ENDFOR
Floating Point AVX512F/KNCNI Special Math Functions Reduce the packed single-precision (32-bit) floating-point elements in "a" by minimum. Returns the minimum of all elements in "a". dst[31:0] := Cast_FP32(0x7F7FFFFF) FOR j := 0 to 15 i := j*32 dst[31:0] := (dst[31:0] < a[i+31:i] ? dst[31:0] : a[i+31:i]) ENDFOR
Integer AVX512F/KNCNI Arithmetic Reduce the packed 32-bit integers in "a" by multiplication. Returns the product of all elements in "a". dst[31:0] := 1 FOR j := 0 to 15 i := j*32 dst[31:0] := dst[31:0] * a[i+31:i] ENDFOR
Integer AVX512F/KNCNI Arithmetic Reduce the packed 64-bit integers in "a" by multiplication. Returns the product of all elements in "a". dst[63:0] := 1 FOR j := 0 to 7 i := j*64 dst[63:0] := dst[63:0] * a[i+63:i] ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Reduce the packed double-precision (64-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a". dst[63:0] := 1.0 FOR j := 0 to 7 i := j*64 dst[63:0] := dst[63:0] * a[i+63:i] ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Reduce the packed single-precision (32-bit) floating-point elements in "a" by multiplication. Returns the product of all elements in "a". dst[31:0] := FP32(1.0) FOR j := 0 to 15 i := j*32 dst[31:0] := dst[31:0] * a[i+31:i] ENDFOR
Integer AVX512F/KNCNI Logical Reduce the packed 32-bit integers in "a" by bitwise OR. Returns the bitwise OR of all elements in "a". dst[31:0] := 0 FOR j := 0 to 15 i := j*32 dst[31:0] := dst[31:0] OR a[i+31:i] ENDFOR
Integer AVX512F/KNCNI Logical Reduce the packed 64-bit integers in "a" by bitwise OR. Returns the bitwise OR of all elements in "a". dst[63:0] := 0 FOR j := 0 to 7 i := j*64 dst[63:0] := dst[63:0] OR a[i+63:i] ENDFOR
Integer AVX512F/KNCNI Logical Performs element-by-element bitwise AND between packed 32-bit integer elements of "v2" and "v3", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v2[i+31:i] & v3[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Convert Performs element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst". FOR j := 0 to 7 i := j*32 n := j*64 dst[n+63:n] := Convert_FP32_To_FP64(v2[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Convert Performs element-by-element conversion of the lower half of packed single-precision (32-bit) floating-point elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[l+63:l] := Convert_FP32_To_FP64(v2[i+31:i]) ELSE dst[l+63:l] := src[l+63:l] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F/KNCNI Convert Performs element-by-element conversion of the lower half of packed 32-bit integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst". FOR j := 0 to 7 i := j*32 l := j*64 dst[l+63:l] := Convert_Int32_To_FP64(v2[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F/KNCNI Convert Performs element-by-element conversion of the lower half of packed 32-bit integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 n := j*64 IF k[j] dst[n+63:n] := Convert_Int32_To_FP64(v2[i+31:i]) ELSE dst[n+63:n] := src[n+63:n] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F/KNCNI Convert Performs element-by-element conversion of the lower half of packed 32-bit unsigned integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst". FOR j := 0 to 7 i := j*32 n := j*64 dst[n+63:n] := Convert_Int32_To_FP64(v2[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer AVX512F/KNCNI Convert Performs element-by-element conversion of the lower half of 32-bit unsigned integer elements in "v2" to packed double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 l := j*64 IF k[j] dst[l+63:l] := Convert_Int32_To_FP64(v2[i+31:i]) ELSE dst[l+63:l] := src[l+63:l] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Up-converts 16 memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv" to 32-bit integer elements and stores them in "dst". AVX512 only supports _MM_UPCONV_EPI32_NONE. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 CASE conv OF _MM_UPCONV_EPI32_NONE: dst[i+31:i] := MEM[addr+31:addr] _MM_UPCONV_EPI32_UINT8: dst[i+31:i] := ZeroExtend32(MEM[addr+7:addr]) _MM_UPCONV_EPI32_SINT8: dst[i+31:i] := SignExtend32(MEM[addr+7:addr]) _MM_UPCONV_EPI32_UINT16: dst[i+31:i] := ZeroExtend32(MEM[addr+15:addr]) _MM_UPCONV_EPI32_SINT16: dst[i+31:i] := SignExtend32(MEM[addr+15:addr]) ESAC ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Up-converts 16 single-precision (32-bit) memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv" to 32-bit integer elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). AVX512 only supports _MM_UPCONV_EPI32_NONE. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_UPCONV_EPI32_NONE: dst[i+31:i] := MEM[addr+31:addr] _MM_UPCONV_EPI32_UINT8: dst[i+31:i] := ZeroExtend32(MEM[addr+7:addr]) _MM_UPCONV_EPI32_SINT8: dst[i+31:i] := SignExtend32(MEM[addr+7:addr]) _MM_UPCONV_EPI32_UINT16: dst[i+31:i] := ZeroExtend32(MEM[addr+15:addr]) _MM_UPCONV_EPI32_SINT16: dst[i+31:i] := SignExtend32(MEM[addr+15:addr]) ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Up-converts 8 double-precision (64-bit) memory locations starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" using "conv" to 64-bit integer elements and stores them in "dst". FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 CASE conv OF _MM_UPCONV_EPI64_NONE: dst[i+63:i] := MEM[addr+63:addr] ESAC ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Up-converts 8 double-precision (64-bit) memory locations starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" using "conv" to 64-bit integer elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_UPCONV_EPI64_NONE: dst[i+63:i] := MEM[addr+63:addr] ESAC ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Up-converts 16 memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv" to single-precision (32-bit) floating-point elements and stores them in "dst". AVX512 only supports _MM_UPCONV_PS_NONE. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 CASE conv OF _MM_UPCONV_PS_NONE: dst[i+31:i] := MEM[addr+31:addr] _MM_UPCONV_PS_FLOAT16: dst[i+31:i] := Convert_FP16_To_FP32(MEM[addr+15:addr]) _MM_UPCONV_PS_UINT8: dst[i+31:i] := Convert_UInt8_To_FP32(MEM[addr+7:addr]) _MM_UPCONV_PS_SINT8: dst[i+31:i] := Convert_Int8_To_FP32(MEM[addr+7:addr]) _MM_UPCONV_PS_UINT16: dst[i+31:i] := Convert_UInt16_To_FP32(MEM[addr+15:addr]) _MM_UPCONV_PS_SINT16: dst[i+31:i] := Convert_Int16_To_FP32(MEM[addr+15:addr]) ESAC ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Up-converts 16 single-precision (32-bit) memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv" to single-precision (32-bit) floating-point elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). AVX512 only supports _MM_UPCONV_PS_NONE. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_UPCONV_PS_NONE: dst[i+31:i] := MEM[addr+31:addr] _MM_UPCONV_PS_FLOAT16: dst[i+31:i] := Convert_FP16_To_FP32(MEM[addr+15:addr]) _MM_UPCONV_PS_UINT8: dst[i+31:i] := Convert_UInt8_To_FP32(MEM[addr+7:addr]) _MM_UPCONV_PS_SINT8: dst[i+31:i] := Convert_Int8_To_FP32(MEM[addr+7:addr]) _MM_UPCONV_PS_UINT16: dst[i+31:i] := Convert_UInt16_To_FP32(MEM[addr+15:addr]) _MM_UPCONV_PS_SINT16: dst[i+31:i] := Convert_Int16_To_FP32(MEM[addr+15:addr]) ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Up-converts 8 double-precision (64-bit) floating-point elements in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" using "conv" to 64-bit floating-point elements and stores them in "dst". FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 CASE conv OF _MM_UPCONV_PD_NONE: dst[i+63:i] := MEM[addr+63:addr] ESAC ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Up-converts 8 double-precision (64-bit) floating-point elements in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" using "conv" to 64-bit floating-point elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_UPCONV_PD_NONE: dst[i+63:i] := MEM[addr+63:addr] ESAC ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Store Down-converts 16 packed single-precision (32-bit) floating-point elements in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv". AVX512 only supports _MM_DOWNCONV_PS_NONE. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 CASE conv OF _MM_DOWNCONV_PS_NONE: MEM[addr+31:addr] := a[i+31:i] _MM_DOWNCONV_PS_FLOAT16: MEM[addr+15:addr] := Convert_FP32_To_FP16(a[i+31:i]) _MM_DOWNCONV_PS_UINT8: MEM[addr+ 7:addr] := Convert_FP32_To_UInt8(a[i+31:i]) _MM_DOWNCONV_PS_SINT8: MEM[addr+ 7:addr] := Convert_FP32_To_Int8(a[i+31:i]) _MM_DOWNCONV_PS_UINT16: MEM[addr+15:addr] := Convert_FP32_To_UInt16(a[i+31:i]) _MM_DOWNCONV_PS_SINT16: MEM[addr+15:addr] := Convert_FP32_To_Int16(a[i+31:i]) ESAC ENDFOR
Floating Point AVX512F/KNCNI Store Down-converts 16 packed single-precision (32-bit) floating-point elements in "a" according to "conv" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using writemask "k" (elements are written only when the corresponding mask bit is not set). AVX512 only supports _MM_DOWNCONV_PS_NONE. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_DOWNCONV_PS_NONE: MEM[addr+31:addr] := a[i+31:i] _MM_DOWNCONV_PS_FLOAT16: MEM[addr+15:addr] := Convert_FP32_To_FP16(a[i+31:i]) _MM_DOWNCONV_PS_UINT8: MEM[addr+ 7:addr] := Convert_FP32_To_UInt8(a[i+31:i]) _MM_DOWNCONV_PS_SINT8: MEM[addr+ 7:addr] := Convert_FP32_To_Int8(a[i+31:i]) _MM_DOWNCONV_PS_UINT16: MEM[addr+15:addr] := Convert_FP32_To_UInt16(a[i+31:i]) _MM_DOWNCONV_PS_SINT16: MEM[addr+15:addr] := Convert_FP32_To_Int16(a[i+31:i]) ESAC FI ENDFOR
Floating Point AVX512F/KNCNI Store Down-converts 8 packed double-precision (64-bit) floating-point elements in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv". FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 CASE conv OF _MM_DOWNCONV_PD_NONE: MEM[addr+63:addr] := a[i+63:i] ESAC ENDFOR
Floating Point AVX512F/KNCNI Store Down-converts 8 packed double-precision (64-bit) floating-point elements in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv". Only those elements whose corresponding mask bit is set in writemask "k" are written to memory. FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_DOWNCONV_PD_NONE: MEM[addr+63:addr] := a[i+63:i] ESAC FI ENDFOR
Integer AVX512F/KNCNI Store Down-converts 8 packed 64-bit integer elements in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv". FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 CASE conv OF _MM_DOWNCONV_EPI64_NONE: MEM[addr+63:addr] := a[i+63:i] ESAC ENDFOR
Integer AVX512F/KNCNI Store Down-converts 8 packed 64-bit integer elements in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using "conv". Only those elements whose corresponding mask bit is set in writemask "k" are written to memory. FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_DOWNCONV_EPI64_NONE: MEM[addr+63:addr] := a[i+63:i] ESAC FI ENDFOR
Floating Point AVX512F/KNCNI Convert Performs an element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to single-precision (32-bit) floating-point elements and stores them in "dst". The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0. FOR j := 0 to 7 i := j*64 k := j*32 dst[k+31:k] := Convert_FP64_To_FP32(v2[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Convert Performs an element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to single-precision (32-bit) floating-point elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The elements are stored in the lower half of the results vector, while the remaining upper half locations are set to 0. FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_FP64_To_FP32(v2[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Loads 8 64-bit integer elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" and stores them in "dst". FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Load Loads 8 64-bit integer elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Loads 8 double-precision (64-bit) floating-point elements stored at memory locations starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" them in "dst". FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Load Loads 8 double-precision (64-bit) floating-point elements from memory starting at location "base_addr" at packed 32-bit integer indices stored in the lower half of "vindex" scaled by "scale" into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 dst[i+63:i] := MEM[addr+63:addr] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Store Stores 8 packed double-precision (64-bit) floating-point elements in "a" and to memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Floating Point AVX512F/KNCNI Store Stores 8 packed double-precision (64-bit) floating-point elements in "a" to memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". Only those elements whose corresponding mask bit is set in writemask "k" are written to memory. FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Floating Point AVX512F/KNCNI Arithmetic Finds the absolute value of each packed single-precision (32-bit) floating-point element in "v2", storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ABS(v2[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Finds the absolute value of each packed single-precision (32-bit) floating-point element in "v2", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ABS(v2[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Finds the absolute value of each packed double-precision (64-bit) floating-point element in "v2", storing the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ABS(v2[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Arithmetic Finds the absolute value of each packed double-precision (64-bit) floating-point element in "v2", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ABS(v2[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Elementary Math Functions Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0) ENDFOR dst[MAX:512] := 0
Floating Point AVX512F/KNCNI Elementary Math Functions Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512F/KNCNI Store Down-converts 16 packed 32-bit integer elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". "hint" indicates to the processor whether the data is non-temporal. AVX512 only supports _MM_DOWNCONV_EPI32_NONE. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 CASE conv OF _MM_DOWNCONV_EPI32_NONE: MEM[addr+31:addr] := a[i+31:i] _MM_DOWNCONV_EPI32_UINT8: MEM[addr+ 7:addr] := Truncate8(a[i+31:i]) _MM_DOWNCONV_EPI32_SINT8: MEM[addr+ 7:addr] := Saturate8(a[i+31:i]) _MM_DOWNCONV_EPI32_UINT16: MEM[addr+15:addr] := Truncate16(a[i+31:i]) _MM_DOWNCONV_EPI32_SINT16: MEM[addr+15:addr] := Saturate16(a[i+15:i]) ESAC ENDFOR
Integer AVX512F/KNCNI Store Down-converts 16 packed 32-bit integer elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". Elements are written using writemask "k" (elements are only written when the corresponding mask bit is set; otherwise, elements are left unchanged in memory). "hint" indicates to the processor whether the data is non-temporal. AVX512 only supports _MM_DOWNCONV_EPI32_NONE. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_DOWNCONV_EPI32_NONE: MEM[addr+31:addr] := a[i+31:i] _MM_DOWNCONV_EPI32_UINT8: MEM[addr+ 7:addr] := Truncate8(a[i+31:i]) _MM_DOWNCONV_EPI32_SINT8: MEM[addr+ 7:addr] := Saturate8(a[i+31:i]) _MM_DOWNCONV_EPI32_UINT16: MEM[addr+15:addr] := Truncate16(a[i+31:i]) _MM_DOWNCONV_EPI32_SINT16: MEM[addr+15:addr] := Saturate16(a[i+15:i]) ESAC FI ENDFOR
AVX512IFMA52 Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ENDFOR dst[MAX:512] := 0
AVX512IFMA52 Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
AVX512IFMA52 Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ENDFOR dst[MAX:256] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ENDFOR dst[MAX:128] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[51:0]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
AVX512IFMA52 Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 7 i := j*64 tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ENDFOR dst[MAX:512] := 0
AVX512IFMA52 Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
AVX512IFMA52 Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 3 i := j*64 tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ENDFOR dst[MAX:256] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ENDFOR dst[MAX:128] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
AVX512IFMA52 AVX512VL Arithmetic Multiply packed unsigned 52-bit integers in each 64-bit element of "b" and "c" to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := ZeroExtend64(b[i+51:i]) * ZeroExtend64(c[i+51:i]) dst[i+63:i] := a[i+63:i] + ZeroExtend64(tmp[103:52]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512PF Load Prefetch single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged in cache. "scale" should be 1, 2, 4 or 8. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j:= 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) ENDFOR
Floating Point AVX512PF Load Prefetch single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged in cache using writemask "k" (elements are only brought into cache when their corresponding mask bit is set). "scale" should be 1, 2, 4 or 8.. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j:= 0 to 7 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) FI ENDFOR
Floating Point AVX512PF Store Prefetch single-precision (32-bit) floating-point elements with intent to write into memory using 64-bit indices. Elements are prefetched into cache level "hint", where "hint" is 0 or 1. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) ENDFOR
Floating Point AVX512PF Store Prefetch single-precision (32-bit) floating-point elements with intent to write into memory using 64-bit indices. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. 32-bit elements are stored at addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not brought into cache when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) FI ENDFOR
Floating Point AVX512PF Load Prefetch double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged in cache. "scale" should be 1, 2, 4 or 8. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+63:addr], hint) ENDFOR
Floating Point AVX512PF Load Prefetch double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged in cache using writemask "k" (elements are brought into cache only when their corresponding mask bits are set). "scale" should be 1, 2, 4 or 8. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+63:addr], hint) FI ENDFOR
Floating Point AVX512PF Store Prefetch double-precision (64-bit) floating-point elements with intent to write using 32-bit indices. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. 64-bit elements are brought into cache from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+63:addr], hint) ENDFOR
Floating Point AVX512PF Store Prefetch double-precision (64-bit) floating-point elements with intent to write using 32-bit indices. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. 64-bit elements are brought into cache from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not brought into cache when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+63:addr], hint) FI ENDFOR
Floating Point AVX512PF Load Prefetch double-precision (64-bit) floating-point elements from memory into cache level specified by "hint" using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+63:addr], hint) ENDFOR
Floating Point AVX512PF Load Prefetch double-precision (64-bit) floating-point elements from memory into cache level specified by "hint" using 64-bit indices. 64-bit elements are loaded from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). Prefetched elements are merged in cache using writemask "k" (elements are copied from memory when the corresponding mask bit is set). "scale" should be 1, 2, 4 or 8. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j := 0 to 7 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+63:addr], hint) FI ENDFOR
Floating Point AVX512PF Store Prefetch double-precision (64-bit) floating-point elements with intent to write into memory using 64-bit indices. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. 64-bit elements are brought into cache from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale"). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+63:addr], hint) ENDFOR
Floating Point AVX512PF Store Prefetch double-precision (64-bit) floating-point elements with intent to write into memory using 64-bit indices. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. 64-bit elements are brought into cache from addresses starting at "base_addr" and offset by each 64-bit element in "vindex" (each index is scaled by the factor in "scale") subject to mask "k" (elements are not brought into cache when the corresponding mask bit is not set). "scale" should be 1, 2, 4 or 8. FOR j := 0 to 7 i := j*64 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+63:addr], hint) FI ENDFOR
Floating Point AVX512PF/KNCNI Load Prefetch single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at "base_addr" and offset by each 32-bit element in "vindex" (each index is scaled by the factor in "scale"). Gathered elements are merged in cache using writemask "k" (elements are brought into cache only when their corresponding mask bits are set). "scale" should be 1, 2, 4 or 8. The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j := 0 to 15 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) FI ENDFOR
Floating Point AVX512PF/KNCNI Load Prefetches a set of 16 single-precision (32-bit) memory locations pointed by base address "base_addr" and 32-bit integer index vector "vindex" with scale "scale" to L1 or L2 level of cache depending on the value of "hint". The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. The "conv" parameter specifies the granularity used by compilers to better encode the instruction. It should be the same as the "conv" parameter specified for the subsequent gather intrinsic. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) ENDFOR
Floating Point AVX512PF/KNCNI Load Prefetches a set of 16 single-precision (32-bit) memory locations pointed by base address "base_addr" and 32-bit integer index vector "vindex" with scale "scale" to L1 or L2 level of cache depending on the value of "hint". Gathered elements are merged in cache using writemask "k" (elements are brought into cache only when their corresponding mask bits are set). The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. The "conv" parameter specifies the granularity used by compilers to better encode the instruction. It should be the same as the "conv" parameter specified for the subsequent gather intrinsic. FOR j := 0 to 15 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) FI ENDFOR
Floating Point AVX512PF/KNCNI Store Prefetches a set of 16 single-precision (32-bit) memory locations pointed by base address "base_addr" and 32-bit integer index vector "vindex" with scale "scale" to L1 or L2 level of cache depending on the value of "hint", with a request for exclusive ownership. The "hint" parameter may be one of the following: _MM_HINT_T0 = 1 for prefetching to L1 cache, _MM_HINT_T1 = 2 for prefetching to L2 cache, _MM_HINT_T2 = 3 for prefetching to L2 cache non-temporal, _MM_HINT_NTA = 0 for prefetching to L1 cache non-temporal. The "conv" parameter specifies the granularity used by compilers to better encode the instruction. It should be the same as the "conv" parameter specified for the subsequent scatter intrinsic. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) ENDFOR
Floating Point AVX512PF/KNCNI Store Prefetches a set of 16 single-precision (32-bit) memory locations pointed by base address "base_addr" and 32-bit integer index vector "vindex" with scale "scale" to L1 or L2 level of cache depending on the value of "hint". The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. The "conv" parameter specifies the granularity used by compilers to better encode the instruction. It should be the same as the "conv" parameter specified for the subsequent gather intrinsic. Only those elements whose corresponding mask bit in "k" is set are loaded into cache. cachev := 0 FOR j := 0 to 15 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) FI ENDFOR
Floating Point AVX512PF/KNCNI Load Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) ENDFOR
Floating Point AVX512PF/KNCNI Store Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. FOR j := 0 to 15 i := j*32 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) ENDFOR
Floating Point AVX512PF/KNCNI Store Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". The "hint" parameter may be 1 (_MM_HINT_T0) for prefetching to L1 cache, or 2 (_MM_HINT_T1) for prefetching to L2 cache. Only those elements whose corresponding mask bit in "k" is set are loaded into the desired cache. FOR j := 0 to 15 i := j*32 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 Prefetch(MEM[addr+31:addr], hint) FI ENDFOR
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := POPCNT(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := POPCNT(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 3 i := j*64 dst[i+63:i] := POPCNT(a[i+63:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := POPCNT(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := POPCNT(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 1 i := j*64 dst[i+63:i] := POPCNT(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*32 dst[i+31:i] := POPCNT(a[i+31:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := POPCNT(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := POPCNT(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 3 i := j*32 dst[i+31:i] := POPCNT(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := POPCNT(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512VPOPCNTDQ AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := POPCNT(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512VPOPCNTDQ Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*32 dst[i+31:i] := POPCNT(a[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VPOPCNTDQ Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POPCNT(a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512VPOPCNTDQ Bit Manipulation Count the number of logical 1 bits in packed 32-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := POPCNT(a[i+31:i]) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512VPOPCNTDQ Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*64 dst[i+63:i] := POPCNT(a[i+63:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512VPOPCNTDQ Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POPCNT(a[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512VPOPCNTDQ Bit Manipulation Count the number of logical 1 bits in packed 64-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := POPCNT(a[i+63:i]) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply packed single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by the 4 corresponding packed elements in "b", accumulate with the corresponding elements in "src", and store the results in "dst". dst[511:0] := src[511:0] FOR i := 0 to 15 FOR m := 0 to 3 addr := b + m * 32 dst.fp32[i] := dst.fp32[i] + a{m}.fp32[i] * Cast_FP32(MEM[addr+31:addr]) ENDFOR ENDFOR dst[MAX:512] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply packed single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by the 4 corresponding packed elements in "b", accumulate with the corresponding elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). dst[511:0] := src[511:0] FOR i := 0 to 15 FOR m := 0 to 3 addr := b + m * 32 IF k[i] dst.fp32[i] := dst.fp32[i] + a{m}.fp32[i] * Cast_FP32(MEM[addr+31:addr]) FI ENDFOR ENDFOR dst[MAX:512] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply packed single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by the 4 corresponding packed elements in "b", accumulate with the corresponding elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). dst[511:0] := src[511:0] FOR i := 0 to 15 FOR m := 0 to 3 addr := b + m * 32 IF k[i] dst.fp32[i] := dst.fp32[i] + a{m}.fp32[i] * Cast_FP32(MEM[addr+31:addr]) ELSE dst.fp32[i] := 0 FI ENDFOR ENDFOR dst[MAX:512] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply packed single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by the 4 corresponding packed elements in "b", accumulate the negated intermediate result with the corresponding elements in "src", and store the results in "dst". dst[511:0] := src[511:0] FOR i := 0 to 15 FOR m := 0 to 3 addr := b + m * 32 dst.fp32[i] := dst.fp32[i] - a{m}.fp32[i] * Cast_FP32(MEM[addr+31:addr]) ENDFOR ENDFOR dst[MAX:512] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply packed single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by the 4 corresponding packed elements in "b", accumulate the negated intermediate result with the corresponding elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). dst[511:0] := src[511:0] FOR i := 0 to 15 FOR m := 0 to 3 addr := b + m * 32 IF k[i] dst.fp32[i] := dst.fp32[i] - a{m}.fp32[i] * Cast_FP32(MEM[addr+31:addr]) FI ENDFOR ENDFOR dst[MAX:512] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply packed single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by the 4 corresponding packed elements in "b", accumulate the negated intermediate result with the corresponding elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). dst[511:0] := src[511:0] FOR i := 0 to 15 FOR m := 0 to 3 addr := b + m * 32 IF k[i] dst.fp32[i] := dst.fp32[i] - a{m}.fp32[i] * Cast_FP32(MEM[addr+31:addr]) ELSE dst.fp32[i] := 0 FI ENDFOR ENDFOR dst[MAX:512] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply the lower single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by corresponding element in "b", accumulate with the lower element in "a", and store the result in the lower element of "dst". dst[127:0] := src[127:0] FOR m := 0 to 3 addr := b + m * 32 dst.fp32[0] := dst.fp32[0] + a{m}.fp32[0] * Cast_FP32(MEM[addr+31:addr]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply the lower single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by corresponding element in "b", accumulate with the lower element in "a", and store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set). dst[127:0] := src[127:0] IF k[0] FOR m := 0 to 3 addr := b + m * 32 dst.fp32[0] := dst.fp32[0] + a{m}.fp32[0] * Cast_FP32(MEM[addr+31:addr]) ENDFOR FI dst[MAX:128] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply the lower single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by corresponding element in "b", accumulate with the lower element in "a", and store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set). dst[127:0] := src[127:0] IF k[0] FOR m := 0 to 3 addr := b + m * 32 dst.fp32[0] := dst.fp32[0] + a{m}.fp32[0] * Cast_FP32(MEM[addr+31:addr]) ENDFOR ELSE dst.fp32[0] := 0 FI dst[MAX:128] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply the lower single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by corresponding element in "b", accumulate the negated intermediate result with the lower element in "src", and store the result in the lower element of "dst". dst[127:0] := src[127:0] FOR m := 0 to 3 addr := b + m * 32 dst.fp32[0] := dst.fp32[0] - a{m}.fp32[0] * Cast_FP32(MEM[addr+31:addr]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply the lower single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by corresponding element in "b", accumulate the negated intermediate result with the lower element in "src", and store the result in the lower element of "dst" using writemask "k" (the element is copied from "a" when mask bit 0 is not set). dst[127:0] := src[127:0] IF k[0] FOR m := 0 to 3 addr := b + m * 32 dst.fp32[0] := dst.fp32[0] - a{m}.fp32[0] * Cast_FP32(MEM[addr+31:addr]) ENDFOR FI dst[MAX:128] := 0
Floating Point AVX512_4FMAPS Arithmetic Multiply the lower single-precision (32-bit) floating-point elements specified in 4 consecutive operands "a0" through "a3" by corresponding element in "b", accumulate the negated intermediate result with the lower element in "src", and store the result in the lower element of "dst" using zeromask "k" (the element is zeroed out when mask bit 0 is not set). dst[127:0] := src[127:0] IF k[0] FOR m := 0 to 3 addr := b + m * 32 dst.fp32[0] := dst.fp32[0] - a{m}.fp32[0] * Cast_FP32(MEM[addr+31:addr]) ENDFOR ELSE dst.fp32[0] := 0 FI dst[MAX:128] := 0
Integer AVX512_4VNNIW Arithmetic Compute 4 sequential operand source-block dot-products of two signed 16-bit element operands with 32-bit element accumulation, and store the results in "dst". dst[511:0] := src[511:0] FOR i := 0 to 15 FOR m := 0 to 3 lim_base := b + m*32 t.dword := MEM[lim_base+31:lim_base] p1.dword := SignExtend32(a{m}.word[2*i+0]) * SignExtend32(Cast_Int16(t.word[0])) p2.dword := SignExtend32(a{m}.word[2*i+1]) * SignExtend32(Cast_Int16(t.word[1])) dst.dword[i] := dst.dword[i] + p1.dword + p2.dword ENDFOR ENDFOR dst[MAX:512] := 0
Integer AVX512_4VNNIW Arithmetic Compute 4 sequential operand source-block dot-products of two signed 16-bit element operands with 32-bit element accumulation with mask, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). dst[511:0] := src[511:0] FOR i := 0 to 15 IF k[i] FOR m := 0 to 3 lim_base := b + m*32 t.dword := MEM[lim_base+31:lim_base] p1.dword := SignExtend32(a{m}.word[2*i+0]) * SignExtend32(Cast_Int16(t.word[0])) p2.dword := SignExtend32(a{m}.word[2*i+1]) * SignExtend32(Cast_Int16(t.word[1])) dst.dword[i] := dst.dword[i] + p1.dword + p2.dword ENDFOR ELSE dst.dword[i] := src.dword[i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_4VNNIW Arithmetic Compute 4 sequential operand source-block dot-products of two signed 16-bit element operands with 32-bit element accumulation with mask, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). dst[511:0] := src[511:0] FOR i := 0 to 15 IF k[i] FOR m := 0 to 3 lim_base := b + m*32 t.dword := MEM[lim_base+31:lim_base] p1.dword := SignExtend32(a{m}.word[2*i+0]) * SignExtend32(Cast_Int16(t.word[0])) p2.dword := SignExtend32(a{m}.word[2*i+1]) * SignExtend32(Cast_Int16(t.word[1])) dst.dword[i] := dst.dword[i] + p1.dword + p2.dword ENDFOR ELSE dst.dword[i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_4VNNIW Arithmetic Compute 4 sequential operand source-block dot-products of two signed 16-bit element operands with 32-bit element accumulation and signed saturation, and store the results in "dst". dst[511:0] := src[511:0] FOR i := 0 to 15 FOR m := 0 to 3 lim_base := b + m*32 t.dword := MEM[lim_base+31:lim_base] p1.dword := SignExtend32(a{m}.word[2*i+0]) * SignExtend32(Cast_Int16(t.word[0])) p2.dword := SignExtend32(a{m}.word[2*i+1]) * SignExtend32(Cast_Int16(t.word[1])) dst.dword[i] := Saturate32(dst.dword[i] + p1.dword + p2.dword) ENDFOR ENDFOR dst[MAX:512] := 0
Integer AVX512_4VNNIW Arithmetic Compute 4 sequential operand source-block dot-products of two signed 16-bit element operands with 32-bit element accumulation with mask and signed saturation, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set).. dst[511:0] := src[511:0] FOR i := 0 to 15 IF k[i] FOR m := 0 to 3 lim_base := b + m*32 t.dword := MEM[lim_base+31:lim_base] p1.dword := SignExtend32(a{m}.word[2*i+0]) * SignExtend32(Cast_Int16(t.word[0])) p2.dword := SignExtend32(a{m}.word[2*i+1]) * SignExtend32(Cast_Int16(t.word[1])) dst.dword[i] := Saturate32(dst.dword[i] + p1.dword + p2.dword) ENDFOR ELSE dst.dword[i] := src.dword[i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_4VNNIW Arithmetic Compute 4 sequential operand source-block dot-products of two signed 16-bit element operands with 32-bit element accumulation with mask and signed saturation, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set).. dst[511:0] := src[511:0] FOR i := 0 to 15 IF k[i] FOR m := 0 to 3 lim_base := b + m*32 t.dword := MEM[lim_base+31:lim_base] p1.dword := SignExtend32(a{m}.word[2*i+0]) * SignExtend32(Cast_Int16(t.word[0])) p2.dword := SignExtend32(a{m}.word[2*i+1]) * SignExtend32(Cast_Int16(t.word[1])) dst.dword[i] := Saturate32(dst.dword[i] + p1.dword + p2.dword) ENDFOR ELSE dst.dword[i] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst". FOR j := 0 to 7 IF j < 4 t := b.fp32[j] ELSE t := a.fp32[j-4] FI dst.word[j] := Convert_FP32_To_BF16(t) ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] IF j < 4 t := b.fp32[j] ELSE t := a.fp32[j-4] FI dst.word[j] := Convert_FP32_To_BF16(t) ELSE dst.word[j] := src.word[j] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] IF j < 4 t := b.fp32[j] ELSE t := a.fp32[j-4] FI dst.word[j] := Convert_FP32_To_BF16(t) ELSE dst.word[j] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst". FOR j := 0 to 15 IF j < 8 t := b.fp32[j] ELSE t := a.fp32[j-8] FI dst.word[j] := Convert_FP32_To_BF16(t) ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] IF j < 8 t := b.fp32[j] ELSE t := a.fp32[j-8] FI dst.word[j] := Convert_FP32_To_BF16(t) ELSE dst.word[j] := src.word[j] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] IF j < 8 t := b.fp32[j] ELSE t := a.fp32[j-8] FI dst.word[j] := Convert_FP32_To_BF16(t) ELSE dst.word[j] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst". FOR j := 0 to 31 IF j < 16 t := b.fp32[j] ELSE t := a.fp32[j-16] FI dst.word[j] := Convert_FP32_To_BF16(t) ENDFOR dst[MAX:512] := 0
Floating Point AVX512_BF16 AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 IF k[j] IF j < 16 t := b.fp32[j] ELSE t := a.fp32[j-16] FI dst.word[j] := Convert_FP32_To_BF16(t) ELSE dst.word[j] := src.word[j] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512_BF16 AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in two vectors "a" and "b" to packed BF16 (16-bit) floating-point elements, and store the results in single vector "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 IF k[j] IF j < 16 t := b.fp32[j] ELSE t := a.fp32[j-16] FI dst.word[j] := Convert_FP32_To_BF16(t) ELSE dst.word[j] := 0 FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ELSE dst.word[j] := src.word[j] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ELSE dst.word[j] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ELSE dst.word[j] := src.word[j] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ELSE dst.word[j] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 15 dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ELSE dst.word[j] := src.word[j] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512F Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed BF16 (16-bit) floating-point elements, and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] dst.word[j] := Convert_FP32_To_BF16(a.fp32[j]) ELSE dst.word[j] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512VL Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst". DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 3 dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 3 IF k[j] dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 3 IF k[j] dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:128] := 0
Floating Point AVX512_BF16 AVX512VL Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst". DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 7 dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512VL Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 7 IF k[j] dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512VL Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 7 IF k[j] dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:256] := 0
Floating Point AVX512_BF16 AVX512F Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst". DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 15 dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ENDFOR dst[MAX:512] := 0
Floating Point AVX512_BF16 AVX512F Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 15 IF k[j] dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:512] := 0
Floating Point AVX512_BF16 AVX512F Arithmetic Compute dot-product of BF16 (16-bit) floating-point pairs in "a" and "b", accumulating the intermediate single-precision (32-bit) floating-point elements with elements in "src", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE make_fp32(x[15:0]) { y.fp32 := 0.0 y[31:16] := x[15:0] RETURN y } dst := src FOR j := 0 to 15 IF k[j] dst.fp32[j] += make_fp32(a.bf16[2*j+1]) * make_fp32(b.bf16[2*j+1]) dst.fp32[j] += make_fp32(a.bf16[2*j+0]) * make_fp32(b.bf16[2*j+0]) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:512] := 0
Integer Mask AVX512_BITALG Bit Manipulation Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR i := 0 to 7 //Qword FOR j := 0 to 7 // Byte IF k[i*8+j] m := c.qword[i].byte[j] & 0x3F dst[i*8+j] := b.qword[i].bit[m] ELSE dst[i*8+j] := 0 FI ENDFOR ENDFOR dst[MAX:64] := 0
Integer Mask AVX512_BITALG Bit Manipulation Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst". FOR i := 0 to 7 //Qword FOR j := 0 to 7 // Byte m := c.qword[i].byte[j] & 0x3F dst[i*8+j] := b.qword[i].bit[m] ENDFOR ENDFOR dst[MAX:64] := 0
Integer Mask AVX512_BITALG AVX512VL Bit Manipulation Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR i := 0 to 3 //Qword FOR j := 0 to 7 // Byte IF k[i*8+j] m := c.qword[i].byte[j] & 0x3F dst[i*8+j] := b.qword[i].bit[m] ELSE dst[i*8+j] := 0 FI ENDFOR ENDFOR dst[MAX:32] := 0
Integer Mask AVX512_BITALG AVX512VL Bit Manipulation Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst". FOR i := 0 to 3 //Qword FOR j := 0 to 7 // Byte m := c.qword[i].byte[j] & 0x3F dst[i*8+j] := b.qword[i].bit[m] ENDFOR ENDFOR dst[MAX:32] := 0
Integer Mask AVX512_BITALG AVX512VL Bit Manipulation Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR i := 0 to 1 //Qword FOR j := 0 to 7 // Byte IF k[i*8+j] m := c.qword[i].byte[j] & 0x3F dst[i*8+j] := b.qword[i].bit[m] ELSE dst[i*8+j] := 0 FI ENDFOR ENDFOR dst[MAX:16] := 0
Integer Mask AVX512_BITALG AVX512VL Bit Manipulation Gather 64 bits from "b" using selection bits in "c". For each 64-bit element in "b", gather 8 bits from the 64-bit element in "b" at 8 bit position controlled by the 8 corresponding 8-bit elements of "c", and store the result in the corresponding 8-bit element of "dst". FOR i := 0 to 1 //Qword FOR j := 0 to 7 // Byte m := c.qword[i].byte[j] & 0x3F dst[i*8+j] := b.qword[i].bit[m] ENDFOR ENDFOR dst[MAX:16] := 0
Integer AVX512_BITALG Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 31 i := j*16 dst[i+15:i] := POPCNT(a[i+15:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512_BITALG Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := POPCNT(a[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_BITALG Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := POPCNT(a[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*16 dst[i+15:i] := POPCNT(a[i+15:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := POPCNT(a[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := POPCNT(a[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*16 dst[i+15:i] := POPCNT(a[i+15:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := POPCNT(a[i+15:i]) ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 16-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := POPCNT(a[i+15:i]) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_BITALG Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 63 i := j*8 dst[i+7:i] := POPCNT(a[i+7:i]) ENDFOR dst[MAX:512] := 0
Integer AVX512_BITALG Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := POPCNT(a[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_BITALG Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := POPCNT(a[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 31 i := j*8 dst[i+7:i] := POPCNT(a[i+7:i]) ENDFOR dst[MAX:256] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := POPCNT(a[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := POPCNT(a[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst". DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*8 dst[i+7:i] := POPCNT(a[i+7:i]) ENDFOR dst[MAX:128] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := POPCNT(a[i+7:i]) ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_BITALG AVX512VL Bit Manipulation Count the number of logical 1 bits in packed 8-bit integers in "a", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE POPCNT(a) { count := 0 DO WHILE a > 0 count += a[0] a >>= 1 OD RETURN count } FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := POPCNT(a[i+7:i]) ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
AVX512_VBMI Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst". FOR i := 0 to 7 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR dst[q+j*8+7:q+j*8] := tmp8[7:0] ENDFOR ENDFOR dst[MAX:512] := 0
AVX512_VBMI Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR i := 0 to 7 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR IF k[i*8+j] dst[q+j*8+7:q+j*8] := tmp8[7:0] ELSE dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8] FI ENDFOR ENDFOR dst[MAX:512] := 0
AVX512_VBMI Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR i := 0 to 7 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR IF k[i*8+j] dst[q+j*8+7:q+j*8] := tmp8[7:0] ELSE dst[q+j*8+7:q+j*8] := 0 FI ENDFOR ENDFOR dst[MAX:512] := 0
AVX512_VBMI AVX512VL Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst". FOR i := 0 to 3 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR dst[q+j*8+7:q+j*8] := tmp8[7:0] ENDFOR ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR i := 0 to 3 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR IF k[i*8+j] dst[q+j*8+7:q+j*8] := tmp8[7:0] ELSE dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8] FI ENDFOR ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR i := 0 to 3 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR IF k[i*8+j] dst[q+j*8+7:q+j*8] := tmp8[7:0] ELSE dst[q+j*8+7:q+j*8] := 0 FI ENDFOR ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst". FOR i := 0 to 1 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR dst[q+j*8+7:q+j*8] := tmp8[7:0] ENDFOR ENDFOR dst[MAX:128] := 0
AVX512_VBMI AVX512VL Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR i := 0 to 1 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR IF k[i*8+j] dst[q+j*8+7:q+j*8] := tmp8[7:0] ELSE dst[q+j*8+7:q+j*8] := src[q+j*8+7:q+j*8] FI ENDFOR ENDFOR dst[MAX:128] := 0
AVX512_VBMI AVX512VL Bit Manipulation For each 64-bit element in "b", select 8 unaligned bytes using a byte-granular shift control within the corresponding 64-bit element of "a", and store the 8 assembled bytes to the corresponding 64-bit element of "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR i := 0 to 1 q := i * 64 FOR j := 0 to 7 tmp8 := 0 ctrl := a[q+j*8+7:q+j*8] & 63 FOR l := 0 to 7 tmp8[l] := b[q+((ctrl+l) & 63)] ENDFOR IF k[i*8+j] dst[q+j*8+7:q+j*8] := tmp8[7:0] ELSE dst[q+j*8+7:q+j*8] := 0 FI ENDFOR ENDFOR dst[MAX:128] := 0
AVX512_VBMI Swizzle Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 63 i := j*8 id := idx[i+5:i]*8 dst[i+7:i] := a[id+7:id] ENDFOR dst[MAX:512] := 0
AVX512_VBMI Swizzle Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 id := idx[i+5:i]*8 IF k[j] dst[i+7:i] := a[id+7:id] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
AVX512_VBMI Swizzle Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 id := idx[i+5:i]*8 IF k[j] dst[i+7:i] := a[id+7:id] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 31 i := j*8 id := idx[i+4:i]*8 dst[i+7:i] := a[id+7:id] ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 id := idx[i+4:i]*8 IF k[j] dst[i+7:i] := a[id+7:id] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 id := idx[i+4:i]*8 IF k[j] dst[i+7:i] := a[id+7:id] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst". FOR j := 0 to 15 i := j*8 id := idx[i+3:i]*8 dst[i+7:i] := a[id+7:id] ENDFOR dst[MAX:128] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 id := idx[i+3:i]*8 IF k[j] dst[i+7:i] := a[id+7:id] ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" using the corresponding index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 id := idx[i+3:i]*8 IF k[j] dst[i+7:i] := a[id+7:id] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
AVX512_VBMI Swizzle Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 63 i := j*8 off := 8*idx[i+5:i] dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off] ENDFOR dst[MAX:512] := 0
AVX512_VBMI Swizzle Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] off := 8*idx[i+5:i] dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR dst[MAX:512] := 0
AVX512_VBMI Swizzle Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] off := 8*idx[i+5:i] dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := idx[i+7:i] FI ENDFOR dst[MAX:512] := 0
AVX512_VBMI Swizzle Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 63 i := j*8 IF k[j] off := 8*idx[i+5:i] dst[i+7:i] := idx[i+6] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 31 i := j*8 off := 8*idx[i+4:i] dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off] ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] off := 8*idx[i+4:i] dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] off := 8*idx[i+4:i] dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := idx[i+7:i] FI ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" and "b" across lanes using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*8 IF k[j] off := 8*idx[i+4:i] dst[i+7:i] := idx[i+5] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst". FOR j := 0 to 15 i := j*8 off := 8*idx[i+3:i] dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off] ENDFOR dst[MAX:128] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] off := 8*idx[i+3:i] dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR dst[MAX:128] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] off := 8*idx[i+3:i] dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := idx[i+7:i] FI ENDFOR dst[MAX:128] := 0
AVX512_VBMI AVX512VL Swizzle Shuffle 8-bit integers in "a" and "b" using the corresponding selector and index in "idx", and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*8 IF k[j] off := 8*idx[i+3:i] dst[i+7:i] := idx[i+4] ? b[off+7:off] : a[off+7:off] ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 64-bits in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> (c[i+63:i] & 63) ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 32-bits in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> (c[i+31:i] & 31) ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by the amount specified in the corresponding element of "c", and store the lower 16-bits in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> (c[i+15:i] & 15) ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "b" and "a" producing an intermediate 128-bit result. Shift the result right by "imm8" bits, and store the lower 64-bits in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ((b[i+63:i] << 64)[127:0] | a[i+63:i]) >> imm8[5:0] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "b" and "a" producing an intermediate 64-bit result. Shift the result right by "imm8" bits, and store the lower 32-bits in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ((b[i+31:i] << 32)[63:0] | a[i+31:i]) >> imm8[4:0] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst". FOR j := 0 to 31 i := j*16 dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst". FOR j := 0 to 15 i := j*16 dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "b" and "a" producing an intermediate 32-bit result. Shift the result right by "imm8" bits, and store the lower 16-bits in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := ((b[i+15:i] << 16)[31:0] | a[i+15:i]) >> imm8[3:0] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst". FOR j := 0 to 7 i := j*64 tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst". FOR j := 0 to 3 i := j*64 tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 64-bits in "dst". FOR j := 0 to 1 i := j*64 tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << (c[i+63:i] & 63) dst[i+63:i] := tmp[127:64] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst". FOR j := 0 to 15 i := j*32 tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst". FOR j := 0 to 7 i := j*32 tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 32-bits in "dst". FOR j := 0 to 3 i := j*32 tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << (c[i+31:i] & 31) dst[i+31:i] := tmp[63:32] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst". FOR j := 0 to 31 i := j*16 tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst". FOR j := 0 to 15 i := j*16 tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by the amount specified in the corresponding element of "c", and store the upper 16-bits in "dst". FOR j := 0 to 7 i := j*16 tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << (c[i+15:i] & 15) dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst"). FOR j := 0 to 7 i := j*64 tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst"). FOR j := 0 to 3 i := j*64 tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 1 i := j*64 IF k[j] tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 64-bit integers in "a" and "b" producing an intermediate 128-bit result. Shift the result left by "imm8" bits, and store the upper 64-bits in "dst"). FOR j := 0 to 1 i := j*64 tmp[127:0] := ((a[i+63:i] << 64)[127:0] | b[i+63:i]) << imm8[5:0] dst[i+63:i] := tmp[127:64] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst". FOR j := 0 to 15 i := j*32 tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst". FOR j := 0 to 7 i := j*32 tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 i := j*32 IF k[j] tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 32-bit integers in "a" and "b" producing an intermediate 64-bit result. Shift the result left by "imm8" bits, and store the upper 32-bits in "dst". FOR j := 0 to 3 i := j*32 tmp[63:0] := ((a[i+31:i] << 32)[63:0] | b[i+31:i]) << imm8[4:0] dst[i+31:i] := tmp[63:32] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 31 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst"). FOR j := 0 to 31 i := j*16 tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst"). FOR j := 0 to 15 i := j*16 tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*16 IF k[j] tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Shift Concatenate packed 16-bit integers in "a" and "b" producing an intermediate 32-bit result. Shift the result left by "imm8" bits, and store the upper 16-bits in "dst"). FOR j := 0 to 7 i := j*16 tmp[31:0] := ((a[i+15:i] << 16)[31:0] | b[i+15:i]) << imm8[3:0] dst[i+15:i] := tmp[31:16] ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Load Swizzle Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m] m := m + 16 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Load Swizzle Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m] m := m + 16 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Swizzle Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[m+15:m] m := m + 16 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Swizzle Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 31 i := j*16 IF k[j] dst[i+15:i] := a[m+15:m] m := m + 16 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Load Swizzle Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m] m := m + 16 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Load Swizzle Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m] m := m + 16 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[m+15:m] m := m + 16 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*16 IF k[j] dst[i+15:i] := a[m+15:m] m := m + 16 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Load Swizzle Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m] m := m + 16 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Load Swizzle Load contiguous active 16-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := MEM[mem_addr+m+15:mem_addr+m] m := m + 16 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[m+15:m] m := m + 16 ELSE dst[i+15:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Load contiguous active 16-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 7 i := j*16 IF k[j] dst[i+15:i] := a[m+15:m] m := m + 16 ELSE dst[i+15:i] := src[i+15:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Load Swizzle Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m] m := m + 8 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Load Swizzle Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m] m := m + 8 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Load Swizzle Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m] m := m + 8 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Load Swizzle Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m] m := m + 8 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Load Swizzle Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m] m := m + 8 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Load Swizzle Load contiguous active 8-bit integers from unaligned memory at "mem_addr" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := MEM[mem_addr+m+7:mem_addr+m] m := m + 8 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Swizzle Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[m+7:m] m := m + 8 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 Swizzle Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 63 i := j*8 IF k[j] dst[i+7:i] := a[m+7:m] m := m + 8 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[m+7:m] m := m + 8 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 31 i := j*8 IF k[j] dst[i+7:i] := a[m+7:m] m := m + 8 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[m+7:m] m := m + 8 ELSE dst[i+7:i] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Load contiguous active 8-bit integers from "a" (those with their respective bit set in mask "k"), and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). m := 0 FOR j := 0 to 15 i := j*8 IF k[j] dst[i+7:i] := a[m+7:m] m := m + 8 ELSE dst[i+7:i] := src[i+7:i] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VBMI2 Store Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 16 m := base_addr FOR j := 0 to 31 i := j*16 IF k[j] MEM[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR
Integer AVX512_VBMI2 AVX512VL Store Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 16 m := base_addr FOR j := 0 to 15 i := j*16 IF k[j] MEM[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR
Integer AVX512_VBMI2 AVX512VL Store Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 16 m := base_addr FOR j := 0 to 7 i := j*16 IF k[j] MEM[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR
Integer AVX512_VBMI2 Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 16 m := 0 FOR j := 0 to 31 i := j*16 IF k[j] dst[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR dst[511:m] := 0 dst[MAX:512] := 0
Integer AVX512_VBMI2 Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 16 m := 0 FOR j := 0 to 31 i := j*16 IF k[j] dst[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR dst[511:m] := src[511:m] dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 16 m := 0 FOR j := 0 to 15 i := j*16 IF k[j] dst[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR dst[255:m] := 0 dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 16 m := 0 FOR j := 0 to 15 i := j*16 IF k[j] dst[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR dst[255:m] := src[255:m] dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 16 m := 0 FOR j := 0 to 7 i := j*16 IF k[j] dst[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR dst[127:m] := 0 dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Contiguously store the active 16-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 16 m := 0 FOR j := 0 to 7 i := j*16 IF k[j] dst[m+size-1:m] := a[i+15:i] m := m + size FI ENDFOR dst[127:m] := src[127:m] dst[MAX:128] := 0
Integer AVX512_VBMI2 Store Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 8 m := base_addr FOR j := 0 to 63 i := j*8 IF k[j] MEM[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR
Integer AVX512_VBMI2 AVX512VL Store Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 8 m := base_addr FOR j := 0 to 31 i := j*8 IF k[j] MEM[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR
Integer AVX512_VBMI2 AVX512VL Store Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to unaligned memory at "base_addr". size := 8 m := base_addr FOR j := 0 to 15 i := j*8 IF k[j] MEM[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR
Integer AVX512_VBMI2 Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 8 m := 0 FOR j := 0 to 63 i := j*8 IF k[j] dst[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR dst[511:m] := 0 dst[MAX:512] := 0
Integer AVX512_VBMI2 Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 8 m := 0 FOR j := 0 to 63 i := j*8 IF k[j] dst[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR dst[511:m] := src[511:m] dst[MAX:512] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 8 m := 0 FOR j := 0 to 31 i := j*8 IF k[j] dst[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR dst[255:m] := 0 dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 8 m := 0 FOR j := 0 to 31 i := j*8 IF k[j] dst[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR dst[255:m] := src[255:m] dst[MAX:256] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in zeromask "k") to "dst", and set the remaining elements to zero. size := 8 m := 0 FOR j := 0 to 15 i := j*8 IF k[j] dst[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR dst[127:m] := 0 dst[MAX:128] := 0
Integer AVX512_VBMI2 AVX512VL Swizzle Contiguously store the active 8-bit integers in "a" (those with their respective bit set in writemask "k") to "dst", and pass through the remaining elements from "src". size := 8 m := 0 FOR j := 0 to 15 i := j*8 IF k[j] dst[m+size-1:m] := a[i+7:i] m := m + size FI ENDFOR dst[127:m] := src[127:m] dst[MAX:128] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst". FOR j := 0 to 15 tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst". FOR j := 0 to 7 tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst". FOR j := 0 to 3 tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2) ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst". FOR j := 0 to 15 tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst". FOR j := 0 to 7 tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 2 adjacent pairs of signed 16-bit integers in "a" with corresponding 16-bit integers in "b", producing 2 intermediate signed 32-bit results. Sum these 2 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst". FOR j := 0 to 3 tmp1.dword := SignExtend32(a.word[2*j]) * SignExtend32(b.word[2*j]) tmp2.dword := SignExtend32(a.word[2*j+1]) * SignExtend32(b.word[2*j+1]) dst.dword[j] := src.dword[j] + tmp1 + tmp2 ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst". FOR j := 0 to 15 tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst". FOR j := 0 to 7 tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src" using signed saturation, and store the packed 32-bit results in "dst". FOR j := 0 to 3 tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := Saturate32(src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4) ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst". FOR j := 0 to 15 tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ENDFOR dst[MAX:512] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst". FOR j := 0 to 7 tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ENDFOR dst[MAX:256] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ELSE dst.dword[j] := 0 FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 3 IF k[j] tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ELSE dst.dword[j] := src.dword[j] FI ENDFOR dst[MAX:128] := 0
Integer AVX512_VNNI AVX512VL Arithmetic Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in "a" with corresponding signed 8-bit integers in "b", producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in "src", and store the packed 32-bit results in "dst". FOR j := 0 to 3 tmp1.word := Signed(ZeroExtend16(a.byte[4*j]) * SignExtend16(b.byte[4*j])) tmp2.word := Signed(ZeroExtend16(a.byte[4*j+1]) * SignExtend16(b.byte[4*j+1])) tmp3.word := Signed(ZeroExtend16(a.byte[4*j+2]) * SignExtend16(b.byte[4*j+2])) tmp4.word := Signed(ZeroExtend16(a.byte[4*j+3]) * SignExtend16(b.byte[4*j+3])) dst.dword[j] := src.dword[j] + tmp1 + tmp2 + tmp3 + tmp4 ENDFOR dst[MAX:128] := 0
Integer AVX512_VP2INTERSECT AVX512VL Mask Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers. MEM[k1+7:k1] := 0 MEM[k2+7:k2] := 0 FOR i := 0 TO 3 FOR j := 0 TO 3 match := (a.dword[i] == b.dword[j] ? 1 : 0) MEM[k1+7:k1].bit[i] |= match MEM[k2+7:k2].bit[j] |= match ENDFOR ENDFOR
Integer AVX512_VP2INTERSECT AVX512VL Mask Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers. MEM[k1+7:k1] := 0 MEM[k2+7:k2] := 0 FOR i := 0 TO 7 FOR j := 0 TO 7 match := (a.dword[i] == b.dword[j] ? 1 : 0) MEM[k1+7:k1].bit[i] |= match MEM[k2+7:k2].bit[j] |= match ENDFOR ENDFOR
Integer AVX512_VP2INTERSECT AVX512F Mask Compute intersection of packed 32-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers. MEM[k1+15:k1] := 0 MEM[k2+15:k2] := 0 FOR i := 0 TO 15 FOR j := 0 TO 15 match := (a.dword[i] == b.dword[j] ? 1 : 0) MEM[k1+15:k1].bit[i] |= match MEM[k2+15:k2].bit[j] |= match ENDFOR ENDFOR
Integer AVX512_VP2INTERSECT AVX512VL Mask Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers. MEM[k1+7:k1] := 0 MEM[k2+7:k2] := 0 FOR i := 0 TO 1 FOR j := 0 TO 1 match := (a.qword[i] == b.qword[j] ? 1 : 0) MEM[k1+7:k1].bit[i] |= match MEM[k2+7:k2].bit[j] |= match ENDFOR ENDFOR
Integer AVX512_VP2INTERSECT AVX512VL Mask Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers. MEM[k1+7:k1] := 0 MEM[k2+7:k2] := 0 FOR i := 0 TO 3 FOR j := 0 TO 3 match := (a.qword[i] == b.qword[j] ? 1 : 0) MEM[k1+7:k1].bit[i] |= match MEM[k2+7:k2].bit[j] |= match ENDFOR ENDFOR
Integer AVX512_VP2INTERSECT AVX512F Mask Compute intersection of packed 64-bit integer vectors "a" and "b", and store indication of match in the corresponding bit of two mask registers specified by "k1" and "k2". A match in corresponding elements of "a" and "b" is indicated by a set bit in the corresponding bit of the mask registers. MEM[k1+7:k1] := 0 MEM[k2+7:k2] := 0 FOR i := 0 TO 7 FOR j := 0 TO 7 match := (a.qword[i] == b.qword[j] ? 1 : 0) MEM[k1+7:k1].bit[i] |= match MEM[k2+7:k2].bit[j] |= match ENDFOR ENDFOR
Integer BMI1 Bit Manipulation Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start". tmp[511:0] := a dst[31:0] := ZeroExtend32(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
Integer BMI1 Bit Manipulation Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control". start := control[7:0] len := control[15:8] tmp[511:0] := a dst[31:0] := ZeroExtend32(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
Integer BMI1 Bit Manipulation Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start". tmp[511:0] := a dst[63:0] := ZeroExtend64(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
Integer BMI1 Bit Manipulation Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control".. start := control[7:0] len := control[15:8] tmp[511:0] := a dst[63:0] := ZeroExtend64(tmp[(start[7:0] + len[7:0] - 1):start[7:0]])
Integer BMI1 Bit Manipulation Extract the lowest set bit from unsigned 32-bit integer "a" and set the corresponding bit in "dst". All other bits in "dst" are zeroed, and all bits are zeroed if no bits are set in "a". dst := (-a) AND a
Integer BMI1 Bit Manipulation Extract the lowest set bit from unsigned 64-bit integer "a" and set the corresponding bit in "dst". All other bits in "dst" are zeroed, and all bits are zeroed if no bits are set in "a". dst := (-a) AND a
Integer BMI1 Bit Manipulation Set all the lower bits of "dst" up to and including the lowest set bit in unsigned 32-bit integer "a". dst := (a - 1) XOR a
Integer BMI1 Bit Manipulation Set all the lower bits of "dst" up to and including the lowest set bit in unsigned 64-bit integer "a". dst := (a - 1) XOR a
Integer BMI1 Bit Manipulation Copy all bits from unsigned 32-bit integer "a" to "dst", and reset (set to 0) the bit in "dst" that corresponds to the lowest set bit in "a". dst := (a - 1) AND a
Integer BMI1 Bit Manipulation Copy all bits from unsigned 64-bit integer "a" to "dst", and reset (set to 0) the bit in "dst" that corresponds to the lowest set bit in "a". dst := (a - 1) AND a
Integer BMI1 Bit Manipulation Compute the bitwise NOT of 32-bit integer "a" and then AND with b, and store the results in dst. dst[31:0] := ((NOT a[31:0]) AND b[31:0])
Integer BMI1 Bit Manipulation Compute the bitwise NOT of 64-bit integer "a" and then AND with b, and store the results in dst. dst[63:0] := ((NOT a[63:0]) AND b[63:0])
Integer BMI1 Bit Manipulation Count the number of trailing zero bits in unsigned 32-bit integer "a", and return that count in "dst". tmp := 0 dst := 0 DO WHILE ((tmp < 32) AND a[tmp] == 0) tmp := tmp + 1 dst := dst + 1 OD
Integer BMI1 Bit Manipulation Count the number of trailing zero bits in unsigned 64-bit integer "a", and return that count in "dst". tmp := 0 dst := 0 DO WHILE ((tmp < 64) AND a[tmp] == 0) tmp := tmp + 1 dst := dst + 1 OD
Integer BMI1 Bit Manipulation Count the number of trailing zero bits in unsigned 32-bit integer "a", and return that count in "dst". tmp := 0 dst := 0 DO WHILE ((tmp < 32) AND a[tmp] == 0) tmp := tmp + 1 dst := dst + 1 OD
Integer BMI1 Bit Manipulation Count the number of trailing zero bits in unsigned 64-bit integer "a", and return that count in "dst". tmp := 0 dst := 0 DO WHILE ((tmp < 64) AND a[tmp] == 0) tmp := tmp + 1 dst := dst + 1 OD
Integer BMI2 Bit Manipulation Copy all bits from unsigned 32-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index". n := index[7:0] dst := a IF (n < 32) dst[31:n] := 0 FI
Integer BMI2 Bit Manipulation Copy all bits from unsigned 64-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index". n := index[7:0] dst := a IF (n < 64) dst[63:n] := 0 FI
Integer BMI2 Bit Manipulation Deposit contiguous low bits from unsigned 32-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero. tmp := a dst := 0 m := 0 k := 0 DO WHILE m < 32 IF mask[m] == 1 dst[m] := tmp[k] k := k + 1 FI m := m + 1 OD
Integer BMI2 Bit Manipulation Deposit contiguous low bits from unsigned 64-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero. tmp := a dst := 0 m := 0 k := 0 DO WHILE m < 64 IF mask[m] == 1 dst[m] := tmp[k] k := k + 1 FI m := m + 1 OD
Integer BMI2 Bit Manipulation Extract bits from unsigned 32-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero. tmp := a dst := 0 m := 0 k := 0 DO WHILE m < 32 IF mask[m] == 1 dst[k] := tmp[m] k := k + 1 FI m := m + 1 OD
Integer BMI2 Bit Manipulation Extract bits from unsigned 64-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero. tmp := a dst := 0 m := 0 k := 0 DO WHILE m < 64 IF mask[m] == 1 dst[k] := tmp[m] k := k + 1 FI m := m + 1 OD
Integer BMI2 Arithmetic Multiply unsigned 32-bit integers "a" and "b", store the low 32-bits of the result in "dst", and store the high 32-bits in "hi". This does not read or write arithmetic flags. dst[31:0] := (a * b)[31:0] MEM[hi+31:hi] := (a * b)[63:32]
Integer BMI2 Arithmetic Multiply unsigned 64-bit integers "a" and "b", store the low 64-bits of the result in "dst", and store the high 64-bits in "hi". This does not read or write arithmetic flags. dst[63:0] := (a * b)[63:0] MEM[hi+63:hi] := (a * b)[127:64]
CET_SS Miscellaneous Increment the shadow stack pointer by 4 times the value specified in bits [7:0] of "a". SSP := SSP + a[7:0] * 4
CET_SS Miscellaneous Increment the shadow stack pointer by 8 times the value specified in bits [7:0] of "a". SSP := SSP + a[7:0] * 8
CET_SS Miscellaneous Read the low 32-bits of the current shadow stack pointer, and store the result in "dst". dst := SSP[31:0]
CET_SS Miscellaneous Read the current shadow stack pointer, and store the result in "dst". dst := SSP[63:0]
CET_SS Miscellaneous Save the previous shadow stack pointer context.
CET_SS Miscellaneous Restore the saved shadow stack pointer from the shadow stack restore token previously created on shadow stack by saveprevssp.
CET_SS Miscellaneous Write 32-bit value in "val" to a shadow stack page in memory specified by "p".
CET_SS Miscellaneous Write 64-bit value in "val" to a shadow stack page in memory specified by "p".
CET_SS Miscellaneous Write 32-bit value in "val" to a user shadow stack page in memory specified by "p".
CET_SS Miscellaneous Write 64-bit value in "val" to a user shadow stack page in memory specified by "p".
CET_SS Miscellaneous Mark shadow stack pointed to by IA32_PL0_SSP as busy.
CET_SS Miscellaneous Mark shadow stack pointed to by "p" as not busy.
CET_SS Miscellaneous If CET is enabled, read the low 32-bits of the current shadow stack pointer, and store the result in "dst". Otherwise return 0. dst := SSP[31:0]
CET_SS Miscellaneous If CET is enabled, read the current shadow stack pointer, and store the result in "dst". Otherwise return 0. dst := SSP[63:0]
CET_SS Miscellaneous Increment the shadow stack pointer by 4 times the value specified in bits [7:0] of "a". SSP := SSP + a[7:0] * 4
CLDEMOTE Miscellaneous Hint to hardware that the cache line that contains "p" should be demoted from the cache closest to the processor core to a level more distant from the processor core.
CLFLUSHOPT General Support Invalidate and flush the cache line that contains "p" from all levels of the cache hierarchy.
CLWB General Support Write back to memory the cache line that contains "p" from any level of the cache hierarchy in the cache coherence domain.
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] * b[63:0]) + c[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := (a[31:0] * b[31:0]) + c[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] * b[63:0]) - c[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := (a[31:0] * b[31:0]) - c[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". FOR j := 0 to 3 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := (a[i+63:i] * b[i+63:i]) + c[i+63:i] ELSE dst[i+63:i] := (a[i+63:i] * b[i+63:i]) - c[i+63:i] FI ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". FOR j := 0 to 7 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := (a[i+31:i] * b[i+31:i]) - c[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) + c[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := -(a[63:0] * b[63:0]) + c[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := -(a[31:0] * b[31:0]) + c[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". FOR j := 0 to 3 i := j*64 dst[i+63:i] := -(a[i+63:i] * b[i+63:i]) - c[i+63:i] ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ENDFOR dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". FOR j := 0 to 7 i := j*32 dst[i+31:i] := -(a[i+31:i] * b[i+31:i]) - c[i+31:i] ENDFOR dst[MAX:256] := 0
Floating Point FMA Arithmetic Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := -(a[63:0] * b[63:0]) - c[63:0] dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point FMA Arithmetic Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := -(a[31:0] * b[31:0]) - c[31:0] dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point FP16C Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 7 i := j*32 m := j*16 dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ENDFOR dst[MAX:256] := 0
Floating Point FP16C Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". [sae_note] FOR j := 0 to 7 i := 16*j l := 32*j dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ENDFOR dst[MAX:128] := 0
Floating Point FP16C Convert Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*32 m := j*16 dst[i+31:i] := Convert_FP16_To_FP32(a[m+15:m]) ENDFOR dst[MAX:128] := 0
Floating Point FP16C Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed half-precision (16-bit) floating-point elements, and store the results in "dst". [sae_note] FOR j := 0 to 3 i := 16*j l := 32*j dst[i+15:i] := Convert_FP32_To_FP16(a[l+31:l]) ENDFOR dst[MAX:64] := 0
Integer FSGSBASE General Support Read the FS segment base register and store the 32-bit result in "dst". dst[31:0] := FS_Segment_Base_Register dst[63:32] := 0
Integer FSGSBASE General Support Read the FS segment base register and store the 64-bit result in "dst". dst[63:0] := FS_Segment_Base_Register
Integer FSGSBASE General Support Read the GS segment base register and store the 32-bit result in "dst". dst[31:0] := GS_Segment_Base_Register dst[63:32] := 0
Integer FSGSBASE General Support Read the GS segment base register and store the 64-bit result in "dst". dst[63:0] := GS_Segment_Base_Register
Integer FSGSBASE General Support Write the unsigned 32-bit integer "a" to the FS segment base register. FS_Segment_Base_Register[31:0] := a[31:0] FS_Segment_Base_Register[63:32] := 0
Integer FSGSBASE General Support Write the unsigned 64-bit integer "a" to the FS segment base register. FS_Segment_Base_Register[63:0] := a[63:0]
Integer FSGSBASE General Support Write the unsigned 32-bit integer "a" to the GS segment base register. GS_Segment_Base_Register[31:0] := a[31:0] GS_Segment_Base_Register[63:32] := 0
Integer FSGSBASE General Support Write the unsigned 64-bit integer "a" to the GS segment base register. GS_Segment_Base_Register[63:0] := a[63:0]
FXSR OS-Targeted Reload the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image at "mem_addr". This data should have been written to memory previously using the FXSAVE instruction, and in the same format as required by the operating mode. "mem_addr" must be aligned on a 16-byte boundary. state_x87_fpu_mmx_sse := fxrstor(MEM[mem_addr+512*8:mem_addr])
FXSR OS-Targeted Reload the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image at "mem_addr". This data should have been written to memory previously using the FXSAVE64 instruction, and in the same format as required by the operating mode. "mem_addr" must be aligned on a 16-byte boundary. state_x87_fpu_mmx_sse := fxrstor64(MEM[mem_addr+512*8:mem_addr])
FXSR OS-Targeted Save the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory location at "mem_addr". The layout of the 512-byte region depends on the operating mode. Bytes [511:464] are available for software use and will not be overwritten by the processor. MEM[mem_addr+512*8:mem_addr] := fxsave(state_x87_fpu_mmx_sse)
FXSR OS-Targeted Save the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory location at "mem_addr". The layout of the 512-byte region depends on the operating mode. Bytes [511:464] are available for software use and will not be overwritten by the processor. MEM[mem_addr+512*8:mem_addr] := fxsave64(state_x87_fpu_mmx_sse)
Integer GFNI AVX512F Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 63 IF k[j] dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ELSE dst.byte[j] := 0 FI ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512F Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 63 IF k[j] dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ELSE dst.byte[j] := src.byte[j] FI ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512F Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 63 dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512VL Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 31 IF k[j] dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ELSE dst.byte[j] := 0 FI ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 31 IF k[j] dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ELSE dst.byte[j] := src.byte[j] FI ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 31 dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 15 IF k[j] dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ELSE dst.byte[j] := 0 FI ENDFOR dst[MAX:128] := 0
Integer GFNI AVX512VL Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst" using writemask "k" (elements are copied from "src"" when the corresponding mask bit is not set). The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 15 IF k[j] dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ELSE dst.byte[j] := src.byte[j] FI ENDFOR dst[MAX:128] := 0
Integer GFNI AVX512VL Arithmetic Multiply the packed 8-bit integers in "a" and "b" in the finite field GF(2^8), and store the results in "dst". The field GF(2^8) is represented in polynomial representation with the reduction polynomial x^8 + x^4 + x^3 + x + 1. DEFINE gf2p8mul_byte(src1byte, src2byte) { tword := 0 FOR i := 0 to 7 IF src2byte.bit[i] tword := tword XOR (src1byte << i) FI ENDFOR FOR i := 14 downto 8 p := 0x11B << (i-8) IF tword.bit[i] tword := tword XOR p FI ENDFOR RETURN tword.byte[0] } FOR j := 0 TO 15 dst.byte[j] := gf2p8mul_byte(a.byte[j], b.byte[j]) ENDFOR dst[MAX:128] := 0
Integer GFNI AVX512F Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 7 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := 0 FI ENDFOR ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512F Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 7 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := src.qword[j].byte[i] FI ENDFOR ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512F Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst". DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 7 FOR i := 0 to 7 dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ENDFOR ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512VL Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 3 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := 0 FI ENDFOR ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 3 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := src.qword[j].byte[i] FI ENDFOR ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst". DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 3 FOR i := 0 to 7 dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ENDFOR ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 1 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := 0 FI ENDFOR ENDFOR dst[MAX:128] := 0
Integer GFNI AVX512VL Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 1 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := src.qword[j].byte[i] FI ENDFOR ENDFOR dst[MAX:128] := 0
Integer GFNI AVX512VL Arithmetic Compute an affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. Store the packed 8-bit results in "dst". DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND src1byte) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 1 FOR i := 0 to 7 dst.qword[j].byte[i] := affine_byte(A.qword[j], x.qword[j].byte[i], b) ENDFOR ENDFOR dst[MAX:128] := 0
Integer GFNI AVX512F Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 7 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := 0 FI ENDFOR ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512F Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 7 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := src.qword[j].byte[b] FI ENDFOR ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512F Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst". DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 7 FOR i := 0 to 7 dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ENDFOR ENDFOR dst[MAX:512] := 0
Integer GFNI AVX512VL Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 3 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := 0 FI ENDFOR ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 3 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := src.qword[j].byte[i] FI ENDFOR ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst". DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 3 FOR i := 0 to 7 dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ENDFOR ENDFOR dst[MAX:256] := 0
Integer GFNI AVX512VL Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using zeromask "k" (elements are zeroed out when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 1 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := 0 FI ENDFOR ENDFOR dst[MAX:128] := 0
Integer GFNI AVX512VL Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 1 FOR i := 0 to 7 IF k[j*8+i] dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ELSE dst.qword[j].byte[i] := src.qword[j].byte[i] FI ENDFOR ENDFOR dst[MAX:128] := 0
Integer GFNI AVX512VL Arithmetic Compute an inverse affine transformation in the Galois Field 2^8. An affine transformation is defined by "A" * "x" + "b", where "A" represents an 8 by 8 bit matrix, "x" represents an 8-bit vector, and "b" is a constant immediate byte. The inverse of the 8-bit values in "x" is defined with respect to the reduction polynomial x^8 + x^4 + x^3 + x + 1. Store the packed 8-bit results in "dst". DEFINE parity(x) { t := 0 FOR i := 0 to 7 t := t XOR x.bit[i] ENDFOR RETURN t } DEFINE affine_inverse_byte(tsrc2qw, src1byte, imm8) { FOR i := 0 to 7 retbyte.bit[i] := parity(tsrc2qw.byte[7-i] AND inverse(src1byte)) XOR imm8.bit[i] ENDFOR RETURN retbyte } FOR j := 0 TO 1 FOR i := 0 to 7 dst.qword[j].byte[i] := affine_inverse_byte(A.qword[j], x.qword[j].byte[i], b) ENDFOR ENDFOR dst[MAX:128] := 0
INVPCID OS-Targeted Invalidate mappings in the Translation Lookaside Buffers (TLBs) and paging-structure caches for the processor context identifier (PCID) specified by "descriptor" based on the invalidation type specified in "type". The PCID "descriptor" is specified as a 16-byte memory operand (with no alignment restrictions) where bits [11:0] specify the PCID, and bits [127:64] specify the linear address; bits [63:12] are reserved. The types supported are: 0) Individual-address invalidation: If "type" is 0, the logical processor invalidates mappings for a single linear address and tagged with the PCID specified in "descriptor", except global translations. The instruction may also invalidate global translations, mappings for other linear addresses, or mappings tagged with other PCIDs. 1) Single-context invalidation: If "type" is 1, the logical processor invalidates all mappings tagged with the PCID specified in "descriptor" except global translations. In some cases, it may invalidate mappings for other PCIDs as well. 2) All-context invalidation: If "type" is 2, the logical processor invalidates all mappings tagged with any PCID. 3) All-context invalidation, retaining global translations: If "type" is 3, the logical processor invalidates all mappings tagged with any PCID except global translations, ignoring "descriptor". The instruction may also invalidate global translations as well. CASE type[1:0] OF 0: // individual-address invalidation retaining global translations OP_PCID := MEM[descriptor+11:descriptor] ADDR := MEM[descriptor+127:descriptor+64] BREAK 1: // single PCID invalidation retaining globals OP_PCID := MEM[descriptor+11:descriptor] // invalidate all mappings tagged with OP_PCID except global translations BREAK 2: // all PCID invalidation // invalidate all mappings tagged with any PCID BREAK 3: // all PCID invalidation retaining global translations // invalidate all mappings tagged with any PCID except global translations BREAK ESAC
KNCNI General Support Fetch the line of data from memory that contains address "p" to a location in the cache hierarchy specified by the locality hint "i".
Mask KNCNI Mask Compute the bitwise NOT of 16-bit masks "a" and then AND with "b", and store the result in "k". k[15:0] := (NOT a[15:0]) AND b[15:0] k[MAX:16] := 0
Mask KNCNI Mask Compute the bitwise AND of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] AND b[15:0] k[MAX:16] := 0
Mask KNCNI Mask Copy 16-bit mask "a" to "k". k[15:0] := a[15:0] k[MAX:16] := 0
Mask KNCNI Mask Compute the bitwise NOT of 16-bit mask "a", and store the result in "k". k[15:0] := NOT a[15:0] k[MAX:16] := 0
Mask KNCNI Mask Compute the bitwise OR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] OR b[15:0] k[MAX:16] := 0
Mask KNCNI Mask Compute the bitwise XNOR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := NOT (a[15:0] XOR b[15:0]) k[MAX:16] := 0
Mask KNCNI Mask Compute the bitwise XOR of 16-bit masks "a" and "b", and store the result in "k". k[15:0] := a[15:0] XOR b[15:0] k[MAX:16] := 0
Integer Mask KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in mask vector "k". FOR j := 0 to 15 i := j*32 k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ENDFOR k[MAX:16] := 0
Integer Mask KNCNI Compare Compare packed signed 32-bit integers in "a" and "b" for less-than-or-equal, and store the results in mask vector "k" using zeromask "k1" (elements are zeroed out when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k[j] := ( a[i+31:i] < b[i+31:i] ) ? 1 : 0 ELSE k[j] := 0 FI ENDFOR k[MAX:16] := 0
Floating Point KNCNI Load Depending on "bc", loads 1, 4, or 16 elements of type and size determined by "conv" from memory address "mt" and converts all elements to single-precision (32-bit) floating-point elements, storing the results in "dst". "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 15 i := j*32 CASE bc OF _MM_BROADCAST32_NONE: CASE conv OF _MM_UPCONV_PS_NONE: n := j*32 dst[i+31:i] := addr[n+31:n] _MM_UPCONV_PS_FLOAT16: n := j*16 dst[i+31:i] := Convert_FP16_To_FP32(addr[n+15:n]) _MM_UPCONV_PS_UINT8: n := j*8 dst[i+31:i] := Convert_UInt8_To_FP32(addr[n+7:n]) _MM_UPCONV_PS_SINT8: n := j*8 dst[i+31:i] := Convert_Int8_To_FP32(addr[n+7:n]) _MM_UPCONV_PS_UINT16: n := j*16 dst[i+31:i] := Convert_UInt16_To_FP32(addr[n+15:n]) _MM_UPCONV_PS_SINT16: n := j*16 dst[i+31:i] := Convert_Int16_To_FP32(addr[n+15:n]) ESAC _MM_BROADCAST_1X16: CASE conv OF _MM_UPCONV_PS_NONE: n := j*32 dst[i+31:i] := addr[31:0] _MM_UPCONV_PS_FLOAT16: n := j*16 dst[i+31:i] := Convert_FP16_To_FP32(addr[15:0]) _MM_UPCONV_PS_UINT8: n := j*8 dst[i+31:i] := Convert_UInt8_To_FP32(addr[7:0]) _MM_UPCONV_PS_SINT8: n := j*8 dst[i+31:i] := Convert_Int8_To_FP32(addr[7:0]) _MM_UPCONV_PS_UINT16: n := j*16 dst[i+31:i] := Convert_UInt16_To_FP32(addr[15:0]) _MM_UPCONV_PS_SINT16: n := j*16 dst[i+31:i] := Convert_Int16_To_FP32(addr[15:0]) ESAC _MM_BROADCAST_4X16: mod := j%4 CASE conv OF _MM_UPCONV_PS_NONE: n := mod*32 dst[i+31:i] := addr[n+31:n] _MM_UPCONV_PS_FLOAT16: n := mod*16 dst[i+31:i] := Convert_FP16_To_FP32(addr[n+15:n]) _MM_UPCONV_PS_UINT8: n := mod*8 dst[i+31:i] := Convert_UInt8_To_FP32(addr[n+7:n]) _MM_UPCONV_PS_SINT8: n := mod*8 dst[i+31:i] := Convert_Int8_To_FP32(addr[n+7:n]) _MM_UPCONV_PS_UINT16: n := mod*16 dst[i+31:i] := Convert_UInt16_To_FP32(addr[n+15:n]) _MM_UPCONV_PS_SINT16: n := mod*16 dst[i+31:i] := Convert_Int16_To_FP32(addr[n+15:n]) ESAC ESAC ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Depending on "bc", loads 1, 4, or 16 elements of type and size determined by "conv" from memory address "mt" and converts all elements to single-precision (32-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 15 i := j*32 IF k[j] CASE bc OF _MM_BROADCAST32_NONE: CASE conv OF _MM_UPCONV_PS_NONE: n := j*32 dst[i+31:i] := addr[n+31:n] _MM_UPCONV_PS_FLOAT16: n := j*16 dst[i+31:i] := Convert_FP16_To_FP32(addr[n+15:n]) _MM_UPCONV_PS_UINT8: n := j*8 dst[i+31:i] := Convert_UInt8_To_FP32(addr[n+7:n]) _MM_UPCONV_PS_SINT8: n := j*8 dst[i+31:i] := Convert_Int8_To_FP32(addr[n+7:n]) _MM_UPCONV_PS_UINT16: n := j*16 dst[i+31:i] := Convert_UInt16_To_FP32(addr[n+15:n]) _MM_UPCONV_PS_SINT16: n := j*16 dst[i+31:i] := Convert_Int16_To_FP32(addr[n+15:n]) ESAC _MM_BROADCAST_1X16: CASE conv OF _MM_UPCONV_PS_NONE: n := j*32 dst[i+31:i] := addr[31:0] _MM_UPCONV_PS_FLOAT16: n := j*16 dst[i+31:i] := Convert_FP16_To_FP32(addr[15:0]) _MM_UPCONV_PS_UINT8: n := j*8 dst[i+31:i] := Convert_UInt8_To_FP32(addr[7:0]) _MM_UPCONV_PS_SINT8: n := j*8 dst[i+31:i] := Convert_Int8_To_FP32(addr[7:0]) _MM_UPCONV_PS_UINT16: n := j*16 dst[i+31:i] := Convert_UInt16_To_FP32(addr[15:0]) _MM_UPCONV_PS_SINT16: n := j*16 dst[i+31:i] := Convert_Int16_To_FP32(addr[15:0]) ESAC _MM_BROADCAST_4X16: mod := j%4 CASE conv OF _MM_UPCONV_PS_NONE: n := mod*32 dst[i+31:i] := addr[n+31:n] _MM_UPCONV_PS_FLOAT16: n := mod*16 dst[i+31:i] := Convert_FP16_To_FP32(addr[n+15:n]) _MM_UPCONV_PS_UINT8: n := mod*8 dst[i+31:i] := Convert_UInt8_To_FP32(addr[n+7:n]) _MM_UPCONV_PS_SINT8: n := mod*8 dst[i+31:i] := Convert_Int8_To_FP32(addr[n+7:n]) _MM_UPCONV_PS_UINT16: n := mod*16 dst[i+31:i] := Convert_UInt16_To_FP32(addr[n+15:n]) _MM_UPCONV_PS_SINT16: n := mod*16 dst[i+31:i] := Convert_Int16_To_FP32(addr[n+15:n]) ESAC ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Depending on "bc", loads 1, 4, or 16 elements of type and size determined by "conv" from memory address "mt" and converts all elements to 32-bit integer elements, storing the results in "dst". "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 15 i := j*32 CASE bc OF _MM_BROADCAST32_NONE: CASE conv OF _MM_UPCONV_EPI32_NONE: n := j*32 dst[i+31:i] := addr[n+31:n] _MM_UPCONV_EPI32_UINT8: n := j*8 dst[i+31:i] := ZeroExtend32(addr[n+7:n]) _MM_UPCONV_EPI32_SINT8: n := j*8 dst[i+31:i] := SignExtend32(addr[n+7:n]) _MM_UPCONV_EPI32_UINT16: n := j*16 dst[i+31:i] := ZeroExtend32(addr[n+15:n]) _MM_UPCONV_EPI32_SINT16: n := j*16 dst[i+31:i] := SignExtend32(addr[n+15:n]) ESAC _MM_BROADCAST_1X16: CASE conv OF _MM_UPCONV_EPI32_NONE: n := j*32 dst[i+31:i] := addr[31:0] _MM_UPCONV_EPI32_UINT8: n := j*8 dst[i+31:i] := ZeroExtend32(addr[7:0]) _MM_UPCONV_EPI32_SINT8: n := j*8 dst[i+31:i] := SignExtend32(addr[7:0]) _MM_UPCONV_EPI32_UINT16: n := j*16 dst[i+31:i] := ZeroExtend32(addr[15:0]) _MM_UPCONV_EPI32_SINT16: n := j*16 dst[i+31:i] := SignExtend32(addr[15:0]) ESAC _MM_BROADCAST_4X16: mod := j%4 CASE conv OF _MM_UPCONV_EPI32_NONE: n := mod*32 dst[i+31:i] := addr[n+31:n] _MM_UPCONV_EPI32_UINT8: n := mod*8 dst[i+31:i] := ZeroExtend32(addr[n+7:n]) _MM_UPCONV_EPI32_SINT8: n := mod*8 dst[i+31:i] := SignExtend32(addr[n+7:n]) _MM_UPCONV_EPI32_UINT16: n := mod*16 dst[i+31:i] := ZeroExtend32(addr[n+15:n]) _MM_UPCONV_EPI32_SINT16: n := mod*16 dst[i+31:i] := SignExtend32(addr[n+15:n]) ESAC ESAC ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Depending on "bc", loads 1, 4, or 16 elements of type and size determined by "conv" from memory address "mt" and converts all elements to 32-bit integer elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 15 i := j*32 IF k[j] CASE bc OF _MM_BROADCAST32_NONE: CASE conv OF _MM_UPCONV_EPI32_NONE: n := j*32 dst[i+31:i] := addr[n+31:n] _MM_UPCONV_EPI32_UINT8: n := j*8 dst[i+31:i] := ZeroExtend32(addr[n+7:n]) _MM_UPCONV_EPI32_SINT8: n := j*8 dst[i+31:i] := SignExtend32(addr[n+7:n]) _MM_UPCONV_EPI32_UINT16: n := j*16 dst[i+31:i] := ZeroExtend32(addr[n+15:n]) _MM_UPCONV_EPI32_SINT16: n := j*16 dst[i+31:i] := SignExtend32(addr[n+15:n]) ESAC _MM_BROADCAST_1X16: CASE conv OF _MM_UPCONV_EPI32_NONE: n := j*32 dst[i+31:i] := addr[31:0] _MM_UPCONV_EPI32_UINT8: n := j*8 dst[i+31:i] := ZeroExtend32(addr[7:0]) _MM_UPCONV_EPI32_SINT8: n := j*8 dst[i+31:i] := SignExtend32(addr[7:0]) _MM_UPCONV_EPI32_UINT16: n := j*16 dst[i+31:i] := ZeroExtend32(addr[15:0]) _MM_UPCONV_EPI32_SINT16: n := j*16 dst[i+31:i] := SignExtend32(addr[15:0]) ESAC _MM_BROADCAST_4X16: mod := j%4 CASE conv OF _MM_UPCONV_EPI32_NONE: n := mod*32 dst[i+31:i] := addr[n+31:n] _MM_UPCONV_EPI32_UINT8: n := mod*8 dst[i+31:i] := ZeroExtend32(addr[n+7:n]) _MM_UPCONV_EPI32_SINT8: n := mod*8 dst[i+31:i] := SignExtend32(addr[n+7:n]) _MM_UPCONV_EPI32_UINT16: n := mod*16 dst[i+31:i] := ZeroExtend32(addr[n+15:n]) _MM_UPCONV_EPI32_SINT16: n := mod*16 dst[i+31:i] := SignExtend32(addr[n+15:n]) ESAC ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Depending on "bc", loads 1, 4, or 8 elements of type and size determined by "conv" from memory address "mt" and converts all elements to double-precision (64-bit) floating-point elements, storing the results in "dst". "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 7 i := j*64 CASE bc OF _MM_BROADCAST64_NONE: CASE conv OF _MM_UPCONV_PD_NONE: n := j*64 dst[i+63:i] := addr[n+63:n] ESAC _MM_BROADCAST_1X8: CASE conv OF _MM_UPCONV_PD_NONE: n := j*64 dst[i+63:i] := addr[63:0] ESAC _MM_BROADCAST_4X8: mod := j%4 CASE conv OF _MM_UPCONV_PD_NONE: n := mod*64 dst[i+63:i] := addr[n+63:n] ESAC ESAC ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Depending on "bc", loads 1, 4, or 8 elements of type and size determined by "conv" from memory address "mt" and converts all elements to double-precision (64-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 7 i := j*64 IF k[j] CASE bc OF _MM_BROADCAST64_NONE: CASE conv OF _MM_UPCONV_PD_NONE: n := j*64 dst[i+63:i] := addr[n+63:n] ESAC _MM_BROADCAST_1X8: CASE conv OF _MM_UPCONV_PD_NONE: n := j*64 dst[i+63:i] := addr[63:0] ESAC _MM_BROADCAST_4X8: mod := j%4 CASE conv OF _MM_UPCONV_PD_NONE: n := mod*64 dst[i+63:i] := addr[n+63:n] ESAC ESAC ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Depending on "bc", loads 1, 4, or 8 elements of type and size determined by "conv" from memory address "mt" and converts all elements to 64-bit integer elements, storing the results in "dst". "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 7 i := j*64 CASE bc OF _MM_BROADCAST64_NONE: CASE conv OF _MM_UPCONV_EPI64_NONE: n := j*64 dst[i+63:i] := addr[n+63:n] ESAC _MM_BROADCAST_1X8: CASE conv OF _MM_UPCONV_EPI64_NONE: n := j*64 dst[i+63:i] := addr[63:0] ESAC _MM_BROADCAST_4X8: mod := j%4 CASE conv OF _MM_UPCONV_EPI64_NONE: n := mod*64 dst[i+63:i] := addr[n+63:n] ESAC ESAC ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Depending on "bc", loads 1, 4, or 8 elements of type and size determined by "conv" from memory address "mt" and converts all elements to 64-bit integer elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 7 i := j*64 IF k[j] CASE bc OF _MM_BROADCAST64_NONE: CASE conv OF _MM_UPCONV_EPI64_NONE: n := j*64 dst[i+63:i] := addr[n+63:n] ESAC _MM_BROADCAST_1X8: CASE conv OF _MM_UPCONV_EPI64_NONE: n := j*64 dst[i+63:i] := addr[63:0] ESAC _MM_BROADCAST_4X8: mod := j%4 CASE conv OF _MM_UPCONV_EPI64_NONE: n := mod*64 dst[i+63:i] := addr[n+63:n] ESAC ESAC ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Swizzle Performs a swizzle transformation of each of the four groups of packed 4xsingle-precision (32-bit) floating-point elements in "v" using swizzle parameter "s", storing the results in "dst". CASE s OF _MM_SWIZ_REG_NONE: dst[511:0] := v[511:0] _MM_SWIZ_REG_DCBA: dst[511:0] := v[511:0] _MM_SWIZ_REG_CDAB: FOR j := 0 to 7 i := j*64 dst[i+31:i] := v[i+63:i+32] dst[i+63:i+32] := v[i+31:i] ENDFOR _MM_SWIZ_REG_BADC: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+95:i+64] dst[i+63:i+32] := v[i+127:i+96] dst[i+95:i+64] := v[i+31:i] dst[i+127:i+96] := v[i+63:i+32] ENDFOR _MM_SWIZ_REG_AAAA: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+31:i] dst[i+63:i+32] := v[i+31:i] dst[i+95:i+64] := v[i+31:i] dst[i+127:i+96] := v[i+31:i] ENDFOR _MM_SWIZ_REG_BBBB: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+63:i+32] dst[i+63:i+32] := v[i+63:i+32] dst[i+95:i+64] := v[i+63:i+32] dst[i+127:i+96] := v[i+63:i+32] ENDFOR _MM_SWIZ_REG_CCCC: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+95:i+64] dst[i+63:i+32] := v[i+95:i+64] dst[i+95:i+64] := v[i+95:i+64] dst[i+127:i+96] := v[i+95:i+64] ENDFOR _MM_SWIZ_REG_DDDD: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+127:i+96] dst[i+63:i+32] := v[i+127:i+96] dst[i+95:i+64] := v[i+127:i+96] dst[i+127:i+96] := v[i+127:i+96] ENDFOR _MM_SWIZ_REG_DACB: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+63:i+32] dst[i+63:i+32] := v[i+95:i+64] dst[i+95:i+64] := v[i+31:i] dst[i+127:i+96] := v[i+127:i+96] ENDFOR ESAC dst[MAX:512] := 0
Floating Point KNCNI Swizzle Performs a swizzle transformation of each of the two groups of packed 4x double-precision (64-bit) floating-point elements in "v" using swizzle parameter "s", storing the results in "dst". CASE s OF _MM_SWIZ_REG_NONE: dst[511:0] := v[511:0] _MM_SWIZ_REG_DCBA: dst[511:0] := v[511:0] _MM_SWIZ_REG_CDAB: FOR j := 0 to 3 i := j*64 dst[i+63:i] := v[i+127:i+64] dst[i+127:i+64] := v[i+63:i] ENDFOR _MM_SWIZ_REG_BADC: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+191:i+128] dst[i+127:i+64] := v[i+255:i+192] dst[i+191:i+128] := v[i+63:i] dst[i+255:i+192] := v[i+127:i+64] ENDFOR _MM_SWIZ_REG_AAAA: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+63:i] dst[i+127:i+64] := v[i+63:i] dst[i+191:i+128] := v[i+63:i] dst[i+255:i+192] := v[i+63:i] ENDFOR _MM_SWIZ_REG_BBBB: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+127:i+63] dst[i+127:i+64] := v[i+127:i+63] dst[i+191:i+128] := v[i+127:i+63] dst[i+255:i+192] := v[i+127:i+63] ENDFOR _MM_SWIZ_REG_CCCC: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+191:i+128] dst[i+127:i+64] := v[i+191:i+128] dst[i+191:i+128] := v[i+191:i+128] dst[i+255:i+192] := v[i+191:i+128] ENDFOR _MM_SWIZ_REG_DDDD: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+255:i+192] dst[i+127:i+64] := v[i+255:i+192] dst[i+191:i+128] := v[i+255:i+192] dst[i+255:i+192] := v[i+255:i+192] ENDFOR _MM_SWIZ_REG_DACB: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+127:i+64] dst[i+127:i+64] := v[i+191:i+128] dst[i+191:i+128] := v[i+63:i] dst[i+255:i+192] := v[i+255:i+192] ENDFOR ESAC dst[MAX:512] := 0
Integer KNCNI Swizzle Performs a swizzle transformation of each of the four groups of packed 4x 32-bit integer elements in "v" using swizzle parameter "s", storing the results in "dst". CASE s OF _MM_SWIZ_REG_NONE: dst[511:0] := v[511:0] _MM_SWIZ_REG_DCBA: dst[511:0] := v[511:0] _MM_SWIZ_REG_CDAB: FOR j := 0 to 7 i := j*64 dst[i+31:i] := v[i+63:i+32] dst[i+63:i+32] := v[i+31:i] ENDFOR _MM_SWIZ_REG_BADC: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+95:i+64] dst[i+63:i+32] := v[i+127:i+96] dst[i+95:i+64] := v[i+31:i] dst[i+127:i+96] := v[i+63:i+32] ENDFOR _MM_SWIZ_REG_AAAA: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+31:i] dst[i+63:i+32] := v[i+31:i] dst[i+95:i+64] := v[i+31:i] dst[i+127:i+96] := v[i+31:i] ENDFOR _MM_SWIZ_REG_BBBB: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+63:i+32] dst[i+63:i+32] := v[i+63:i+32] dst[i+95:i+64] := v[i+63:i+32] dst[i+127:i+96] := v[i+63:i+32] ENDFOR _MM_SWIZ_REG_CCCC: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+95:i+64] dst[i+63:i+32] := v[i+95:i+64] dst[i+95:i+64] := v[i+95:i+64] dst[i+127:i+96] := v[i+95:i+64] ENDFOR _MM_SWIZ_REG_DDDD: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+127:i+96] dst[i+63:i+32] := v[i+127:i+96] dst[i+95:i+64] := v[i+127:i+96] dst[i+127:i+96] := v[i+127:i+96] ENDFOR _MM_SWIZ_REG_DACB: FOR j := 0 to 3 i := j*128 dst[i+31:i] := v[i+63:i+32] dst[i+63:i+32] := v[i+95:i+64] dst[i+95:i+64] := v[i+31:i] dst[i+127:i+96] := v[i+127:i+96] ENDFOR ESAC dst[MAX:512] := 0
Integer KNCNI Swizzle Performs a swizzle transformation of each of the two groups of packed 4x64-bit integer elements in "v" using swizzle parameter "s", storing the results in "dst". CASE s OF _MM_SWIZ_REG_NONE: dst[511:0] := v[511:0] _MM_SWIZ_REG_DCBA: dst[511:0] := v[511:0] _MM_SWIZ_REG_CDAB: FOR j := 0 to 3 i := j*64 dst[i+63:i] := v[i+127:i+64] dst[i+127:i+64] := v[i+63:i] ENDFOR _MM_SWIZ_REG_BADC: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+191:i+128] dst[i+127:i+64] := v[i+255:i+192] dst[i+191:i+128] := v[i+63:i] dst[i+255:i+192] := v[i+127:i+64] ENDFOR _MM_SWIZ_REG_AAAA: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+63:i] dst[i+127:i+64] := v[i+63:i] dst[i+191:i+128] := v[i+63:i] dst[i+255:i+192] := v[i+63:i] ENDFOR _MM_SWIZ_REG_BBBB: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+127:i+63] dst[i+127:i+64] := v[i+127:i+63] dst[i+191:i+128] := v[i+127:i+63] dst[i+255:i+192] := v[i+127:i+63] ENDFOR _MM_SWIZ_REG_CCCC: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+191:i+128] dst[i+127:i+64] := v[i+191:i+128] dst[i+191:i+128] := v[i+191:i+128] dst[i+255:i+192] := v[i+191:i+128] ENDFOR _MM_SWIZ_REG_DDDD: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+255:i+192] dst[i+127:i+64] := v[i+255:i+192] dst[i+191:i+128] := v[i+255:i+192] dst[i+255:i+192] := v[i+255:i+192] ENDFOR _MM_SWIZ_REG_DACB: FOR j := 0 to 1 i := j*256 dst[i+63:i] := v[i+127:i+64] dst[i+127:i+64] := v[i+191:i+128] dst[i+191:i+128] := v[i+63:i] dst[i+255:i+192] := v[i+255:i+192] ENDFOR ESAC dst[MAX:512] := 0
Floating Point KNCNI Swizzle Performs a swizzle transformation of each of the four groups of packed 4x single-precision (32-bit) floating-point elements in "v" using swizzle parameter "s", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE s OF _MM_SWIZ_REG_NONE: dst[511:0] := v[511:0] _MM_SWIZ_REG_DCBA: dst[511:0] := v[511:0] _MM_SWIZ_REG_CDAB: FOR j := 0 to 7 i := j*64 IF k[j*2] dst[i+31:i] := v[i+63:i+32] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*2+1] dst[i+63:i+32] := v[i+31:i] ELSE dst[i+63:i+32] := src[i+63:i+32] FI ENDFOR _MM_SWIZ_REG_BADC: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+95:i+64] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+127:i+96] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+31:i] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+63:i+32] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_AAAA: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+31:i] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+31:i] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+31:i] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_BBBB: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+63:i+32] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+63:i+32] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+63:i+32] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+63:i+32] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_CCCC: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+95:i+64] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+95:i+64] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+95:i+64] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+95:i+64] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_DDDD: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+127:i+96] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+127:i+96] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+127:i+96] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+127:i+96] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_DACB: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+63:i+32] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+95:i+64] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+31:i] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+127:i+96] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR ESAC dst[MAX:512] := 0
Floating Point KNCNI Swizzle Performs a swizzle transformation of each of the two groups of packed 4x double-precision (64-bit) floating-point elements in "v" using swizzle parameter "s", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE s OF _MM_SWIZ_REG_NONE: dst[511:0] := v[511:0] _MM_SWIZ_REG_DCBA: dst[511:0] := v[511:0] _MM_SWIZ_REG_CDAB: FOR j := 0 to 3 i := j*64 IF k[j*2] dst[i+63:i] := v[i+127:i+64] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*2+1] dst[i+127:i+64] := v[i+63:i] ELSE dst[i+127:i+64] := src[i+127:i+64] FI ENDFOR _MM_SWIZ_REG_BADC: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+191:i+128] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+255:i+192] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+63:i] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+127:i+64] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_AAAA: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+63:i] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+63:i] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+63:i] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_BBBB: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+127:i+63] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+127:i+63] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+127:i+63] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+127:i+63] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_CCCC: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+191:i+128] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+191:i+128] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+191:i+128] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+191:i+128] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_DDDD: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+255:i+192] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+255:i+192] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+255:i+192] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+255:i+192] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_DACB: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+127:i+64] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+191:i+128] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+63:i] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+255:i+192] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR ESAC dst[MAX:512] := 0
Integer KNCNI Swizzle Performs a swizzle transformation of each of the four groups of packed 4x32-bit integer elements in "v" using swizzle parameter "s", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE s OF _MM_SWIZ_REG_NONE: dst[511:0] := v[511:0] _MM_SWIZ_REG_DCBA: dst[511:0] := v[511:0] _MM_SWIZ_REG_CDAB: FOR j := 0 to 7 i := j*64 IF k[j*2] dst[i+31:i] := v[i+63:i+32] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*2+1] dst[i+63:i+32] := v[i+31:i] ELSE dst[i+63:i+32] := src[i+63:i+32] FI ENDFOR _MM_SWIZ_REG_BADC: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+95:i+64] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+127:i+96] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+31:i] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+63:i+32] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_AAAA: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+31:i] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+31:i] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+31:i] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_BBBB: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+63:i+32] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+63:i+32] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+63:i+32] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+63:i+32] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_CCCC: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+95:i+64] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+95:i+64] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+95:i+64] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+95:i+64] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_DDDD: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+127:i+96] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+127:i+96] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+127:i+96] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+127:i+96] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR _MM_SWIZ_REG_DACB: FOR j := 0 to 3 i := j*128 IF k[j*4] dst[i+31:i] := v[i+63:i+32] ELSE dst[i+31:i] := src[i+31:i] FI IF k[j*4+1] dst[i+63:i+32] := v[i+95:i+64] ELSE dst[i+63:i+32] := src[i+63:i+32] FI IF k[j*4+2] dst[i+95:i+64] := v[i+31:i] ELSE dst[i+95:i+64] := src[i+95:i+64] FI IF k[j*4+3] dst[i+127:i+96] := v[i+127:i+96] ELSE dst[i+127:i+96] := src[i+127:i+96] FI ENDFOR ESAC dst[MAX:512] := 0
Integer KNCNI Swizzle Performs a swizzle transformation of each of the four groups of packed 4x64-bit integer elements in "v" using swizzle parameter "s", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). CASE s OF _MM_SWIZ_REG_NONE: dst[511:0] := v[511:0] _MM_SWIZ_REG_DCBA: dst[511:0] := v[511:0] _MM_SWIZ_REG_CDAB: FOR j := 0 to 3 i := j*64 IF k[j*2] dst[i+63:i] := v[i+127:i+64] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*2+1] dst[i+127:i+64] := v[i+63:i] ELSE dst[i+127:i+64] := src[i+127:i+64] FI ENDFOR _MM_SWIZ_REG_BADC: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+191:i+128] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+255:i+192] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+63:i] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+127:i+64] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_AAAA: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+63:i] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+63:i] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+63:i] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_BBBB: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+127:i+63] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+127:i+63] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+127:i+63] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+127:i+63] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_CCCC: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+191:i+128] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+191:i+128] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+191:i+128] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+191:i+128] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_DDDD: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+255:i+192] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+255:i+192] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+255:i+192] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+255:i+192] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR _MM_SWIZ_REG_DACB: FOR j := 0 to 1 i := j*256 IF k[j*4] dst[i+63:i] := v[i+127:i+64] ELSE dst[i+63:i] := src[i+63:i] FI IF k[j*4+1] dst[i+127:i+64] := v[i+191:i+128] ELSE dst[i+127:i+64] := src[i+127:i+64] FI IF k[j*4+2] dst[i+191:i+128] := v[i+63:i] ELSE dst[i+191:i+128] := src[i+191:i+128] FI IF k[j*4+3] dst[i+255:i+192] := v[i+255:i+192] ELSE dst[i+255:i+192] := src[i+255:i+192] FI ENDFOR ESAC dst[MAX:512] := 0
Floating Point KNCNI Store Downconverts packed single-precision (32-bit) floating-point elements stored in "v" to a smaller type depending on "conv" and stores them in memory location "mt". "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 15 i := j*32 CASE conv OF _MM_DOWNCONV_PS_NONE: addr[i+31:i] := v[i+31:i] _MM_DOWNCONV_PS_FLOAT16: n := j*16 addr[n+15:n] := Convert_FP32_To_FP16(v[i+31:i]) _MM_DOWNCONV_PS_UINT8: n := j*8 addr[n+7:n] := Convert_FP32_To_UInt8(v[i+31:i]) _MM_DOWNCONV_PS_SINT8: n := j*8 addr[n+7:n] := Convert_FP32_To_Int8(v[i+31:i]) _MM_DOWNCONV_PS_UINT16: n := j*16 addr[n+15:n] := Convert_FP32_To_UInt16(v[i+31:i]) _MM_DOWNCONV_PS_SINT16: n := j*16 addr[n+15:n] := Convert_FP32_To_Int16(v[i+31:i]) ESAC ENDFOR
Integer KNCNI Store Downconverts packed 32-bit integer elements stored in "v" to a smaller type depending on "conv" and stores them in memory location "mt". "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 15 i := j*32 CASE conv OF _MM_DOWNCONV_EPI32_NONE: addr[i+31:i] := v[i+31:i] _MM_DOWNCONV_EPI32_UINT8: n := j*8 addr[n+7:n] := Int32ToUInt8(v[i+31:i]) _MM_DOWNCONV_EPI32_SINT8: n := j*8 addr[n+7:n] := Int32ToSInt8(v[i+31:i]) _MM_DOWNCONV_EPI32_UINT16: n := j*16 addr[n+15:n] := Int32ToUInt16(v[i+31:i]) _MM_DOWNCONV_EPI32_SINT16: n := j*16 addr[n+15:n] := Int32ToSInt16(v[i+31:i]) ESAC ENDFOR
Floating Point KNCNI Store Downconverts packed double-precision (64-bit) floating-point elements stored in "v" to a smaller type depending on "conv" and stores them in memory location "mt". "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 7 i := j*64 CASE conv OF _MM_DOWNCONV_PS_NONE: addr[i+63:i] := v[i+63:i] ESAC ENDFOR
Integer KNCNI Store Downconverts packed 64-bit integer elements stored in "v" to a smaller type depending on "conv" and stores them in memory location "mt". "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 7 i := j*64 CASE conv OF _MM_DOWNCONV_EPI64_NONE: addr[i+63:i] := v[i+63:i] ESAC ENDFOR
Floating Point KNCNI Store Downconverts packed single-precision (32-bit) floating-point elements stored in "v" to a smaller type depending on "conv" and stores them in memory location "mt" using writemask "k" (elements are not written to memory when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 15 i := j*32 IF k[j] CASE conv OF _MM_DOWNCONV_PS_NONE: mt[i+31:i] := v[i+31:i] _MM_DOWNCONV_PS_FLOAT16: n := j*16 mt[n+15:n] := Convert_FP32_To_FP16(v[i+31:i]) _MM_DOWNCONV_PS_UINT8: n := j*8 mt[n+7:n] := Convert_FP32_To_UInt8(v[i+31:i]) _MM_DOWNCONV_PS_SINT8: n := j*8 mt[n+7:n] := Convert_FP32_To_Int8(v[i+31:i]) _MM_DOWNCONV_PS_UINT16: n := j*16 mt[n+15:n] := Convert_FP32_To_UInt16(v[i+31:i]) _MM_DOWNCONV_PS_SINT16: n := j*16 mt[n+15:n] := Convert_FP32_To_Int16(v[i+31:i]) ESAC FI ENDFOR
Floating Point KNCNI Store Downconverts packed double-precision (64-bit) floating-point elements stored in "v" to a smaller type depending on "conv" and stores them in memory location "mt" (elements in "mt" are unaltered when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 7 i := j*64 CASE conv OF _MM_DOWNCONV_PD_NONE: IF k[j] mt[i+63:i] := v[i+63:i] FI ESAC ENDFOR
Integer KNCNI Store Downconverts packed 32-bit integer elements stored in "v" to a smaller type depending on "conv" and stores them in memory location "mt" (elements in "mt" are unaltered when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 15 i := j*32 IF k[j] CASE conv OF _MM_DOWNCONV_EPI32_NONE: addr[i+31:i] := v[i+31:i] _MM_DOWNCONV_EPI32_UINT8: n := j*8 addr[n+7:n] := Int32ToUInt8(v[i+31:i]) _MM_DOWNCONV_EPI32_SINT8: n := j*8 addr[n+7:n] := Int32ToSInt8(v[i+31:i]) _MM_DOWNCONV_EPI32_UINT16: n := j*16 addr[n+15:n] := Int32ToUInt16(v[i+31:i]) _MM_DOWNCONV_EPI32_SINT16: n := j*16 addr[n+15:n] := Int32ToSInt16(v[i+31:i]) ESAC FI ENDFOR
Integer KNCNI Store Downconverts packed 64-bit integer elements stored in "v" to a smaller type depending on "conv" and stores them in memory location "mt" (elements in "mt" are unaltered when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. addr := MEM[mt] FOR j := 0 to 7 i := j*64 IF k[j] CASE conv OF _MM_DOWNCONV_EPI64_NONE: addr[i+63:i] := v[i+63:i] ESAC FI ENDFOR
Floating Point KNCNI Store Stores packed single-precision (32-bit) floating-point elements from "v" to memory address "mt" with a no-read hint to the processor. addr := MEM[mt] FOR j := 0 to 15 i := j*32 addr[i+31:i] := v[i+31:i] ENDFOR
Floating Point KNCNI Store Stores packed double-precision (64-bit) floating-point elements from "v" to memory address "mt" with a no-read hint to the processor. addr := MEM[mt] FOR j := 0 to 7 i := j*64 addr[i+63:i] := v[i+63:i] ENDFOR
Floating Point KNCNI Store Stores packed single-precision (32-bit) floating-point elements from "v" to memory address "mt" with a no-read hint and using a weakly-ordered memory consistency model (stores performed with this function are not globally ordered, and subsequent stores from the same thread can be observed before them). addr := MEM[mt] FOR j := 0 to 15 i := j*32 addr[i+31:i] := v[i+31:i] ENDFOR
Floating Point KNCNI Store Stores packed double-precision (64-bit) floating-point elements from "v" to memory address "mt" with a no-read hint and using a weakly-ordered memory consistency model (stores performed with this function are not globally ordered, and subsequent stores from the same thread can be observed before them). addr := MEM[mt] FOR j := 0 to 7 i := j*64 addr[i+63:i] := v[i+63:i] ENDFOR
Integer KNCNI Arithmetic Performs element-by-element addition of packed 32-bit integers in "v2" and "v3" and the corresponding bit in "k2", storing the result of the addition in "dst" and the result of the carry in "k2_res". FOR j := 0 to 15 i := j*32 k2_res[j] := Carry(v2[i+31:i] + v3[i+31:i] + k2[j]) dst[i+31:i] := v2[i+31:i] + v3[i+31:i] + k2[j] ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element addition of packed 32-bit integers in "v2" and "v3" and the corresponding bit in "k2", storing the result of the addition in "dst" and the result of the carry in "k2_res" using writemask "k1" (elements are copied from "v2" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] k2_res[j] := Carry(v2[i+31:i] + v3[i+31:i] + k2[j]) dst[i+31:i] := v2[i+31:i] + v3[i+31:i] + k2[j] ELSE dst[i+31:i] := v2[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element addition between packed double-precision (64-bit) floating-point elements in "v2" and "v3" and negates their sum, storing the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := -(v2[i+63:i] + v3[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element addition between packed double-precision (64-bit) floating-point elements in "v2" and "v3" and negates their sum, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(v2[i+63:i] + v3[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element addition between packed single-precision (32-bit) floating-point elements in "v2" and "v3" and negates their sum, storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := -(v2[i+31:i] + v3[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element addition between packed single-precision (32-bit) floating-point elements in "v2" and "v3" and negates their sum, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(v2[i+31:i] + v3[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element by element addition between packed double-precision (64-bit) floating-point elements in "v2" and "v3" and negates the sum, storing the result in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := -(v2[i+63:i] + v3[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element by element addition between packed double-precision (64-bit) floating-point elements in "v2" and "v3" and negates the sum, storing the result in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := -(v2[i+63:i] + v3[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element by element addition between packed single-precision (32-bit) floating-point elements in "v2" and "v3" and negates the sum, storing the result in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := -(v2[i+31:i] + v3[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element by element addition between packed single-precision (32-bit) floating-point elements in "v2" and "v3" and negates the sum, storing the result in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := -(v2[i+31:i] + v3[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element subtraction of packed double-precision (64-bit) floating-point elements in "v2" from "v3" storing the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := v3[i+63:i] - v2[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element subtraction of packed double-precision (64-bit) floating-point elements in "v2" from "v3" storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := v3[i+63:i] - v2[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element subtraction of packed single-precision (32-bit) floating-point elements in "v2" from "v3" storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := v3[i+31:i] - v2[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element subtraction of packed single-precision (32-bit) floating-point elements in "v2" from "v3" storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v3[i+31:i] - v2[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element subtraction of packed double-precision (64-bit) floating-point elements in "v2" from "v3" storing the results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := v3[i+63:i] - v2[i+63:i] ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element subtraction of packed double-precision (64-bit) floating-point elements in "v2" from "v3" storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := v3[i+63:i] - v2[i+63:i] ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element subtraction of packed single-precision (32-bit) floating-point elements in "v2" from "v3" storing the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := v3[i+31:i] - v2[i+31:i] ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs element-by-element subtraction of packed single-precision (32-bit) floating-point elements in "v2" from "v3" storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v3[i+31:i] - v2[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element subtraction of packed 32-bit integer elements in "v2" from "v3" storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := v3[i+31:i] - v2[i+31:i] ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element subtraction of packed 32-bit integer elements in "v2" from "v3" storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set) FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v3[i+31:i] - v2[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element addition of packed 32-bit integer elements in "v2" and "v3", storing the resultant carry in "k2_res" (carry flag) and the addition results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := v2[i+31:i] + v3[i+31:i] k2_res[j] := Carry(v2[i+31:i] + v3[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element addition of packed 32-bit integer elements in "v2" and "v3", storing the resultant carry in "k2_res" (carry flag) and the addition results in "dst" using writemask "k" (elements are copied from "v2" and "k_old" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v2[i+31:i] + v3[i+31:i] ELSE dst[i+31:i] := v2[i+31:i] k2_res[j] := k_old[j] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs an element-by-element addition of packed 32-bit integer elements in "v2" and "v3", storing the results in "dst" and the sign of the sum in "sign" (sign flag). FOR j := 0 to 15 i := j*32 dst[i+31:i] := v2[i+31:i] + v3[i+31:i] sign[j] := v2[i+31:i] & v3[i+31:i] & 0x80000000 ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs an element-by-element addition of packed 32-bit integer elements in "v2" and "v3", storing the results in "dst" and the sign of the sum in "sign" (sign flag). Results are stored using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v2[i+31:i] + v3[i+31:i] sign[j] := v2[i+31:i] & v3[i+31:i] & 0x80000000 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs an element-by-element addition of packed single-precision (32-bit) floating-point elements in "v2" and "v3", storing the results in "dst" and the sign of the sum in "sign" (sign flag). FOR j := 0 to 15 i := j*32 dst[i+31:i] := v2[i+31:i] + v3[i+31:i] sign[j] := v2[i+31:i] & v3[i+31:i] & 0x80000000 ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs an element-by-element addition of packed single-precision (32-bit) floating-point elements in "v2" and "v3", storing the results in "dst" and the sign of the sum in "sign" (sign flag). Results are stored using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v2[i+31:i] + v3[i+31:i] sign[j] := v2[i+31:i] & v3[i+31:i] & 0x80000000 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs an element-by-element addition of packed single-precision (32-bit) floating-point elements in "v2" and "v3", storing the results in "dst" and the sign of the sum in "sign" (sign flag). [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := v2[i+31:i] + v3[i+31:i] sign[j] := v2[i+31:i] & v3[i+31:i] & 0x80000000 ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Performs an element-by-element addition of packed single-precision (32-bit) floating-point elements in "v2" and "v3", storing the results in "dst" and the sign of the sum in "sign" (sign flag). Results are stored using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v2[i+31:i] + v3[i+31:i] sign[j] := v2[i+31:i] & v3[i+31:i] & 0x80000000 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element subtraction of packed 32-bit integer elements in "v3" from "v2", storing the results in "dst" and the nth borrow bit in the nth position of "borrow" (borrow flag). FOR j := 0 to 15 i := j*32 dst[i+31:i] := v2[i+31:i] - v3[i+31:i] borrow[j] := Borrow(v2[i+31:i] - v3[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element subtraction of packed 32-bit integer elements in "v3" from "v2", storing the results in "dst" and the nth borrow bit in the nth position of "borrow" (borrow flag). Results are stored using writemask "k" (elements are copied from "v2" and "k_old" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := v2[i+31:i] - v3[i+31:i] borrow[j] := Borrow(v2[i+31:i] - v3[i+31:i]) ELSE dst[i+31:i] := v3[i+31:i] borrow[j] := k_old[j] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element subtraction of packed 32-bit integer elements in "v2" from "v3", storing the results in "dst" and "v2". The borrowed value from the subtraction difference for the nth element is written to the nth bit of "borrow" (borrow flag). FOR j := 0 to 15 i := j*32 dst[i+31:i] := v3[i+31:i] - v2[i+31:i] borrow[j] := Borrow(v3[i+31:i] - v2[i+31:i]) ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element subtraction of packed 32-bit integer elements in "v2" from "v3", storing the results in "dst" and "v2". The borrowed value from the subtraction difference for the nth element is written to the nth bit of "borrow" (borrow flag). Results are written using writemask "k" (elements are copied from "k" to "k_old" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] diff := v3[i+31:i] - v2[i+31:i] borrow[j] := Borrow(v3[i+31:i] - v2[i+31:i]) dst[i+31:i] := diff v2[i+31:i] := diff ELSE borrow[j] := k_old[j] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element three-input subtraction of packed 32-bit integer elements of "v3" as well as the corresponding bit from "k" from "v2". The borrowed value from the subtraction difference for the nth element is written to the nth bit of "borrow" (borrow flag). Results are stored in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := v2[i+31:i] - v3[i+31:i] - k[j] borrow[j] := Borrow(v2[i+31:i] - v3[i+31:i] - k[j]) ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element three-input subtraction of packed 32-bit integer elements of "v3" as well as the corresponding bit from "k2" from "v2". The borrowed value from the subtraction difference for the nth element is written to the nth bit of "borrow" (borrow flag). Results are stored in "dst" using writemask "k1" (elements are copied from "v2" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] dst[i+31:i] := v2[i+31:i] - v3[i+31:i] - k2[j] borrow[j] := Borrow(v2[i+31:i] - v3[i+31:i] - k2[j]) ELSE dst[i+31:i] := v2[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element three-input subtraction of packed 32-bit integer elements of "v2" as well as the corresponding bit from "k" from "v3". The borrowed value from the subtraction difference for the nth element is written to the nth bit of "borrow" (borrow flag). Results are stored in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := v3[i+31:i] - v2[i+31:i] - k[j] borrow[j] := Borrow(v2[i+31:i] - v3[i+31:i] - k[j]) ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element three-input subtraction of packed 32-bit integer elements of "v2" as well as the corresponding bit from "k2" from "v3". The borrowed value from the subtraction difference for the nth element is written to the nth bit of "borrow" (borrow flag). Results are stored in "dst" using writemask "k1" (elements are copied from "v2" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k1[j] dst[i+31:i] := v3[i+31:i] - v2[i+31:i] - k2[j] borrow[j] := Borrow(v2[i+31:i] - v3[i+31:i] - k2[j]) ELSE dst[i+31:i] := v2[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Convert Performs element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to packed single-precision (32-bit) floating-point elements, storing the results in "dst". Results are written to the lower half of "dst", and the upper half locations are set to '0'. [round_note] FOR j := 0 to 7 i := j*64 k := j*32 dst[k+31:k] := Convert_FP64_To_FP32(v2[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Convert Performs element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to packed single-precision (32-bit) floating-point elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Results are written to the lower half of "dst", and the upper half locations are set to '0'. [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_FP64_To_FP32(v2[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer KNCNI Convert Performs element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to packed 32-bit unsigned integer elements, storing the results in "dst". Results are written to the lower half of "dst", and the upper half locations are set to '0'. [round_note] FOR j := 0 to 7 i := j*64 k := j*32 dst[k+31:k] := Convert_FP64_To_Int32(v2[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer KNCNI Convert Performs element-by-element conversion of packed double-precision (64-bit) floating-point elements in "v2" to packed 32-bit unsigned integer elements, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Results are written to the lower half of "dst", and the upper half locations are set to '0'. [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_FP64_To_Int32(v2[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer KNCNI Convert Performs element-by-element conversion of packed single-precision (32-bit) floating-point elements in "v2" to packed 32-bit integer elements and performs an optional exponent adjust using "expadj", storing the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := v2[i+31:i] CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC dst[i+31:i] := Float32ToInt32(dst[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer KNCNI Convert Performs element-by-element conversion of packed single-precision (32-bit) floating-point elements in "v2" to packed 32-bit unsigned integer elements and performing an optional exponent adjust using "expadj", storing the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := v2[i+31:i] CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC dst[i+31:i] := Float32ToUInt32(dst[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer KNCNI Convert Performs element-by-element conversion of packed 32-bit unsigned integer elements in "v2" to packed single-precision (32-bit) floating-point elements and performing an optional exponent adjust using "expadj", storing the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := UInt32ToFloat32(v2[i+31:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ENDFOR dst[MAX:512] := 0
Floating Point Integer KNCNI Convert Performs element-by-element conversion of packed 32-bit unsigned integer elements in "v2" to packed single-precision (32-bit) floating-point elements and performing an optional exponent adjust using "expadj", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Int32ToFloat32(v2[i+31:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Elementary Math Functions Approximates the base-2 exponent of the packed single-precision (32-bit) floating-point elements in "v2" with eight bits for sign and magnitude and 24 bits for the fractional part. Results are stored in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := exp223(v2[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Elementary Math Functions Approximates the base-2 exponent of the packed single-precision (32-bit) floating-point elements in "v2" with eight bits for sign and magnitude and 24 bits for the fractional part. Results are stored in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := exp223(v2[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Miscellaneous Fixes up NaN's from packed double-precision (64-bit) floating-point elements in "v1" and "v2", storing the results in "dst" and storing the quietized NaN's from "v1" in "v3". FOR j := 0 to 7 i := j*64 dst[i+63:i] := FixupNaNs(v1[i+63:i], v2[i+63:i]) v3[i+63:i] := QuietizeNaNs(v1[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Miscellaneous Fixes up NaN's from packed double-precision (64-bit) floating-point elements in "v1" and "v2", storing the results in "dst" using writemask "k" (only elements whose corresponding mask bit is set are used in the computation). Quietized NaN's from "v1" are stored in "v3". FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := FixupNaNs(v1[i+63:i], v2[i+63:i]) v3[i+63:i] := QuietizeNaNs(v1[i+63:i]) FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Miscellaneous Fixes up NaN's from packed single-precision (32-bit) floating-point elements in "v1" and "v2", storing the results in "dst" and storing the quietized NaN's from "v1" in "v3". FOR j := 0 to 15 i := j*32 dst[i+31:i] := FixupNaNs(v1[i+31:i], v2[i+31:i]) v3[i+31:i] := QuietizeNaNs(v1[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Miscellaneous Fixes up NaN's from packed single-precision (32-bit) floating-point elements in "v1" and "v2", storing the results in "dst" using writemask "k" (only elements whose corresponding mask bit is set are used in the computation). Quietized NaN's from "v1" are stored in "v3". FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FixupNaNs(v1[i+31:i], v2[i+31:i]) v3[i+31:i] := QuietizeNaNs(v1[i+31:i]) FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the high-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt-64, up-converted depending on the value of "conv", and expanded into packed 32-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_EPI32_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_EPI32_UINT8: RETURN ZeroExtend32(MEM[addr + offset]) _MM_UPCONV_EPI32_SINT8: RETURN SignExtend32(MEM[addr + offset]) _MM_UPCONV_EPI32_UINT16: RETURN ZeroExtend32(MEM[addr + 2*offset]) _MM_UPCONV_EPI32_SINT16: RETURN SignExtend32(MEM[addr + 2*offset]) ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_EPI32_NONE: RETURN 4 _MM_UPCONV_EPI32_UINT8: RETURN 1 _MM_UPCONV_EPI32_SINT8: RETURN 1 _MM_UPCONV_EPI32_UINT16: RETURN 2 _MM_UPCONV_EPI32_SINT16: RETURN 2 ESAC } dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false upSize := UPCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 15 IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*upSize % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 dst[i+31:i] := UPCONVERT(addr, loadOffset, conv) FI loadOffset := loadOffset + 1 ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the high-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt-64, up-converted depending on the value of "conv", and expanded into packed 32-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. Elements are copied to "dst" according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_EPI32_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_EPI32_UINT8: RETURN ZeroExtend32(MEM[addr + offset]) _MM_UPCONV_EPI32_SINT8: RETURN SignExtend32(MEM[addr + offset]) _MM_UPCONV_EPI32_UINT16: RETURN ZeroExtend32(MEM[addr + 2*offset]) _MM_UPCONV_EPI32_SINT16: RETURN SignExtend32(MEM[addr + 2*offset]) ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_EPI32_NONE: RETURN 4 _MM_UPCONV_EPI32_UINT8: RETURN 1 _MM_UPCONV_EPI32_SINT8: RETURN 1 _MM_UPCONV_EPI32_UINT16: RETURN 2 _MM_UPCONV_EPI32_SINT16: RETURN 2 ESAC } dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false upSize := UPCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 15 IF k[j] IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*upSize % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 dst[i+31:i] := UPCONVERT(addr, loadOffset, conv) FI loadOffset := loadOffset + 1 FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the low-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt, up-converted depending on the value of "conv", and expanded into packed 32-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_EPI32_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_EPI32_UINT8: RETURN ZeroExtend32(MEM[addr + offset]) _MM_UPCONV_EPI32_SINT8: RETURN SignExtend32(MEM[addr + offset]) _MM_UPCONV_EPI32_UINT16: RETURN ZeroExtend32(MEM[addr + 2*offset]) _MM_UPCONV_EPI32_SINT16: RETURN SignExtend32(MEM[addr + 2*offset]) ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_EPI32_NONE: RETURN 4 _MM_UPCONV_EPI32_UINT8: RETURN 1 _MM_UPCONV_EPI32_SINT8: RETURN 1 _MM_UPCONV_EPI32_UINT16: RETURN 2 _MM_UPCONV_EPI32_SINT16: RETURN 2 ESAC } dst[511:0] := src[511:0] loadOffset := 0 upSize := UPCONVERTSIZE(conv) addr := mt FOR j := 0 to 15 i := j*32 dst[i+31:i] := UPCONVERT(addr, loadOffset, conv) loadOffset := loadOffset + 1 IF (mt + loadOffset * upSize) % 64 == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the low-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt, up-converted depending on the value of "conv", and expanded into packed 32-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. Elements are copied to "dst" according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_EPI32_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_EPI32_UINT8: RETURN ZeroExtend32(MEM[addr + offset]) _MM_UPCONV_EPI32_SINT8: RETURN SignExtend32(MEM[addr + offset]) _MM_UPCONV_EPI32_UINT16: RETURN ZeroExtend32(MEM[addr + 2*offset]) _MM_UPCONV_EPI32_SINT16: RETURN SignExtend32(MEM[addr + 2*offset]) ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_EPI32_NONE: RETURN 4 _MM_UPCONV_EPI32_UINT8: RETURN 1 _MM_UPCONV_EPI32_SINT8: RETURN 1 _MM_UPCONV_EPI32_UINT16: RETURN 2 _MM_UPCONV_EPI32_SINT16: RETURN 2 ESAC } dst[511:0] := src[511:0] loadOffset := 0 upSize := UPCONVERTSIZE(conv) addr := mt FOR j := 0 to 15 IF k[j] i := j*32 dst[i+31:i] := UPCONVERT(addr, loadOffset, conv) loadOffset := loadOffset + 1 IF (mt + loadOffset * upSize) % 64 == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the high-64-byte-aligned portion of the quadword stream starting at element-aligned address mt-64, up-converted depending on the value of "conv", and expanded into packed 64-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_EPI64_NONE: RETURN MEM[addr + 8*offset] ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_EPI64_NONE: RETURN 8 ESAC } dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false upSize := UPCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 7 IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*upSize) == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 dst[i+63:i] := UPCONVERT(addr, loadOffset, conv) FI loadOffset := loadOffset + 1 ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the high-64-byte-aligned portion of the quadword stream starting at element-aligned address mt-64, up-converted depending on the value of "conv", and expanded into packed 64-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. Elements are copied to "dst" according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_EPI64_NONE: RETURN MEM[addr + 8*offset] ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_EPI64_NONE: RETURN 8 ESAC } dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false upSize := UPCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 7 IF k[j] IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*upSize) == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 dst[i+63:i] := UPCONVERT(addr, loadOffset, conv) FI loadOffset := loadOffset + 1 FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the low-64-byte-aligned portion of the quadword stream starting at element-aligned address mt, up-converted depending on the value of "conv", and expanded into packed 64-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted quad that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_EPI64_NONE: RETURN MEM[addr + 8*offset] ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_EPI64_NONE: RETURN 8 ESAC } dst[511:0] := src[511:0] loadOffset := 0 upSize := UPCONVERTSIZE(conv) addr := mt FOR j := 0 to 7 i := j*64 dst[i+63:i] := UPCONVERT(addr, loadOffset, conv) loadOffset := loadOffset + 1 IF (addr + loadOffset*upSize % 64) == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the low-64-byte-aligned portion of the quadword stream starting at element-aligned address mt, up-converted depending on the value of "conv", and expanded into packed 64-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. Elements are copied to "dst" according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_EPI64_NONE: RETURN MEM[addr + 8*offset] ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_EPI64_NONE: RETURN 8 ESAC } dst[511:0] := src[511:0] loadOffset := 0 upSize := UPCONVERTSIZE(conv) addr := mt FOR j := 0 to 7 IF k[j] i := j*64 dst[i+63:i] := UPCONVERT(addr, loadOffset, conv) loadOffset := loadOffset + 1 IF (addr + loadOffset*upSize % 64) == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the high-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt-64, up-converted depending on the value of "conv", and expanded into packed single-precision (32-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PS_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP16_To_FP32(MEM[addr + 4*offset]) _MM_UPCONV_PS_UINT8: RETURN Convert_UInt8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_SINT8: RETURN Convert_Int8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_UINT16: RETURN Convert_UInt16_To_FP32(MEM[addr + 2*offset]) _MM_UPCONV_PS_SINT16: RETURN Convert_Int16_To_FP32(MEM[addr + 2*offset]) ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_PS_NONE: RETURN 4 _MM_UPCONV_PS_FLOAT16: RETURN 2 _MM_UPCONV_PS_UINT8: RETURN 1 _MM_UPCONV_PS_SINT8: RETURN 1 _MM_UPCONV_PS_UINT16: RETURN 2 _MM_UPCONV_PS_SINT16: RETURN 2 ESAC } dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false upSize := UPCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 15 IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*upSize % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 dst[i+31:i] := UPCONVERT(addr, loadOffset, conv) FI loadOffset := loadOffset + 1 ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the high-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt-64, up-converted depending on the value of "conv", and expanded into packed single-precision (32-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. Elements are copied to "dst" according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PS_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP16_To_FP32(MEM[addr + 4*offset]) _MM_UPCONV_PS_UINT8: RETURN Convert_UInt8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_SINT8: RETURN Convert_Int8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_UINT16: RETURN Convert_UInt16_To_FP32(MEM[addr + 2*offset]) _MM_UPCONV_PS_SINT16: RETURN Convert_Int16_To_FP32(MEM[addr + 2*offset]) ESAC } DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PS_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP16_To_FP32(MEM[addr + 4*offset]) _MM_UPCONV_PS_UINT8: RETURN Convert_UInt8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_SINT8: RETURN Convert_Int8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_UINT16: RETURN Convert_UInt16_To_FP32(MEM[addr + 2*offset]) _MM_UPCONV_PS_SINT16: RETURN Convert_Int16_To_FP32(MEM[addr + 2*offset]) ESAC } dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false upSize := UPCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 15 IF k[j] IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*upSize % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 dst[i+31:i] := UPCONVERT(addr, loadOffset, conv) FI loadOffset := loadOffset + 1 FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the low-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt, up-converted depending on the value of "conv", and expanded into packed single-precision (32-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PS_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP16_To_FP32(MEM[addr + 4*offset]) _MM_UPCONV_PS_UINT8: RETURN Convert_UInt8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_SINT8: RETURN Convert_Int8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_UINT16: RETURN Convert_UInt16_To_FP32(MEM[addr + 2*offset]) _MM_UPCONV_PS_SINT16: RETURN Convert_Int16_To_FP32(MEM[addr + 2*offset]) ESAC } DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PS_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP16_To_FP32(MEM[addr + 4*offset]) _MM_UPCONV_PS_UINT8: RETURN Convert_UInt8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_SINT8: RETURN Convert_Int8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_UINT16: RETURN Convert_UInt16_To_FP32(MEM[addr + 2*offset]) _MM_UPCONV_PS_SINT16: RETURN Convert_Int16_To_FP32(MEM[addr + 2*offset]) ESAC } dst[511:0] := src[511:0] loadOffset := 0 upSize := UPCONVERTSIZE(conv) addr := MEM[mt] FOR j := 0 to 15 i := j*32 dst[i+31:i] := UPCONVERT(addr, loadOffset, conv) loadOffset := loadOffset + 1 IF (mt + loadOffset * upSize) % 64 == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the low-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt, up-converted depending on the value of "conv", and expanded into packed single-precision (32-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. Elements are copied to "dst" according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PS_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP16_To_FP32(MEM[addr + 4*offset]) _MM_UPCONV_PS_UINT8: RETURN Convert_UInt8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_SINT8: RETURN Convert_Int8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_UINT16: RETURN Convert_UInt16_To_FP32(MEM[addr + 2*offset]) _MM_UPCONV_PS_SINT16: RETURN Convert_Int16_To_FP32(MEM[addr + 2*offset]) ESAC } DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PS_NONE: RETURN MEM[addr + 4*offset] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP16_To_FP32(MEM[addr + 4*offset]) _MM_UPCONV_PS_UINT8: RETURN Convert_UInt8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_SINT8: RETURN Convert_Int8_To_FP32(MEM[addr + offset]) _MM_UPCONV_PS_UINT16: RETURN Convert_UInt16_To_FP32(MEM[addr + 2*offset]) _MM_UPCONV_PS_SINT16: RETURN Convert_Int16_To_FP32(MEM[addr + 2*offset]) ESAC } dst[511:0] := src[511:0] loadOffset := 0 upSize := UPCONVERTSIZE(conv) addr := MEM[mt] FOR j := 0 to 15 IF k[j] i := j*32 dst[i+31:i] := UPCONVERT(addr, loadOffset, conv) loadOffset := loadOffset + 1 IF (mt + loadOffset * upSize) % 64 == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the high-64-byte-aligned portion of the quadword stream starting at element-aligned address mt-64, up-converted depending on the value of "conv", and expanded into packed double-precision (64-bit) floating-point values in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PD_NONE: RETURN MEM[addr + 8*offset] ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_PD_NONE: RETURN 8 ESAC } dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false upSize := UPCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 7 IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*upSize) % 64 == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 dst[i+63:i] := UPCONVERT(addr, loadOffset, conv) FI loadOffset := loadOffset + 1 ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the high-64-byte-aligned portion of the quadword stream starting at element-aligned address mt-64, up-converted depending on the value of "conv", and expanded into packed double-precision (64-bit) floating-point values in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. Elements are copied to "dst" according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PD_NONE: RETURN MEM[addr + 8*offset] ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_PD_NONE: RETURN 8 ESAC } dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false upSize := UPCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 7 IF k[j] IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*upSize) % 64 == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 dst[i+63:i] := UPCONVERT(addr, loadOffset, conv) FI loadOffset := loadOffset + 1 FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the low-64-byte-aligned portion of the quadword stream starting at element-aligned address mt, up-converted depending on the value of "conv", and expanded into packed double-precision (64-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted quad that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PD_NONE: RETURN MEM[addr + 8*offset] ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_PD_NONE: RETURN 8 ESAC } dst[511:0] := src[511:0] loadOffset := 0 upSize := UPCONVERTSIZE(conv) addr := mt FOR j := 0 to 7 i := j*64 dst[i+63:i] := UPCONVERT(addr, loadOffset, conv) loadOffset := loadOffset + 1 IF (mt + loadOffset * upSize) % 64 == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the low-64-byte-aligned portion of the quadword stream starting at element-aligned address mt, up-converted depending on the value of "conv", and expanded into packed double-precision (64-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted quad that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". "hint" indicates to the processor whether the loaded data is non-temporal. Elements are copied to "dst" according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE UPCONVERT(addr, offset, convertTo) { CASE conv OF _MM_UPCONV_PD_NONE: RETURN MEM[addr + 8*offset] ESAC } DEFINE UPCONVERTSIZE(convertTo) { CASE conv OF _MM_UPCONV_PD_NONE: RETURN 8 ESAC } dst[511:0] := src[511:0] loadOffset := 0 upSize := UPCONVERTSIZE(conv) addr := mt FOR j := 0 to 7 IF k[j] i := j*64 dst[i+63:i] := UPCONVERT(addr, loadOffset, conv) loadOffset := loadOffset + 1 IF (mt + loadOffset * upSize) % 64 == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Down-converts and stores packed 32-bit integer elements of "v1" into a byte/word/doubleword stream according to "conv" at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). "hint" indicates to the processor whether the data is non-temporal. DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_DOWNCONV_EPI32_NONE: RETURN element[31:0] _MM_DOWNCONV_EPI32_UINT8: RETURN Truncate8(element[31:0]) _MM_DOWNCONV_EPI32_SINT8: RETURN Saturate8(element[31:0]) _MM_DOWNCONV_EPI32_UINT16: RETURN Truncate16(element[31:0]) _MM_DOWNCONV_EPI32_SINT16: RETURN Saturate16(element[31:0]) ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_DOWNCONV_EPI32_NONE: RETURN 4 _MM_DOWNCONV_EPI32_UINT8: RETURN 1 _MM_DOWNCONV_EPI32_SINT8: RETURN 1 _MM_DOWNCONV_EPI32_UINT16: RETURN 2 _MM_DOWNCONV_EPI32_SINT16: RETURN 2 ESAC } storeOffset := 0 foundNext64BytesBoundary := false downSize := DOWNCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 15 IF foundNext64BytesBoundary == false IF ((addr + (storeOffset + 1)*downSize) % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 tmp := DOWNCONVERT(v1[i+31:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 4: MEM[storeAddr] := tmp[31:0] 2: MEM[storeAddr] := tmp[15:0] 1: MEM[storeAddr] := tmp[7:0] ESAC FI storeOffset := storeOffset + 1 ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Down-converts and stores packed 32-bit integer elements of "v1" into a byte/word/doubleword stream according to "conv" at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). "hint" indicates to the processor whether the data is non-temporal. Elements are stored to memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_DOWNCONV_EPI32_NONE: RETURN element[31:0] _MM_DOWNCONV_EPI32_UINT8: RETURN Truncate8(element[31:0]) _MM_DOWNCONV_EPI32_SINT8: RETURN Saturate8(element[31:0]) _MM_DOWNCONV_EPI32_UINT16: RETURN Truncate16(element[31:0]) _MM_DOWNCONV_EPI32_SINT16: RETURN Saturate16(element[31:0]) ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_DOWNCONV_EPI32_NONE: RETURN 4 _MM_DOWNCONV_EPI32_UINT8: RETURN 1 _MM_DOWNCONV_EPI32_SINT8: RETURN 1 _MM_DOWNCONV_EPI32_UINT16: RETURN 2 _MM_DOWNCONV_EPI32_SINT16: RETURN 2 ESAC } storeOffset := 0 foundNext64BytesBoundary := false downSize := DOWNCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 15 IF k[j] IF foundNext64BytesBoundary == false IF ((addr + (storeOffset + 1)*downSize) % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 tmp := DOWNCONVERT(v1[i+31:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 4: MEM[storeAddr] := tmp[31:0] 2: MEM[storeAddr] := tmp[15:0] 1: MEM[storeAddr] := tmp[7:0] ESAC FI storeOffset := storeOffset + 1 FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Down-converts and stores packed 32-bit integer elements of "v1" into a byte/word/doubleword stream according to "conv" at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). "hint" indicates to the processor whether the data is non-temporal. DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_DOWNCONV_EPI32_NONE: RETURN element[31:0] _MM_DOWNCONV_EPI32_UINT8: RETURN Truncate8(element[31:0]) _MM_DOWNCONV_EPI32_SINT8: RETURN Saturate8(element[31:0]) _MM_DOWNCONV_EPI32_UINT16: RETURN Truncate16(element[31:0]) _MM_DOWNCONV_EPI32_SINT16: RETURN Saturate16(element[31:0]) ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_DOWNCONV_EPI32_NONE: RETURN 4 _MM_DOWNCONV_EPI32_UINT8: RETURN 1 _MM_DOWNCONV_EPI32_SINT8: RETURN 1 _MM_DOWNCONV_EPI32_UINT16: RETURN 2 _MM_DOWNCONV_EPI32_SINT16: RETURN 2 ESAC } storeOffset := 0 downSize := DOWNCONVERTSIZE(conv) addr := mt FOR j := 0 to 15 i := j*32 tmp := DOWNCONVERT(v1[i+31:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 4: MEM[storeAddr] := tmp[31:0] 2: MEM[storeAddr] := tmp[15:0] 1: MEM[storeAddr] := tmp[7:0] ESAC storeOffset := storeOffset + 1 IF ((addr + storeOffset * downSize) % 64) == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Down-converts and stores packed 32-bit integer elements of "v1" into a byte/word/doubleword stream according to "conv" at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). "hint" indicates to the processor whether the data is non-temporal. Elements are written to memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_DOWNCONV_EPI32_NONE: RETURN element[31:0] _MM_DOWNCONV_EPI32_UINT8: RETURN Truncate8(element[31:0]) _MM_DOWNCONV_EPI32_SINT8: RETURN Saturate8(element[31:0]) _MM_DOWNCONV_EPI32_UINT16: RETURN Truncate16(element[31:0]) _MM_DOWNCONV_EPI32_SINT16: RETURN Saturate16(element[31:0]) ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_DOWNCONV_EPI32_NONE: RETURN 4 _MM_DOWNCONV_EPI32_UINT8: RETURN 1 _MM_DOWNCONV_EPI32_SINT8: RETURN 1 _MM_DOWNCONV_EPI32_UINT16: RETURN 2 _MM_DOWNCONV_EPI32_SINT16: RETURN 2 ESAC } storeOffset := 0 downSize := DOWNCONVERTSIZE(conv) addr := mt FOR j := 0 to 15 IF k[j] i := j*32 tmp := DOWNCONVERT(v1[i+31:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 4: MEM[storeAddr] := tmp[31:0] 2: MEM[storeAddr] := tmp[15:0] 1: MEM[storeAddr] := tmp[7:0] ESAC storeOffset := storeOffset + 1 IF ((addr + storeOffset * downSize) % 64) == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Down-converts and stores packed 64-bit integer elements of "v1" into a quadword stream according to "conv" at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). "hint" indicates to the processor whether the data is non-temporal. DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_EPI64_NONE: RETURN element[63:0] ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_EPI64_NONE: RETURN 8 ESAC } storeOffset := 0 foundNext64BytesBoundary := false downSize := DOWNCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 7 IF foundNext64BytesBoundary == false IF ((addr + (storeOffset + 1)*downSize) % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 tmp := DOWNCONVERT(v1[i+63:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 8: MEM[storeAddr] := tmp[63:0] ESAC FI storeOffset := storeOffset + 1 ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Down-converts and stores packed 64-bit integer elements of "v1" into a quadword stream according to "conv" at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (mt-64)). "hint" indicates to the processor whether the data is non-temporal. Elements are stored to memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_EPI64_NONE: RETURN element[63:0] ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_EPI64_NONE: RETURN 8 ESAC } storeOffset := 0 foundNext64BytesBoundary := false downSize := DOWNCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 7 IF k[j] IF foundNext64BytesBoundary == false IF ((addr + (storeOffset + 1)*downSize) % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 tmp := DOWNCONVERT(v1[i+63:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 8: MEM[storeAddr] := tmp[63:0] ESAC FI storeOffset := storeOffset + 1 FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Down-converts and stores packed 64-bit integer elements of "v1" into a quadword stream according to "conv" at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). "hint" indicates to the processor whether the data is non-temporal. DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_EPI64_NONE: RETURN element[63:0] ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_EPI64_NONE: RETURN 8 ESAC } storeOffset := 0 downSize := DOWNCONVERTSIZE(conv) addr := mt FOR j := 0 to 7 i := j*63 tmp := DOWNCONVERT(v1[i+63:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 8: MEM[storeAddr] := tmp[63:0] ESAC storeOffset := storeOffset + 1 IF ((addr + storeOffset * downSize) % 64) == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Down-converts and stores packed 64-bit integer elements of "v1" into a quadword stream according to "conv" at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). "hint" indicates to the processor whether the data is non-temporal. Elements are stored to memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_EPI64_NONE: RETURN element[63:0] ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_EPI64_NONE: RETURN 8 ESAC } storeOffset := 0 downSize := DOWNCONVERTSIZE(conv) addr := mt FOR j := 0 to 7 IF k[j] i := j*63 tmp := DOWNCONVERT(v1[i+63:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 8: MEM[storeAddr] := tmp[63:0] ESAC storeOffset := storeOffset + 1 IF ((addr + storeOffset * downSize) % 64) == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts and stores packed single-precision (32-bit) floating-point elements of "v1" into a byte/word/doubleword stream according to "conv" at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). "hint" indicates to the processor whether the data is non-temporal. DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_PS_NONE: RETURN element[31:0] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP32_To_FP16(element[31:0]) _MM_UPCONV_PS_UINT8: RETURN Truncate8(element[31:0]) _MM_UPCONV_PS_SINT8: RETURN Saturate8(element[31:0]) _MM_UPCONV_PS_UINT16: RETURN Truncate16(element[31:0]) _MM_UPCONV_PS_SINT16: RETURN Saturate16(element[31:0]) ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_PS_NONE: RETURN 4 _MM_UPCONV_PS_FLOAT16: RETURN 2 _MM_UPCONV_PS_UINT8: RETURN 1 _MM_UPCONV_PS_SINT8: RETURN 1 _MM_UPCONV_PS_UINT16: RETURN 2 _MM_UPCONV_PS_SINT16: RETURN 2 ESAC } storeOffset := 0 foundNext64BytesBoundary := false downSize := DOWNCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 15 IF foundNext64BytesBoundary == false IF ((addr + (storeOffset + 1)*downSize) % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 tmp := DOWNCONVERT(v1[i+31:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 4: MEM[storeAddr] := tmp[31:0] 2: MEM[storeAddr] := tmp[15:0] 1: MEM[storeAddr] := tmp[7:0] ESAC FI storeOffset := storeOffset + 1 ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts and stores packed single-precision (32-bit) floating-point elements of "v1" into a byte/word/doubleword stream according to "conv" at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). "hint" indicates to the processor whether the data is non-temporal. Elements are stored to memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_PS_NONE: RETURN element[31:0] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP32_To_FP16(element[31:0]) _MM_UPCONV_PS_UINT8: RETURN Truncate8(element[31:0]) _MM_UPCONV_PS_SINT8: RETURN Saturate8(element[31:0]) _MM_UPCONV_PS_UINT16: RETURN Truncate16(element[31:0]) _MM_UPCONV_PS_SINT16: RETURN Saturate16(element[31:0]) ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_PS_NONE: RETURN 4 _MM_UPCONV_PS_FLOAT16: RETURN 2 _MM_UPCONV_PS_UINT8: RETURN 1 _MM_UPCONV_PS_SINT8: RETURN 1 _MM_UPCONV_PS_UINT16: RETURN 2 _MM_UPCONV_PS_SINT16: RETURN 2 ESAC } storeOffset := 0 foundNext64BytesBoundary := false downSize := DOWNCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 15 IF k[j] IF foundNext64BytesBoundary == false IF ((addr + (storeOffset + 1)*downSize) % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 tmp := DOWNCONVERT(v1[i+31:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 4: MEM[storeAddr] := tmp[31:0] 2: MEM[storeAddr] := tmp[15:0] 1: MEM[storeAddr] := tmp[7:0] ESAC FI storeOffset := storeOffset + 1 FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts and stores packed single-precision (32-bit) floating-point elements of "v1" into a byte/word/doubleword stream according to "conv" at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). "hint" indicates to the processor whether the data is non-temporal. DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_PS_NONE: RETURN element[31:0] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP32_To_FP16(element[31:0]) _MM_UPCONV_PS_UINT8: RETURN Truncate8(element[31:0]) _MM_UPCONV_PS_SINT8: RETURN Saturate8(element[31:0]) _MM_UPCONV_PS_UINT16: RETURN Truncate16(element[31:0]) _MM_UPCONV_PS_SINT16: RETURN Saturate16(element[31:0]) ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_PS_NONE: RETURN 4 _MM_UPCONV_PS_FLOAT16: RETURN 2 _MM_UPCONV_PS_UINT8: RETURN 1 _MM_UPCONV_PS_SINT8: RETURN 1 _MM_UPCONV_PS_UINT16: RETURN 2 _MM_UPCONV_PS_SINT16: RETURN 2 ESAC } storeOffset := 0 downSize := DOWNCONVERTSIZE(conv) addr := mt FOR j := 0 to 15 i := j*32 tmp := DOWNCONVERT(v1[i+31:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 4: MEM[storeAddr] := tmp[31:0] 2: MEM[storeAddr] := tmp[15:0] 1: MEM[storeAddr] := tmp[7:0] ESAC storeOffset := storeOffset + 1 IF ((addr + storeOffset * downSize) % 64) == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts and stores packed single-precision (32-bit) floating-point elements of "v1" into a byte/word/doubleword stream according to "conv" at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). "hint" indicates to the processor whether the data is non-temporal. Elements are stored to memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_PS_NONE: RETURN element[31:0] _MM_UPCONV_PS_FLOAT16: RETURN Convert_FP32_To_FP16(element[31:0]) _MM_UPCONV_PS_UINT8: RETURN Truncate8(element[31:0]) _MM_UPCONV_PS_SINT8: RETURN Saturate8(element[31:0]) _MM_UPCONV_PS_UINT16: RETURN Truncate16(element[31:0]) _MM_UPCONV_PS_SINT16: RETURN Saturate16(element[31:0]) ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_PS_NONE: RETURN 4 _MM_UPCONV_PS_FLOAT16: RETURN 2 _MM_UPCONV_PS_UINT8: RETURN 1 _MM_UPCONV_PS_SINT8: RETURN 1 _MM_UPCONV_PS_UINT16: RETURN 2 _MM_UPCONV_PS_SINT16: RETURN 2 ESAC } storeOffset := 0 downSize := DOWNCONVERTSIZE(conv) addr := mt FOR j := 0 to 15 IF k[j] i := j*32 tmp := DOWNCONVERT(v1[i+31:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 4: MEM[storeAddr] := tmp[31:0] 2: MEM[storeAddr] := tmp[15:0] 1: MEM[storeAddr] := tmp[7:0] ESAC storeOffset := storeOffset + 1 IF ((addr + storeOffset * downSize) % 64) == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts and stores packed double-precision (64-bit) floating-point elements of "v1" into a quadword stream according to "conv" at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). "hint" indicates to the processor whether the data is non-temporal. DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_PD_NONE: RETURN element[63:0] ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_PD_NONE: RETURN 8 ESAC } storeOffset := 0 foundNext64BytesBoundary := false downSize := DOWNCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 7 IF foundNext64BytesBoundary == false IF ((addr + (storeOffset + 1)*downSize) % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 tmp := DOWNCONVERT(v1[i+63:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 8: MEM[storeAddr] := tmp[63:0] ESAC FI storeOffset := storeOffset + 1 ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts and stores packed double-precision (64-bit) floating-point elements of "v1" into a quadword stream according to "conv" at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). "hint" indicates to the processor whether the data is non-temporal. Elements are stored to memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_PD_NONE: RETURN element[63:0] ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_PD_NONE: RETURN 8 ESAC } storeOffset := 0 foundNext64BytesBoundary := false downSize := DOWNCONVERTSIZE(conv) addr := mt-64 FOR j := 0 to 7 IF k[j] IF foundNext64BytesBoundary == false IF ((addr + (storeOffset + 1)*downSize) % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 tmp := DOWNCONVERT(v1[i+63:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 8: MEM[storeAddr] := tmp[63:0] ESAC FI storeOffset := storeOffset + 1 FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts and stores packed double-precision (64-bit) floating-point elements of "v1" into a quadword stream according to "conv" at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). "hint" indicates to the processor whether the data is non-temporal. DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_PD_NONE: RETURN element[63:0] ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_PD_NONE: RETURN 8 ESAC } storeOffset := 0 downSize := DOWNCONVERTSIZE(conv) addr := mt FOR j := 0 to 7 i := j*63 tmp := DOWNCONVERT(v1[i+63:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 8: MEM[storeAddr] := tmp[63:0] ESAC storeOffset := storeOffset + 1 IF ((addr + storeOffset * downSize) % 64) == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts and stores packed double-precision (64-bit) floating-point elements of "v1" into a quadword stream according to "conv" at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). "hint" indicates to the processor whether the data is non-temporal. Elements are stored to memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). DEFINE DOWNCONVERT(element, convertTo) { CASE convertTo OF _MM_UPCONV_PD_NONE: RETURN element[63:0] ESAC } DEFINE DOWNCONVERTSIZE(convertTo) { CASE convertTo OF _MM_UPCONV_PD_NONE: RETURN 8 ESAC } storeOffset := 0 downSize := DOWNCONVERTSIZE(conv) addr := mt FOR j := 0 to 7 IF k[j] i := j*63 tmp := DOWNCONVERT(v1[i+63:i], conv) storeAddr := addr + storeOffset * downSize CASE downSize OF 8: MEM[storeAddr] := tmp[63:0] ESAC storeOffset := storeOffset + 1 IF ((addr + storeOffset * downSize) % 64) == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Stores 8 packed 64-bit integer elements located in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale". FOR j := 0 to 7 i := j*64 m := j*32 addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] ENDFOR
Integer KNCNI Store Stores 8 packed 64-bit integer elements located in "a" and stores them in memory locations starting at location "base_addr" at packed 32-bit integer indices stored in "vindex" scaled by "scale" using writemask "k" (elements whose corresponding mask bit is not set are not written to memory). FOR j := 0 to 7 i := j*64 m := j*32 IF k[j] addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8 MEM[addr+63:addr] := a[i+63:i] FI ENDFOR
Integer KNCNI Load Loads the high-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt-64 and expands them into packed 32-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false addr := mt-64 FOR j := 0 to 15 IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*4 % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 tmp := MEM[addr + loadOffset*4] dst[i+31:i] := tmp[i+31:i] FI loadOffset := loadOffset + 1 ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the high-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt-64 and expands them into packed 32-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false addr := mt-64 FOR j := 0 to 15 IF k[j] IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*4 % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 tmp := MEM[addr + loadOffset*4] dst[i+31:i] := tmp[i+31:i] FI loadOffset := loadOffset + 1 FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the low-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt and expanded into packed 32-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". dst[511:0] := src[511:0] loadOffset := 0 addr := mt FOR j := 0 to 15 i := j*32 tmp := MEM[addr + loadOffset*4] dst[i+31:i] := tmp[i+31:i] loadOffset := loadOffset + 1 IF (mt + loadOffset * 4) % 64 == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the low-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt and expands them into packed 32-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). dst[511:0] := src[511:0] loadOffset := 0 addr := mt FOR j := 0 to 15 i := j*32 IF k[j] tmp := MEM[addr + loadOffset*4] dst[i+31:i] := tmp[i+31:i] loadOffset := loadOffset + 1 IF (mt + loadOffset * 4) % 64 == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the high-64-byte-aligned portion of the quadword stream starting at element-aligned address mt-64 and expands them into packed 64-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false addr := mt-64 FOR j := 0 to 7 IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*8) == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 tmp := MEM[addr + loadOffset*8] dst[i+63:i] := tmp[i+63:i] FI loadOffset := loadOffset + 1 ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the high-64-byte-aligned portion of the quadword stream starting at element-aligned address mt-64 and expands them into packed 64-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false addr := mt-64 FOR j := 0 to 7 IF k[j] IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*8) == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 tmp := MEM[addr + loadOffset*8] dst[i+63:i] := tmp[i+63:i] FI loadOffset := loadOffset + 1 FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the low-64-byte-aligned portion of the quadword stream starting at element-aligned address mt and expands them into packed 64-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted quad that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". dst[511:0] := src[511:0] loadOffset := 0 addr := mt FOR j := 0 to 7 i := j*64 tmp := MEM[addr + loadOffset*8] dst[i+63:i] := tmp[i+63:i] loadOffset := loadOffset + 1 IF (addr + loadOffset*8 % 64) == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads the low-64-byte-aligned portion of the quadword stream starting at element-aligned address mt and expands them into packed 64-bit integers in "dst". The initial values of "dst" are copied from "src". Only those converted quad that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). dst[511:0] := src[511:0] loadOffset := 0 addr := mt FOR j := 0 to 7 i := j*64 IF k[j] tmp := MEM[addr + loadOffset*8] dst[i+63:i] := tmp[i+63:i] loadOffset := loadOffset + 1 IF (addr + loadOffset*8 % 64) == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the high-64-byte-aligned portion of the byte/word/doubleword stream starting at element-aligned address mt-64 and expands them into packed single-precision (32-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false addr := mt-64 FOR j := 0 to 15 IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*4 % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 tmp := MEM[addr + loadOffset*4] dst[i+31:i] := tmp[i+31:i] FI loadOffset := loadOffset + 1 ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the high-64-byte-aligned portion of the doubleword stream starting at element-aligned address mt-64 and expands them into packed single-precision (32-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false addr := mt-64 FOR j := 0 to 15 IF k[j] IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*4 % 64) == 0 foundNext64BytesBoundary := true FI ELSE i := j*32 tmp := MEM[addr + loadOffset*4] dst[i+31:i] := tmp[i+31:i] FI loadOffset := loadOffset + 1 FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the low-64-byte-aligned portion of the doubleword stream starting at element-aligned address mt and expanded into packed single-precision (32-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". dst[511:0] := src[511:0] loadOffset := 0 addr := mt FOR j := 0 to 15 i := j*32 tmp := MEM[addr + loadOffset*4] dst[i+31:i] := tmp[i+31:i] loadOffset := loadOffset + 1 IF (mt + loadOffset * 4) % 64 == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the low-64-byte-aligned portion of the doubleword stream starting at element-aligned address mt and expanded into packed single-precision (32-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted doublewords that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those doublewords are taken from "src". Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). dst[511:0] := src[511:0] loadOffset := 0 addr := mt FOR j := 0 to 15 i := j*32 IF k[j] tmp := MEM[addr + loadOffset*4] dst[i+31:i] := tmp[i+31:i] loadOffset := loadOffset + 1 IF (mt + loadOffset * 4) % 64 == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the high-64-byte-aligned portion of the quadword stream starting at element-aligned address mt-64 and expands them into packed double-precision (64-bit) floating-point values in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false addr := mt-64 FOR j := 0 to 7 IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*8) % 64 == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 tmp := MEM[addr + loadOffset*8] dst[i+63:i] := tmp[i+63:i] FI loadOffset := loadOffset + 1 ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the high-64-byte-aligned portion of the quadword stream starting at element-aligned address mt-64 and expands them into packed double-precision (64-bit) floating-point values in "dst". The initial values of "dst" are copied from "src". Only those converted quadwords that occur at or after the first 64-byte-aligned address following (mt-64) are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). dst[511:0] := src[511:0] loadOffset := 0 foundNext64BytesBoundary := false addr := mt-64 FOR j := 0 to 7 IF k[j] IF foundNext64BytesBoundary == false IF (addr + (loadOffset + 1)*8) % 64 == 0 foundNext64BytesBoundary := true FI ELSE i := j*64 tmp := MEM[addr + loadOffset*8] dst[i+63:i] := tmp[i+63:i] FI loadOffset := loadOffset + 1 FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the low-64-byte-aligned portion of the quadword stream starting at element-aligned address mt and expands them into packed double-precision (64-bit) floating-point elements in "dst". The initial values of "dst" are copied from "src". Only those converted quad that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". dst[511:0] := src[511:0] loadOffset := 0 addr := mt FOR j := 0 to 7 i := j*64 tmp := MEM[addr + loadOffset*8] dst[i+63:i] := tmp[i+63:i] loadOffset := loadOffset + 1 IF ((addr + 8*loadOffset) % 64) == 0 BREAK FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Loads the low-64-byte-aligned portion of the quadword stream starting at element-aligned address mt and expands them into packed double-precision (64-bit) floating-point values in "dst". The initial values of "dst" are copied from "src". Only those converted quad that occur before first 64-byte-aligned address following "mt" are loaded. Elements in the resulting vector that do not map to those quadwords are taken from "src". Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). dst[511:0] := src[511:0] loadOffset := 0 addr := mt FOR j := 0 to 7 i := j*64 IF k[j] tmp := MEM[addr + loadOffset*8] dst[i+63:i] := tmp[i+63:i] loadOffset := loadOffset + 1 IF ((addr + 8*loadOffset) % 64) == 0 BREAK FI FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Store Stores packed 32-bit integer elements of "v1" into a doubleword stream at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). storeOffset := 0 foundNext64BytesBoundary := 0 addr := mt-64 FOR j := 0 to 15 IF foundNext64BytesBoundary == 0 IF ((addr + (storeOffset + 1)*4) % 64) == 0 foundNext64BytesBoundary := 1 FI ELSE i := j*32 MEM[addr + storeOffset*4] := v1[i+31:i] FI storeOffset := storeOffset + 1 ENDFOR
Integer KNCNI Store Stores packed 32-bit integer elements of "v1" into a doubleword stream at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). storeOffset := 0 foundNext64BytesBoundary := 0 addr := mt-64 FOR j := 0 to 15 IF k[j] IF foundNext64BytesBoundary == 0 IF ((addr + (storeOffset + 1)*4) % 64) == 0 foundNext64BytesBoundary := 1 FI ELSE i := j*32 MEM[addr + storeOffset*4] := v1[i+31:i] FI storeOffset := storeOffset + 1 FI ENDFOR
Integer KNCNI Store Stores packed 32-bit integer elements of "v1" into a doubleword stream at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). storeOffset := 0 addr := mt FOR j := 0 to 15 i := j*32 MEM[addr + storeOffset*4] := v1[i+31:i] storeOffset := storeOffset + 1 IF ((addr + storeOffset*4) % 64) == 0 BREAK FI ENDFOR
Integer KNCNI Store Stores packed 32-bit integer elements of "v1" into a doubleword stream at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). storeOffset := 0 addr := mt FOR j := 0 to 15 IF k[j] i := j*32 MEM[addr + storeOffset*4] := v1[i+31:i] storeOffset := storeOffset + 1 IF ((addr + storeOffset*4) % 64) == 0 BREAK FI FI ENDFOR
Integer KNCNI Store Stores packed 64-bit integer elements of "v1" into a quadword stream at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). storeOffset := 0 foundNext64BytesBoundary := 0 addr := mt-64 FOR j := 0 to 7 IF foundNext64BytesBoundary == 0 IF ((addr + (storeOffset + 1)*8) % 64) == 0 foundNext64BytesBoundary := 1 FI ELSE i := j*64 MEM[addr + storeOffset*8] := v1[i+63:i] FI storeOffset := storeOffset + 1 ENDFOR
Integer KNCNI Store Stores packed 64-bit integer elements of "v1" into a quadword stream at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). storeOffset := 0 foundNext64BytesBoundary := 0 addr := mt-64 FOR j := 0 to 7 IF k[j] IF foundNext64BytesBoundary == 0 IF ((addr + (storeOffset + 1)*8) % 64) == 0 foundNext64BytesBoundary := 1 FI ELSE i := j*64 MEM[addr + storeOffset*8] := v1[i+63:i] FI storeOffset := storeOffset + 1 FI ENDFOR
Integer KNCNI Store Stores packed 64-bit integer elements of "v1" into a quadword stream at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). storeOffset := 0 addr := mt FOR j := 0 to 7 i := j*64 MEM[addr + storeOffset*8] := v1[i+63:i] storeOffset := storeOffset + 1 IF ((addr + storeOffset*8) % 64) == 0 BREAK FI ENDFOR
Integer KNCNI Store Stores packed 64-bit integer elements of "v1" into a quadword stream at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). storeOffset := 0 addr := mt FOR j := 0 to 7 IF k[j] i := j*64 MEM[addr + storeOffset*8] := v1[i+63:i] storeOffset := storeOffset + 1 IF ((addr + storeOffset*8) % 64) == 0 BREAK FI FI ENDFOR
Floating Point KNCNI Store Stores packed single-precision (32-bit) floating-point elements of "v1" into a doubleword stream at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). storeOffset := 0 foundNext64BytesBoundary := 0 addr := mt-64 FOR j := 0 to 15 IF foundNext64BytesBoundary == 0 IF ((addr + (storeOffset + 1)*4) % 64) == 0 foundNext64BytesBoundary := 1 FI ELSE i := j*32 MEM[addr + storeOffset*4] := v1[i+31:i] FI storeOffset := storeOffset + 1 ENDFOR
Floating Point KNCNI Store Stores packed single-precision (32-bit) floating-point elements of "v1" into a doubleword stream at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). storeOffset := 0 foundNext64BytesBoundary := 0 addr := mt-64 FOR j := 0 to 15 IF k[j] IF foundNext64BytesBoundary == 0 IF ((addr + (storeOffset + 1)*4) % 64) == 0 foundNext64BytesBoundary := 1 FI ELSE i := j*32 MEM[addr + storeOffset*4] := v1[i+31:i] FI storeOffset := storeOffset + 1 FI ENDFOR
Floating Point KNCNI Store Stores packed single-precision (32-bit) floating-point elements of "v1" into a doubleword stream at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). storeOffset := 0 addr := mt FOR j := 0 to 15 i := j*32 MEM[addr + storeOffset*4] := v1[i+31:i] storeOffset := storeOffset + 1 IF ((addr + storeOffset*4) % 64) == 0 BREAK FI ENDFOR
Floating Point KNCNI Store Stores packed single-precision (32-bit) floating-point elements of "v1" into a doubleword stream at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). storeOffset := 0 addr := mt FOR j := 0 to 15 IF k[j] i := j*32 MEM[addr + storeOffset*4] := v1[i+31:i] storeOffset := storeOffset + 1 IF ((addr + storeOffset*4) % 64) == 0 BREAK FI FI ENDFOR
Floating Point KNCNI Store Stores packed double-precision (64-bit) floating-point elements of "v1" into a quadword stream at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). storeOffset := 0 foundNext64BytesBoundary := 0 addr := mt-64 FOR j := 0 to 7 IF foundNext64BytesBoundary == 0 IF ((addr + (storeOffset + 1)*8) % 64) == 0 foundNext64BytesBoundary := 1 FI ELSE i := j*64 MEM[addr + storeOffset*4] := v1[i+63:i] FI storeOffset := storeOffset + 1 ENDFOR
Floating Point KNCNI Store Stores packed double-precision (64-bit) floating-point elements of "v1" into a quadword stream at a logically mapped starting address (mt-64), storing the high-64-byte elements of that stream (those elements of the stream that map at or after the first 64-byte-aligned address following (m5-64)). Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). storeOffset := 0 foundNext64BytesBoundary := 0 addr := mt-64 FOR j := 0 to 7 IF k[j] IF foundNext64BytesBoundary == 0 IF ((addr + (storeOffset + 1)*8) % 64) == 0 foundNext64BytesBoundary := 1 FI ELSE i := j*64 MEM[addr + storeOffset*4] := v1[i+63:i] FI storeOffset := storeOffset + 1 FI ENDFOR
Floating Point KNCNI Store Stores packed double-precision (64-bit) floating-point elements of "v1" into a quadword stream at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). storeOffset := 0 addr := mt FOR j := 0 to 7 i := j*64 MEM[addr + storeOffset*8] := v1[i+63:i] storeOffset := storeOffset + 1 IF ((addr + storeOffset*8) % 64) == 0 BREAK FI ENDFOR
Floating Point KNCNI Store Stores packed double-precision (64-bit) floating-point elements of "v1" into a quadword stream at a logically mapped starting address "mt", storing the low-64-byte elements of that stream (those elements of the stream that map before the first 64-byte-aligned address following "mt"). Elements are loaded from memory according to element selector "k" (elements are skipped when the corresponding mask bit is not set). storeOffset := 0 addr := mt FOR j := 0 to 7 IF k[j] i := j*64 MEM[addr + storeOffset*8] := v1[i+63:i] storeOffset := storeOffset + 1 IF ((addr + storeOffset*8) % 64) == 0 BREAK FI FI ENDFOR
KNCNI Bit Manipulation Counts the number of set bits in 32-bit unsigned integer "r1", returning the results in "dst". dst[31:0] := PopCount(r1[31:0])
KNCNI Bit Manipulation Counts the number of set bits in 64-bit unsigned integer "r1", returning the results in "dst". dst[63:0] := PopCount(r1[63:0])
Mask KNCNI Mask Inserts the low byte of mask "k2" into the high byte of "dst", and copies the low byte of "k1" to the low byte of "dst". dst[7:0] := k1[7:0] dst[15:8] := k2[7:0]
Floating Point Integer KNCNI Convert Performs an element-by-element conversion of elements in packed double-precision (64-bit) floating-point vector "v2" to 32-bit integer elements, storing them in the lower half of "dst". The elements in the upper half of "dst" are set to 0. [round_note] FOR j := 0 to 7 i := j*64 k := j*32 dst[k+31:k] := Convert_FP64_To_Int32(v2[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point Integer KNCNI Convert Performs an element-by-element conversion of elements in packed double-precision (64-bit) floating-point vector "v2" to 32-bit integer elements, storing them in the lower half of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). The elements in the upper half of "dst" are set to 0. [round_note] FOR j := 0 to 7 i := j*64 l := j*32 IF k[j] dst[l+31:l] := Convert_FP64_To_Int32(v2[i+63:i]) ELSE dst[l+31:l] := src[l+31:l] FI ENDFOR dst[MAX:512] := 0
Floating Point Integer KNCNI Convert Performs element-by-element conversion of packed 32-bit integer elements in "v2" to packed single-precision (32-bit) floating-point elements and performing an optional exponent adjust using "expadj", storing the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := Int32ToFloat32(v2[i+31:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Elementary Math Functions Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a" with absolute error of 2^(-23) and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Elementary Math Functions Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a" with absolute error of 2^(-23) and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Multiply packed 32-bit integer elements in "a" and "b", add the intermediate result to packed elements in "c" and store the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Multiply packed 32-bit integer elements in "a" and "b", add the intermediate result to packed elements in "c" and store the results in "dst" using writemask "k" (elements are copied from "a" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Multiply packed 32-bit integer elements in "a" and "b", add the intermediate result to packed elements in "c" and store the results in "dst" using writemask "k" (elements are copied from "c" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i] ELSE dst[i+31:i] := c[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Multiply packed 32-bit integer elements in each 4-element set of "a" and by element 1 of the corresponding 4-element set from "b", add the intermediate result to element 0 of the corresponding 4-element set from "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 base := (j & ~0x3) * 32 scale[31:0] := b[base+63:base+32] bias[31:0] := b[base+31:base] dst[i+31:i] := (a[i+31:i] * scale[31:0]) + bias[31:0] ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Multiply packed 32-bit integer elements in each 4-element set of "a" and by element 1 of the corresponding 4-element set from "b", add the intermediate result to element 0 of the corresponding 4-element set from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] base := (j & ~0x3) * 32 scale[31:0] := b[base+63:base+32] bias[31:0] := b[base+31:base] dst[i+31:i] := (a[i+31:i] * scale[31:0]) + bias[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in each 4-element set of "a" and by element 1 of the corresponding 4-element set from "b", add the intermediate result to element 0 of the corresponding 4-element set from "b", and store the results in "dst". [round_note] FOR j := 0 to 15 i := j*32 base := (j & ~0x3) * 32 scale[31:0] := b[base+63:base+32] bias[31:0] := b[base+31:base] dst[i+31:i] := (a[i+31:i] * scale[31:0]) + bias[31:0] ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in each 4-element set of "a" and by element 1 of the corresponding 4-element set from "b", add the intermediate result to element 0 of the corresponding 4-element set from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] base := (j & ~0x3) * 32 scale[31:0] := b[base+63:base+32] bias[31:0] := b[base+31:base] dst[i+31:i] := (a[i+31:i] * scale[31:0]) + bias[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of the absolute elements of each pair of corresponding elements of packed single-precision (32-bit) floating-point elements in "a" and "b", storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := FpMax(ABS(a[i+31:i]), ABS(b[i+31:i])) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of the absolute elements of each pair of corresponding elements of packed single-precision (32-bit) floating-point elements in "a" and "b", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FpMax(ABS(a[i+31:i]), ABS(b[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of each pair of corresponding elements in packed single-precision (32-bit) floating-point elements in "a" and "b", storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := FpMax(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of each pair of corresponding elements of packed single-precision (32-bit) floating-point elements in "a" and "b", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FpMax(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of the absolute elements of each pair of corresponding elements of packed single-precision (32-bit) floating-point elements in "a" and "b", storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := FpMax(ABS(a[i+31:i]), ABS(b[i+31:i])) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of the absolute elements of each pair of corresponding elements of packed single-precision (32-bit) floating-point elements in "a" and "b", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FpMax(ABS(a[i+31:i]), ABS(b[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of each pair of corresponding elements in packed double-precision (64-bit) floating-point elements in "a" and "b", storing the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := FpMax(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of each pair of corresponding elements of packed double-precision (64-bit) floating-point elements in "a" and "b", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := FpMax(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the minimum of each pair of corresponding elements in packed single-precision (32-bit) floating-point elements in "a" and "b", storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := FpMin(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of each pair of corresponding elements of packed single-precision (32-bit) floating-point elements in "a" and "b", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := FpMin(a[i+31:i], b[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the minimum of each pair of corresponding elements in packed double-precision (64-bit) floating-point elements in "a" and "b", storing the results in "dst". FOR j := 0 to 7 i := j*64 dst[i+63:i] := FpMin(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the maximum of each pair of corresponding elements of packed double-precision (64-bit) floating-point elements in "a" and "b", storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := FpMin(a[i+63:i], b[i+63:i]) ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element multiplication between packed 32-bit integer elements in "a" and "b" and stores the high 32 bits of each result into "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) >> 32 ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element multiplication between packed 32-bit integer elements in "a" and "b" and stores the high 32 bits of each result into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) >> 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element multiplication between packed unsigned 32-bit integer elements in "a" and "b" and stores the high 32 bits of each result into "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (a[i+31:i] * b[i+31:i]) >> 32 ENDFOR dst[MAX:512] := 0
Integer KNCNI Arithmetic Performs element-by-element multiplication between packed unsigned 32-bit integer elements in "a" and "b" and stores the high 32 bits of each result into "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (a[i+31:i] * b[i+31:i]) >> 32 ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Swizzle Permutes 128-bit blocks of the packed 32-bit integer vector "a" using constant "imm8". The results are stored in "dst". DEFINE SELECT4(src, control) { CASE control[1:0] OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } FOR j := 0 to 3 i := j*128 n := j*2 dst[i+127:i] := SELECT4(a[511:0], imm8[n+1:n]) ENDFOR dst[MAX:512] := 0
Integer KNCNI Swizzle Permutes 128-bit blocks of the packed 32-bit integer vector "a" using constant "imm8". The results are stored in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE control[1:0] OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp[511:0] := 0 FOR j := 0 to 3 i := j*128 n := j*2 tmp[i+127:i] := SELECT4(a[511:0], imm8[n+1:n]) ENDFOR FOR j := 0 to 15 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Elementary Math Functions Approximates the reciprocals of packed single-precision (32-bit) floating-point elements in "a" to 23 bits of precision, storing the results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := (1.0 / a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Elementary Math Functions Approximates the reciprocals of packed single-precision (32-bit) floating-point elements in "a" to 23 bits of precision, storing the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := (1.0 / a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Convert Round the packed single-precision (32-bit) floating-point elements in "a" to the nearest integer value using "expadj" and in the direction of "rounding", and store the results as packed single-precision floating-point elements in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := ROUND(a[i+31:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Convert Round the packed single-precision (32-bit) floating-point elements in "a" to the nearest integer value using "expadj" and in the direction of "rounding", and store the results as packed single-precision floating-point elements in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ROUND(a[i+31:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Miscellaneous Performs element-by-element rounding of packed single-precision (32-bit) floating-point elements in "a" using "expadj" and in the direction of "rounding" and stores results in "dst". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := ROUND(a[i+31:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Miscellaneous Performs element-by-element rounding of packed single-precision (32-bit) floating-point elements in "a" using "expadj" and in the direction of "rounding" and stores results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := ROUND(a[i+31:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Miscellaneous Performs element-by-element rounding of packed double-precision (64-bit) floating-point elements in "a" using "expadj" and in the direction of "rounding" and stores results in "dst". [round_note] FOR j := 0 to 7 i := j*64 dst[i+63:i] := ROUND(a[i+63:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Miscellaneous Performs element-by-element rounding of packed double-precision (64-bit) floating-point elements in "a" using "expadj" and in the direction of "rounding" and stores results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). [round_note] FOR j := 0 to 7 i := j*64 IF k[j] dst[i+63:i] := ROUND(a[i+63:i]) CASE expadj OF _MM_EXPADJ_NONE: dst[i+31:i] := dst[i+31:i] * (2 << 0) _MM_EXPADJ_4: dst[i+31:i] := dst[i+31:i] * (2 << 4) _MM_EXPADJ_5: dst[i+31:i] := dst[i+31:i] * (2 << 5) _MM_EXPADJ_8: dst[i+31:i] := dst[i+31:i] * (2 << 8) _MM_EXPADJ_16: dst[i+31:i] := dst[i+31:i] * (2 << 16) _MM_EXPADJ_24: dst[i+31:i] := dst[i+31:i] * (2 << 24) _MM_EXPADJ_31: dst[i+31:i] := dst[i+31:i] * (2 << 31) _MM_EXPADJ_32: dst[i+31:i] := dst[i+31:i] * (2 << 32) ESAC ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Elementary Math Functions Calculates the reciprocal square root of packed single-precision (32-bit) floating-point elements in "a" to 23 bits of accuracy and stores the result in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := Sqrt(1.0 / a[i+31:i]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Elementary Math Functions Calculates the reciprocal square root of packed single-precision (32-bit) floating-point elements in "a" to 23 bits of accuracy and stores the result in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := Sqrt(1.0 / a[i+31:i]) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Scales each single-precision (32-bit) floating-point element in "a" by multiplying it by 2**exponent, where the exponent is the corresponding 32-bit integer element in "b", storing results in "dst". FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] * POW(2.0, FP32(b[i+31:i])) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Scales each single-precision (32-bit) floating-point element in "a" by multiplying it by 2**exponent, where the exponent is the corresponding 32-bit integer element in "b", storing results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * POW(2.0, FP32(b[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Scales each single-precision (32-bit) floating-point element in "a" by multiplying it by 2**exponent, where the exponent is the corresponding 32-bit integer element in "b", storing results in "dst". Intermediate elements are rounded using "rounding". [round_note] FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i] * POW(2.0,FP32(b[i+31:i])) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Scales each single-precision (32-bit) floating-point element in "a" by multiplying it by 2**exp, where the exp is the corresponding 32-bit integer element in "b", storing results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). Results are rounded using constant "rounding". [round_note] FOR j := 0 to 15 i := j*32 IF k[j] dst[i+31:i] := a[i+31:i] * POW(2.0, FP32(b[i+31:i])) ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Special Math Functions Determines the minimum element of the packed single-precision (32-bit) floating-point elements stored in "a" and stores the result in "dst". min := a[31:0] FOR j := 1 to 15 i := j*32 dst := FpMin(min, a[i+31:i]) ENDFOR dst := min
Floating Point KNCNI Special Math Functions Determines the minimum element of the packed single-precision (32-bit) floating-point elements stored in "a" and stores the result in "dst" using writemask "k" (elements are ignored when the corresponding mask bit is not set). min := a[31:0] FOR j := 1 to 15 i := j*32 IF k[j] CONTINUE ELSE dst := FpMin(min, a[i+31:i]) FI ENDFOR dst := min
Floating Point KNCNI Special Math Functions Determines the minimum element of the packed double-precision (64-bit) floating-point elements stored in "a" and stores the result in "dst". min := a[63:0] FOR j := 1 to 7 i := j*64 dst := FpMin(min, a[i+63:i]) ENDFOR dst := min
Floating Point KNCNI Special Math Functions Determines the minimum element of the packed double-precision (64-bit) floating-point elements stored in "a" and stores the result in "dst". Bitmask "k" is used to exclude certain elements (elements are ignored when the corresponding mask bit is not set). min := a[63:0] FOR j := 1 to 7 i := j*64 IF k[j] CONTINUE ELSE dst := FpMin(min, a[i+63:i]) FI ENDFOR dst := min
Floating Point KNCNI Special Math Functions Determines the maximum element of the packed single-precision (32-bit) floating-point elements stored in "a" and stores the result in "dst". max := a[31:0] FOR j := 1 to 15 i := j*32 dst := FpMax(max, a[i+31:i]) ENDFOR dst := max
Floating Point KNCNI Special Math Functions Determines the maximum element of the packed single-precision (32-bit) floating-point elements stored in "a" and stores the result in "dst". Bitmask "k" is used to exclude certain elements (elements are ignored when the corresponding mask bit is not set). max := a[31:0] FOR j := 1 to 15 i := j*32 IF k[j] CONTINUE ELSE dst := FpMax(max, a[i+31:i]) FI ENDFOR dst := max
Floating Point KNCNI Special Math Functions Determines the maximum element of the packed double-precision (64-bit) floating-point elements stored in "a" and stores the result in "dst". max := a[63:0] FOR j := 1 to 7 i := j*64 dst := FpMax(max, a[i+63:i]) ENDFOR dst := max
Floating Point KNCNI Special Math Functions Determines the maximum element of the packed double-precision (64-bit) floating-point elements stored in "a" and stores the result in "dst". Bitmask "k" is used to exclude certain elements (elements are ignored when the corresponding mask bit is not set). max := a[63:0] FOR j := 1 to 7 i := j*64 IF k[j] CONTINUE ELSE dst := FpMax(max, a[i+63:i]) FI ENDFOR dst := max
KNCNI Bit Manipulation Count the number of trailing zero bits in unsigned 32-bit integer "x" starting at bit "a", and return that count in "dst". tmp := a IF tmp < 0 tmp := 0 FI dst := 0 IF tmp > 31 dst := 32 ELSE DO WHILE ((tmp < 32) AND x[tmp] == 0) tmp := tmp + 1 dst := dst + 1 OD FI
KNCNI Bit Manipulation Count the number of trailing zero bits in unsigned 64-bit integer "x" starting at bit "a", and return that count in "dst". tmp := a IF tmp < 0 tmp := 0 FI dst := 0 IF tmp > 63 dst := 64 ELSE DO WHILE ((tmp < 64) AND x[tmp] == 0) tmp := tmp + 1 dst := dst + 1 OD FI
KNCNI General Support Stalls a thread without blocking other threads for 32-bit unsigned integer "r1" clock cycles. BlockThread(r1)
KNCNI General Support Stalls a thread without blocking other threads for 64-bit unsigned integer "r1" clock cycles. BlockThread(r1)
KNCNI General Support Set performance monitoring filtering mask to 32-bit unsigned integer "r1". SetPerfMonMask(r1[31:0])
KNCNI General Support Set performance monitoring filtering mask to 64-bit unsigned integer "r1". SetPerfMonMask(r1[63:0])
KNCNI General Support Evicts the cache line containing the address "ptr" from cache level "level" (can be either 0 or 1). CacheLineEvict(ptr, level)
Mask KNCNI Mask Performs a bitwise AND operation between NOT of "k2" and "k1", storing the result in "dst". dst[15:0] := NOT(k2[15:0]) & k1[15:0]
Mask KNCNI Mask Moves high byte from "k2" to low byte of "k1", and moves low byte of "k2" to high byte of "k1". tmp[7:0] := k2[15:8] k2[15:8] := k1[7:0] k1[7:0] := tmp[7:0] tmp[7:0] := k2[7:0] k2[7:0] := k1[15:8] k1[15:8] := tmp[7:0]
Mask KNCNI Mask Performs bitwise OR between "k1" and "k2", storing the result in "dst". ZF flag is set if "dst" is 0. dst[15:0] := k1[15:0] | k2[15:0] IF dst == 0 SetZF() FI
Mask KNCNI Mask Performs bitwise OR between "k1" and "k2", storing the result in "dst". CF flag is set if "dst" consists of all 1's. dst[15:0] := k1[15:0] | k2[15:0] IF PopCount(dst[15:0]) == 16 SetCF() FI
KNCNI Mask Converts bit mask "k1" into an integer value, storing the results in "dst". dst := ZeroExtend32(k1)
KNCNI Mask Converts integer "mask" into bitmask, storing the result in "dst". dst := mask[15:0]
Mask KNCNI Mask Packs masks "k1" and "k2" into the high 32 bits of "dst". The rest of "dst" is set to 0. dst[63:48] := k1[15:0] dst[47:32] := k2[15:0] dst[31:0] := 0
Mask KNCNI Mask Packs masks "k1" and "k2" into the low 32 bits of "dst". The rest of "dst" is set to 0. dst[31:16] := k1[15:0] dst[15:0] := k2[15:0] dst[63:32] := 0
Mask KNCNI Mask Extracts 16-bit value "b" from 64-bit integer "a", storing the result in "dst". CASE b[1:0] OF 0: dst[15:0] := a[63:48] 1: dst[15:0] := a[47:32] 2: dst[15:0] := a[31:16] 3: dst[15:0] := a[15:0] ESAC dst[MAX:15] := 0
Floating Point KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in each 4-element set of "a" and by element 1 of the corresponding 4-element set from "b", add the intermediate result to element 0 of the corresponding 4-element set from "b", and store the results in "dst". FOR j := 0 to 15 i := j*32 base := (j & ~0x3) * 32 scale[31:0] := b[base+63:base+32] bias[31:0] := b[base+31:base] dst[i+31:i] := (a[i+31:i] * scale[31:0]) + bias[31:0] ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Arithmetic Multiply packed single-precision (32-bit) floating-point elements in each 4-element set of "a" and by element 1 of the corresponding 4-element set from "b", add the intermediate result to element 0 of the corresponding 4-element set from "b", and store the results in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 15 i := j*32 IF k[j] base := (j & ~0x3) * 32 scale[31:0] := b[base+63:base+32] bias[31:0] := b[base+31:base] dst[i+31:i] := (a[i+31:i] * scale[31:0]) + bias[31:0] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Up-converts 8 single-precision (32-bit) memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using "conv" to 32-bit integer elements and stores them in "dst". "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 CASE conv OF _MM_UPCONV_EPI32_NONE: dst[i+31:i] := MEM[addr+31:addr] _MM_UPCONV_EPI32_UINT8: dst[i+31:i] := ZeroExtend32(MEM[addr+7:addr]) _MM_UPCONV_EPI32_SINT8: dst[i+31:i] := SignExtend32(MEM[addr+7:addr]) _MM_UPCONV_EPI32_UINT16: dst[i+31:i] := ZeroExtend32(MEM[addr+15:addr]) _MM_UPCONV_EPI32_SINT16: dst[i+31:i] := SignExtend32(MEM[addr+15:addr]) ESAC ENDFOR dst[MAX:256] := 0
Integer KNCNI Load Up-converts 8 single-precision (32-bit) memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using "conv" to 32-bit integer elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_UPCONV_EPI32_NONE: dst[i+31:i] := MEM[addr+31:addr] _MM_UPCONV_EPI32_UINT8: dst[i+31:i] := ZeroExtend32(MEM[addr+7:addr]) _MM_UPCONV_EPI32_SINT8: dst[i+31:i] := SignExtend32(MEM[addr+7:addr]) _MM_UPCONV_EPI32_UINT16: dst[i+31:i] := ZeroExtend32(MEM[addr+15:addr]) _MM_UPCONV_EPI32_SINT16: dst[i+31:i] := SignExtend32(MEM[addr+15:addr]) ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Integer KNCNI Load Up-converts 8 double-precision (64-bit) memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using "conv" to 64-bit integer elements and stores them in "dst". "hint" indicates to the processor whether the load is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 CASE conv OF _MM_UPCONV_EPI64_NONE: dst[i+63:i] := MEM[addr+63:addr] ESAC ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Up-converts 8 double-precision (64-bit) memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using "conv" to 64-bit integer elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "hint" indicates to the processor whether the load is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_UPCONV_EPI64_NONE: dst[i+63:i] := MEM[addr+63:addr] ESAC ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Up-converts 8 memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using "conv" to single-precision (32-bit) floating-point elements and stores them in the lower half of "dst". "hint" indicates to the processor whether the load is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 CASE conv OF _MM_UPCONV_PS_NONE: dst[i+31:i] := MEM[addr+31:addr] _MM_UPCONV_PS_FLOAT16: dst[i+31:i] := Convert_FP16_To_FP32(MEM[addr+15:addr]) _MM_UPCONV_PS_UINT8: dst[i+31:i] := Convert_UInt8_To_FP32(MEM[addr+7:addr]) _MM_UPCONV_PS_SINT8: dst[i+31:i] := Convert_Int8_To_FP32(MEM[addr+7:addr]) _MM_UPCONV_PS_UINT16: dst[i+31:i] := Convert_UInt16_To_FP32(MEM[addr+15:addr]) _MM_UPCONV_PS_SINT16: dst[i+31:i] := Convert_Int16_To_FP32(MEM[addr+15:addr]) ESAC ENDFOR dst[MAX:256] := 0
Floating Point KNCNI Load Up-converts 8 memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using "conv" to single-precision (32-bit) floating-point elements and stores them in the lower half of "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "hint" indicates to the processor whether the load is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_UPCONV_PS_NONE: dst[i+31:i] := MEM[addr+31:addr] _MM_UPCONV_PS_FLOAT16: dst[i+31:i] := Convert_FP16_To_FP32(MEM[addr+15:addr]) _MM_UPCONV_PS_UINT8: dst[i+31:i] := Convert_UInt8_To_FP32(MEM[addr+7:addr]) _MM_UPCONV_PS_SINT8: dst[i+31:i] := Convert_Int8_To_FP32(MEM[addr+7:addr]) _MM_UPCONV_PS_UINT16: dst[i+31:i] := Convert_UInt16_To_FP32(MEM[addr+15:addr]) _MM_UPCONV_PS_SINT16: dst[i+31:i] := Convert_Int16_To_FP32(MEM[addr+15:addr]) ESAC ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point KNCNI Load Up-converts 8 double-precision (64-bit) floating-point elements stored in memory starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using "conv" to 64-bit floating-point elements and stores them in "dst". "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 CASE conv OF _MM_UPCONV_PD_NONE: dst[i+63:i] := MEM[addr+63:addr] ESAC ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Load Up-converts 8 double-precision (64-bit) floating-point elements stored in memory starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using "conv" to 64-bit floating-point elements and stores them in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_UPCONV_PD_NONE: dst[i+63:i] := MEM[addr+63:addr] ESAC ELSE dst[i+63:i] := src[i+63:i] FI ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Store Down-converts 8 packed single-precision (32-bit) floating-point elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 CASE conv OF _MM_DOWNCONV_PS_NONE: MEM[addr+31:addr] := a[i+31:i] _MM_DOWNCONV_PS_FLOAT16: MEM[addr+15:addr] := Convert_FP32_To_FP16(a[i+31:i]) _MM_DOWNCONV_PS_UINT8: MEM[addr+ 7:addr] := Convert_FP32_To_UInt8(a[i+31:i]) _MM_DOWNCONV_PS_SINT8: MEM[addr+ 7:addr] := Convert_FP32_To_Int8(a[i+31:i]) _MM_DOWNCONV_PS_UINT16: MEM[addr+15:addr] := Convert_FP32_To_UInt16(a[i+31:i]) _MM_DOWNCONV_PS_SINT16: MEM[addr+15:addr] := Convert_FP32_To_Int16(a[i+31:i]) ESAC ENDFOR
Floating Point KNCNI Store Down-converts 8 packed single-precision (32-bit) floating-point elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". Elements are only written when the corresponding mask bit is set in "k"; otherwise, elements are unchanged in memory. "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_DOWNCONV_PS_NONE: MEM[addr+31:addr] := a[i+31:i] _MM_DOWNCONV_PS_FLOAT16: MEM[addr+15:addr] := Convert_FP32_To_FP16(a[i+31:i]) _MM_DOWNCONV_PS_UINT8: MEM[addr+ 7:addr] := Convert_FP32_To_UInt8(a[i+31:i]) _MM_DOWNCONV_PS_SINT8: MEM[addr+ 7:addr] := Convert_FP32_To_Int8(a[i+31:i]) _MM_DOWNCONV_PS_UINT16: MEM[addr+15:addr] := Convert_FP32_To_UInt16(a[i+31:i]) _MM_DOWNCONV_PS_SINT16: MEM[addr+15:addr] := Convert_FP32_To_Int16(a[i+31:i]) ESAC FI ENDFOR
Floating Point KNCNI Store Down-converts 8 packed double-precision (64-bit) floating-point elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 CASE conv OF _MM_DOWNCONV_EPI64_NONE: MEM[addr+63:addr] := a[i+63:i] ESAC ENDFOR
Floating Point KNCNI Store Down-converts 8 packed double-precision (64-bit) floating-point elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". Elements are written to memory using writemask "k" (elements are not stored to memory when the corresponding mask bit is not set; the memory location is left unchanged). "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_DOWNCONV_EPI64_NONE: MEM[addr+63:addr] := a[i+63:i] ESAC FI ENDFOR
Integer KNCNI Store Down-converts the low 8 packed 32-bit integer elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 CASE conv OF _MM_DOWNCONV_EPI32_NONE: MEM[addr+31:addr] := a[i+31:i] _MM_DOWNCONV_EPI32_UINT8: MEM[addr+ 7:addr] := Truncate8(a[i+31:i]) _MM_DOWNCONV_EPI32_SINT8: MEM[addr+ 7:addr] := Saturate8(a[i+31:i]) _MM_DOWNCONV_EPI32_UINT16: MEM[addr+15:addr] := Truncate16(a[i+31:i]) _MM_DOWNCONV_EPI32_SINT16: MEM[addr+15:addr] := Saturate16(a[i+31:i]) ESAC ENDFOR
Integer KNCNI Store Down-converts the low 8 packed 32-bit integer elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". Elements are written to memory using writemask "k" (elements are only written when the corresponding mask bit is set; otherwise, the memory location is left unchanged). "hint" indicates to the processor whether the data is non-temporal. FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_DOWNCONV_EPI32_NONE: MEM[addr+31:addr] := a[i+31:i] _MM_DOWNCONV_EPI32_UINT8: MEM[addr+ 7:addr] := Truncate8(a[i+31:i]) _MM_DOWNCONV_EPI32_SINT8: MEM[addr+ 7:addr] := Saturate8(a[i+31:i]) _MM_DOWNCONV_EPI32_UINT16: MEM[addr+15:addr] := Truncate16(a[i+31:i]) _MM_DOWNCONV_EPI32_SINT16: MEM[addr+15:addr] := Saturate16(a[i+31:i]) ESAC FI ENDFOR
Integer KNCNI Store Down-converts 8 packed 64-bit integer elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". "hint" indicates to the processor whether the load is non-temporal. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 CASE conv OF _MM_DOWNCONV_EPI64_NONE: MEM[addr+63:addr] := a[i+63:i] ESAC ENDFOR
Integer KNCNI Store Down-converts 8 packed 64-bit integer elements in "a" using "conv" and stores them in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". Only those elements whose corresponding mask bit is set in writemask "k" are written to memory. FOR j := 0 to 7 i := j*64 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 IF k[j] CASE conv OF _MM_DOWNCONV_EPI64_NONE: MEM[addr+63:addr] := a[i+63:i] ESAC FI ENDFOR
Floating Point KNCNI Store Permutes 128-bit blocks of the packed single-precision (32-bit) floating-point elements in "a" using constant "imm8". The results are stored in "dst". DEFINE SELECT4(src, control) { CASE control[1:0] OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } FOR j := 0 to 3 i := j*128 n := j*2 dst[i+127:i] := SELECT4(a[511:0], imm8[n+1:n]) ENDFOR dst[MAX:512] := 0
Floating Point KNCNI Swizzle Permutes 128-bit blocks of the packed single-precision (32-bit) floating-point elements in "a" using constant "imm8". The results are stored in "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). DEFINE SELECT4(src, control) { CASE control[1:0] OF 0: tmp[127:0] := src[127:0] 1: tmp[127:0] := src[255:128] 2: tmp[127:0] := src[383:256] 3: tmp[127:0] := src[511:384] ESAC RETURN tmp[127:0] } tmp[511:0] := 0 FOR j := 0 to 3 i := j*128 n := j*2 tmp[i+127:i] := SELECT4(a[511:0], imm8[n+1:n]) ENDFOR FOR j := 0 to 15 IF k[j] dst[i+31:i] := tmp[i+31:i] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:512] := 0
Integer KNCNI Load Loads 8 32-bit integer memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" to "dst". FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:256] := 0
Integer KNCNI Load Loads 8 32-bit integer memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point KNCNI Load Loads 8 single-precision (32-bit) floating-point memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" to "dst". FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ENDFOR dst[MAX:256] := 0
Floating Point KNCNI Load Loads 8 single-precision (32-bit) floating-point memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" to "dst" using writemask "k" (elements are copied from "src" when the corresponding mask bit is not set). FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 dst[i+31:i] := MEM[addr+31:addr] ELSE dst[i+31:i] := src[i+31:i] FI ENDFOR dst[MAX:256] := 0
Floating Point KNCNI Store Stores 8 packed single-precision (32-bit) floating-point elements in "a" in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Floating Point KNCNI Store Stores 8 packed single-precision (32-bit) floating-point elements in "a" in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using writemask "k" (elements are only written to memory when the corresponding mask bit is set). FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Integer KNCNI Store Stores 8 packed 32-bit integer elements in "a" in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale". FOR j := 0 to 7 i := j*32 m := j*64 addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] ENDFOR
Integer KNCNI Store Stores 8 packed 32-bit integer elements in "a" in memory locations starting at location "base_addr" at packed 64-bit integer indices stored in "vindex" scaled by "scale" using writemask "k" (elements are only written to memory when the corresponding mask bit is set). FOR j := 0 to 7 i := j*32 m := j*64 IF k[j] addr := base_addr + vindex[m+63:m] * ZeroExtend64(scale) * 8 MEM[addr+31:addr] := a[i+31:i] FI ENDFOR
Mask KNCNI Mask Move the high element from "k1" to the low element of "k1", and insert the low element of "k2" into the high element of "k1". tmp[7:0] := k1[15:8] k1[15:8] := k2[7:0] k1[7:0] := tmp[7:0]
Mask KNCNI Mask Insert the low element of "k2" into the high element of "k1". k1[15:8] := k2[7:0]
Integer LZCNT Bit Manipulation Count the number of leading zero bits in unsigned 32-bit integer "a", and return that count in "dst". tmp := 31 dst := 0 DO WHILE (tmp >= 0 AND a[tmp] == 0) tmp := tmp - 1 dst := dst + 1 OD
Integer LZCNT Bit Manipulation Count the number of leading zero bits in unsigned 64-bit integer "a", and return that count in "dst". tmp := 63 dst := 0 DO WHILE (tmp >= 0 AND a[tmp] == 0) tmp := tmp - 1 dst := dst + 1 OD
Integer MMX Convert Copy 64-bit integer "a" to "dst". dst[63:0] := a[63:0]
Integer MMX Convert Copy 64-bit integer "a" to "dst". dst[63:0] := a[63:0]
MMX General Support Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures.
Integer MMX Convert Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper element of "dst". dst[31:0] := a[31:0] dst[63:32] := 0
Integer MMX Convert Copy the lower 32-bit integer in "a" to "dst". dst[31:0] := a[31:0]
Integer MMX Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst". dst[7:0] := Saturate8(a[15:0]) dst[15:8] := Saturate8(a[31:16]) dst[23:16] := Saturate8(a[47:32]) dst[31:24] := Saturate8(a[63:48]) dst[39:32] := Saturate8(b[15:0]) dst[47:40] := Saturate8(b[31:16]) dst[55:48] := Saturate8(b[47:32]) dst[63:56] := Saturate8(b[63:48])
Integer MMX Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst". dst[15:0] := Saturate16(a[31:0]) dst[31:16] := Saturate16(a[63:32]) dst[47:32] := Saturate16(b[31:0]) dst[63:48] := Saturate16(b[63:32])
Integer MMX Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst". dst[7:0] := SaturateU8(a[15:0]) dst[15:8] := SaturateU8(a[31:16]) dst[23:16] := SaturateU8(a[47:32]) dst[31:24] := SaturateU8(a[63:48]) dst[39:32] := SaturateU8(b[15:0]) dst[47:40] := SaturateU8(b[31:16]) dst[55:48] := SaturateU8(b[47:32]) dst[63:56] := SaturateU8(b[63:48])
Integer MMX Swizzle Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_BYTES(src1[63:0], src2[63:0]) { dst[7:0] := src1[39:32] dst[15:8] := src2[39:32] dst[23:16] := src1[47:40] dst[31:24] := src2[47:40] dst[39:32] := src1[55:48] dst[47:40] := src2[55:48] dst[55:48] := src1[63:56] dst[63:56] := src2[63:56] RETURN dst[63:0] } dst[63:0] := INTERLEAVE_HIGH_BYTES(a[63:0], b[63:0])
Integer MMX Swizzle Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_WORDS(src1[63:0], src2[63:0]) { dst[15:0] := src1[47:32] dst[31:16] := src2[47:32] dst[47:32] := src1[63:48] dst[63:48] := src2[63:48] RETURN dst[63:0] } dst[63:0] := INTERLEAVE_HIGH_WORDS(a[63:0], b[63:0])
Integer MMX Swizzle Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst". dst[31:0] := a[63:32] dst[63:32] := b[63:32]
Integer MMX Swizzle Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_BYTES(src1[63:0], src2[63:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] RETURN dst[63:0] } dst[63:0] := INTERLEAVE_BYTES(a[63:0], b[63:0])
Integer MMX Swizzle Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_WORDS(src1[63:0], src2[63:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] RETURN dst[63:0] } dst[63:0] := INTERLEAVE_WORDS(a[63:0], b[63:0])
Integer MMX Swizzle Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst". dst[31:0] := a[31:0] dst[63:32] := b[31:0]
Integer MMX Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := a[i+7:i] + b[i+7:i] ENDFOR
Integer MMX Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := a[i+15:i] + b[i+15:i] ENDFOR
Integer MMX Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR
Integer MMX Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ENDFOR
Integer MMX Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ENDFOR
Integer MMX Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ENDFOR
Integer MMX Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ENDFOR
Floating Point Integer MMX Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := a[i+7:i] - b[i+7:i] ENDFOR
Floating Point Integer MMX Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := a[i+15:i] - b[i+15:i] ENDFOR
Floating Point Integer MMX Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR
Floating Point Integer MMX Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ENDFOR
Floating Point Integer MMX Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ENDFOR
Floating Point Integer MMX Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ENDFOR
Floating Point Integer MMX Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ENDFOR
Integer MMX Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ENDFOR
Integer MMX Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 3 i := j*16 tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ENDFOR
Integer MMX Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst". FOR j := 0 to 3 i := j*16 tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[15:0] ENDFOR
Floating Point Integer MMX Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift 64-bit integer "a" left by "count" while shifting in zeros, and store the result in "dst". IF count[63:0] > 63 dst[63:0] := 0 ELSE dst[63:0] := ZeroExtend64(a[63:0] << count[63:0]) FI
Floating Point Integer MMX Shift Shift 64-bit integer "a" left by "imm8" while shifting in zeros, and store the result in "dst". IF imm8[7:0] > 63 dst[63:0] := 0 ELSE dst[63:0] := ZeroExtend64(a[63:0] << imm8[7:0]) FI
Floating Point Integer MMX Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR
Floating Point Integer MMX Shift Shift 64-bit integer "a" right by "count" while shifting in zeros, and store the result in "dst". IF count[63:0] > 63 dst[63:0] := 0 ELSE dst[63:0] := ZeroExtend64(a[63:0] >> count[63:0]) FI
Floating Point Integer MMX Shift Shift 64-bit integer "a" right by "imm8" while shifting in zeros, and store the result in "dst". IF imm8[7:0] > 63 dst[63:0] := 0 ELSE dst[63:0] := ZeroExtend64(a[63:0] >> imm8[7:0]) FI
Integer MMX Logical Compute the bitwise AND of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[63:0] := (a[63:0] AND b[63:0])
Integer MMX Logical Compute the bitwise NOT of 64 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst". dst[63:0] := ((NOT a[63:0]) AND b[63:0])
Integer MMX Logical Compute the bitwise OR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[63:0] := (a[63:0] OR b[63:0])
Integer MMX Logical Compute the bitwise XOR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[63:0] := (a[63:0] XOR b[63:0])
Integer MMX Compare Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0 ENDFOR
Integer MMX Compare Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0 ENDFOR
Integer MMX Compare Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Integer MMX Compare Compare packed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0 ENDFOR
Integer MMX Compare Compare packed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0 ENDFOR
Integer MMX Compare Compare packed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
MMX General Support Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures.
Integer MMX Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := a[i+7:i] + b[i+7:i] ENDFOR
Integer MMX Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := a[i+15:i] + b[i+15:i] ENDFOR
Integer MMX Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR
Integer MMX Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ENDFOR
Integer MMX Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ENDFOR
Integer MMX Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ENDFOR
Integer MMX Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ENDFOR
Integer MMX Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := a[i+7:i] - b[i+7:i] ENDFOR
Integer MMX Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := a[i+15:i] - b[i+15:i] ENDFOR
Integer MMX Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR
Integer MMX Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ENDFOR
Integer MMX Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ENDFOR
Integer MMX Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ENDFOR
Integer MMX Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ENDFOR
Integer MMX Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ENDFOR
Integer MMX Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 3 i := j*16 tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ENDFOR
Integer MMX Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst". FOR j := 0 to 3 i := j*16 tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[15:0] ENDFOR
Integer MMX Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ENDFOR
Integer MMX Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ENDFOR
Integer MMX Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ENDFOR
Integer MMX Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ENDFOR
Integer MMX Shift Shift 64-bit integer "a" left by "count" while shifting in zeros, and store the result in "dst". IF count[63:0] > 63 dst[63:0] := 0 ELSE dst[63:0] := ZeroExtend64(a[63:0] << count[63:0]) FI
Integer MMX Shift Shift 64-bit integer "a" left by "imm8" while shifting in zeros, and store the result in "dst". IF imm8[7:0] > 63 dst[63:0] := 0 ELSE dst[63:0] := ZeroExtend64(a[63:0] << imm8[7:0]) FI
Integer MMX Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR
Integer MMX Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR
Integer MMX Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR
Integer MMX Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR
Integer MMX Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR
Integer MMX Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR
Integer MMX Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR
Integer MMX Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR
Integer MMX Shift Shift 64-bit integer "a" right by "count" while shifting in zeros, and store the result in "dst". IF count[63:0] > 63 dst[63:0] := 0 ELSE dst[63:0] := ZeroExtend64(a[63:0] >> count[63:0]) FI
Integer MMX Shift Shift 64-bit integer "a" right by "imm8" while shifting in zeros, and store the result in "dst". IF imm8[7:0] > 63 dst[63:0] := 0 ELSE dst[63:0] := ZeroExtend64(a[63:0] >> imm8[7:0]) FI
Integer MMX Logical Compute the bitwise AND of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[63:0] := (a[63:0] AND b[63:0])
Integer MMX Logical Compute the bitwise NOT of 64 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst". dst[63:0] := ((NOT a[63:0]) AND b[63:0])
Integer MMX Logical Compute the bitwise OR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[63:0] := (a[63:0] OR b[63:0])
Integer MMX Logical Compute the bitwise XOR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[63:0] := (a[63:0] XOR b[63:0])
Integer MMX Compare Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0 ENDFOR
Integer MMX Compare Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0 ENDFOR
Integer MMX Compare Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Integer MMX Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0 ENDFOR
Integer MMX Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0 ENDFOR
Integer MMX Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Integer MMX Convert Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper element of "dst". dst[31:0] := a[31:0] dst[63:32] := 0
Integer MMX Convert Copy the lower 32-bit integer in "a" to "dst". dst[31:0] := a[31:0]
Integer MMX Convert Copy 64-bit integer "a" to "dst". dst[63:0] := a[63:0]
Integer MMX Convert Copy 64-bit integer "a" to "dst". dst[63:0] := a[63:0]
Integer MMX Set Return vector of type __m64 with all elements set to zero. dst[MAX:0] := 0
Integer MMX Set Set packed 32-bit integers in "dst" with the supplied values. dst[31:0] := e0 dst[63:32] := e1
Integer MMX Set Set packed 16-bit integers in "dst" with the supplied values. dst[15:0] := e0 dst[31:16] := e1 dst[47:32] := e2 dst[63:48] := e3
Integer MMX Set Set packed 8-bit integers in "dst" with the supplied values. dst[7:0] := e0 dst[15:8] := e1 dst[23:16] := e2 dst[31:24] := e3 dst[39:32] := e4 dst[47:40] := e5 dst[55:48] := e6 dst[63:56] := e7
Integer MMX Set Broadcast 32-bit integer "a" to all elements of "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := a[31:0] ENDFOR
Integer MMX Set Broadcast 16-bit integer "a" to all all elements of "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := a[15:0] ENDFOR
Integer MMX Set Broadcast 8-bit integer "a" to all elements of "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := a[7:0] ENDFOR
Integer MMX Set Set packed 32-bit integers in "dst" with the supplied values in reverse order. dst[31:0] := e1 dst[63:32] := e0
Integer MMX Set Set packed 16-bit integers in "dst" with the supplied values in reverse order. dst[15:0] := e3 dst[31:16] := e2 dst[47:32] := e1 dst[63:48] := e0
Integer MMX Set Set packed 8-bit integers in "dst" with the supplied values in reverse order. dst[7:0] := e7 dst[15:8] := e6 dst[23:16] := e5 dst[31:24] := e4 dst[39:32] := e3 dst[47:40] := e2 dst[55:48] := e1 dst[63:56] := e0
Integer MMX Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst". dst[7:0] := Saturate8(a[15:0]) dst[15:8] := Saturate8(a[31:16]) dst[23:16] := Saturate8(a[47:32]) dst[31:24] := Saturate8(a[63:48]) dst[39:32] := Saturate8(b[15:0]) dst[47:40] := Saturate8(b[31:16]) dst[55:48] := Saturate8(b[47:32]) dst[63:56] := Saturate8(b[63:48])
Integer MMX Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst". dst[15:0] := Saturate16(a[31:0]) dst[31:16] := Saturate16(a[63:32]) dst[47:32] := Saturate16(b[31:0]) dst[63:48] := Saturate16(b[63:32])
Integer MMX Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst". dst[7:0] := SaturateU8(a[15:0]) dst[15:8] := SaturateU8(a[31:16]) dst[23:16] := SaturateU8(a[47:32]) dst[31:24] := SaturateU8(a[63:48]) dst[39:32] := SaturateU8(b[15:0]) dst[47:40] := SaturateU8(b[31:16]) dst[55:48] := SaturateU8(b[47:32]) dst[63:56] := SaturateU8(b[63:48])
Integer MMX Swizzle Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_BYTES(src1[63:0], src2[63:0]) { dst[7:0] := src1[39:32] dst[15:8] := src2[39:32] dst[23:16] := src1[47:40] dst[31:24] := src2[47:40] dst[39:32] := src1[55:48] dst[47:40] := src2[55:48] dst[55:48] := src1[63:56] dst[63:56] := src2[63:56] RETURN dst[63:0] } dst[63:0] := INTERLEAVE_HIGH_BYTES(a[63:0], b[63:0])
Integer MMX Swizzle Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_WORDS(src1[63:0], src2[63:0]) { dst[15:0] := src1[47:32] dst[31:16] := src2[47:32] dst[47:32] := src1[63:48] dst[63:48] := src2[63:48] RETURN dst[63:0] } dst[63:0] := INTERLEAVE_HIGH_WORDS(a[63:0], b[63:0])
Integer MMX Swizzle Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst". dst[31:0] := a[63:32] dst[63:32] := b[63:32]
Integer MMX Swizzle Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_BYTES(src1[63:0], src2[63:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] RETURN dst[63:0] } dst[63:0] := INTERLEAVE_BYTES(a[63:0], b[63:0])
Integer MMX Swizzle Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_WORDS(src1[63:0], src2[63:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] RETURN dst[63:0] } dst[63:0] := INTERLEAVE_WORDS(a[63:0], b[63:0])
Integer MMX Swizzle Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst". dst[31:0] := a[31:0] dst[63:32] := b[31:0]
MONITOR General Support Arm address monitoring hardware using the address specified in "p". A store to an address within the specified address range triggers the monitoring hardware. Specify optional extensions in "extensions", and optional hints in "hints".
MONITOR General Support Hint to the processor that it can enter an implementation-dependent-optimized state while waiting for an event or store operation to the address range specified by MONITOR.
MOVBE Load Load 16 bits from memory, perform a byte swap operation, and store the result in "dst". FOR j := 0 to 1 i := j*8 dst[i+7:i] := MEM[ptr+15-i:ptr+8-i] ENDFOR
MOVBE Load Load 32 bits from memory, perform a byte swap operation, and store the result in "dst". FOR j := 0 to 3 i := j*8 dst[i+7:i] := MEM[ptr+31-i:ptr+24-i] ENDFOR
MOVBE Load Load 64 bits from memory, perform a byte swap operation, and store the result in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := MEM[ptr+63-i:ptr+56-i] ENDFOR
MOVBE Store Perform a bit swap operation of the 16 bits in "data", and store the results to memory. FOR j := 0 to 1 i := j*8 MEM[ptr+i+7:ptr+i] := data[15-i:8-i] ENDFOR
MOVBE Store Perform a bit swap operation of the 32 bits in "data", and store the results to memory. addr := MEM[ptr] FOR j := 0 to 3 i := j*8 MEM[ptr+i+7:ptr+i] := data[31-i:24-i] ENDFOR
MOVBE Store Perform a bit swap operation of the 64 bits in "data", and store the results to memory. addr := MEM[ptr] FOR j := 0 to 7 i := j*8 MEM[ptr+i+7:ptr+i] := data[63-i:56-i] ENDFOR
MOVDIR64B Store Move 64-byte (512-bit) value using direct store from source memory address "src" to destination memory address "dst". MEM[dst+511:dst] := MEM[src+511:src]
MOVDIRI Store Store 64-bit integer from "val" into memory using direct store. MEM[dst+63:dst] := val[63:0]
MOVDIRI Store Store 32-bit integer from "val" into memory using direct store. MEM[dst+31:dst] := val[31:0]
MPX Miscellaneous Make a pointer with the value of "srcmem" and bounds set to ["srcmem", "srcmem" + "size" - 1], and store the result in "dst". dst := srcmem dst.LB := srcmem.LB dst.UB := srcmem + size - 1
MPX Miscellaneous Narrow the bounds for pointer "q" to the intersection of the bounds of "r" and the bounds ["q", "q" + "size" - 1], and store the result in "dst". dst := q IF r.LB > (q + size - 1) OR r.UB < q dst.LB := 1 dst.UB := 0 ELSE dst.LB := MAX(r.LB, q) dst.UB := MIN(r.UB, (q + size - 1)) FI
MPX Miscellaneous Make a pointer with the value of "q" and bounds set to the bounds of "r" (e.g. copy the bounds of "r" to pointer "q"), and store the result in "dst". dst := q dst.LB := r.LB dst.UB := r.UB
MPX Miscellaneous Make a pointer with the value of "q" and open bounds, which allow the pointer to access the entire virtual address space, and store the result in "dst". dst := q dst.LB := 0 dst.UB := 0
MPX Miscellaneous Stores the bounds of "ptr_val" pointer in memory at address "ptr_addr". MEM[ptr_addr].LB := ptr_val.LB MEM[ptr_addr].UB := ptr_val.UB
MPX Miscellaneous Checks if "q" is within its lower bound, and throws a #BR if not. IF q < q.LB #BR FI
MPX Miscellaneous Checks if "q" is within its upper bound, and throws a #BR if not. IF q > q.UB #BR FI
MPX Miscellaneous Checks if ["q", "q" + "size" - 1] is within the lower and upper bounds of "q" and throws a #BR if not. IF (q + size - 1) < q.LB OR (q + size - 1) > q.UB #BR FI
MPX Miscellaneous Return the lower bound of "q". dst := q.LB
MPX Miscellaneous Return the upper bound of "q". dst := q.UB
Integer Bit Manipulation Set "dst" to the index of the lowest set bit in 32-bit integer "a". If no bits are set in "a" then "dst" is undefined. tmp := 0 IF a == 0 // dst is undefined ELSE DO WHILE ((tmp < 32) AND a[tmp] == 0) tmp := tmp + 1 OD FI dst := tmp
Integer Bit Manipulation Set "dst" to the index of the highest set bit in 32-bit integer "a". If no bits are set in "a" then "dst" is undefined. tmp := 31 IF a == 0 // dst is undefined ELSE DO WHILE ((tmp > 0) AND a[tmp] == 0) tmp := tmp - 1 OD FI dst := tmp
Integer Flag Bit Manipulation Set "index" to the index of the lowest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1. tmp := 0 IF a == 0 // MEM[index+31:index] is undefined dst := 0 ELSE DO WHILE ((tmp < 32) AND a[tmp] == 0) tmp := tmp + 1 OD MEM[index+31:index] := tmp dst := (tmp == 31) ? 0 : 1 FI
Integer Flag Bit Manipulation Set "index" to the index of the highest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1. tmp := 31 IF a == 0 // MEM[index+31:index] is undefined dst := 0 ELSE DO WHILE ((tmp > 0) AND a[tmp] == 0) tmp := tmp - 1 OD MEM[index+31:index] := tmp dst := (tmp == 0) ? 0 : 1 FI
Integer Flag Bit Manipulation Set "index" to the index of the lowest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1. tmp := 0 IF a == 0 // MEM[index+31:index] is undefined dst := 0 ELSE DO WHILE ((tmp < 64) AND a[tmp] == 0) tmp := tmp + 1 OD MEM[index+31:index] := tmp dst := (tmp == 63) ? 0 : 1 FI
Integer Flag Bit Manipulation Set "index" to the index of the highest set bit in 32-bit integer "mask". If no bits are set in "a", then "index" is undefined and "dst" is set to 0, otherwise "dst" is set to 1. tmp := 63 IF a == 0 // MEM[index+31:index] is undefined dst := 0 ELSE DO WHILE ((tmp > 0) AND a[tmp] == 0) tmp := tmp - 1 OD MEM[index+31:index] := tmp dst := (tmp == 0) ? 0 : 1 FI
Integer Flag Bit Manipulation Return the bit at index "b" of 32-bit integer "a". addr := a + ZeroExtend64(b) dst[0] := MEM[addr]
Integer Flag Bit Manipulation Return the bit at index "b" of 32-bit integer "a", and set that bit to its complement. addr := a + ZeroExtend64(b) dst[0] := MEM[addr] MEM[addr] := ~dst[0]
Integer Flag Bit Manipulation Return the bit at index "b" of 32-bit integer "a", and set that bit to zero. addr := a + ZeroExtend64(b) dst[0] := MEM[addr] MEM[addr] := 0
Integer Flag Bit Manipulation Return the bit at index "b" of 32-bit integer "a", and set that bit to one. addr := a + ZeroExtend64(b) dst[0] := MEM[addr] MEM[addr] := 1
Integer Flag Bit Manipulation Return the bit at index "b" of 64-bit integer "a". addr := a + b dst[0] := MEM[addr]
Integer Flag Bit Manipulation Return the bit at index "b" of 64-bit integer "a", and set that bit to its complement. addr := a + b dst[0] := MEM[addr] MEM[addr] := ~dst[0]
Integer Flag Bit Manipulation Return the bit at index "b" of 64-bit integer "a", and set that bit to zero. addr := a + b dst[0] := MEM[addr] MEM[addr] := 0
Integer Flag Bit Manipulation Return the bit at index "b" of 64-bit integer "a", and set that bit to one. addr := a + b dst[0] := MEM[addr] MEM[addr] := 1
Integer Bit Manipulation Reverse the byte order of 32-bit integer "a", and store the result in "dst". This intrinsic is provided for conversion between little and big endian values. dst[7:0] := a[31:24] dst[15:8] := a[23:16] dst[23:16] := a[15:8] dst[31:24] := a[7:0]
Integer Bit Manipulation Reverse the byte order of 64-bit integer "a", and store the result in "dst". This intrinsic is provided for conversion between little and big endian values. dst[7:0] := a[63:56] dst[15:8] := a[55:48] dst[23:16] := a[47:40] dst[31:24] := a[39:32] dst[39:32] := a[31:24] dst[47:40] := a[23:16] dst[55:48] := a[15:8] dst[63:56] := a[7:0]
Floating Point Integer Cast Cast from type float to type unsigned __int32 without conversion. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer Cast Cast from type double to type unsigned __int64 without conversion. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer Cast Cast from type unsigned __int32 to type float without conversion. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer Cast Cast from type unsigned __int64 to type double without conversion. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Integer Shift Shift the bits of unsigned long integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst". // size := 32 or 64 dst := a count := shift AND (size - 1) DO WHILE (count > 0) tmp[0] := dst[size - 1] dst := (dst << 1) OR tmp[0] count := count - 1 OD
Integer Shift Shift the bits of unsigned long integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst". // size := 32 or 64 dst := a count := shift AND (size - 1) DO WHILE (count > 0) tmp[size - 1] := dst[0] dst := (dst >> 1) OR tmp[size - 1] count := count - 1 OD
General Support Read the Performance Monitor Counter (PMC) specified by "a", and store up to 64-bits in "dst". The width of performance counters is implementation specific. dst[63:0] := ReadPMC(a)
Integer Shift Shift the bits of unsigned 32-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst". dst := a count := shift AND 31 DO WHILE (count > 0) tmp[0] := dst[31] dst := (dst << 1) OR tmp[0] count := count - 1 OD
Integer Shift Shift the bits of unsigned 32-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst". dst := a count := shift AND 31 DO WHILE (count > 0) tmp[31] := dst[0] dst := (dst >> 1) OR tmp count := count - 1 OD
Integer Shift Shift the bits of unsigned 16-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst". dst := a count := shift AND 15 DO WHILE (count > 0) tmp[0] := dst[15] dst := (dst << 1) OR tmp[0] count := count - 1 OD
Integer Shift Shift the bits of unsigned 16-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst". dst := a count := shift AND 15 DO WHILE (count > 0) tmp[15] := dst[0] dst := (dst >> 1) OR tmp count := count - 1 OD
Integer Shift Shift the bits of unsigned 64-bit integer "a" left by the number of bits specified in "shift", rotating the most-significant bit to the least-significant bit location, and store the unsigned result in "dst". dst := a count := shift AND 63 DO WHILE (count > 0) tmp[0] := dst[63] dst := (dst << 1) OR tmp[0] count := count - 1 OD
Integer Shift Shift the bits of unsigned 64-bit integer "a" right by the number of bits specified in "shift", rotating the least-significant bit to the most-significant bit location, and store the unsigned result in "dst". dst := a count := shift AND 63 DO WHILE (count > 0) tmp[63] := dst[0] dst := (dst >> 1) OR tmp[63] count := count - 1 OD
Integer Flag Arithmetic Add unsigned 32-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry flag), and store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag). tmp[32:0] := a[31:0] + b[31:0] + (c_in > 0 ? 1 : 0) MEM[out+31:out] := tmp[31:0] dst[0] := tmp[32] dst[7:1] := 0
Integer Flag Arithmetic Add unsigned 64-bit integers "a" and "b" with unsigned 8-bit carry-in "c_in" (carry flag), and store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag). tmp[64:0] := a[63:0] + b[63:0] + (c_in > 0 ? 1 : 0) MEM[out+63:out] := tmp[63:0] dst[0] := tmp[64] dst[7:1] := 0
Integer Flag Arithmetic Add unsigned 8-bit borrow "c_in" (carry flag) to unsigned 32-bit integer "b", and subtract the result from unsigned 32-bit integer "a". Store the unsigned 32-bit result in "out", and the carry-out in "dst" (carry or overflow flag). tmp[32:0] := a[31:0] - (b[31:0] + (c_in > 0 ? 1 : 0)) MEM[out+31:out] := tmp[31:0] dst[0] := tmp[32] dst[7:1] := 0
Integer Flag Arithmetic Add unsigned 8-bit borrow "c_in" (carry flag) to unsigned 64-bit integer "b", and subtract the result from unsigned 64-bit integer "a". Store the unsigned 64-bit result in "out", and the carry-out in "dst" (carry or overflow flag). tmp[64:0] := a[63:0] - (b[63:0] + (c_in > 0 ? 1 : 0)) MEM[out+63:out] := tmp[63:0] dst[0] := tmp[64] dst[7:1] := 0
Miscellaneous Insert the 32-bit data from "a" into a Processor Trace stream via a PTW packet. The PTW packet will be inserted if tracing is currently enabled and ptwrite is currently enabled. The current IP will also be inserted via a FUP packet if FUPonPTW is enabled.
Miscellaneous Insert the 64-bit data from "a" into a Processor Trace stream via a PTW packet. The PTW packet will be inserted if tracing is currently enabled and ptwrite is currently enabled. The current IP will also be inserted via a FUP packet if FUPonPTW is enabled.
Miscellaneous Invoke the Intel SGX enclave user (non-privilege) leaf function specified by "a", and return the error code. The "__data" array contains 3 32-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.
Miscellaneous Invoke the Intel SGX enclave system (privileged) leaf function specified by "a", and return the error code. The "__data" array contains 3 32-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.
Miscellaneous Invoke the Intel SGX enclave virtualized (VMM) leaf function specified by "a", and return the error code. The "__data" array contains 3 32-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to ebx, ecx, and edx.
Miscellaneous Write back and flush internal caches. Initiate writing-back and flushing of external caches.
Floating Point Convert Convert the half-precision (16-bit) floating-point value "a" to a single-precision (32-bit) floating-point value, and store the result in "dst". dst[31:0] := Convert_FP16_To_FP32(a[15:0])
Floating Point Convert Convert the single-precision (32-bit) floating-point value "a" to a half-precision (16-bit) floating-point value, and store the result in "dst". [round_note] dst[15:0] := Convert_FP32_To_FP16(a[31:0])
Integer PCLMULQDQ Application-Targeted Perform a carry-less multiplication of two 64-bit integers, selected from "a" and "b" according to "imm8", and store the results in "dst". IF (imm8[0] == 0) TEMP1 := a[63:0] ELSE TEMP1 := a[127:64] FI IF (imm8[4] == 0) TEMP2 := b[63:0] ELSE TEMP2 := b[127:64] FI FOR i := 0 to 63 TEMP[i] := (TEMP1[0] and TEMP2[i]) FOR j := 1 to i TEMP[i] := TEMP[i] XOR (TEMP1[j] AND TEMP2[i-j]) ENDFOR dst[i] := TEMP[i] ENDFOR FOR i := 64 to 127 TEMP[i] := 0 FOR j := (i - 63) to 63 TEMP[i] := TEMP[i] XOR (TEMP1[j] AND TEMP2[i-j]) ENDFOR dst[i] := TEMP[i] ENDFOR dst[127] := 0
PCONFIG Miscellaneous Invoke the PCONFIG leaf function specified by "a". The "__data" array contains 3 32-bit elements that may act as input, output, or be unused, depending on the semantics of the specified leaf function; these correspond to rbx, rcx, and rdx. May return the value in eax, depending on the semantics of the specified leaf function.
Integer Flag POPCNT Bit Manipulation Count the number of bits set to 1 in unsigned 32-bit integer "a", and return that count in "dst". dst := 0 FOR i := 0 to 31 IF a[i] dst := dst + 1 FI ENDFOR
Integer Flag POPCNT Bit Manipulation Count the number of bits set to 1 in unsigned 64-bit integer "a", and return that count in "dst". dst := 0 FOR i := 0 to 63 IF a[i] dst := dst + 1 FI ENDFOR
Integer Flag POPCNT Bit Manipulation Count the number of bits set to 1 in 32-bit integer "a", and return that count in "dst". dst := 0 FOR i := 0 to 31 IF a[i] dst := dst + 1 FI ENDFOR
Integer Flag POPCNT Bit Manipulation Count the number of bits set to 1 in 64-bit integer "a", and return that count in "dst". dst := 0 FOR i := 0 to 63 IF a[i] dst := dst + 1 FI ENDFOR
PREFETCHWT1 General Support Fetch the line of data from memory that contains address "p" to a location in the cache hierarchy specified by the locality hint "i".
RDPID General Support Copy the IA32_TSC_AUX MSR (signature value) into "dst". dst[31:0] := IA32_TSC_AUX[31:0]
Integer Flag RDRAND Random Read a hardware generated 16-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise. IF HW_RND_GEN.ready == 1 val[15:0] := dst := 1 ELSE val[15:0] := 0 dst := 0 FI
Integer Flag RDRAND Random Read a hardware generated 32-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise. IF HW_RND_GEN.ready == 1 val[31:0] := dst := 1 ELSE val[31:0] := 0 dst := 0 FI
Integer Flag RDRAND Random Read a hardware generated 64-bit random value and store the result in "val". Return 1 if a random value was generated, and 0 otherwise. IF HW_RND_GEN.ready == 1 val[63:0] := dst := 1 ELSE val[63:0] := 0 dst := 0 FI
Flag RDSEED Random Read a 16-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise. IF HW_NRND_GEN.ready == 1 val[15:0] := dst := 1 ELSE val[15:0] := 0 dst := 0 FI
Flag RDSEED Random Read a 32-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise. IF HW_NRND_GEN.ready == 1 val[31:0] := dst := 1 ELSE val[31:0] := 0 dst := 0 FI
Flag RDSEED Random Read a 64-bit NIST SP800-90B and SP800-90C compliant random value and store in "val". Return 1 if a random value was generated, and 0 otherwise. IF HW_NRND_GEN.ready == 1 val[63:0] := dst := 1 ELSE val[63:0] := 0 dst := 0 FI
RDTSCP General Support Copy the current 64-bit value of the processor's time-stamp counter into "dst", and store the IA32_TSC_AUX MSR (signature value) into memory at "mem_addr". dst[63:0] := TimeStampCounter MEM[mem_addr+31:mem_addr] := IA32_TSC_AUX[31:0]
RTM General Support Force an RTM abort. The EAX register is updated to reflect an XABORT instruction caused the abort, and the "imm8" parameter will be provided in bits [31:24] of EAX. Following an RTM abort, the logical processor resumes execution at the fallback address computed through the outermost XBEGIN instruction. IF RTM_ACTIVE == 0 // nop ELSE // restore architectural register state // discard memory updates performed in transaction // update EAX with status and imm8 value eax[31:24] := imm8[7:0] RTM_NEST_COUNT := 0 RTM_ACTIVE := 0 IF _64_BIT_MODE RIP := fallbackRIP ELSE EIP := fallbackEIP FI FI
RTM General Support Specify the start of an RTM code region. If the logical processor was not already in transactional execution, then this call causes the logical processor to transition into transactional execution. On an RTM abort, the logical processor discards all architectural register and memory updates performed during the RTM execution, restores architectural state, and starts execution beginning at the fallback address computed from the outermost XBEGIN instruction. Return status of ~0 (0xFFFF) if continuing inside transaction; all other codes are aborts. IF RTM_NEST_COUNT < MAX_RTM_NEST_COUNT RTM_NEST_COUNT := RTM_NEST_COUNT + 1 IF RTM_NEST_COUNT == 1 IF _64_BIT_MODE fallbackRIP := RIP ELSE IF _32_BIT_MODE fallbackEIP := EIP FI RTM_ACTIVE := 1 // enter RTM execution, record register state, start tracking memory state FI ELSE // RTM abort (see _xabort) FI
RTM General Support Specify the end of an RTM code region. If this corresponds to the outermost scope, the logical processor will attempt to commit the logical processor state atomically. If the commit fails, the logical processor will perform an RTM abort. IF RTM_ACTIVE == 1 RTM_NEST_COUNT := RTM_NEST_COUNT - 1 IF RTM_NEST_COUNT == 0 // try to commit transaction IF FAIL_TO_COMMIT_TRANSACTION // RTM abort (see _xabort) ELSE RTM_ACTIVE := 0 FI FI FI
RTM General Support Query the transactional execution status, return 1 if inside a transactionally executing RTM or HLE region, and return 0 otherwise. IF (RTM_ACTIVE == 1 OR HLE_ACTIVE == 1) dst := 1 ELSE dst := 0 FI
SERIALIZE General Support Serialize instruction execution, ensuring all modifications to flags, registers, and memory by previous instructions are completed before the next instruction is fetched.
Integer SHA Cryptography Perform an intermediate calculation for the next four SHA1 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst". W0 := a[127:96] W1 := a[95:64] W2 := a[63:32] W3 := a[31:0] W4 := b[127:96] W5 := b[95:64] dst[127:96] := W2 XOR W0 dst[95:64] := W3 XOR W1 dst[63:32] := W4 XOR W2 dst[31:0] := W5 XOR W3
Integer SHA Cryptography Perform the final calculation for the next four SHA1 message values (unsigned 32-bit integers) using the intermediate result in "a" and the previous message values in "b", and store the result in "dst". W13 := b[95:64] W14 := b[63:32] W15 := b[31:0] W16 := (a[127:96] XOR W13) <<< 1 W17 := (a[95:64] XOR W14) <<< 1 W18 := (a[63:32] XOR W15) <<< 1 W19 := (a[31:0] XOR W16) <<< 1 dst[127:96] := W16 dst[95:64] := W17 dst[63:32] := W18 dst[31:0] := W19
Integer SHA Cryptography Calculate SHA1 state variable E after four rounds of operation from the current SHA1 state variable "a", add that value to the scheduled values (unsigned 32-bit integers) in "b", and store the result in "dst". tmp := (a[127:96] <<< 30) dst[127:96] := b[127:96] + tmp dst[95:64] := b[95:64] dst[63:32] := b[63:32] dst[31:0] := b[31:0]
Integer SHA Cryptography Perform four rounds of SHA1 operation using an initial SHA1 state (A,B,C,D) from "a" and some pre-computed sum of the next 4 round message values (unsigned 32-bit integers), and state variable E from "b", and store the updated SHA1 state (A,B,C,D) in "dst". "func" contains the logic functions and round constants. IF (func[1:0] == 0) f := f0() K := K0 ELSE IF (func[1:0] == 1) f := f1() K := K1 ELSE IF (func[1:0] == 2) f := f2() K := K2 ELSE IF (func[1:0] == 3) f := f3() K := K3 FI A := a[127:96] B := a[95:64] C := a[63:32] D := a[31:0] W[0] := b[127:96] W[1] := b[95:64] W[2] := b[63:32] W[3] := b[31:0] A[1] := f(B, C, D) + (A <<< 5) + W[0] + K B[1] := A C[1] := B <<< 30 D[1] := C E[1] := D FOR i := 1 to 3 A[i+1] := f(B[i], C[i], D[i]) + (A[i] <<< 5) + W[i] + E[i] + K B[i+1] := A[i] C[i+1] := B[i] <<< 30 D[i+1] := C[i] E[i+1] := D[i] ENDFOR dst[127:96] := A[4] dst[95:64] := B[4] dst[63:32] := C[4] dst[31:0] := D[4]
Integer SHA Cryptography Perform an intermediate calculation for the next four SHA256 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst". W4 := b[31:0] W3 := a[127:96] W2 := a[95:64] W1 := a[63:32] W0 := a[31:0] dst[127:96] := W3 + sigma0(W4) dst[95:64] := W2 + sigma0(W3) dst[63:32] := W1 + sigma0(W2) dst[31:0] := W0 + sigma0(W1)
Integer SHA Cryptography Perform the final calculation for the next four SHA256 message values (unsigned 32-bit integers) using previous message values from "a" and "b", and store the result in "dst"." W14 := b[95:64] W15 := b[127:96] W16 := a[31:0] + sigma1(W14) W17 := a[63:32] + sigma1(W15) W18 := a[95:64] + sigma1(W16) W19 := a[127:96] + sigma1(W17) dst[127:96] := W19 dst[95:64] := W18 dst[63:32] := W17 dst[31:0] := W16
Integer SHA Cryptography Perform 2 rounds of SHA256 operation using an initial SHA256 state (C,D,G,H) from "a", an initial SHA256 state (A,B,E,F) from "b", and a pre-computed sum of the next 2 round message values (unsigned 32-bit integers) and the corresponding round constants from "k", and store the updated SHA256 state (A,B,E,F) in "dst". A[0] := b[127:96] B[0] := b[95:64] C[0] := a[127:96] D[0] := a[95:64] E[0] := b[63:32] F[0] := b[31:0] G[0] := a[63:32] H[0] := a[31:0] W_K[0] := k[31:0] W_K[1] := k[63:32] FOR i := 0 to 1 A[i+1] := Ch(E[i], F[i], G[i]) + sum1(E[i]) + W_K[i] + H[i] + Maj(A[i], B[i], C[i]) + sum0(A[i]) B[i+1] := A[i] C[i+1] := B[i] D[i+1] := C[i] E[i+1] := Ch(E[i], F[i], G[i]) + sum1(E[i]) + W_K[i] + H[i] + D[i] F[i+1] := E[i] G[i+1] := F[i] H[i+1] := G[i] ENDFOR dst[127:96] := A[2] dst[95:64] := B[2] dst[63:32] := E[2] dst[31:0] := F[2]
SSE Swizzle Macro: Transpose the 4x4 matrix formed by the 4 rows of single-precision (32-bit) floating-point elements in "row0", "row1", "row2", and "row3", and store the transposed matrix in these vectors ("row0" now contains column 0, etc.). __m128 tmp3, tmp2, tmp1, tmp0; tmp0 := _mm_unpacklo_ps(row0, row1); tmp2 := _mm_unpacklo_ps(row2, row3); tmp1 := _mm_unpackhi_ps(row0, row1); tmp3 := _mm_unpackhi_ps(row2, row3); row0 := _mm_movelh_ps(tmp0, tmp2); row1 := _mm_movehl_ps(tmp2, tmp0); row2 := _mm_movelh_ps(tmp1, tmp3); row3 := _mm_movehl_ps(tmp3, tmp1);
SSE General Support Get the unsigned 32-bit value of the MXCSR control and status register. dst[31:0] := MXCSR
SSE General Support Set the MXCSR control and status register with the value in unsigned 32-bit integer "a". MXCSR := a[31:0]
SSE General Support Macro: Get the exception state bits from the MXCSR control and status register. The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT dst[31:0] := MXCSR & _MM_EXCEPT_MASK
SSE General Support Macro: Set the exception state bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT MXCSR := a[31:0] AND ~_MM_EXCEPT_MASK
SSE General Support Macro: Get the exception mask bits from the MXCSR control and status register. The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT dst[31:0] := MXCSR & _MM_MASK_MASK
SSE General Support Macro: Set the exception mask bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT MXCSR := a[31:0] AND ~_MM_MASK_MASK
SSE General Support Macro: Get the rounding mode bits from the MXCSR control and status register. The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO dst[31:0] := MXCSR & _MM_ROUND_MASK
SSE General Support Macro: Set the rounding mode bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO MXCSR := a[31:0] AND ~_MM_ROUND_MASK
SSE General Support Macro: Get the flush zero bits from the MXCSR control and status register. The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF dst[31:0] := MXCSR & _MM_FLUSH_MASK
SSE General Support Macro: Set the flush zero bits of the MXCSR control and status register to the value in unsigned 32-bit integer "a". The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF MXCSR := a[31:0] AND ~_MM_FLUSH_MASK
SSE General Support Fetch the line of data from memory that contains address "p" to a location in the cache hierarchy specified by the locality hint "i".
SSE General Support Perform a serializing operation on all store-to-memory instructions that were issued prior to this instruction. Guarantees that every store instruction that precedes, in program order, is globally visible before any store instruction which follows the fence in program order.
Integer SSE Special Math Functions Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ENDFOR
Integer SSE Special Math Functions Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ENDFOR
Integer SSE Special Math Functions Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ENDFOR
Integer SSE Special Math Functions Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ENDFOR
Integer SSE Special Math Functions Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ENDFOR
Integer SSE Special Math Functions Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ENDFOR
Integer SSE Special Math Functions Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ENDFOR
Integer SSE Special Math Functions Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ENDFOR
Integer SSE Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 3 i := j*16 tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ENDFOR
Integer SSE Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 3 i := j*16 tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ENDFOR
Integer SSE Probability/Statistics Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ENDFOR
Integer SSE Probability/Statistics Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ENDFOR
Integer SSE Probability/Statistics Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ENDFOR
Integer SSE Probability/Statistics Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ENDFOR
Integer SSE Arithmetic Miscellaneous Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of "dst". FOR j := 0 to 7 i := j*8 tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i]) ENDFOR dst[15:0] := tmp[7:0] + tmp[15:8] + tmp[23:16] + tmp[31:24] + tmp[39:32] + tmp[47:40] + tmp[55:48] + tmp[63:56] dst[63:16] := 0
Floating Point Integer SSE Arithmetic Miscellaneous Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of "dst". FOR j := 0 to 7 i := j*8 tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i]) ENDFOR dst[15:0] := tmp[7:0] + tmp[15:8] + tmp[23:16] + tmp[31:24] + tmp[39:32] + tmp[47:40] + tmp[55:48] + tmp[63:56] dst[63:16] := 0
Floating Point SSE Convert Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[127:32] := a[127:32]
Integer SSE Convert Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[127:32] := a[127:32]
Floating Point Integer SSE Convert Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int64_To_FP32(b[63:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point SSE Convert Convert packed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[63:32] := Convert_Int32_To_FP32(b[63:32]) dst[95:64] := a[95:64] dst[127:96] := a[127:96]
Integer SSE Convert Convert packed signed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_Int32_To_FP32(b[31:0]) dst[63:32] := Convert_Int32_To_FP32(b[63:32]) dst[95:64] := a[95:64] dst[127:96] := a[127:96]
Floating Point SSE Convert Convert packed 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*16 m := j*32 dst[m+31:m] := Convert_Int16_To_FP32(a[i+15:i]) ENDFOR
Floating Point Integer SSE Convert Convert packed unsigned 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*16 m := j*32 dst[m+31:m] := Convert_Int16_To_FP32(a[i+15:i]) ENDFOR
Floating Point SSE Convert Convert the lower packed 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*8 m := j*32 dst[m+31:m] := Convert_Int8_To_FP32(a[i+7:i]) ENDFOR
Floating Point Integer SSE Convert Convert the lower packed unsigned 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := j*8 m := j*32 dst[m+31:m] := Convert_Int8_To_FP32(a[i+7:i]) ENDFOR
Floating Point SSE Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", then covert the packed signed 32-bit integers in "b" to single-precision (32-bit) floating-point element, and store the results in the upper 2 elements of "dst". dst[31:0] := Convert_Int32_To_FP32(a[31:0]) dst[63:32] := Convert_Int32_To_FP32(a[63:32]) dst[95:64] := Convert_Int32_To_FP32(b[31:0]) dst[127:96] := Convert_Int32_To_FP32(b[63:32])
Integer SSE Store Store 64-bits of integer data from "a" into memory using a non-temporal memory hint. MEM[mem_addr+63:mem_addr] := a[63:0]
Integer SSE Store Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint. FOR j := 0 to 7 i := j*8 IF mask[i+7] MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i] FI ENDFOR
Integer SSE Store Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element). FOR j := 0 to 7 i := j*8 IF mask[i+7] MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i] FI ENDFOR
Integer SSE Swizzle Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst". dst[15:0] := (a[63:0] >> (imm8[1:0] * 16))[15:0] dst[31:16] := 0
Integer SSE Swizzle Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst". dst[15:0] := (a[63:0] >> (imm8[1:0] * 16))[15:0] dst[31:16] := 0
Integer SSE Swizzle Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8". dst[63:0] := a[63:0] sel := imm8[1:0]*16 dst[sel+15:sel] := i[15:0]
Integer SSE Swizzle Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8". dst[63:0] := a[63:0] sel := imm8[1:0]*16 dst[sel+15:sel] := i[15:0]
Integer SSE Miscellaneous Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst". FOR j := 0 to 7 i := j*8 dst[j] := a[i+7] ENDFOR dst[MAX:8] := 0
Integer SSE Miscellaneous Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst". FOR j := 0 to 7 i := j*8 dst[j] := a[i+7] ENDFOR dst[MAX:8] := 0
Integer SSE Swizzle Shuffle 16-bit integers in "a" using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[15:0] := src[15:0] 1: tmp[15:0] := src[31:16] 2: tmp[15:0] := src[47:32] 3: tmp[15:0] := src[63:48] ESAC RETURN tmp[15:0] } dst[15:0] := SELECT4(a[63:0], imm8[1:0]) dst[31:16] := SELECT4(a[63:0], imm8[3:2]) dst[47:32] := SELECT4(a[63:0], imm8[5:4]) dst[63:48] := SELECT4(a[63:0], imm8[7:6])
Floating Point Integer SSE Swizzle Shuffle 16-bit integers in "a" using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[15:0] := src[15:0] 1: tmp[15:0] := src[31:16] 2: tmp[15:0] := src[47:32] 3: tmp[15:0] := src[63:48] ESAC RETURN tmp[15:0] } dst[15:0] := SELECT4(a[63:0], imm8[1:0]) dst[31:16] := SELECT4(a[63:0], imm8[3:2]) dst[47:32] := SELECT4(a[63:0], imm8[5:4]) dst[63:48] := SELECT4(a[63:0], imm8[7:6])
Floating Point SSE Arithmetic Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := a[31:0] + b[31:0] dst[127:32] := a[127:32]
Floating Point SSE Arithmetic Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR
Floating Point SSE Arithmetic Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := a[31:0] - b[31:0] dst[127:32] := a[127:32]
Floating Point SSE Arithmetic Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR
Floating Point SSE Arithmetic Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := a[31:0] * b[31:0] dst[127:32] := a[127:32]
Floating Point SSE Arithmetic Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] * b[i+31:i] ENDFOR
Floating Point SSE Arithmetic Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := a[31:0] / b[31:0] dst[127:32] := a[127:32]
Floating Point SSE Arithmetic Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := a[i+31:i] / b[i+31:i] ENDFOR
Floating Point SSE Elementary Math Functions Compute the square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := SQRT(a[31:0]) dst[127:32] := a[127:32]
Floating Point SSE Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := SQRT(a[i+31:i]) ENDFOR
Floating Point SSE Elementary Math Functions Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12. dst[31:0] := (1.0 / a[31:0]) dst[127:32] := a[127:32]
Floating Point SSE Elementary Math Functions Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. FOR j := 0 to 3 i := j*32 dst[i+31:i] := (1.0 / a[i+31:i]) ENDFOR
Floating Point SSE Elementary Math Functions Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12. dst[31:0] := (1.0 / SQRT(a[31:0])) dst[127:32] := a[127:32]
Floating Point SSE Elementary Math Functions Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. FOR j := 0 to 3 i := j*32 dst[i+31:i] := (1.0 / SQRT(a[i+31:i])) ENDFOR
Floating Point SSE Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". dst[31:0] := MIN(a[31:0], b[31:0]) dst[127:32] := a[127:32]
Floating Point SSE Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR
Floating Point SSE Special Math Functions Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". dst[31:0] := MAX(a[31:0], b[31:0]) dst[127:32] := a[127:32]
Floating Point SSE Special Math Functions Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR
Floating Point SSE Logical Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := (a[i+31:i] AND b[i+31:i]) ENDFOR
Floating Point SSE Logical Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ((NOT a[i+31:i]) AND b[i+31:i]) ENDFOR
Floating Point SSE Logical Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] OR b[i+31:i] ENDFOR
Floating Point SSE Logical Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] XOR b[i+31:i] ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := ( a[31:0] == b[31:0] ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := ( a[31:0] < b[31:0] ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := ( a[31:0] <= b[31:0] ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] <= b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := ( a[31:0] > b[31:0] ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := ( a[31:0] >= b[31:0] ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] >= b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := ( a[31:0] != b[31:0] ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] != b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := (!( a[31:0] < b[31:0] )) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := !( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := (!( a[31:0] <= b[31:0] )) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := (!( a[i+31:i] <= b[i+31:i] )) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := (!( a[31:0] > b[31:0] )) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := (!( a[i+31:i] > b[i+31:i] )) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := (!( a[31:0] >= b[31:0] )) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := (!( a[i+31:i] >= b[i+31:i] )) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := ( a[31:0] != NaN AND b[31:0] != NaN ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] != NaN AND b[i+31:i] != NaN ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point SSE Compare Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := ( a[31:0] == NaN OR b[31:0] == NaN ) ? 0xFFFFFFFF : 0 dst[127:32] := a[127:32]
Floating Point SSE Compare Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] == NaN OR b[i+31:i] == NaN ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). RETURN ( a[31:0] == b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). RETURN ( a[31:0] < b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). RETURN ( a[31:0] <= b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). RETURN ( a[31:0] > b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). RETURN ( a[31:0] >= b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). RETURN ( a[31:0] != b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[31:0] == b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[31:0] < b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[31:0] <= b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[31:0] > b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[31:0] >= b[31:0] ) ? 1 : 0
Floating Point Flag SSE Compare Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[31:0] != b[31:0] ) ? 1 : 0
Floating Point Integer SSE Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". dst[31:0] := Convert_FP32_To_Int32(a[31:0])
Floating Point SSE Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". dst[31:0] := Convert_FP32_To_Int32(a[31:0])
Floating Point Integer SSE Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". dst[63:0] := Convert_FP32_To_Int64(a[31:0])
Floating Point SSE Convert Copy the lower single-precision (32-bit) floating-point element of "a" to "dst". dst[31:0] := a[31:0]
Floating Point Integer SSE Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ENDFOR
Floating Point SSE Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ENDFOR
Floating Point Integer SSE Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
Floating Point SSE Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". dst[31:0] := Convert_FP32_To_Int32_Truncate(a[31:0])
Floating Point Integer SSE Convert Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". dst[63:0] := Convert_FP32_To_Int64_Truncate(a[31:0])
Floating Point Integer SSE Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ENDFOR
Floating Point SSE Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ENDFOR
Floating Point Integer SSE Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst". Note: this intrinsic will generate 0x7FFF, rather than 0x8000, for input values between 0x7FFF and 0x7FFFFFFF. FOR j := 0 to 3 i := 16*j k := 32*j IF a[k+31:k] >= FP32(0x7FFF) && a[k+31:k] <= FP32(0x7FFFFFFF) dst[i+15:i] := 0x7FFF ELSE dst[i+15:i] := Convert_FP32_To_Int16(a[k+31:k]) FI ENDFOR
Floating Point Integer SSE Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 8-bit integers, and store the results in lower 4 elements of "dst". Note: this intrinsic will generate 0x7F, rather than 0x80, for input values between 0x7F and 0x7FFFFFFF. FOR j := 0 to 3 i := 8*j k := 32*j IF a[k+31:k] >= FP32(0x7F) && a[k+31:k] <= FP32(0x7FFFFFFF) dst[i+7:i] := 0x7F ELSE dst[i+7:i] := Convert_FP32_To_Int8(a[k+31:k]) FI ENDFOR
Floating Point SSE Set Copy single-precision (32-bit) floating-point element "a" to the lower element of "dst", and zero the upper 3 elements. dst[31:0] := a[31:0] dst[127:32] := 0
Floating Point SSE Set Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[31:0] ENDFOR
Floating Point SSE Set Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[31:0] ENDFOR
Floating Point SSE Set Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values. dst[31:0] := e0 dst[63:32] := e1 dst[95:64] := e2 dst[127:96] := e3
Floating Point SSE Set Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order. dst[31:0] := e3 dst[63:32] := e2 dst[95:64] := e1 dst[127:96] := e0
Floating Point SSE Set Return vector of type __m128 with all elements set to zero. dst[MAX:0] := 0
Integer SSE Load Load 2 single-precision (32-bit) floating-point elements from memory into the upper 2 elements of "dst", and copy the lower 2 elements from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[31:0] := a[31:0] dst[63:32] := a[63:32] dst[95:64] := MEM[mem_addr+31:mem_addr] dst[127:96] := MEM[mem_addr+63:mem_addr+32]
Integer SSE Load Load 2 single-precision (32-bit) floating-point elements from memory into the lower 2 elements of "dst", and copy the upper 2 elements from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[31:0] := MEM[mem_addr+31:mem_addr] dst[63:32] := MEM[mem_addr+63:mem_addr+32] dst[95:64] := a[95:64] dst[127:96] := a[127:96]
Floating Point SSE Load Load a single-precision (32-bit) floating-point element from memory into the lower of "dst", and zero the upper 3 elements. "mem_addr" does not need to be aligned on any particular boundary. dst[31:0] := MEM[mem_addr+31:mem_addr] dst[127:32] := 0
Floating Point SSE Load Load a single-precision (32-bit) floating-point element from memory into all elements of "dst". dst[31:0] := MEM[mem_addr+31:mem_addr] dst[63:32] := MEM[mem_addr+31:mem_addr] dst[95:64] := MEM[mem_addr+31:mem_addr] dst[127:96] := MEM[mem_addr+31:mem_addr]
Floating Point SSE Load Load a single-precision (32-bit) floating-point element from memory into all elements of "dst". dst[31:0] := MEM[mem_addr+31:mem_addr] dst[63:32] := MEM[mem_addr+31:mem_addr] dst[95:64] := MEM[mem_addr+31:mem_addr] dst[127:96] := MEM[mem_addr+31:mem_addr]
Floating Point SSE Load Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into "dst". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. dst[127:0] := MEM[mem_addr+127:mem_addr]
Floating Point SSE Load Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[127:0] := MEM[mem_addr+127:mem_addr]
Floating Point SSE Load Load 4 single-precision (32-bit) floating-point elements from memory into "dst" in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated. dst[31:0] := MEM[mem_addr+127:mem_addr+96] dst[63:32] := MEM[mem_addr+95:mem_addr+64] dst[95:64] := MEM[mem_addr+63:mem_addr+32] dst[127:96] := MEM[mem_addr+31:mem_addr]
Floating Point SSE Store Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer SSE Store Store the upper 2 single-precision (32-bit) floating-point elements from "a" into memory. MEM[mem_addr+31:mem_addr] := a[95:64] MEM[mem_addr+63:mem_addr+32] := a[127:96]
Integer SSE Store Store the lower 2 single-precision (32-bit) floating-point elements from "a" into memory. MEM[mem_addr+31:mem_addr] := a[31:0] MEM[mem_addr+63:mem_addr+32] := a[63:32]
Floating Point SSE Store Store the lower single-precision (32-bit) floating-point element from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+31:mem_addr] := a[31:0]
Floating Point SSE Store Store the lower single-precision (32-bit) floating-point element from "a" into 4 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+31:mem_addr] := a[31:0] MEM[mem_addr+63:mem_addr+32] := a[31:0] MEM[mem_addr+95:mem_addr+64] := a[31:0] MEM[mem_addr+127:mem_addr+96] := a[31:0]
Floating Point SSE Store Store the lower single-precision (32-bit) floating-point element from "a" into 4 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+31:mem_addr] := a[31:0] MEM[mem_addr+63:mem_addr+32] := a[31:0] MEM[mem_addr+95:mem_addr+64] := a[31:0] MEM[mem_addr+127:mem_addr+96] := a[31:0]
Floating Point SSE Store Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+127:mem_addr] := a[127:0]
Floating Point SSE Store Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+127:mem_addr] := a[127:0]
Floating Point SSE Store Store 4 single-precision (32-bit) floating-point elements from "a" into memory in reverse order. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+31:mem_addr] := a[127:96] MEM[mem_addr+63:mem_addr+32] := a[95:64] MEM[mem_addr+95:mem_addr+64] := a[63:32] MEM[mem_addr+127:mem_addr+96] := a[31:0]
Floating Point SSE Move Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := b[31:0] dst[127:32] := a[127:32]
Floating Point SSE Swizzle Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(b[127:0], imm8[5:4]) dst[127:96] := SELECT4(b[127:0], imm8[7:6])
Floating Point SSE Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the high half "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
Floating Point SSE Swizzle Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
Floating Point SSE Move Move the upper 2 single-precision (32-bit) floating-point elements from "b" to the lower 2 elements of "dst", and copy the upper 2 elements from "a" to the upper 2 elements of "dst". dst[31:0] := b[95:64] dst[63:32] := b[127:96] dst[95:64] := a[95:64] dst[127:96] := a[127:96]
Floating Point SSE Move Move the lower 2 single-precision (32-bit) floating-point elements from "b" to the upper 2 elements of "dst", and copy the lower 2 elements from "a" to the lower 2 elements of "dst". dst[31:0] := a[31:0] dst[63:32] := a[63:32] dst[95:64] := b[31:0] dst[127:96] := b[63:32]
Floating Point SSE Miscellaneous Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a". FOR j := 0 to 3 i := j*32 IF a[i+31] dst[j] := 1 ELSE dst[j] := 0 FI ENDFOR dst[MAX:4] := 0
SSE General Support Allocate "size" bytes of memory, aligned to the alignment specified in "align", and return a pointer to the allocated memory. "_mm_free" should be used to free memory that is allocated with "_mm_malloc".
SSE General Support Free aligned memory that was allocated with "_mm_malloc".
Floating Point SSE General Support Return vector of type __m128 with undefined elements.
Floating Point SSE Trigonometry Compute the inverse cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ACOS(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ACOS(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ACOSH(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ACOSH(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ASIN(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ASIN(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ASINH(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ASINH(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ATAN(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ATAN(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse tangent of packed double-precision (64-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians. FOR j := 0 to 1 i := j*64 dst[i+63:i] := ATAN2(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse tangent of packed single-precision (32-bit) floating-point elements in "a" divided by packed elements in "b", and store the results in "dst" expressed in radians. FOR j := 0 to 3 i := j*32 dst[i+31:i] := ATAN2(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ATANH(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the inverse hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ATANH(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := CubeRoot(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := CubeRoot(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := CDFNormal(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := CDFNormal(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the inverse cumulative distribution function of packed double-precision (64-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := InverseCDFNormal(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the inverse cumulative distribution function of packed single-precision (32-bit) floating-point elements in "a" using the normal distribution, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := InverseCDFNormal(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of "e" raised to the power of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]". DEFINE CEXP(a[31:0], b[31:0]) { result[31:0] := POW(FP32(e), a[31:0]) * COS(b[31:0]) result[63:32] := POW(FP32(e), a[31:0]) * SIN(b[31:0]) RETURN result } FOR j := 0 to 1 i := j*64 dst[i+63:i] := CEXP(a[i+31:i], a[i+63:i+32]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the natural logarithm of packed complex numbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]". DEFINE CLOG(a[31:0], b[31:0]) { result[31:0] := LOG(SQRT(POW(a, 2.0) + POW(b, 2.0))) result[63:32] := ATAN2(b, a) RETURN result } FOR j := 0 to 1 i := j*64 dst[i+63:i] := CLOG(a[i+31:i], a[i+63:i+32]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := COS(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := COS(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := COSD(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := COSD(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the hyperbolic cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := COSH(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the hyperbolic cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := COSH(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the square root of packed complex snumbers in "a", and store the complex results in "dst". Each complex number is composed of two adjacent single-precision (32-bit) floating-point elements, which defines the complex number "complex = vec.fp32[0] + i * vec.fp32[1]". DEFINE CSQRT(a[31:0], b[31:0]) { sign[31:0] := (b < 0.0) ? -FP32(1.0) : FP32(1.0) result[31:0] := SQRT((a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0) result[63:32] := sign * SQRT((-a + SQRT(POW(a, 2.0) + POW(b, 2.0))) / 2.0) RETURN result } FOR j := 0 to 1 i := j*64 dst[i+63:i] := CSQRT(a[i+31:i], a[i+63:i+32]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed signed 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 15 i := 8*j IF b[i+7:i] == 0 #DE FI dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed signed 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 7 i := 16*j IF b[i+15:i] == 0 #DE FI dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 3 i := 32*j IF b[i+31:i] == 0 #DE FI dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed signed 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 1 i := 64*j IF b[i+63:i] == 0 #DE FI dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 15 i := 8*j IF b[i+7:i] == 0 #DE FI dst[i+7:i] := Truncate8(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 7 i := 16*j IF b[i+15:i] == 0 #DE FI dst[i+15:i] := Truncate16(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 3 i := 32*j IF b[i+31:i] == 0 #DE FI dst[i+31:i] := Truncate32(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 1 i := 64*j IF b[i+63:i] == 0 #DE FI dst[i+63:i] := Truncate64(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Arithmetic Compute the error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ERF(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ERF(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := 1.0 - ERF(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+63:i] := 1.0 - ERF(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the inverse complementary error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+63:i])) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the inverse complementary error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+63:i] := 1.0 / (1.0 - ERF(a[i+31:i])) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the inverse error function of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := 1.0 / ERF(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Probability/Statistics Compute the inverse error function of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+63:i] := 1.0 / ERF(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := POW(e, a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := POW(FP32(e), a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of 10 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := POW(10.0, a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of 10 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := POW(FP32(10.0), a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of 2 raised to the power of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := POW(2.0, a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of 2 raised to the power of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := POW(FP32(2.0), a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of "e" raised to the power of packed double-precision (64-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := POW(e, a[i+63:i]) - 1.0 ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of "e" raised to the power of packed single-precision (32-bit) floating-point elements in "a", subtract one from each element, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := POW(FP32(e), a[i+31:i]) - 1.0 ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := SQRT(POW(a[i+63:i], 2.0) + POW(b[i+63:i], 2.0)) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the length of the hypotenous of a right triangle, with the lengths of the other two sides of the triangle stored as packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := SQRT(POW(a[i+31:i], 2.0) + POW(b[i+31:i], 2.0)) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed 32-bit integers into memory at "mem_addr". FOR j := 0 to 3 i := 32*j dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i]) MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the inverse cube root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := InvCubeRoot(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the inverse cube root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := InvCubeRoot(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the inverse square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := InvSQRT(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the inverse square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := InvSQRT(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the natural logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the natural logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the base-10 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) / LOG(10.0) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the base-10 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) / LOG(10.0) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the natural logarithm of one plus packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := LOG(1.0 + a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the natural logarithm of one plus packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := LOG(1.0 + a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the base-2 logarithm of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := LOG(a[i+63:i]) / LOG(2.0) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the base-2 logarithm of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := LOG(a[i+31:i]) / LOG(2.0) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Convert the exponent of each packed double-precision (64-bit) floating-point element in "a" to a double-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 1 i := j*64 dst[i+63:i] := ConvertExpFP64(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Convert the exponent of each packed single-precision (32-bit) floating-point element in "a" to a single-precision floating-point number representing the integer exponent, and store the results in "dst". This intrinsic essentially calculates "floor(log2(x))" for each element. FOR j := 0 to 3 i := j*32 dst[i+31:i] := ConvertExpFP32(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of packed double-precision (64-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := POW(a[i+63:i], b[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the exponential value of packed single-precision (32-bit) floating-point elements in "a" raised by packed elements in "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := POW(a[i+31:i], b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed 8-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 15 i := 8*j dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed 16-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 7 i := 16*j dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed 32-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed 64-bit integers in "a" by packed elements in "b", and store the remainders as packed 32-bit integers in "dst". FOR j := 0 to 1 i := 64*j dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 8-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 15 i := 8*j dst[i+7:i] := REMAINDER(a[i+7:i] / b[i+7:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 16-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 7 i := 16*j dst[i+15:i] := REMAINDER(a[i+15:i] / b[i+15:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 64-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 1 i := 64*j dst[i+63:i] := REMAINDER(a[i+63:i] / b[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := SIN(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := SIN(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the sine and cosine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr". FOR j := 0 to 1 i := j*64 dst[i+63:i] := SIN(a[i+63:i]) MEM[mem_addr+i+63:mem_addr+i] := COS(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the sine and cosine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, store the sine in "dst", and store the cosine into memory at "mem_addr". FOR j := 0 to 3 i := j*32 dst[i+31:i] := SIN(a[i+31:i]) MEM[mem_addr+i+31:mem_addr+i] := COS(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the sine of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := SIND(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the sine of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := SIND(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the hyperbolic sine of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := SINH(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the hyperbolic sine of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := SINH(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction. FOR j := 0 to 1 i := j*64 dst[i+63:i] := CEIL(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction. FOR j := 0 to 3 i := j*32 dst[i+31:i] := CEIL(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction. FOR j := 0 to 1 i := j*64 dst[i+63:i] := FLOOR(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction. FOR j := 0 to 3 i := j*32 dst[i+31:i] := FLOOR(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction. FOR j := 0 to 1 i := j*64 dst[i+63:i] := ROUND(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" to the nearest integer value, and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction. FOR j := 0 to 3 i := j*32 dst[i+31:i] := ROUND(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_pd". FOR j := 0 to 1 i := j*64 dst[i+63:i] := SQRT(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Elementary Math Functions Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". Note that this intrinsic is less efficient than "_mm_sqrt_ps". FOR j := 0 to 3 i := j*32 dst[i+31:i] := SQRT(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := TAN(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := TAN(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := TAND(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in degrees, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := TAND(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the hyperbolic tangent of packed double-precision (64-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := TANH(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Trigonometry Compute the hyperbolic tangent of packed single-precision (32-bit) floating-point elements in "a" expressed in radians, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := TANH(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Miscellaneous Truncate the packed double-precision (64-bit) floating-point elements in "a", and store the results as packed double-precision floating-point elements in "dst". This intrinsic may generate the "roundpd"/"vroundpd" instruction. FOR j := 0 to 1 i := j*64 dst[i+63:i] := TRUNCATE(a[i+63:i]) ENDFOR dst[MAX:128] := 0
Floating Point SSE Miscellaneous Truncate the packed single-precision (32-bit) floating-point elements in "a", and store the results as packed single-precision floating-point elements in "dst". This intrinsic may generate the "roundps"/"vroundps" instruction. FOR j := 0 to 3 i := j*32 dst[i+31:i] := TRUNCATE(a[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the truncated results in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", store the truncated results in "dst", and store the remainders as packed unsigned 32-bit integers into memory at "mem_addr". FOR j := 0 to 3 i := 32*j dst[i+31:i] := TRUNCATE(a[i+31:i] / b[i+31:i]) MEM[mem_addr+i+31:mem_addr+i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Arithmetic Divide packed unsigned 32-bit integers in "a" by packed elements in "b", and store the remainders as packed unsigned 32-bit integers in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := REMAINDER(a[i+31:i] / b[i+31:i]) ENDFOR dst[MAX:128] := 0
Integer SSE Store Store 16-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+15:mem_addr] := a[15:0]
Integer SSE Load Load unaligned 64-bit integer from memory into the first element of "dst". dst[63:0] := MEM[mem_addr+63:mem_addr] dst[MAX:64] := 0
Integer SSE Store Store 64-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+63:mem_addr] := a[63:0]
Integer SSE Load Load unaligned 16-bit integer from memory into the first element of "dst". dst[15:0] := MEM[mem_addr+15:mem_addr] dst[MAX:16] := 0
Floating Point SSE2 General Support Return vector of type __m128d with undefined elements.
Integer SSE2 General Support Return vector of type __m128i with undefined elements.
Integer SSE2 Load Load unaligned 32-bit integer from memory into the first element of "dst". dst[31:0] := MEM[mem_addr+31:mem_addr] dst[MAX:32] := 0
Integer SSE2 Store Store 32-bit integer from the first element of "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+31:mem_addr] := a[31:0]
SSE2 General Support Provide a hint to the processor that the code sequence is a spin-wait loop. This can help improve the performance and power consumption of spin-wait loops.
SSE2 General Support Invalidate and flush the cache line that contains "p" from all levels of the cache hierarchy.
SSE2 General Support Perform a serializing operation on all load-from-memory instructions that were issued prior to this instruction. Guarantees that every load instruction that precedes, in program order, is globally visible before any load instruction which follows the fence in program order.
SSE2 General Support Perform a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior to this instruction. Guarantees that every memory access that precedes, in program order, the memory fence instruction is globally visible before any memory instruction which follows the fence in program order.
Integer SSE2 Arithmetic Add packed 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := a[i+7:i] + b[i+7:i] ENDFOR
Integer SSE2 Arithmetic Add packed 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := a[i+15:i] + b[i+15:i] ENDFOR
Integer SSE2 Arithmetic Add packed 32-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] + b[i+31:i] ENDFOR
Integer SSE2 Arithmetic Add 64-bit integers "a" and "b", and store the result in "dst". dst[63:0] := a[63:0] + b[63:0]
Integer SSE2 Arithmetic Add packed 64-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] + b[i+63:i] ENDFOR
Integer SSE2 Arithmetic Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := Saturate8( a[i+7:i] + b[i+7:i] ) ENDFOR
Integer SSE2 Arithmetic Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i] + b[i+15:i] ) ENDFOR
Integer SSE2 Arithmetic Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := SaturateU8( a[i+7:i] + b[i+7:i] ) ENDFOR
Integer SSE2 Arithmetic Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := SaturateU16( a[i+15:i] + b[i+15:i] ) ENDFOR
Integer SSE2 Probability/Statistics Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := (a[i+7:i] + b[i+7:i] + 1) >> 1 ENDFOR
Integer SSE2 Probability/Statistics Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := (a[i+15:i] + b[i+15:i] + 1) >> 1 ENDFOR
Integer SSE2 Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := SignExtend32(a[i+31:i+16]*b[i+31:i+16]) + SignExtend32(a[i+15:i]*b[i+15:i]) ENDFOR
Integer SSE2 Special Math Functions Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ENDFOR
Integer SSE2 Special Math Functions Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ENDFOR
Integer SSE2 Special Math Functions Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ENDFOR
Integer SSE2 Special Math Functions Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ENDFOR
Integer SSE2 Arithmetic Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 7 i := j*16 tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[31:16] ENDFOR
Integer SSE2 Arithmetic Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". FOR j := 0 to 7 i := j*16 tmp[31:0] := a[i+15:i] * b[i+15:i] dst[i+15:i] := tmp[31:16] ENDFOR
Integer SSE2 Arithmetic Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst". FOR j := 0 to 7 i := j*16 tmp[31:0] := SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i]) dst[i+15:i] := tmp[15:0] ENDFOR
Integer SSE2 Arithmetic Multiply the low unsigned 32-bit integers from "a" and "b", and store the unsigned 64-bit result in "dst". dst[63:0] := a[31:0] * b[31:0]
Integer SSE2 Arithmetic Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+31:i] * b[i+31:i] ENDFOR
Integer SSE2 Arithmetic Miscellaneous Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst". FOR j := 0 to 15 i := j*8 tmp[i+7:i] := ABS(a[i+7:i] - b[i+7:i]) ENDFOR FOR j := 0 to 1 i := j*64 dst[i+15:i] := tmp[i+7:i] + tmp[i+15:i+8] + tmp[i+23:i+16] + tmp[i+31:i+24] + \ tmp[i+39:i+32] + tmp[i+47:i+40] + tmp[i+55:i+48] + tmp[i+63:i+56] dst[i+63:i+16] := 0 ENDFOR
Integer SSE2 Arithmetic Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := a[i+7:i] - b[i+7:i] ENDFOR
Integer SSE2 Arithmetic Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := a[i+15:i] - b[i+15:i] ENDFOR
Integer SSE2 Arithmetic Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[i+31:i] - b[i+31:i] ENDFOR
Integer SSE2 Arithmetic Subtract 64-bit integer "b" from 64-bit integer "a", and store the result in "dst". dst[63:0] := a[63:0] - b[63:0]
Integer SSE2 Arithmetic Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] - b[i+63:i] ENDFOR
Integer SSE2 Arithmetic Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := Saturate8(a[i+7:i] - b[i+7:i]) ENDFOR
Integer SSE2 Arithmetic Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := Saturate16(a[i+15:i] - b[i+15:i]) ENDFOR
Integer SSE2 Arithmetic Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := SaturateU8(a[i+7:i] - b[i+7:i]) ENDFOR
Integer SSE2 Arithmetic Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := SaturateU16(a[i+15:i] - b[i+15:i]) ENDFOR
Integer SSE2 Shift Shift "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] << (tmp*8)
Integer SSE2 Shift Shift "a" left by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] << (tmp*8)
Integer SSE2 Shift Shift "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] >> (tmp*8)
Integer SSE2 Shift Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << imm8[7:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] << count[63:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << imm8[7:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] << count[63:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << imm8[7:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] << count[63:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF count[63:0] > 15 dst[i+15:i] := (a[i+15] ? 0xFFFF : 0x0) ELSE dst[i+15:i] := SignExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF count[63:0] > 31 dst[i+31:i] := (a[i+31] ? 0xFFFFFFFF : 0x0) ELSE dst[i+31:i] := SignExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR
Integer SSE2 Shift Shift "a" right by "imm8" bytes while shifting in zeros, and store the results in "dst". tmp := imm8[7:0] IF tmp > 15 tmp := 16 FI dst[127:0] := a[127:0] >> (tmp*8)
Integer SSE2 Shift Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF imm8[7:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> imm8[7:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 7 i := j*16 IF count[63:0] > 15 dst[i+15:i] := 0 ELSE dst[i+15:i] := ZeroExtend16(a[i+15:i] >> count[63:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF imm8[7:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> imm8[7:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 3 i := j*32 IF count[63:0] > 31 dst[i+31:i] := 0 ELSE dst[i+31:i] := ZeroExtend32(a[i+31:i] >> count[63:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF imm8[7:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> imm8[7:0]) FI ENDFOR
Integer SSE2 Shift Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". FOR j := 0 to 1 i := j*64 IF count[63:0] > 63 dst[i+63:i] := 0 ELSE dst[i+63:i] := ZeroExtend64(a[i+63:i] >> count[63:0]) FI ENDFOR
Integer SSE2 Logical Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[127:0] := (a[127:0] AND b[127:0])
Integer SSE2 Logical Compute the bitwise NOT of 128 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst". dst[127:0] := ((NOT a[127:0]) AND b[127:0])
Integer SSE2 Logical Compute the bitwise OR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[127:0] := (a[127:0] OR b[127:0])
Integer SSE2 Logical Compute the bitwise XOR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst". dst[127:0] := (a[127:0] XOR b[127:0])
Integer SSE2 Compare Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := ( a[i+7:i] == b[i+7:i] ) ? 0xFF : 0 ENDFOR
Integer SSE2 Compare Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := ( a[i+15:i] == b[i+15:i] ) ? 0xFFFF : 0 ENDFOR
Integer SSE2 Compare Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] == b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Integer SSE2 Compare Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := ( a[i+7:i] > b[i+7:i] ) ? 0xFF : 0 ENDFOR
Integer SSE2 Compare Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := ( a[i+15:i] > b[i+15:i] ) ? 0xFFFF : 0 ENDFOR
Integer SSE2 Compare Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] > b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Integer SSE2 Compare Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtb instruction with the order of the operands switched. FOR j := 0 to 15 i := j*8 dst[i+7:i] := ( a[i+7:i] < b[i+7:i] ) ? 0xFF : 0 ENDFOR
Integer SSE2 Compare Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtw instruction with the order of the operands switched. FOR j := 0 to 7 i := j*16 dst[i+15:i] := ( a[i+15:i] < b[i+15:i] ) ? 0xFFFF : 0 ENDFOR
Integer SSE2 Compare Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtd instruction with the order of the operands switched. FOR j := 0 to 3 i := j*32 dst[i+31:i] := ( a[i+31:i] < b[i+31:i] ) ? 0xFFFFFFFF : 0 ENDFOR
Floating Point Integer SSE2 Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := j*32 m := j*64 dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ENDFOR
Floating Point SSE2 Convert Convert the signed 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := Convert_Int32_To_FP64(b[31:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point Integer SSE2 Convert Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := Convert_Int64_To_FP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point Integer SSE2 Convert Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := Convert_Int64_To_FP64(b[63:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point Integer SSE2 Convert Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := Convert_Int32_To_FP32(a[i+31:i]) ENDFOR
Floating Point SSE2 Convert Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := j*32 m := j*64 dst[m+63:m] := Convert_Int32_To_FP64(a[i+31:i]) ENDFOR
Integer SSE2 Convert Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper elements of "dst". dst[31:0] := a[31:0] dst[127:32] := 0
Integer SSE2 Convert Copy 64-bit integer "a" to the lower element of "dst", and zero the upper element. dst[63:0] := a[63:0] dst[127:64] := 0
Integer SSE2 Convert Copy 64-bit integer "a" to the lower element of "dst", and zero the upper element. dst[63:0] := a[63:0] dst[127:64] := 0
Integer SSE2 Convert Copy the lower 32-bit integer in "a" to "dst". dst[31:0] := a[31:0]
Integer SSE2 Convert Copy the lower 64-bit integer in "a" to "dst". dst[63:0] := a[63:0]
Integer SSE2 Convert Copy the lower 64-bit integer in "a" to "dst". dst[63:0] := a[63:0]
Integer SSE2 Set Set packed 64-bit integers in "dst" with the supplied values. dst[63:0] := e0 dst[127:64] := e1
Integer SSE2 Set Set packed 64-bit integers in "dst" with the supplied values. dst[63:0] := e0 dst[127:64] := e1
Integer SSE2 Set Set packed 32-bit integers in "dst" with the supplied values. dst[31:0] := e0 dst[63:32] := e1 dst[95:64] := e2 dst[127:96] := e3
Integer SSE2 Set Set packed 16-bit integers in "dst" with the supplied values. dst[15:0] := e0 dst[31:16] := e1 dst[47:32] := e2 dst[63:48] := e3 dst[79:64] := e4 dst[95:80] := e5 dst[111:96] := e6 dst[127:112] := e7
Integer SSE2 Set Set packed 8-bit integers in "dst" with the supplied values. dst[7:0] := e0 dst[15:8] := e1 dst[23:16] := e2 dst[31:24] := e3 dst[39:32] := e4 dst[47:40] := e5 dst[55:48] := e6 dst[63:56] := e7 dst[71:64] := e8 dst[79:72] := e9 dst[87:80] := e10 dst[95:88] := e11 dst[103:96] := e12 dst[111:104] := e13 dst[119:112] := e14 dst[127:120] := e15
Integer SSE2 Set Broadcast 64-bit integer "a" to all elements of "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[63:0] ENDFOR
Integer SSE2 Set Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[63:0] ENDFOR
Integer SSE2 Set Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastd". FOR j := 0 to 3 i := j*32 dst[i+31:i] := a[31:0] ENDFOR
Integer SSE2 Set Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate "vpbroadcastw". FOR j := 0 to 7 i := j*16 dst[i+15:i] := a[15:0] ENDFOR
Integer SSE2 Set Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastb". FOR j := 0 to 15 i := j*8 dst[i+7:i] := a[7:0] ENDFOR
Integer SSE2 Set Set packed 64-bit integers in "dst" with the supplied values in reverse order. dst[63:0] := e1 dst[127:64] := e0
Integer SSE2 Set Set packed 32-bit integers in "dst" with the supplied values in reverse order. dst[31:0] := e3 dst[63:32] := e2 dst[95:64] := e1 dst[127:96] := e0
Integer SSE2 Set Set packed 16-bit integers in "dst" with the supplied values in reverse order. dst[15:0] := e7 dst[31:16] := e6 dst[47:32] := e5 dst[63:48] := e4 dst[79:64] := e3 dst[95:80] := e2 dst[111:96] := e1 dst[127:112] := e0
Integer SSE2 Set Set packed 8-bit integers in "dst" with the supplied values in reverse order. dst[7:0] := e15 dst[15:8] := e14 dst[23:16] := e13 dst[31:24] := e12 dst[39:32] := e11 dst[47:40] := e10 dst[55:48] := e9 dst[63:56] := e8 dst[71:64] := e7 dst[79:72] := e6 dst[87:80] := e5 dst[95:88] := e4 dst[103:96] := e3 dst[111:104] := e2 dst[119:112] := e1 dst[127:120] := e0
Integer SSE2 Set Return vector of type __m128i with all elements set to zero. dst[MAX:0] := 0
Integer SSE2 Load Load 64-bit integer from memory into the first element of "dst". dst[63:0] := MEM[mem_addr+63:mem_addr] dst[MAX:64] := 0
Integer SSE2 Load Load 128-bits of integer data from memory into "dst". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. dst[127:0] := MEM[mem_addr+127:mem_addr]
Integer SSE2 Load Load 128-bits of integer data from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[127:0] := MEM[mem_addr+127:mem_addr]
Integer SSE2 Store Conditionally store 8-bit integer elements from "a" into memory using "mask" (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint. "mem_addr" does not need to be aligned on any particular boundary. FOR j := 0 to 15 i := j*8 IF mask[i+7] MEM[mem_addr+i+7:mem_addr+i] := a[i+7:i] FI ENDFOR
Integer SSE2 Store Store 128-bits of integer data from "a" into memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer SSE2 Store Store 128-bits of integer data from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer SSE2 Store Store 64-bit integer from the first element of "a" into memory. MEM[mem_addr+63:mem_addr] := a[63:0]
Integer SSE2 Store Store 128-bits of integer data from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+127:mem_addr] := a[127:0]
Integer SSE2 Store Store 32-bit integer "a" into memory using a non-temporal hint to minimize cache pollution. If the cache line containing address "mem_addr" is already in the cache, the cache will be updated. MEM[mem_addr+31:mem_addr] := a[31:0]
Integer SSE2 Store Store 64-bit integer "a" into memory using a non-temporal hint to minimize cache pollution. If the cache line containing address "mem_addr" is already in the cache, the cache will be updated. MEM[mem_addr+63:mem_addr] := a[63:0]
Integer SSE2 Miscellaneous Copy the lower 64-bit integer in "a" to "dst". dst[63:0] := a[63:0]
Integer SSE2 Move Copy the 64-bit integer "a" to the lower element of "dst", and zero the upper element. dst[63:0] := a[63:0] dst[127:64] := 0
Integer SSE2 Move Copy the lower 64-bit integer in "a" to the lower element of "dst", and zero the upper element. dst[63:0] := a[63:0] dst[127:64] := 0
Integer SSE2 Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst". dst[7:0] := Saturate8(a[15:0]) dst[15:8] := Saturate8(a[31:16]) dst[23:16] := Saturate8(a[47:32]) dst[31:24] := Saturate8(a[63:48]) dst[39:32] := Saturate8(a[79:64]) dst[47:40] := Saturate8(a[95:80]) dst[55:48] := Saturate8(a[111:96]) dst[63:56] := Saturate8(a[127:112]) dst[71:64] := Saturate8(b[15:0]) dst[79:72] := Saturate8(b[31:16]) dst[87:80] := Saturate8(b[47:32]) dst[95:88] := Saturate8(b[63:48]) dst[103:96] := Saturate8(b[79:64]) dst[111:104] := Saturate8(b[95:80]) dst[119:112] := Saturate8(b[111:96]) dst[127:120] := Saturate8(b[127:112])
Integer SSE2 Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst". dst[15:0] := Saturate16(a[31:0]) dst[31:16] := Saturate16(a[63:32]) dst[47:32] := Saturate16(a[95:64]) dst[63:48] := Saturate16(a[127:96]) dst[79:64] := Saturate16(b[31:0]) dst[95:80] := Saturate16(b[63:32]) dst[111:96] := Saturate16(b[95:64]) dst[127:112] := Saturate16(b[127:96])
Integer SSE2 Miscellaneous Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst". dst[7:0] := SaturateU8(a[15:0]) dst[15:8] := SaturateU8(a[31:16]) dst[23:16] := SaturateU8(a[47:32]) dst[31:24] := SaturateU8(a[63:48]) dst[39:32] := SaturateU8(a[79:64]) dst[47:40] := SaturateU8(a[95:80]) dst[55:48] := SaturateU8(a[111:96]) dst[63:56] := SaturateU8(a[127:112]) dst[71:64] := SaturateU8(b[15:0]) dst[79:72] := SaturateU8(b[31:16]) dst[87:80] := SaturateU8(b[47:32]) dst[95:88] := SaturateU8(b[63:48]) dst[103:96] := SaturateU8(b[79:64]) dst[111:104] := SaturateU8(b[95:80]) dst[119:112] := SaturateU8(b[111:96]) dst[127:120] := SaturateU8(b[127:112])
Integer SSE2 Swizzle Extract a 16-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst". dst[15:0] := (a[127:0] >> (imm8[2:0] * 16))[15:0] dst[31:16] := 0
Integer SSE2 Swizzle Copy "a" to "dst", and insert the 16-bit integer "i" into "dst" at the location specified by "imm8". dst[127:0] := a[127:0] sel := imm8[2:0]*16 dst[sel+15:sel] := i[15:0]
Integer SSE2 Miscellaneous Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst". FOR j := 0 to 15 i := j*8 dst[j] := a[i+7] ENDFOR dst[MAX:16] := 0
Integer SSE2 Swizzle Shuffle 32-bit integers in "a" using the control in "imm8", and store the results in "dst". DEFINE SELECT4(src, control) { CASE(control[1:0]) OF 0: tmp[31:0] := src[31:0] 1: tmp[31:0] := src[63:32] 2: tmp[31:0] := src[95:64] 3: tmp[31:0] := src[127:96] ESAC RETURN tmp[31:0] } dst[31:0] := SELECT4(a[127:0], imm8[1:0]) dst[63:32] := SELECT4(a[127:0], imm8[3:2]) dst[95:64] := SELECT4(a[127:0], imm8[5:4]) dst[127:96] := SELECT4(a[127:0], imm8[7:6])
Integer SSE2 Swizzle Shuffle 16-bit integers in the high 64 bits of "a" using the control in "imm8". Store the results in the high 64 bits of "dst", with the low 64 bits being copied from from "a" to "dst". dst[63:0] := a[63:0] dst[79:64] := (a >> (imm8[1:0] * 16))[79:64] dst[95:80] := (a >> (imm8[3:2] * 16))[79:64] dst[111:96] := (a >> (imm8[5:4] * 16))[79:64] dst[127:112] := (a >> (imm8[7:6] * 16))[79:64]
Integer SSE2 Swizzle Shuffle 16-bit integers in the low 64 bits of "a" using the control in "imm8". Store the results in the low 64 bits of "dst", with the high 64 bits being copied from from "a" to "dst". dst[15:0] := (a >> (imm8[1:0] * 16))[15:0] dst[31:16] := (a >> (imm8[3:2] * 16))[15:0] dst[47:32] := (a >> (imm8[5:4] * 16))[15:0] dst[63:48] := (a >> (imm8[7:6] * 16))[15:0] dst[127:64] := a[127:64]
Integer SSE2 Swizzle Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[71:64] dst[15:8] := src2[71:64] dst[23:16] := src1[79:72] dst[31:24] := src2[79:72] dst[39:32] := src1[87:80] dst[47:40] := src2[87:80] dst[55:48] := src1[95:88] dst[63:56] := src2[95:88] dst[71:64] := src1[103:96] dst[79:72] := src2[103:96] dst[87:80] := src1[111:104] dst[95:88] := src2[111:104] dst[103:96] := src1[119:112] dst[111:104] := src2[119:112] dst[119:112] := src1[127:120] dst[127:120] := src2[127:120] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_BYTES(a[127:0], b[127:0])
Integer SSE2 Swizzle Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[79:64] dst[31:16] := src2[79:64] dst[47:32] := src1[95:80] dst[63:48] := src2[95:80] dst[79:64] := src1[111:96] dst[95:80] := src2[111:96] dst[111:96] := src1[127:112] dst[127:112] := src2[127:112] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_WORDS(a[127:0], b[127:0])
Integer SSE2 Swizzle Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[95:64] dst[63:32] := src2[95:64] dst[95:64] := src1[127:96] dst[127:96] := src2[127:96] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_DWORDS(a[127:0], b[127:0])
Integer SSE2 Swizzle Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
Integer SSE2 Swizzle Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_BYTES(src1[127:0], src2[127:0]) { dst[7:0] := src1[7:0] dst[15:8] := src2[7:0] dst[23:16] := src1[15:8] dst[31:24] := src2[15:8] dst[39:32] := src1[23:16] dst[47:40] := src2[23:16] dst[55:48] := src1[31:24] dst[63:56] := src2[31:24] dst[71:64] := src1[39:32] dst[79:72] := src2[39:32] dst[87:80] := src1[47:40] dst[95:88] := src2[47:40] dst[103:96] := src1[55:48] dst[111:104] := src2[55:48] dst[119:112] := src1[63:56] dst[127:120] := src2[63:56] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_BYTES(a[127:0], b[127:0])
Integer SSE2 Swizzle Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_WORDS(src1[127:0], src2[127:0]) { dst[15:0] := src1[15:0] dst[31:16] := src2[15:0] dst[47:32] := src1[31:16] dst[63:48] := src2[31:16] dst[79:64] := src1[47:32] dst[95:80] := src2[47:32] dst[111:96] := src1[63:48] dst[127:112] := src2[63:48] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_WORDS(a[127:0], b[127:0])
Integer SSE2 Swizzle Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_DWORDS(src1[127:0], src2[127:0]) { dst[31:0] := src1[31:0] dst[63:32] := src2[31:0] dst[95:64] := src1[63:32] dst[127:96] := src2[63:32] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_DWORDS(a[127:0], b[127:0])
Integer SSE2 Swizzle Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
Floating Point SSE2 Arithmetic Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := a[63:0] + b[63:0] dst[127:64] := a[127:64]
Floating Point SSE2 Arithmetic Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] + b[i+63:i] ENDFOR
Floating Point SSE2 Arithmetic Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := a[63:0] / b[63:0] dst[127:64] := a[127:64]
Floating Point SSE2 Arithmetic Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". FOR j := 0 to 1 i := 64*j dst[i+63:i] := a[i+63:i] / b[i+63:i] ENDFOR
Floating Point SSE2 Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := MAX(a[63:0], b[63:0]) dst[127:64] := a[127:64]
Floating Point SSE2 Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := MAX(a[i+63:i], b[i+63:i]) ENDFOR
Floating Point SSE2 Special Math Functions Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := MIN(a[63:0], b[63:0]) dst[127:64] := a[127:64]
Floating Point SSE2 Special Math Functions Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := MIN(a[i+63:i], b[i+63:i]) ENDFOR
Floating Point SSE2 Arithmetic Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := a[63:0] * b[63:0] dst[127:64] := a[127:64]
Floating Point SSE2 Arithmetic Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] * b[i+63:i] ENDFOR
Floating Point SSE2 Elementary Math Functions Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := SQRT(b[63:0]) dst[127:64] := a[127:64]
Floating Point SSE2 Elementary Math Functions Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := SQRT(a[i+63:i]) ENDFOR
Floating Point SSE2 Arithmetic Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := a[63:0] - b[63:0] dst[127:64] := a[127:64]
Floating Point SSE2 Arithmetic Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] - b[i+63:i] ENDFOR
Floating Point SSE2 Logical Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] AND b[i+63:i]) ENDFOR
Floating Point SSE2 Logical Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ((NOT a[i+63:i]) AND b[i+63:i]) ENDFOR
Floating Point SSE2 Logical Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] OR b[i+63:i] ENDFOR
Floating Point SSE2 Logical Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[i+63:i] XOR b[i+63:i] ENDFOR
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] == b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] < b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] <= b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] > b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] >= b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] != NaN AND b[63:0] != NaN) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] == NaN OR b[63:0] == NaN) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (a[63:0] != b[63:0]) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (!(a[63:0] < b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (!(a[63:0] <= b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (!(a[63:0] > b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := (!(a[63:0] >= b[63:0])) ? 0xFFFFFFFFFFFFFFFF : 0 dst[127:64] := a[127:64]
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] == b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] < b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] <= b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] > b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] >= b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] != NaN AND b[i+63:i] != NaN) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] == NaN OR b[i+63:i] == NaN) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (a[i+63:i] != b[i+63:i]) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (!(a[i+63:i] < b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (!(a[i+63:i] <= b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (!(a[i+63:i] > b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point SSE2 Compare Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := (!(a[i+63:i] >= b[i+63:i])) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). RETURN ( a[63:0] == b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). RETURN ( a[63:0] < b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). RETURN ( a[63:0] <= b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). RETURN ( a[63:0] > b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). RETURN ( a[63:0] >= b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). RETURN ( a[63:0] != b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[63:0] == b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[63:0] < b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[63:0] <= b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[63:0] > b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[63:0] >= b[63:0] ) ? 1 : 0
Floating Point Flag SSE2 Compare Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. RETURN ( a[63:0] != b[63:0] ) ? 1 : 0
Floating Point SSE2 Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_FP32(a[k+63:k]) ENDFOR dst[127:64] := 0
Floating Point SSE2 Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 32*j dst[i+63:i] := Convert_FP32_To_FP64(a[k+31:k]) ENDFOR
Floating Point Integer SSE2 Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k]) ENDFOR
Floating Point Integer SSE2 Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". dst[31:0] := Convert_FP64_To_Int32(a[63:0])
Floating Point Integer SSE2 Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". dst[63:0] := Convert_FP64_To_Int64(a[63:0])
Floating Point Integer SSE2 Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". dst[63:0] := Convert_FP64_To_Int64(a[63:0])
Floating Point SSE2 Convert Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := Convert_FP64_To_FP32(b[63:0]) dst[127:32] := a[127:32] dst[MAX:128] := 0
Floating Point SSE2 Convert Copy the lower double-precision (64-bit) floating-point element of "a" to "dst". dst[63:0] := a[63:0]
Floating Point SSE2 Convert Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := Convert_FP32_To_FP64(b[31:0]) dst[127:64] := a[127:64] dst[MAX:128] := 0
Floating Point Integer SSE2 Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k]) ENDFOR
Floating Point Integer SSE2 Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". dst[31:0] := Convert_FP64_To_Int32_Truncate(a[63:0])
Floating Point Integer SSE2 Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
Floating Point Integer SSE2 Convert Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". dst[63:0] := Convert_FP64_To_Int64_Truncate(a[63:0])
Floating Point Integer SSE2 Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32(a[i+31:i]) ENDFOR
Floating Point Integer SSE2 Convert Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 3 i := 32*j dst[i+31:i] := Convert_FP32_To_Int32_Truncate(a[i+31:i]) ENDFOR
Floating Point Integer SSE2 Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k]) ENDFOR
Floating Point Integer SSE2 Convert Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". FOR j := 0 to 1 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32_Truncate(a[k+63:k]) ENDFOR
Floating Point SSE2 Set Copy double-precision (64-bit) floating-point element "a" to the lower element of "dst", and zero the upper element. dst[63:0] := a[63:0] dst[127:64] := 0
Floating Point SSE2 Set Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[63:0] ENDFOR
Floating Point SSE2 Set Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := a[63:0] ENDFOR
Floating Point SSE2 Set Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values. dst[63:0] := e0 dst[127:64] := e1
Floating Point SSE2 Set Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order. dst[63:0] := e1 dst[127:64] := e0
Floating Point SSE2 Set Return vector of type __m128d with all elements set to zero. dst[MAX:0] := 0
Floating Point SSE2 Load Load 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. dst[127:0] := MEM[mem_addr+127:mem_addr]
Floating Point SSE2 Load Load a double-precision (64-bit) floating-point element from memory into both elements of "dst". dst[63:0] := MEM[mem_addr+63:mem_addr] dst[127:64] := MEM[mem_addr+63:mem_addr]
Floating Point SSE2 Load Load a double-precision (64-bit) floating-point element from memory into both elements of "dst". dst[63:0] := MEM[mem_addr+63:mem_addr] dst[127:64] := MEM[mem_addr+63:mem_addr]
Floating Point SSE2 Load Load 2 double-precision (64-bit) floating-point elements from memory into "dst" in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated. dst[63:0] := MEM[mem_addr+127:mem_addr+64] dst[127:64] := MEM[mem_addr+63:mem_addr]
Floating Point SSE2 Load Load 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from memory into "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[127:0] := MEM[mem_addr+127:mem_addr]
Floating Point SSE2 Load Load a double-precision (64-bit) floating-point element from memory into the lower of "dst", and zero the upper element. "mem_addr" does not need to be aligned on any particular boundary. dst[63:0] := MEM[mem_addr+63:mem_addr] dst[127:64] := 0
Floating Point SSE2 Load Load a double-precision (64-bit) floating-point element from memory into the upper element of "dst", and copy the lower element from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[63:0] := a[63:0] dst[127:64] := MEM[mem_addr+63:mem_addr]
Floating Point SSE2 Load Load a double-precision (64-bit) floating-point element from memory into the lower element of "dst", and copy the upper element from "a" to "dst". "mem_addr" does not need to be aligned on any particular boundary. dst[63:0] := MEM[mem_addr+63:mem_addr] dst[127:64] := a[127:64]
Floating Point SSE2 Store Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory using a non-temporal memory hint. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+127:mem_addr] := a[127:0]
Floating Point SSE2 Store Store the lower double-precision (64-bit) floating-point element from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+63:mem_addr] := a[63:0]
Floating Point SSE2 Store Store the lower double-precision (64-bit) floating-point element from "a" into 2 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+63:mem_addr] := a[63:0] MEM[mem_addr+127:mem_addr+64] := a[63:0]
Floating Point SSE2 Store Store the lower double-precision (64-bit) floating-point element from "a" into 2 contiguous elements in memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+63:mem_addr] := a[63:0] MEM[mem_addr+127:mem_addr+64] := a[63:0]
Floating Point SSE2 Store Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+127:mem_addr] := a[127:0]
Floating Point SSE2 Store Store 128-bits (composed of 2 packed double-precision (64-bit) floating-point elements) from "a" into memory. "mem_addr" does not need to be aligned on any particular boundary. MEM[mem_addr+127:mem_addr] := a[127:0]
Floating Point SSE2 Store Store 2 double-precision (64-bit) floating-point elements from "a" into memory in reverse order. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. MEM[mem_addr+63:mem_addr] := a[127:64] MEM[mem_addr+127:mem_addr+64] := a[63:0]
Floating Point SSE2 Store Store the upper double-precision (64-bit) floating-point element from "a" into memory. MEM[mem_addr+63:mem_addr] := a[127:64]
Floating Point SSE2 Store Store the lower double-precision (64-bit) floating-point element from "a" into memory. MEM[mem_addr+63:mem_addr] := a[63:0]
Floating Point SSE2 Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_HIGH_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[127:64] dst[127:64] := src2[127:64] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_HIGH_QWORDS(a[127:0], b[127:0])
Floating Point SSE2 Swizzle Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst". DEFINE INTERLEAVE_QWORDS(src1[127:0], src2[127:0]) { dst[63:0] := src1[63:0] dst[127:64] := src2[63:0] RETURN dst[127:0] } dst[127:0] := INTERLEAVE_QWORDS(a[127:0], b[127:0])
Floating Point SSE2 Miscellaneous Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a". FOR j := 0 to 1 i := j*64 IF a[i+63] dst[j] := 1 ELSE dst[j] := 0 FI ENDFOR dst[MAX:2] := 0
Floating Point SSE2 Swizzle Shuffle double-precision (64-bit) floating-point elements using the control in "imm8", and store the results in "dst". dst[63:0] := (imm8[0] == 0) ? a[63:0] : a[127:64] dst[127:64] := (imm8[1] == 0) ? b[63:0] : b[127:64]
Floating Point SSE2 Move Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := b[63:0] dst[127:64] := a[127:64]
Floating Point SSE2 Cast Cast vector of type __m128d to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer SSE2 Cast Cast vector of type __m128d to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point SSE2 Cast Cast vector of type __m128 to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point Integer SSE2 Cast Cast vector of type __m128 to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point SSE2 Cast Cast vector of type __m128i to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point SSE2 Cast Cast vector of type __m128i to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.
Floating Point SSE3 Arithmetic Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst". FOR j := 0 to 3 i := j*32 IF ((j & 1) == 0) dst[i+31:i] := a[i+31:i] - b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] + b[i+31:i] FI ENDFOR
Floating Point SSE3 Arithmetic Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst". FOR j := 0 to 1 i := j*64 IF ((j & 1) == 0) dst[i+63:i] := a[i+63:i] - b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] + b[i+63:i] FI ENDFOR
Floating Point SSE3 Arithmetic Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst". dst[63:0] := a[127:64] + a[63:0] dst[127:64] := b[127:64] + b[63:0]
Floating Point SSE3 Arithmetic Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst". dst[31:0] := a[63:32] + a[31:0] dst[63:32] := a[127:96] + a[95:64] dst[95:64] := b[63:32] + b[31:0] dst[127:96] := b[127:96] + b[95:64]
Floating Point SSE3 Arithmetic Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst". dst[63:0] := a[63:0] - a[127:64] dst[127:64] := b[63:0] - b[127:64]
Floating Point SSE3 Arithmetic Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst". dst[31:0] := a[31:0] - a[63:32] dst[63:32] := a[95:64] - a[127:96] dst[95:64] := b[31:0] - b[63:32] dst[127:96] := b[95:64] - b[127:96]
Integer SSE3 Load Load 128-bits of integer data from unaligned memory into "dst". This intrinsic may perform better than "_mm_loadu_si128" when the data crosses a cache line boundary. dst[127:0] := MEM[mem_addr+127:mem_addr]
Floating Point SSE3 Move Duplicate the low double-precision (64-bit) floating-point element from "a", and store the results in "dst". dst[63:0] := a[63:0] dst[127:64] := a[63:0]
Floating Point SSE3 Load Load a double-precision (64-bit) floating-point element from memory into both elements of "dst". dst[63:0] := MEM[mem_addr+63:mem_addr] dst[127:64] := MEM[mem_addr+63:mem_addr]
Floating Point SSE3 Move Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". dst[31:0] := a[63:32] dst[63:32] := a[63:32] dst[95:64] := a[127:96] dst[127:96] := a[127:96]
Floating Point SSE3 Move Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". dst[31:0] := a[31:0] dst[63:32] := a[31:0] dst[95:64] := a[95:64] dst[127:96] := a[95:64]
Floating Point SSE4.1 Swizzle Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst". FOR j := 0 to 1 i := j*64 IF imm8[j] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR
Floating Point SSE4.1 Swizzle Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using control mask "imm8", and store the results in "dst". FOR j := 0 to 3 i := j*32 IF imm8[j] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR
Floating Point SSE4.1 Swizzle Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst". FOR j := 0 to 1 i := j*64 IF mask[i+63] dst[i+63:i] := b[i+63:i] ELSE dst[i+63:i] := a[i+63:i] FI ENDFOR
Floating Point SSE4.1 Swizzle Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst". FOR j := 0 to 3 i := j*32 IF mask[i+31] dst[i+31:i] := b[i+31:i] ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR
Integer SSE4.1 Swizzle Blend packed 8-bit integers from "a" and "b" using "mask", and store the results in "dst". FOR j := 0 to 15 i := j*8 IF mask[i+7] dst[i+7:i] := b[i+7:i] ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR
Integer SSE4.1 Swizzle Blend packed 16-bit integers from "a" and "b" using control mask "imm8", and store the results in "dst". FOR j := 0 to 7 i := j*16 IF imm8[j] dst[i+15:i] := b[i+15:i] ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR
Floating Point SSE4.1 Arithmetic Conditionally multiply the packed double-precision (64-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8". DEFINE DP(a[127:0], b[127:0], imm8[7:0]) { FOR j := 0 to 1 i := j*64 IF imm8[(4+j)%8] temp[i+63:i] := a[i+63:i] * b[i+63:i] ELSE temp[i+63:i] := 0.0 FI ENDFOR sum[63:0] := temp[127:64] + temp[63:0] FOR j := 0 to 1 i := j*64 IF imm8[j%8] tmpdst[i+63:i] := sum[63:0] ELSE tmpdst[i+63:i] := 0.0 FI ENDFOR RETURN tmpdst[127:0] } dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])
Floating Point SSE4.1 Arithmetic Conditionally multiply the packed single-precision (32-bit) floating-point elements in "a" and "b" using the high 4 bits in "imm8", sum the four products, and conditionally store the sum in "dst" using the low 4 bits of "imm8". DEFINE DP(a[127:0], b[127:0], imm8[7:0]) { FOR j := 0 to 3 i := j*32 IF imm8[(4+j)%8] temp[i+31:i] := a[i+31:i] * b[i+31:i] ELSE temp[i+31:i] := 0 FI ENDFOR sum[31:0] := (temp[127:96] + temp[95:64]) + (temp[63:32] + temp[31:0]) FOR j := 0 to 3 i := j*32 IF imm8[j%8] tmpdst[i+31:i] := sum[31:0] ELSE tmpdst[i+31:i] := 0 FI ENDFOR RETURN tmpdst[127:0] } dst[127:0] := DP(a[127:0], b[127:0], imm8[7:0])
Floating Point SSE4.1 Swizzle Extract a single-precision (32-bit) floating-point element from "a", selected with "imm8", and store the result in "dst". dst[31:0] := (a[127:0] >> (imm8[1:0] * 32))[31:0]
Integer SSE4.1 Swizzle Extract an 8-bit integer from "a", selected with "imm8", and store the result in the lower element of "dst". dst[7:0] := (a[127:0] >> (imm8[3:0] * 8))[7:0] dst[31:8] := 0
Integer SSE4.1 Swizzle Extract a 32-bit integer from "a", selected with "imm8", and store the result in "dst". dst[31:0] := (a[127:0] >> (imm8[1:0] * 32))[31:0]
Integer SSE4.1 Swizzle Extract a 64-bit integer from "a", selected with "imm8", and store the result in "dst". dst[63:0] := (a[127:0] >> (imm8[0] * 64))[63:0]
Floating Point SSE4.1 Swizzle Copy "a" to "tmp", then insert a single-precision (32-bit) floating-point element from "b" into "tmp" using the control in "imm8". Store "tmp" to "dst" using the mask in "imm8" (elements are zeroed out when the corresponding bit is set). tmp2[127:0] := a[127:0] CASE (imm8[7:6]) OF 0: tmp1[31:0] := b[31:0] 1: tmp1[31:0] := b[63:32] 2: tmp1[31:0] := b[95:64] 3: tmp1[31:0] := b[127:96] ESAC CASE (imm8[5:4]) OF 0: tmp2[31:0] := tmp1[31:0] 1: tmp2[63:32] := tmp1[31:0] 2: tmp2[95:64] := tmp1[31:0] 3: tmp2[127:96] := tmp1[31:0] ESAC FOR j := 0 to 3 i := j*32 IF imm8[j%8] dst[i+31:i] := 0 ELSE dst[i+31:i] := tmp2[i+31:i] FI ENDFOR
Integer SSE4.1 Swizzle Copy "a" to "dst", and insert the lower 8-bit integer from "i" into "dst" at the location specified by "imm8". dst[127:0] := a[127:0] sel := imm8[3:0]*8 dst[sel+7:sel] := i[7:0]
Integer SSE4.1 Swizzle Copy "a" to "dst", and insert the 32-bit integer "i" into "dst" at the location specified by "imm8". dst[127:0] := a[127:0] sel := imm8[1:0]*32 dst[sel+31:sel] := i[31:0]
Integer SSE4.1 Swizzle Copy "a" to "dst", and insert the 64-bit integer "i" into "dst" at the location specified by "imm8". dst[127:0] := a[127:0] sel := imm8[0]*64 dst[sel+63:sel] := i[63:0]
Integer SSE4.1 Special Math Functions Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := MAX(a[i+7:i], b[i+7:i]) ENDFOR
Integer SSE4.1 Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR
Integer SSE4.1 Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := MAX(a[i+31:i], b[i+31:i]) ENDFOR
Integer SSE4.1 Special Math Functions Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := MAX(a[i+15:i], b[i+15:i]) ENDFOR
Integer SSE4.1 Special Math Functions Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := MIN(a[i+7:i], b[i+7:i]) ENDFOR
Integer SSE4.1 Special Math Functions Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR
Integer SSE4.1 Special Math Functions Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := MIN(a[i+31:i], b[i+31:i]) ENDFOR
Integer SSE4.1 Special Math Functions Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := MIN(a[i+15:i], b[i+15:i]) ENDFOR
Integer SSE4.1 Convert Miscellaneous Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst". dst[15:0] := SaturateU16(a[31:0]) dst[31:16] := SaturateU16(a[63:32]) dst[47:32] := SaturateU16(a[95:64]) dst[63:48] := SaturateU16(a[127:96]) dst[79:64] := SaturateU16(b[31:0]) dst[95:80] := SaturateU16(b[63:32]) dst[111:96] := SaturateU16(b[95:64]) dst[127:112] := SaturateU16(b[127:96])
Integer SSE4.1 Compare Compare packed 64-bit integers in "a" and "b" for equality, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ( a[i+63:i] == b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Integer SSE4.1 Convert Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst". FOR j := 0 to 7 i := j*8 l := j*16 dst[l+15:l] := SignExtend16(a[i+7:i]) ENDFOR
Integer SSE4.1 Convert Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 8*j dst[i+31:i] := SignExtend32(a[k+7:k]) ENDFOR
Integer SSE4.1 Convert Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 8*j dst[i+63:i] := SignExtend64(a[k+7:k]) ENDFOR
Integer SSE4.1 Convert Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 16*j dst[i+31:i] := SignExtend32(a[k+15:k]) ENDFOR
Integer SSE4.1 Convert Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 16*j dst[i+63:i] := SignExtend64(a[k+15:k]) ENDFOR
Integer SSE4.1 Convert Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 32*j dst[i+63:i] := SignExtend64(a[k+31:k]) ENDFOR
Integer SSE4.1 Convert Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst". FOR j := 0 to 7 i := j*8 l := j*16 dst[l+15:l] := ZeroExtend16(a[i+7:i]) ENDFOR
Integer SSE4.1 Convert Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 8*j dst[i+31:i] := ZeroExtend32(a[k+7:k]) ENDFOR
Integer SSE4.1 Convert Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 8*j dst[i+63:i] := ZeroExtend64(a[k+7:k]) ENDFOR
Integer SSE4.1 Convert Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst". FOR j := 0 to 3 i := 32*j k := 16*j dst[i+31:i] := ZeroExtend32(a[k+15:k]) ENDFOR
Integer SSE4.1 Convert Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 16*j dst[i+63:i] := ZeroExtend64(a[k+15:k]) ENDFOR
Integer SSE4.1 Convert Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst". FOR j := 0 to 1 i := 64*j k := 32*j dst[i+63:i] := ZeroExtend64(a[k+31:k]) ENDFOR
Integer SSE4.1 Arithmetic Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := SignExtend64(a[i+31:i]) * SignExtend64(b[i+31:i]) ENDFOR
Integer SSE4.1 Arithmetic Multiply the packed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst". FOR j := 0 to 3 i := j*32 tmp[63:0] := a[i+31:i] * b[i+31:i] dst[i+31:i] := tmp[31:0] ENDFOR
Integer Flag SSE4.1 Logical Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "ZF" value. IF ((a[127:0] AND b[127:0]) == 0) ZF := 1 ELSE ZF := 0 FI IF (((NOT a[127:0]) AND b[127:0]) == 0) CF := 1 ELSE CF := 0 FI RETURN ZF
Integer Flag SSE4.1 Logical Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "CF" value. IF ((a[127:0] AND b[127:0]) == 0) ZF := 1 ELSE ZF := 0 FI IF (((NOT a[127:0]) AND b[127:0]) == 0) CF := 1 ELSE CF := 0 FI RETURN CF
Integer Flag SSE4.1 Logical Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. IF ((a[127:0] AND b[127:0]) == 0) ZF := 1 ELSE ZF := 0 FI IF (((NOT a[127:0]) AND b[127:0]) == 0) CF := 1 ELSE CF := 0 FI IF (ZF == 0 && CF == 0) dst := 1 ELSE dst := 0 FI
Integer Flag SSE4.1 Logical Compute the bitwise AND of 128 bits (representing integer data) in "a" and "mask", and return 1 if the result is zero, otherwise return 0. IF ((a[127:0] AND mask[127:0]) == 0) ZF := 1 ELSE ZF := 0 FI dst := ZF
Integer Flag SSE4.1 Logical Compute the bitwise AND of 128 bits (representing integer data) in "a" and "mask", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "mask", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. IF ((a[127:0] AND mask[127:0]) == 0) ZF := 1 ELSE ZF := 0 FI IF (((NOT a[127:0]) AND mask[127:0]) == 0) CF := 1 ELSE CF := 0 FI IF (ZF == 0 && CF == 0) dst := 1 ELSE dst := 0 FI
Integer Flag SSE4.1 Logical Compute the bitwise NOT of "a" and then AND with a 128-bit vector containing all 1's, and return 1 if the result is zero, otherwise return 0. FOR j := 0 to 127 tmp[j] := 1 ENDFOR IF (((NOT a[127:0]) AND tmp[127:0]) == 0) CF := 1 ELSE CF := 0 FI dst := CF
Floating Point SSE4.1 Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed double-precision floating-point elements in "dst". [round_note] FOR j := 0 to 1 i := j*64 dst[i+63:i] := ROUND(a[i+63:i], rounding) ENDFOR
Floating Point SSE4.1 Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" down to an integer value, and store the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := FLOOR(a[i+63:i]) ENDFOR
Floating Point SSE4.1 Special Math Functions Round the packed double-precision (64-bit) floating-point elements in "a" up to an integer value, and store the results as packed double-precision floating-point elements in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := CEIL(a[i+63:i]) ENDFOR
Floating Point SSE4.1 Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" using the "rounding" parameter, and store the results as packed single-precision floating-point elements in "dst". [round_note] FOR j := 0 to 3 i := j*32 dst[i+31:i] := ROUND(a[i+31:i], rounding) ENDFOR
Floating Point SSE4.1 Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" down to an integer value, and store the results as packed single-precision floating-point elements in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := FLOOR(a[i+31:i]) ENDFOR
Floating Point SSE4.1 Special Math Functions Round the packed single-precision (32-bit) floating-point elements in "a" up to an integer value, and store the results as packed single-precision floating-point elements in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := CEIL(a[i+31:i]) ENDFOR
Floating Point SSE4.1 Special Math Functions Round the lower double-precision (64-bit) floating-point element in "b" using the "rounding" parameter, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [round_note] dst[63:0] := ROUND(b[63:0], rounding) dst[127:64] := a[127:64]
Floating Point SSE4.1 Special Math Functions Round the lower double-precision (64-bit) floating-point element in "b" down to an integer value, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := FLOOR(b[63:0]) dst[127:64] := a[127:64]
Floating Point SSE4.1 Special Math Functions Round the lower double-precision (64-bit) floating-point element in "b" up to an integer value, store the result as a double-precision floating-point element in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". dst[63:0] := CEIL(b[63:0]) dst[127:64] := a[127:64]
Floating Point SSE4.1 Special Math Functions Round the lower single-precision (32-bit) floating-point element in "b" using the "rounding" parameter, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". [round_note] dst[31:0] := ROUND(b[31:0], rounding) dst[127:32] := a[127:32]
Floating Point SSE4.1 Special Math Functions Round the lower single-precision (32-bit) floating-point element in "b" down to an integer value, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := FLOOR(b[31:0]) dst[127:32] := a[127:32]
Floating Point SSE4.1 Special Math Functions Round the lower single-precision (32-bit) floating-point element in "b" up to an integer value, store the result as a single-precision floating-point element in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". dst[31:0] := CEIL(b[31:0]) dst[127:32] := a[127:32]
Integer SSE4.1 Miscellaneous Horizontally compute the minimum amongst the packed unsigned 16-bit integers in "a", store the minimum and index in "dst", and zero the remaining bits in "dst". index[2:0] := 0 min[15:0] := a[15:0] FOR j := 0 to 7 i := j*16 IF a[i+15:i] < min[15:0] index[2:0] := j min[15:0] := a[i+15:i] FI ENDFOR dst[15:0] := min[15:0] dst[18:16] := index[2:0] dst[127:19] := 0
Integer SSE4.1 Arithmetic Miscellaneous Compute the sum of absolute differences (SADs) of quadruplets of unsigned 8-bit integers in "a" compared to those in "b", and store the 16-bit results in "dst". Eight SADs are performed using one quadruplet from "b" and eight quadruplets from "a". One quadruplet is selected from "b" starting at on the offset specified in "imm8". Eight quadruplets are formed from sequential 8-bit integers selected from "a" starting at the offset specified in "imm8". DEFINE MPSADBW(a[127:0], b[127:0], imm8[2:0]) { a_offset := imm8[2]*32 b_offset := imm8[1:0]*32 FOR j := 0 to 7 i := j*8 k := a_offset+i l := b_offset tmp[i*2+15:i*2] := ABS(Signed(a[k+7:k] - b[l+7:l])) + ABS(Signed(a[k+15:k+8] - b[l+15:l+8])) + \ ABS(Signed(a[k+23:k+16] - b[l+23:l+16])) + ABS(Signed(a[k+31:k+24] - b[l+31:l+24])) ENDFOR RETURN tmp[127:0] } dst[127:0] := MPSADBW(a[127:0], b[127:0], imm8[2:0])
Integer SSE4.1 Load Load 128-bits of integer data from memory into "dst" using a non-temporal memory hint. "mem_addr" must be aligned on a 16-byte boundary or a general-protection exception may be generated. dst[127:0] := MEM[mem_addr+127:mem_addr]
SSE4.2 String Compare Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and store the generated mask in "dst". [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF a[m+size-1:m] == 0 aInvalid := 1 FI IF b[n+size-1:n] == 0 bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results bInvalid := 0 FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF b[n+size-1:n] == 0 bInvalid := 1 FI IF bInvalid // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output IF imm8[6] // byte / word mask FOR i := 0 to UpperBound j := i*size IF IntRes2[i] dst[j+size-1:j] := (imm8[0] ? 0xFF : 0xFFFF) ELSE dst[j+size-1:j] := 0 FI ENDFOR ELSE // bit mask dst[UpperBound:0] := IntRes2[UpperBound:0] dst[127:UpperBound+1] := 0 FI
Flag SSE4.2 String Compare Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and store the generated index in "dst". [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF a[m+size-1:m] == 0 aInvalid := 1 FI IF b[n+size-1:n] == 0 bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results bInvalid := 0 FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF b[n+size-1:n] == 0 bInvalid := 1 FI IF bInvalid // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output IF imm8[6] // most significant bit tmp := UpperBound dst := tmp DO WHILE ((tmp >= 0) AND a[tmp] == 0) tmp := tmp - 1 dst := tmp OD ELSE // least significant bit tmp := 0 dst := tmp DO WHILE ((tmp <= UpperBound) AND a[tmp] == 0) tmp := tmp + 1 dst := tmp OD FI
Flag SSE4.2 String Compare Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if any character in "b" was null, and 0 otherwise. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 bInvalid := 0 FOR j := 0 to UpperBound n := j*size IF b[n+size-1:n] == 0 bInvalid := 1 FI ENDFOR dst := bInvalid
Flag SSE4.2 String Compare Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if the resulting mask was non-zero, and 0 otherwise. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF a[m+size-1:m] == 0 aInvalid := 1 FI IF b[n+size-1:n] == 0 bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results bInvalid := 0 FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF b[n+size-1:n] == 0 bInvalid := 1 FI IF bInvalid // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output dst := (IntRes2 != 0)
Flag SSE4.2 String Compare Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if any character in "a" was null, and 0 otherwise. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 aInvalid := 0 FOR i := 0 to UpperBound m := i*size IF a[m+size-1:m] == 0 aInvalid := 1 FI ENDFOR dst := aInvalid
Flag SSE4.2 String Compare Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns bit 0 of the resulting bit mask. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF a[m+size-1:m] == 0 aInvalid := 1 FI IF b[n+size-1:n] == 0 bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results bInvalid := 0 FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF b[n+size-1:n] == 0 bInvalid := 1 FI IF bInvalid // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output dst := IntRes2[0]
Flag SSE4.2 String Compare Compare packed strings with implicit lengths in "a" and "b" using the control in "imm8", and returns 1 if "b" did not contain a null character and the resulting mask was zero, and 0 otherwise. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF a[m+size-1:m] == 0 aInvalid := 1 FI IF b[n+size-1:n] == 0 bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results bInvalid := 0 FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF b[n+size-1:n] == 0 bInvalid := 1 FI IF bInvalid // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output dst := (IntRes2 == 0) AND bInvalid
SSE4.2 String Compare Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and store the generated mask in "dst". [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF i == la aInvalid := 1 FI IF j == lb bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF i >= lb // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output IF imm8[6] // byte / word mask FOR i := 0 to UpperBound j := i*size IF IntRes2[i] dst[j+size-1:j] := (imm8[0] ? 0xFF : 0xFFFF) ELSE dst[j+size-1:j] := 0 FI ENDFOR ELSE // bit mask dst[UpperBound:0] := IntRes2[UpperBound:0] dst[127:UpperBound+1] := 0 FI
Flag SSE4.2 String Compare Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and store the generated index in "dst". [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF i == la aInvalid := 1 FI IF j == lb bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF i >= lb // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output IF imm8[6] // most significant bit tmp := UpperBound dst := tmp DO WHILE ((tmp >= 0) AND a[tmp] == 0) tmp := tmp - 1 dst := tmp OD ELSE // least significant bit tmp := 0 dst := tmp DO WHILE ((tmp <= UpperBound) AND a[tmp] == 0) tmp := tmp + 1 dst := tmp OD FI
Flag SSE4.2 String Compare Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if any character in "b" was null, and 0 otherwise. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 dst := (lb <= UpperBound)
Flag SSE4.2 String Compare Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if the resulting mask was non-zero, and 0 otherwise. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF i == la aInvalid := 1 FI IF j == lb bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF i >= lb // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output dst := (IntRes2 != 0)
Flag SSE4.2 String Compare Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if any character in "a" was null, and 0 otherwise. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 dst := (la <= UpperBound)
Flag SSE4.2 String Compare Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns bit 0 of the resulting bit mask. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF i == la aInvalid := 1 FI IF j == lb bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF i >= lb // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output dst := IntRes2[0]
Flag SSE4.2 String Compare Compare packed strings in "a" and "b" with lengths "la" and "lb" using the control in "imm8", and returns 1 if "b" did not contain a null character and the resulting mask was zero, and 0 otherwise. [strcmp_note] size := (imm8[0] ? 16 : 8) // 8 or 16-bit characters UpperBound := (128 / size) - 1 BoolRes := 0 // compare all characters aInvalid := 0 bInvalid := 0 FOR i := 0 to UpperBound m := i*size FOR j := 0 to UpperBound n := j*size BoolRes.word[i].bit[j] := (a[m+size-1:m] == b[n+size-1:n]) ? 1 : 0 // invalidate characters after EOS IF i == la aInvalid := 1 FI IF j == lb bInvalid := 1 FI // override comparisons for invalid characters CASE (imm8[3:2]) OF 0: // equal any IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 1: // ranges IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 FI 2: // equal each IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI 3: // equal ordered IF (!aInvalid && bInvalid) BoolRes.word[i].bit[j] := 0 ELSE IF (aInvalid && !bInvalid) BoolRes.word[i].bit[j] := 1 ELSE IF (aInvalid && bInvalid) BoolRes.word[i].bit[j] := 1 FI ESAC ENDFOR ENDFOR // aggregate results CASE (imm8[3:2]) OF 0: // equal any IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR BoolRes.word[i].bit[j] ENDFOR ENDFOR 1: // ranges IntRes1 := 0 FOR i := 0 to UpperBound FOR j := 0 to UpperBound IntRes1[i] := IntRes1[i] OR (BoolRes.word[i].bit[j] AND BoolRes.word[i].bit[j+1]) j += 2 ENDFOR ENDFOR 2: // equal each IntRes1 := 0 FOR i := 0 to UpperBound IntRes1[i] := BoolRes.word[i].bit[i] ENDFOR 3: // equal ordered IntRes1 := (imm8[0] ? 0xFF : 0xFFFF) FOR i := 0 to UpperBound k := i FOR j := 0 to UpperBound-i IntRes1[i] := IntRes1[i] AND BoolRes.word[k].bit[j] k := k+1 ENDFOR ENDFOR ESAC // optionally negate results FOR i := 0 to UpperBound IF imm8[4] IF imm8[5] // only negate valid IF i >= lb // invalid, don't negate IntRes2[i] := IntRes1[i] ELSE // valid, negate IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // negate all IntRes2[i] := -1 XOR IntRes1[i] FI ELSE // don't negate IntRes2[i] := IntRes1[i] FI ENDFOR // output dst := (IntRes2 == 0) AND (lb > UpperBound)
Integer SSE4.2 Compare Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in "dst". FOR j := 0 to 1 i := j*64 dst[i+63:i] := ( a[i+63:i] > b[i+63:i] ) ? 0xFFFFFFFFFFFFFFFF : 0 ENDFOR
Integer SSE4.2 Cryptography Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 8-bit integer "v", and stores the result in "dst". tmp1[7:0] := v[0:7] // bit reflection tmp2[31:0] := crc[0:31] // bit reflection tmp3[39:0] := tmp1[7:0] << 32 tmp4[39:0] := tmp2[31:0] << 8 tmp5[39:0] := tmp3[39:0] XOR tmp4[39:0] tmp6[31:0] := MOD2(tmp5[39:0], 0x11EDC6F41) // remainder from polynomial division modulus 2 dst[31:0] := tmp6[0:31] // bit reflection
Integer SSE4.2 Cryptography Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 16-bit integer "v", and stores the result in "dst". tmp1[15:0] := v[0:15] // bit reflection tmp2[31:0] := crc[0:31] // bit reflection tmp3[47:0] := tmp1[15:0] << 32 tmp4[47:0] := tmp2[31:0] << 16 tmp5[47:0] := tmp3[47:0] XOR tmp4[47:0] tmp6[31:0] := MOD2(tmp5[47:0], 0x11EDC6F41) // remainder from polynomial division modulus 2 dst[31:0] := tmp6[0:31] // bit reflection
Integer SSE4.2 Cryptography Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 32-bit integer "v", and stores the result in "dst". tmp1[31:0] := v[0:31] // bit reflection tmp2[31:0] := crc[0:31] // bit reflection tmp3[63:0] := tmp1[31:0] << 32 tmp4[63:0] := tmp2[31:0] << 32 tmp5[63:0] := tmp3[63:0] XOR tmp4[63:0] tmp6[31:0] := MOD2(tmp5[63:0], 0x11EDC6F41) // remainder from polynomial division modulus 2 dst[31:0] := tmp6[0:31] // bit reflection
Integer SSE4.2 Cryptography Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 64-bit integer "v", and stores the result in "dst". tmp1[63:0] := v[0:63] // bit reflection tmp2[31:0] := crc[0:31] // bit reflection tmp3[95:0] := tmp1[31:0] << 32 tmp4[95:0] := tmp2[63:0] << 64 tmp5[95:0] := tmp3[95:0] XOR tmp4[95:0] tmp6[31:0] := MOD2(tmp5[95:0], 0x11EDC6F41) // remainder from polynomial division modulus 2 dst[31:0] := tmp6[0:31] // bit reflection
Integer SSSE3 Special Math Functions Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 7 i := j*8 dst[i+7:i] := ABS(Int(a[i+7:i])) ENDFOR
Integer SSSE3 Special Math Functions Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 15 i := j*8 dst[i+7:i] := ABS(a[i+7:i]) ENDFOR
Integer SSSE3 Special Math Functions Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := ABS(Int(a[i+15:i])) ENDFOR
Integer SSSE3 Special Math Functions Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := ABS(a[i+15:i]) ENDFOR
Integer SSSE3 Special Math Functions Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 1 i := j*32 dst[i+31:i] := ABS(a[i+31:i]) ENDFOR
Integer SSSE3 Special Math Functions Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst". FOR j := 0 to 3 i := j*32 dst[i+31:i] := ABS(a[i+31:i]) ENDFOR
Integer SSSE3 Swizzle Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst". FOR j := 0 to 15 i := j*8 IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[3:0] := b[i+3:i] dst[i+7:i] := a[index*8+7:index*8] FI ENDFOR
Integer SSSE3 Swizzle Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst". FOR j := 0 to 7 i := j*8 IF b[i+7] == 1 dst[i+7:i] := 0 ELSE index[2:0] := b[i+2:i] dst[i+7:i] := a[index*8+7:index*8] FI ENDFOR
Integer SSSE3 Miscellaneous Concatenate 16-byte blocks in "a" and "b" into a 32-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst". tmp[255:0] := ((a[127:0] << 128)[255:0] OR b[127:0]) >> (imm8*8) dst[127:0] := tmp[127:0]
Integer SSSE3 Miscellaneous Concatenate 8-byte blocks in "a" and "b" into a 16-byte temporary result, shift the result right by "imm8" bytes, and store the low 16 bytes in "dst". tmp[127:0] := ((a[63:0] << 64)[127:0] OR b[63:0]) >> (imm8*8) dst[63:0] := tmp[63:0]
Integer SSSE3 Arithmetic Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". dst[15:0] := a[31:16] + a[15:0] dst[31:16] := a[63:48] + a[47:32] dst[47:32] := a[95:80] + a[79:64] dst[63:48] := a[127:112] + a[111:96] dst[79:64] := b[31:16] + b[15:0] dst[95:80] := b[63:48] + b[47:32] dst[111:96] := b[95:80] + b[79:64] dst[127:112] := b[127:112] + b[111:96]
Integer SSSE3 Arithmetic Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". dst[15:0] := Saturate16(a[31:16] + a[15:0]) dst[31:16] := Saturate16(a[63:48] + a[47:32]) dst[47:32] := Saturate16(a[95:80] + a[79:64]) dst[63:48] := Saturate16(a[127:112] + a[111:96]) dst[79:64] := Saturate16(b[31:16] + b[15:0]) dst[95:80] := Saturate16(b[63:48] + b[47:32]) dst[111:96] := Saturate16(b[95:80] + b[79:64]) dst[127:112] := Saturate16(b[127:112] + b[111:96])
Integer SSSE3 Arithmetic Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". dst[31:0] := a[63:32] + a[31:0] dst[63:32] := a[127:96] + a[95:64] dst[95:64] := b[63:32] + b[31:0] dst[127:96] := b[127:96] + b[95:64]
Integer SSSE3 Arithmetic Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". dst[15:0] := a[31:16] + a[15:0] dst[31:16] := a[63:48] + a[47:32] dst[47:32] := b[31:16] + b[15:0] dst[63:48] := b[63:48] + b[47:32]
Integer SSSE3 Arithmetic Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". dst[31:0] := a[63:32] + a[31:0] dst[63:32] := b[63:32] + b[31:0]
Integer SSSE3 Arithmetic Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". dst[15:0] := Saturate16(a[31:16] + a[15:0]) dst[31:16] := Saturate16(a[63:48] + a[47:32]) dst[47:32] := Saturate16(b[31:16] + b[15:0]) dst[63:48] := Saturate16(b[63:48] + b[47:32])
Integer SSSE3 Arithmetic Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". dst[15:0] := a[15:0] - a[31:16] dst[31:16] := a[47:32] - a[63:48] dst[47:32] := a[79:64] - a[95:80] dst[63:48] := a[111:96] - a[127:112] dst[79:64] := b[15:0] - b[31:16] dst[95:80] := b[47:32] - b[63:48] dst[111:96] := b[79:64] - b[95:80] dst[127:112] := b[111:96] - b[127:112]
Integer SSSE3 Arithmetic Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". dst[15:0] := Saturate16(a[15:0] - a[31:16]) dst[31:16] := Saturate16(a[47:32] - a[63:48]) dst[47:32] := Saturate16(a[79:64] - a[95:80]) dst[63:48] := Saturate16(a[111:96] - a[127:112]) dst[79:64] := Saturate16(b[15:0] - b[31:16]) dst[95:80] := Saturate16(b[47:32] - b[63:48]) dst[111:96] := Saturate16(b[79:64] - b[95:80]) dst[127:112] := Saturate16(b[111:96] - b[127:112])
Integer SSSE3 Arithmetic Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". dst[31:0] := a[31:0] - a[63:32] dst[63:32] := a[95:64] - a[127:96] dst[95:64] := b[31:0] - b[63:32] dst[127:96] := b[95:64] - b[127:96]
Integer SSSE3 Arithmetic Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". dst[15:0] := a[15:0] - a[31:16] dst[31:16] := a[47:32] - a[63:48] dst[47:32] := b[15:0] - b[31:16] dst[63:48] := b[47:32] - b[63:48]
Integer SSSE3 Arithmetic Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". dst[31:0] := a[31:0] - a[63:32] dst[63:32] := b[31:0] - b[63:32]
Integer SSSE3 Arithmetic Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". dst[15:0] := Saturate16(a[15:0] - a[31:16]) dst[31:16] := Saturate16(a[47:32] - a[63:48]) dst[47:32] := Saturate16(b[15:0] - b[31:16]) dst[63:48] := Saturate16(b[47:32] - b[63:48])
Integer SSSE3 Arithmetic Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst". FOR j := 0 to 7 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ENDFOR
Integer SSSE3 Arithmetic Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst". FOR j := 0 to 3 i := j*16 dst[i+15:i] := Saturate16( a[i+15:i+8]*b[i+15:i+8] + a[i+7:i]*b[i+7:i] ) ENDFOR
Integer SSSE3 Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst". FOR j := 0 to 7 i := j*16 tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ENDFOR
Integer SSSE3 Arithmetic Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst". FOR j := 0 to 3 i := j*16 tmp[31:0] := ((SignExtend32(a[i+15:i]) * SignExtend32(b[i+15:i])) >> 14) + 1 dst[i+15:i] := tmp[16:1] ENDFOR
Integer SSSE3 Arithmetic Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 15 i := j*8 IF b[i+7:i] < 0 dst[i+7:i] := -(a[i+7:i]) ELSE IF b[i+7:i] == 0 dst[i+7:i] := 0 ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR
Integer SSSE3 Arithmetic Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 7 i := j*16 IF b[i+15:i] < 0 dst[i+15:i] := -(a[i+15:i]) ELSE IF b[i+15:i] == 0 dst[i+15:i] := 0 ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR
Integer SSSE3 Arithmetic Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 3 i := j*32 IF b[i+31:i] < 0 dst[i+31:i] := -(a[i+31:i]) ELSE IF b[i+31:i] == 0 dst[i+31:i] := 0 ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR
Integer SSSE3 Arithmetic Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 7 i := j*8 IF b[i+7:i] < 0 dst[i+7:i] := -(a[i+7:i]) ELSE IF b[i+7:i] == 0 dst[i+7:i] := 0 ELSE dst[i+7:i] := a[i+7:i] FI ENDFOR
Integer SSSE3 Arithmetic Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 3 i := j*16 IF b[i+15:i] < 0 dst[i+15:i] := -(a[i+15:i]) ELSE IF b[i+15:i] == 0 dst[i+15:i] := 0 ELSE dst[i+15:i] := a[i+15:i] FI ENDFOR
Integer SSSE3 Arithmetic Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. FOR j := 0 to 1 i := j*32 IF b[i+31:i] < 0 dst[i+31:i] := -(a[i+31:i]) ELSE IF b[i+31:i] == 0 dst[i+31:i] := 0 ELSE dst[i+31:i] := a[i+31:i] FI ENDFOR
TSC General Support Copy the current 64-bit value of the processor's time-stamp counter into "dst". dst[63:0] := TimeStampCounter
TSXLDTRK Miscellaneous Mark the start of a TSX (HLE/RTM) suspend load address tracking region. If this is used inside a transactional region, subsequent loads are not added to the read set of the transaction. If this is used inside a suspend load address tracking region it will cause transaction abort. If this is used outside of a transactional region it behaves like a NOP.
TSXLDTRK Miscellaneous Mark the end of a TSX (HLE/RTM) suspend load address tracking region. If this is used inside a suspend load address tracking region it will end the suspend region and all following load addresses will be added to the transaction read set. If this is used inside an active transaction but not in a suspend region it will cause transaction abort. If this is used outside of a transactional region it behaves like a NOP.
Integer AVX512VL VAES Cryptography Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"." FOR j := 0 to 1 i := j*128 a[i+127:i] := ShiftRows(a[i+127:i]) a[i+127:i] := SubBytes(a[i+127:i]) dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i] ENDFOR dst[MAX:256] := 0
Integer AVX512VL VAES Cryptography Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst"." FOR j := 0 to 1 i := j*128 a[i+127:i] := ShiftRows(a[i+127:i]) a[i+127:i] := SubBytes(a[i+127:i]) a[i+127:i] := MixColumns(a[i+127:i]) dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i] ENDFOR dst[MAX:256] := 0
Integer AVX512VL VAES Cryptography Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst". FOR j := 0 to 1 i := j*128 a[i+127:i] := InvShiftRows(a[i+127:i]) a[i+127:i] := InvSubBytes(a[i+127:i]) dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i] ENDFOR dst[MAX:256] := 0
Integer AVX512VL VAES Cryptography Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the results in "dst". FOR j := 0 to 1 i := j*128 a[i+127:i] := InvShiftRows(a[i+127:i]) a[i+127:i] := InvSubBytes(a[i+127:i]) a[i+127:i] := InvMixColumns(a[i+127:i]) dst[i+127:i] := a[i+127:i] XOR RoundKey[i+127:i] ENDFOR dst[MAX:256] := 0
Integer VPCLMULQDQ Application-Targeted Carry-less multiplication of one quadword of 'b' by one quadword of 'c', stores the 128-bit result in 'dst'. The immediate 'Imm8' is used to determine which quadwords of 'b' and 'c' should be used. DEFINE PCLMUL128(X,Y) { FOR i := 0 to 63 TMP[i] := X[ 0 ] and Y[ i ] FOR j := 1 to i TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ]) ENDFOR DEST[ i ] := TMP[ i ] ENDFOR FOR i := 64 to 126 TMP[i] := 0 FOR j := i - 63 to 63 TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ]) ENDFOR DEST[ i ] := TMP[ i ] ENDFOR DEST[127] := 0 RETURN DEST // 128b vector } FOR i := 0 to 3 IF Imm8[0] == 0 TEMP1 := b.m128[i].qword[0] ELSE TEMP1 := b.m128[i].qword[1] FI IF Imm8[4] == 0 TEMP2 := c.m128[i].qword[0] ELSE TEMP2 := c.m128[i].qword[1] FI dst.m128[i] := PCLMUL128(TEMP1, TEMP2) ENDFOR dst[MAX:512] := 0
Integer AVX512VL VPCLMULQDQ Application-Targeted Carry-less multiplication of one quadword of 'b' by one quadword of 'c', stores the 128-bit result in 'dst'. The immediate 'Imm8' is used to determine which quadwords of 'b' and 'c' should be used. DEFINE PCLMUL128(X,Y) { FOR i := 0 to 63 TMP[i] := X[ 0 ] and Y[ i ] FOR j := 1 to i TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ]) ENDFOR DEST[ i ] := TMP[ i ] ENDFOR FOR i := 64 to 126 TMP[i] := 0 FOR j := i - 63 to 63 TMP[i] := TMP[i] xor (X[ j ] and Y[ i - j ]) ENDFOR DEST[ i ] := TMP[ i ] ENDFOR DEST[127] := 0 RETURN DEST // 128b vector } FOR i := 0 to 1 IF Imm8[0] == 0 TEMP1 := b.m128[i].qword[0] ELSE TEMP1 := b.m128[i].qword[1] FI IF Imm8[4] == 0 TEMP2 := c.m128[i].qword[0] ELSE TEMP2 := c.m128[i].qword[1] FI dst.m128[i] := PCLMUL128(TEMP1, TEMP2) ENDFOR dst[MAX:256] := 0
Flag WAITPKG Miscellaneous Directs the processor to enter an implementation-dependent optimized state until the TSC reaches or exceeds the value specified in "counter". Bit 0 of "ctrl" selects between a lower power (cleared) or faster wakeup (set) optimized state. Returns the carry flag (CF).
Flag WAITPKG Miscellaneous Directs the processor to enter an implementation-dependent optimized state while monitoring a range of addresses. The instruction wakes up when the TSC reaches or exceeds the value specified in "counter" (if the monitoring hardware did not trigger beforehand). Bit 0 of "ctrl" selects between a lower power (cleared) or faster wakeup (set) optimized state. Returns the carry flag (CF).
WAITPKG Miscellaneous Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be a writeback memory caching type. The address is contained in "a".
WBNOINVD Miscellaneous Write back and do not flush internal caches. Initiate writing-back without flushing of external caches.
XSAVE OS-Targeted Copy up to 64-bits from the value of the extended control register (XCR) specified by "a" into "dst". Currently only XFEATURE_ENABLED_MASK XCR is supported. dst[63:0] := XCR[a]
XSAVE OS-Targeted Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary. st_mask := mem_addr.HEADER.XSTATE_BV[62:0] FOR i := 0 to 62 IF (rs_mask[i] AND XCR0[i]) IF st_mask[i] CASE (i) OF 0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU] 1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE] DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i] ESAC ELSE // ProcessorExtendedState := Processor Supplied Values CASE (i) OF 1: MXCSR := mem_addr.FPUSSESave_Area[SSE] ESAC FI FI i := i + 1 ENDFOR
XSAVE OS-Targeted Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary. st_mask := mem_addr.HEADER.XSTATE_BV[62:0] FOR i := 0 to 62 IF (rs_mask[i] AND XCR0[i]) IF st_mask[i] CASE (i) OF 0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU] 1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE] DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i] ESAC ELSE // ProcessorExtendedState := Processor Supplied Values CASE (i) OF 1: MXCSR := mem_addr.FPUSSESave_Area[SSE] ESAC FI FI i := i + 1 ENDFOR
XSAVE OS-Targeted Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. mask[62:0] := save_mask[62:0] AND XCR0[62:0] FOR i := 0 to 62 IF mask[i] CASE (i) OF 0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU] 1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE] DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i] ESAC mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i] FI i := i + 1 ENDFOR
XSAVE OS-Targeted Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. mask[62:0] := save_mask[62:0] AND XCR0[62:0] FOR i := 0 to 62 IF mask[i] CASE (i) OF 0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU] 1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE] DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i] ESAC mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i] FI i := i + 1 ENDFOR
XSAVE XSAVEOPT OS-Targeted Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. The hardware may optimize the manner in which data is saved. The performance of this instruction will be equal to or better than using the XSAVE instruction. mask[62:0] := save_mask[62:0] AND XCR0[62:0] FOR i := 0 to 62 IF mask[i] CASE (i) OF 0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU] 1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE] 2: mem_addr.EXT_SAVE_Area2[YMM] := ProcessorState[YMM] DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i] ESAC mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i] FI i := i + 1 ENDFOR
XSAVE XSAVEOPT OS-Targeted Perform a full or partial save of the enabled processor states to memory at "mem_addr". State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. The hardware may optimize the manner in which data is saved. The performance of this instruction will be equal to or better than using the XSAVE64 instruction. mask[62:0] := save_mask[62:0] AND XCR0[62:0] FOR i := 0 to 62 IF mask[i] CASE (i) OF 0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU] 1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE] 2: mem_addr.EXT_SAVE_Area2[YMM] := ProcessorState[YMM] DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i] ESAC mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i] FI i := i + 1 ENDFOR
XSAVE OS-Targeted Copy 64-bits from "val" to the extended control register (XCR) specified by "a". Currently only XFEATURE_ENABLED_MASK XCR is supported. XCR[a] := val[63:0]
XSAVE XSAVEC OS-Targeted Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsavec differs from xsave in that it uses compaction and that it may use init optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. mask[62:0] := save_mask[62:0] AND XCR0[62:0] FOR i := 0 to 62 IF mask[i] CASE (i) OF 0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU] 1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE] DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i] ESAC mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i] FI i := i + 1 ENDFOR
XSAVE XSS OS-Targeted Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsaves differs from xsave in that it can save state components corresponding to bits set in IA32_XSS MSR and that it may use the modified optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. mask[62:0] := save_mask[62:0] AND XCR0[62:0] FOR i := 0 to 62 IF mask[i] CASE (i) OF 0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU] 1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE] DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i] ESAC mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i] FI i := i + 1 ENDFOR
XSAVE XSAVEC OS-Targeted Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsavec differs from xsave in that it uses compaction and that it may use init optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. mask[62:0] := save_mask[62:0] AND XCR0[62:0] FOR i := 0 to 62 IF mask[i] CASE (i) OF 0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU] 1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE] DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i] ESAC mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i] FI i := i + 1 ENDFOR
XSAVE XSS OS-Targeted Perform a full or partial save of the enabled processor states to memory at "mem_addr"; xsaves differs from xsave in that it can save state components corresponding to bits set in IA32_XSS MSR and that it may use the modified optimization. State is saved based on bits [62:0] in "save_mask" and "XCR0". "mem_addr" must be aligned on a 64-byte boundary. mask[62:0] := save_mask[62:0] AND XCR0[62:0] FOR i := 0 to 62 IF mask[i] CASE (i) OF 0: mem_addr.FPUSSESave_Area[FPU] := ProcessorState[x87_FPU] 1: mem_addr.FPUSSESaveArea[SSE] := ProcessorState[SSE] DEFAULT: mem_addr.Ext_Save_Area[i] := ProcessorState[i] ESAC mem_addr.HEADER.XSTATE_BV[i] := INIT_FUNCTION[i] FI i := i + 1 ENDFOR
XSAVE XSS OS-Targeted Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". xrstors differs from xrstor in that it can restore state components corresponding to bits set in the IA32_XSS MSR; xrstors cannot restore from an xsave area in which the extended region is in the standard form. State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary. st_mask := mem_addr.HEADER.XSTATE_BV[62:0] FOR i := 0 to 62 IF (rs_mask[i] AND XCR0[i]) IF st_mask[i] CASE (i) OF 0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU] 1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE] DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i] ESAC ELSE // ProcessorExtendedState := Processor Supplied Values CASE (i) OF 1: MXCSR := mem_addr.FPUSSESave_Area[SSE] ESAC FI FI i := i + 1 ENDFOR
XSAVE XSS OS-Targeted Perform a full or partial restore of the enabled processor states using the state information stored in memory at "mem_addr". xrstors differs from xrstor in that it can restore state components corresponding to bits set in the IA32_XSS MSR; xrstors cannot restore from an xsave area in which the extended region is in the standard form. State is restored based on bits [62:0] in "rs_mask", "XCR0", and "mem_addr.HEADER.XSTATE_BV". "mem_addr" must be aligned on a 64-byte boundary. st_mask := mem_addr.HEADER.XSTATE_BV[62:0] FOR i := 0 to 62 IF (rs_mask[i] AND XCR0[i]) IF st_mask[i] CASE (i) OF 0: ProcessorState[x87_FPU] := mem_addr.FPUSSESave_Area[FPU] 1: ProcessorState[SSE] := mem_addr.FPUSSESaveArea[SSE] DEFAULT: ProcessorState[i] := mem_addr.Ext_Save_Area[i] ESAC ELSE // ProcessorExtendedState := Processor Supplied Values CASE (i) OF 1: MXCSR := mem_addr.FPUSSESave_Area[SSE] ESAC FI FI i := i + 1 ENDFOR