IA-32 Instruction Latency and Throughput C
C-9
PSUBB/PSUBW/PSUBD
xmm, xmm
2 2 1 2 2 1 MMX_ALU
PSUBSB/PSUBSW/PSUBU
SB/PSUBUSW xmm, xmm
2 2 1 2 2 1 MMX_ALU
PUNPCKHBW/PUNPCKH
WD/PUNPCKHDQ xmm,
xmm
4 4 1+1 2 2 2 MMX_SHFT
PUNPCKHQDQ xmm, xmm 4 4 1_1 2 2 2 MMX_SHFT
PUNPCKLBW/PUNPCKLW
D/PUNPCKLDQ xmm, xmm
2 2 2 2 2 2 MMX_SHFT
PUNPCKLQDQ
3
xmm,
xmm
44 1 11 1FP_MISC
PXOR xmm, xmm 2 2 1 2 2 1 MMX_ALU
See “Table Footnotes”
Table C-3 Streaming SIMD Extension 2 Double-precision Floating-point
Instructions
Instruction Latency
1
Throughput
Execution
Unit
2
CPUID 0F3n 0F2n 0x69n 0F3n 0F2n 0x69n 0F2n
ADDPD xmm, xmm 5 4 4 2 2 2 FP_ADD
ADDSD xmm, xmm 5 4 3 2 2 1 FP_ADD
ANDNPD
3
xmm, xmm 4 4 1 2 2 1 MMX_ALU
ANDPD
3
xmm, xmm 4 4 1 2 2 1 MMX_ALU
CMPPD xmm, xmm,
imm8
54 4 22 2FP_ADD
CMPSD xmm, xmm,
imm8
54 3 22 1FP_ADD
continued
Table C-2 Streaming SIMD Extension 2 128-bit Integer Instructions (continued)
Instruction Latency
1
Throughput
Execution
Unit
2