xvi
Example 3-4 Identification of SSE2 with cpuid ..................................................... 3-5
Example 3-5 Identification of SSE2 by the OS ..................................................... 3-6
Example 3-6 Identification of SSE3 with cpuid ..................................................... 3-7
Example 3-7 Identification of SSE3 by the OS ..................................................... 3-8
Example 3-8 Simple Four-Iteration Loop ............................................................ 3-14
Example 3-9 Streaming SIMD Extensions Using Inlined Assembly Encoding ... 3-15
Example 3-10 Simple Four-Iteration Loop Coded with Intrinsics.......................... 3-16
Example 3-11 C++ Code Using the Vector Classes.............................................3-18
Example 3-12 Automatic Vectorization for a Simple Loop.................................... 3-19
Example 3-13 C Algorithm for 64-bit Data Alignment........................................... 3-23
Example 3-14 AoS Data Structure ....................................................................... 3-27
Example 3-16 AoS and SoA Code Samples ........................................................ 3-28
Example 3-15 SoA Data Structure ....................................................................... 3-28
Example 3-17 Hybrid SoA Data Structure ............................................................ 3-30
Example 3-18 Pseudo-code Before Strip Mining..................................................3-32
Example 3-19 Strip Mined Code........................................................................... 3-33
Example 3-20 Loop Blocking................................................................................ 3-35
Example 3-21 Emulation of Conditional Moves .................................................... 3-37
Example 4-1 Resetting the Register between __m64 and FP Data Types...........4-5
Example 4-2 Unsigned Unpack Instructions......................................................... 4-7
Example 4-3 Signed Unpack Code ...................................................................... 4-8
Example 4-4 Interleaved Pack with Saturation ................................................... 4-10
Example 4-5 Interleaved Pack without Saturation .............................................. 4-11
Example 4-6 Unpacking Two Packed-word Sources in a Non-interleaved Way .4-13
Example 4-7 pextrw Instruction Code................................................................. 4-14
Example 4-8 pinsrw Instruction Code................................................................. 4-15
Example 4-9 Repeated pinsrw Instruction Code ................................................ 4-16
Example 4-10 pmovmskb Instruction Code.......................................................... 4-17
Example 4-12 Broadcast Using 2 Instructions...................................................... 4-19
Example 4-11 pshuf Instruction Code .................................................................. 4-19
Example 4-13 Swap Using 3 Instructions............................................................. 4-20
Example 4-14 Reverse Using 3 Instructions......................................................... 4-20
Example 4-15 Generating Constants ................................................................... 4-21
Example 4-16 Absolute Difference of Two Unsigned Numbers ............................ 4-23
Example 4-17 Absolute Difference of Signed Numbers ....................................... 4-24
Example 4-18 Computing Absolute Value ............................................................4-25
Example 4-19 Clipping to a Signed Range of Words [high, low] .......................... 4-27