EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #271 background imageLoading...
Page #271 background image
Optimizing for SIMD Floating-point Applications 5
5-9
Now consider the case when the data is organized as SoA. Example 5-2
demonstrates how 4 results are computed for 5 instructions.
For the most efficient use of the four component-wide registers,
reorganizing the data into the SoA format yields increased throughput
and hence much better performance for the instructions used.
As can be seen from this simple example, vertical computation yielded
100% use of the available SIMD registers and produced 4 results. (The
results may vary based on the application.) If the data structures must be
in a format that is not “friendly” to vertical computation, it can be
rearranged “on the fly” to achieve full utilization of the SIMD registers.
This operation is referred to as “swizzling” operation and the reverse
operation is referred to as “deswizzling.”
Data Swizzling
Swizzling data from one format to another may be required in many
algorithms when the available instruction set extension is limited (e.g.,
only SSE is available). An example of this is AoS format, where the
vertices come as
xyz adjacent coordinates. Rearranging them into SoA
format,
xxxx, yyyy, zzzz, allows more efficient SIMD computations.
For efficient data shuffling and swizzling use the following instructions:
movlps, movhps load/store and move data on half sections of the
registers
shufps, unpackhps, and unpacklps unpack data
Example 5-2 Pseudocode for Vertical (xxxx, yyyy, zzzz, SoA) Computation
mulps ; x*x' for all 4 x-components of 4 vertices
mulps ; y*y' for all 4 y-components of 4 vertices
mulps ; z*z' for all 4 z-components of 4 vertices
addps ; x*x' + y*y'
addps ; x*x'+y*y'+z*z'

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals