EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #154 background imageLoading...
Page #154 background image
IA-32 Intel® Architecture Optimization
2-82
String move/store instructions have multiple data granularities. For
efficient data movement, larger data granularities are preferable. This
means better efficiency can be achieved by decomposing an arbitrary
counter value into a number of doublewords plus single byte moves
with a count value less or equal to 3.
Because software can use SIMD data movement instructions to move 16
bytes at a time, the following paragraphs discuss general guidelines for
designing and implementing high-performance library functions such as
memcpy(), memset, and memmove(). There are four factors to be
considered:
Throughput per iteration:
If two pieces of code have approximately identical path lengths,
efficiency favors choosing instruction that moves larger pieces of
data per iteration. Also, smaller code size per iteration will in
general reduce overhead and improve throughput. Sometimes, this
may involve a comparison of the relative overhead of an iterative
loop structure versus using REP prefix for iteration.
Address alignment:
Data movement instructions with highest throughput usually have
alignment restrictions, or they operate more efficiently if destination
address is aligned to its natural data size. Specifically, 16-byte
moves need to ensure the destination address is aligned to 16-byte
boundaries; and 8-bytes moves perform better if destination address
is aligned to 8-byte boundaries. Frequently, moving at doubleword
granularity performs better with addresses that are 8-byte aligned.
REP string move vs. SIMD move:
Implementing general-purpose memory functions using SIMD
extensions usually requires adding some prolog code to ensure the
availability of SIMD instructions at runtime. Throughput
comparison must also take into consideration the overhead of the
prolog when considering a REP string implementation versus a
SIMD approach.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals