EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #517 background imageLoading...
Page #517 background image
IA-32 Instruction Latency and Throughput C
C-3
While several items on the above list involve selecting the right
instruction, this appendix focuses on the following issues. These are
listed in an expected priority order, though which item contributes most
to performance will vary by application.
Maximize the flow of μops into the execution core. IA-32
instructions which consist of more than four μops require additional
steps from microcode ROM. These instructions with longer μop
flows incur a delay in the front end and reduce the supply of uops to
the execution core. In Pentium 4 and Intel Xeon processors,
transfers to microcode ROM often reduce how efficiently μops can
be packed into the trace cache. Where possible, it is advisable to
select instructions with four or fewer μops. For example, a 32-bit
integer multiply with a memory operand fits in the trace cache
without going to microcode, while a 16-bit integer multiply to
memory does not.
Avoid resource conflicts. Interleaving instructions so that they don’t
compete for the same port or execution unit can increase
throughput. For example, alternating
PADDQ and PMULUDQ, each have
a throughput of one issue per two clock cycles. When interleaved,
they can achieve an effective throughput of one instruction per cycle
because they use the same port but different execution units.
Selecting instructions with fast throughput also helps to preserve
issue port bandwidth, hide latency and allows for higher software
performance.
Minimize the latency of dependency chains that are on the critical
path. For example, an operation to shift left by two bits executes
faster when encoded as two adds than when it is encoded as a shift.
If latency is not an issue, the shift results in a denser byte encoding.
In addition to the general and specific rules, coding guidelines and the
instruction data provided in this manual, you can take advantage of the
software performance analysis and tuning toolset available at
http://developer.intel.com/software/products/index.htm
. The tools
include the VTune Performance Analyzer, with its performance-
monitoring capabilities.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals