EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #124 background imageLoading...
Page #124 background image
IA-32 Intel® Architecture Optimization
2-52
Minimizing Bus Latency
The system bus on Intel Xeon and Pentium 4 processors provides up to
6.4 GB/sec bandwidth of throughput at 200 MHz scalable bus clock
rate. (See MSR_EBC_FREQUENCY_ID register.) The peak bus
bandwidth is even higher with higher bus clock rates.
Each bus transaction includes the overhead of making request and
arbitrations. The average latency of bus read and bus write transactions
will be longer if reads and writes alternate. Segmenting reads and writes
into phases can reduce the average latency of bus transactions. This is
because the number of incidences of successive transactions involving a
read following a write or a write following a read are reduced.
User/Source Coding Rule 7. (M impact, ML generality) If there is a blend of
reads and writes on the bus, changing the code to separate these bus
transactions into read phases and write phases can help performance.
Note, however, that the order of read and write operations on the bus are
not the same as they appear in the program.
Bus latency of fetching a cache line of data can vary as a function of the
access stride of data references. In general, bus latency will increase in
response to increasing values of the stride of successive cache misses.
Independently, bus latency will also increase as a function of increasing
bus queue depths (the number outstanding bus requests of a given
transaction type). The combination of these two trends can be highly
non-linear, in that bus latency of large-stride, band-width sensitive
situations are such that effective throughput of the bus system for
data-parallel accesses can be significantly less than the effective
throughput of small-stride, bandwidth sensitive situations.
To minimize the per-access cost of memory traffic or amortize raw
memory latency effectively, software should control its cache miss
pattern to favor higher concentration of smaller-stride cache misses.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals