Intel ARCHITECTURE IA-32 - Minimizing Bus Latency

To Next Page

To Previous Page

IA-32 Intel® Architecture Optimization

2-52

Minimizing Bus Latency

The system bus on Intel Xeon and Pentium 4 processors provides up to

6.4 GB/sec bandwidth of throughput at 200 MHz scalable bus clock

rate. (See MSR_EBC_FREQUENCY_ID register.) The peak bus

bandwidth is even higher with higher bus clock rates.

Each bus transaction includes the overhead of making request and

arbitrations. The average latency of bus read and bus write transactions

will be longer if reads and writes alternate. Segmenting reads and writes

into phases can reduce the average latency of bus transactions. This is

because the number of incidences of successive transactions involving a

read following a write or a write following a read are reduced.

User/Source Coding Rule 7. (M impact, ML generality) If there is a blend of

reads and writes on the bus, changing the code to separate these bus

transactions into read phases and write phases can help performance.

Note, however, that the order of read and write operations on the bus are

not the same as they appear in the program.

Bus latency of fetching a cache line of data can vary as a function of the

access stride of data references. In general, bus latency will increase in

response to increasing values of the stride of successive cache misses.

Independently, bus latency will also increase as a function of increasing

bus queue depths (the number outstanding bus requests of a given

transaction type). The combination of these two trends can be highly

non-linear, in that bus latency of large-stride, band-width sensitive

situations are such that effective throughput of the bus system for

data-parallel accesses can be significantly less than the effective

throughput of small-stride, bandwidth sensitive situations.

To minimize the per-access cost of memory traffic or amortize raw

memory latency effectively, software should control its cache miss

pattern to favor higher concentration of smaller-stride cache misses.

Related product manuals