Intel ARCHITECTURE IA-32 - Usage Notes on Bus Activities

To Next Page

To Previous Page

Using Performance Monitoring Events B

B-15

Usage Notes on Bus Activities

A number of performance metrics in Table B-1 are based on

IOQ_active_entries and BSQ_active entries. The next three paragraphs

provide information of various bus transaction underway metrics. These

metrics nominally measure the end-to-end latency of transactions

entering the BSQ; i.e., the aggregate sum of the allocation-to-

deallocation durations for the BSQ entries used for all individual

transaction in the processor. They can be divided by the corresponding

number-of-transactions metrics (i.e., those that measure allocations) to

approximate an average latency per transaction. However, that

approximation can be significantly higher than the number of cycles it

takes to get the first chunk of data for the demand fetch (e.g., load),

because the entire transaction must be completed before deallocation.

That latency includes deallocation overheads, and the time to get the

other half of the 128-byte line, which is called an adjacent-sector

prefetch. Since adjacent-sector prefetches have lower priority than

demand fetches, there is a high probability on a heavily utilized system

that the adjacent-sector prefetch will have to wait until the next bus

arbitration cycle from that processor. Note also that on current

implementations, the granularities at which BSQ_allocation and

BSQ_active_entries count can differ, leading to a possible 2-times

overcounting of latencies for non-partial programmatic loads.

Users of the bus transaction underway metrics would be best served by

employing them for relative comparisons across BSQ latencies of all

transactions. Users that want to do cycle-by-cycle or type-by-type

analysis should be aware that this event is known to be inaccurate for

“UC Reads Chunk Underway” and “Write WC partial underway”

metrics. Relative changes to the average of all BSQ latencies should be

viewed as an indication that overall memory performance has changed.

That memory performance change may or may not be reflected in the

measured FSB latencies.

Also note that for Pentium 4 and Intel Xeon Processor implementations

with an integrated 3rd-level cache, BSQ entries are allocated for all

2nd-level writebacks (replaced lines), not just those that become bus

Related product manuals