EasyManua.ls Logo

Sun Microsystems UltraSPARC-I - Page 281

Sun Microsystems UltraSPARC-I
410 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Sun Microelectronics
266
UltraSPARC User’s Manual
PDU is somewhat separated from the rest of the pipeline, the I-Cache miss may
have occurred when the pipeline was already stalled (for example, due to a
multi-cycle integer divide, floating-point divide dependency, dependency on load
data that missed the D-Cache, etc.). This means that the miss (or part of it) may
be transparent to the pipeline.
When an I-Cache miss is detected, normal instruction fetching is disabled and a
request is sent to the E-Cache for the line that is missing in the I-Cache. A full line
of 8 instructions (32 bytes) is brought into the processor in two parts (the inter-
face to the E-Cache is 16-bytes wide). The critical part (that is, the 16 bytes con-
taining the instruction that caused the miss) is brought in first. An I-Cache miss
adds 5 cycles relative to the time it would take for an I-Cache hit (assuming that
there is no conflict for the arbitration of the E-Cache bus). If a predicted taken
branch is in the second 16-byte block brought into the I-Cache, there will be a one
cycle delay before the next fetch (this is the time needed to compute the next ad-
dress).
Because of the possibility of stalling the processor for 6 cycles in the case when
the pipeline is waiting for new instructions, it is desirable to try to make routines
fit in the I-Cache and avoid hot spots (collisions). UltraSPARC provides instru-
mentation to profile a program and detect if instruction accesses generate a cache
miss or a cache hit. For example, one can program performance counters to mon-
itor I-Cache accesses and I-Cache misses. Then, by checkpointing the counters be-
fore and after a large section of code, combined with profiling the section of code,
one can determine if the frequently executed functions generally hit or miss the
I-Cache. Instrumentation can be used in a similar manner to determine if a trap
handler generally resides in the I-Cache or causes a cache miss.
16.2.4 Executing Code Out of the E-Cache
When frequently executed routines do not fit in the I-Cache, it is possible to orga-
nize the code so that the main routines reside in the much larger E-Cache and do
not significantly affect the execution time. As an example we look at fpppp. Of the
fourteen floating-point programs in SPECfp92, fpppp shows the highest I-Cache
miss rate (about 21%) per cache access, or about 6.0% per instruction. For com-
parison, the next highest is doduc with about a 3% miss per cache access, 1% per
instruction. Even though the I-Cache miss rate is significant, UltraSPARC is bare-
ly affected by it (the impact is on CPI only 0.0084). The reasons why it performs
so well are:
The code is organized as a large sequential block.
Branches are predicted very well (over 90%).
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com

Table of Contents