EasyManua.ls Logo

Sun Microsystems UltraSPARC-I - Page 290

Sun Microsystems UltraSPARC-I
410 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Sun Microelectronics
275
16. Code Generation Guidelines
If such a load (D-Cache miss, E-Cache hit) is immediately followed by a use, the
group is broken and an (N+1)-cycle stall occurs; Figure 16-12 illustrates this situ-
ation. (The figure shows a 7-cycle stall, which is consistent with 1–1–1 mode;
2–2 mode incurs an 8-cycle stall.)
Figure 16-12 D-Cache Miss, E-Cache Hit (1–11 mode shown)
Because of the high penalty associated with a load miss for code scheduled based
on loads hitting the D-Cache, UltraSPARC provides hardware support for non-
blocking loads through a load buffer that allows code scheduling based on Exter-
nal Cache (E-Cache) hits.
16.3.6 Scheduling for the E-Cache
Some applications have a working set that is too large to fit within the D-Cache
(they cause many capacity misses); others use data in patterns that generate
many conflict-misses. Compilers c an schedule these applications to “bypass” the
D-Cache and access the data out of the E-Cache.
Loads that miss the D-Cache do not necessarily stall the pipeline (non-blocking
loads). Instead, they are sent to the load buffer, where they wait for the data to be
returned from the E-Cache. The pipeline stalls only when an instruction that is
dependent on the non-blocking load enters the pipeline before the load data is re-
turned.
16.3.6.1 Load Buffer Timing
The load buffer’s depth and its interaction with the rest of the pipeline are de-
signed to support full throughput (one load per cycle) for a D-Cache with a three-
cycle pin-to-pin latency and one cycle throughput, which is consistent with 1–1–1
mode.) As shown in Figure 16-13, if a use is separated from a load by 8 cycles, no
stall occurs and full throughput is achieved. In comparison, if code is scheduled
for the D-Cache only, N extra cycles are required between the load and the use,
where N is determined by the SRAM mode, as shown in Table 16-1 on page 274.
The shaded rows in Figure 16-13 represent these N extra cycles.
load r
1
FDGECN
1
QQQQQ
use r
1
FDGGEEEEEEEECN
1
N
2
N
3
W
Group Break (
N
+1)-Cycle Stall Execution Resumes
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com

Table of Contents