EasyManua.ls Logo

Sun Microsystems UltraSPARC-I - Page 288

Sun Microsystems UltraSPARC-I
410 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Sun Microelectronics
273
16. Code Generation Guidelines
16.3.2 D-Cache Timing
The latency of a load to the D-Cache depends on the opcode. For unsigned loads,
data can be used two cycles after the load. For instance, if the first two instruc-
tions in the instruction buffer are a load and an instruction dependent on that
load, the grouping logic will break the group after the load and a bubble will be
inserted in the pipeline the following cycle. Code compiled for an earlier SPARC
processor with a load use penalty of one cycle will show a penalty of about.1 CPI
just for this rule; thus, it is very important to separate loads from their use.
16.3.2.1 Signed Loads
All signed loads smaller than 64 bits must be separated from their use by three
cycles; otherwise, an extra bubble is inserted in the pipeline to force the separa-
tion between the load and its use. Floating-point loads are not sign extended, so
they have a latency of two cycles.
Once a signed load (smaller than 64 bits) is encountered in the instruction stream,
all subsequent consecutive loads (signed or unsigned) also return data in three
cycles; otherwise, there would be a collision between two loads returning data.
As soon as a cycle without a load appears in the pipeline, the latency of loads is
brought back to two cycles.
Note: The SPARC-V8 LD instruction is replaced with LDUW in SPARC-V9; the
new instruction does not require sign extension.
16.3.3 Data Alignment
SPARC-V9 requires that all accesses be aligned on an address equal to the size of
the access. Otherwise a
mem_address_not_aligned
trap is generated. This is espe-
cially important for double precision floating-point loads, which should be
aligned on an 8-byte boundary. If misalignment is determined to be possible at
compile time, it is better to use two LDF (load floating-point, single precision) in-
structions and avoid the trap. UltraSPARC supports single-precision loads mixed
with double-precision operations, so that the case above can execute without pen-
alty (except for the additional load). If a trap does occur, UltraSPARC dedicates a
trap vector for this specific misalignment, which reduces the overall penalty of
the trap.
Grouping load data is desirable, since a D-Cache sub-block can contain either
four properly aligned single-precision operands or two properly aligned double-
precision operands (eight and four respectively for a D-Cache line). As we shall
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com

Table of Contents