Sun Microelectronics
234
UltraSPARC User’s Manual
Code Example 13-5 Byte-Aligned Block Copy Inner Loop
Note that the loop must be unrolled two times to achieve maximum
performance. All FP registers are double-precision. Eight versions of
this loop are needed to handle all the cases of double word
misalignment between the source and destination.
loop:
faligndata %f0, %f2, %f34
faligndata %f2, %f4, %f36
faligndata %f4, %f6, %f38
faligndata %f6, %f8, %f40
faligndata %f8, %f10, %f42
faligndata %f10, %f12, %f44
faligndata %f12, %f14, %f46
addcc l0, -1, l0
bg,pt l1
fmovd %f14, %f48
(end of loop handling)
l1: ldda [regaddr] ASI_BLK_P, %f0
stda %f32, [regaddr] ASI_BLK_P
faligndata %f48, %f16, %f32
faligndata %f16, %f18, %f34
faligndata %f18, %f20, %f36
faligndata %f20, %f22, %f38
faligndata %f22, %f24, %f40
faligndata %f24, %f26, %f42
faligndata %f26, %f28, %f44
faligndata %f28, %f30, %f46
addcc l0, -1, l0
be,pnt done
fmovd %f30, %f48
ldda [regaddr] ASI_BLK_P, %f16
stda %f32, [regaddr] ASI_BLK_P
ba loop
faligndata %f48, %f0, %f32
done: (end of loop processing)
Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com