Optimization
11-19
11
For example, on the i960 KA, KB, SA, and SB processors, the execution
of a memory operation can overlap the execution of an arithmetic
instruction, provided the memory operation occurs in the instruction
stream first. The following code computes the expression
(b*13) + c with
these instructions:
ld _b, r4
muli r4, 13, r4
ld _c, r5
addi r5, r4, r4
To optimize this computation, the compiler moves the instruction that
fetches the value of
c ahead of the multiply instruction:
ld _b, r4
ld _c, r5
muli r4, 13, r4
addi r5, r4, r4
When this rearranged code executes, part of the instruction ld _c, r5
executes in parallel with the multiplication. The instruction ld _b, r4
also executes partly in parallel with the instruction ld _c, r5.
The same sort of rearrangement can improve performance on the CA and
CF processors, but more parallelism is possible because the CA and CF
can issue multiple instructions at one time and can execute more
instruction categories in parallel than the KA or KB.
For example, on the CA and CF processors, the compiler can also
substitute one instruction for another that has the same effect but executes
in a different internal unit of the processor. The most common examples
of such substitution are conversions of
mov instructions to lda
instructions, and vice versa.