4-18 Performance
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
Table 4-2 shows the timing of internal operations from dis-
patch to retire of each ROP for nearly two iterations of this
loop. All memory accesses are assumed to hit in the cache.
EVEN_ARRAY_SIZE is set to 20.
Table 4-2. Integer Dot Product Internal Operations Timing
Instruction
Cycle
1234567891011121314
MOV EAX,[ESI][ECX*4] L > - - - !
MOV EBX,[ESI][ECX*4]+4 L > - - - !
IMUL EAX,[EDI][ECX*4]
L>- - !
- MMMM> !
IMUL EBX,[EDI][ECX*4]+4
L>- - - !
- MMMM> !
ADD ECX,2 A>- - - !
ADD EDX,EAX ---A>!
ADD EBP,EBX ---A>!
CMP ECX,20 ---A>!
JL LOOP ----B>!
MOV EAX,[ESI][ECX*4] L>- - - !
MOV EBX,[ESI][ECX*4]+4 L>- - - !
IMUL EAX,[EDI][ECX*4]
L>- - !
- MMMM> !
IMUL EAX,[EDI][ECX*4]+4
L>---!
- MMMM>
Notes:
L— load execute
M— multiply execute
A— ALU execute
B— branch execute
>— result
!— retire (update real state)
- — preceding or after execute: waiting in the reservation station