To Next Page

To Previous Page

Dispatch and Execution Timing 4-17

18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual

4.2.3 Integer Dot Product Example

This example illustrates an optimal code sequence for an inte-

ger dot product operation that performs multiply/accumulates

(MACs) at the rate of one every 3 cycles. In this example, the

array size is a constant. The loop is unrolled to perform sepa-

rate MAC operations in parallel for even and odd elements.

The final sum is generated outside the loop (as well as the final

iteration for odd-sized arrays).

mac_loop:

MOV EAX, [ESI][ECX*4] ;load A(i)

MOV EBX, [ESI][ECX*4]+4 ;load A(i+1)

IMUL EAX, [EDI][ECX*4] ;A(i) * B(i)

IMUL EBX, [EDI][ECX*4]+4 ;A(i+1) * B(i+1)

ADD ECX, 2 ;increment index

ADD EDX, EAX ;even sum

ADD EBP, EBX ;odd sum

CMP ECX, EVEN_ARRAY_SIZE ;loop control

JL mac_loop ;jump

;do final MAC here for odd-sized arrays

ADD EDX, EBP ;final sum

XCHG reg, reg 0_0x_1000011x_xxx_xxx F

alu 1/1

alu 2/2

XCHG mem, reg 0_1x_1000011x_xxx_xxx F

ld 1/1

st 1/1/2

alu 1/2

XOR reg, reg 0_0x_001100xx_xxx_xxx Falu1/1

XOR reg, mem 0_1x_0011001x_xxx_xxx F

ld 1/1

alu 1/2

XOR mem, reg 0_1x_0011000x_xxx_xxx F

ld 1/1

alu 1/2

st 1/1/3

XOR AL/AX/EAX, imm 0_xx_0011010x_xxx_xxx Falu1/1

XOR reg, imm 0_0x_100000xx_110_xxx Falu 1/1

XOR mem, imm 0_1x_100000xx_110_xxx F

ld 1/1

alu 1/2

st 1/1/3

Table 4-1. Integer Instructions (continued)

Instruction Mnemonic Opcode Format

Fastpath or

Microcode

Execution

Unit Timing

Questions and Answers:

Need help?

Do you have a question about the AMD K5 and is the answer not in the manual?

AMD K5 Specifications

General

Manufacturer	AMD
Model	K5
Architecture	x86
Microarchitecture	K5
Introduction Year	1996
Clock Speed	75 - 133 MHz
Core Count	1
Socket	Socket 7
Core stepping	SSA/5, 5k86
Voltage	3.3V
Transistors	4.3 million
L1 Cache	8 KB (data) + 16 KB (instruction)
FSB	50 MHz to 66 MHz
Process Technology	350 nm