EasyManua.ls Logo

AMD AMD5K86 - TABLE 4-2. Integer Dot Product Internal Operations Timing

AMD AMD5K86
416 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
AMD~
18524B/O-Mar1996
AMD5~6
Processor
Technical
Reference
Manual
TABLE
4-1.
Integer
Instructions
(continued)
Instruction Mnemonic
Opcode Format
Fastpath or
Execution
Microcode
Unit Timing
XORreg,mem
0
lx_OOllOOlx_xxx_xxx
F
Id
1/1
-
alu
1/2
Id
1/1
XOR
mem, reg
O_lx_OOllOOOx_xxx_xxx
F alu
1/2
st
111/3
XOR
ALlAXlEAX,
imm
O_xx_OOllOlOx_xxx_xxx
F alu
111
XORreg,imm
O_Ox_lOOOOOxx_llO_xxx
F alu
111
Id
1/1
XORmem,imm
O_lx_lOOOOOxx_llO_xxx
F
alu
1/2
st
1/113
4.2.3
Integer
Dot
Produd
Example
This
example
illustrates
an
optimal
code
sequence
for
an
inte-
ger
dot
product
operation
that
performs
multiply/accumulates
(MACs)
at
the
rate
of
one
every
3 cycles.
In
this
example,
the
array
size is a
constant.
The
loop is
unrolled
to
perform
sepa-
rate
MAC
operations
in
parallel
for
even
and
odd
elements.
The
final
sum
is
generated
outside
the
loop
(as
well
as
the
final
iteration
for odd-sized
arrays).
mac_loop:
MOV
EAX.
[ESI][ECX*4]
MOV
EBX.
[ESI][ECX*4]+4
IMUL
EAX.
[EDI][ECX*4]
IMUL
EBX.
[EDI][ECX*4]+4
ADD
ECX.
2
ADD
EDX.
EAX
ADD
EBP.
EBX
CMP
ECX.
EVEN_ARRAY_SIZE
JL
macloop
;load
Mi)
;load A(i+l)
;A(i) * B(i)
;A(i+l) * B(i+1)
;increment index
;even
sum
;odd
sum
; loop control
;
jump
;do
final
MAC
here for odd-sized arrays
ADD
EDX.
EBP
;final
sum
Table
4-2 shows
the
timing
of
internal
operations
from
dis-
patch
to
retire
of
each
ROP
for
nearly
two
iterations
of
this
loop.
All
memory
accesses
are
assumed
to
hit
in
the
cache.
EVEN_ARRAY_SIZE is
set
to
20.
Dispatch
and
Execution
Timing
4-17

Table of Contents

Related product manuals