EasyManua.ls Logo

AMD AMD5K86 - Dispatch and Execution Timing

AMD AMD5K86
416 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
AMD~
AMD5f1J6
Processor
Technical
Reference
Manual
18524B/O-Mar1996
4-4
Bit
Scan-BSF
and
BSR
take
1 cycle (2
cycles
for
memory-
based
input),
in
contrast
to
the
Pentium
processor's
data-
dependent
6
to
34 cycles.
Bit
Test-BT,
BTS, BTR,
and
BTC
take
1 cycle
for
register-
based
operands,
and
2
or
3 cycles
for
memory-based
oper-
ands
with
immediate
bit-offset,
in
contrast
to
the
Pentium
processor's
4
to
9 cycles.
Register-based
bit-offset
forms
on
the
AMD5
K
86
processor
take
5 cycles.
If
the
semantics
of
the
register-based
bit-offset
form
are
desired
(where
the
bit
offset
can
cover
a
very
large
bit
string
in
memory),
it
is
bet-
ter
to
emulate
this
with
simpler
instructions
that
can
be
interleaved
with
independent
instructions
for
greater
paral-
lelism.
Floating-Point Top-oj-Stack
Bottleneck-The
AMD5
K
86 pro-
cessor
has
a
pipelined
floating-point
unit.
Greater
parallel-
ism
can
be
achieved
by
using
FXCH
in
parallel
with
floating-point
operations
to
alleviate
the
top-of-stack
bottle-
neck,
as
in
the
Pentium
processor.
The
AMD5
K
86
processor
also
permits
integer
operations
(ALD,
branch,
load/store)
in
parallel
with
floating-point
operations.
Locating Branch
Targets-Performance
can
be
sensitive
to
code
alignment,
especially
in
tight
loops.
Locating
branch
targets
to
the
first
17
bytes
of
the
32-byte
cache
line
maxi-
mizes
the
opportunity
for
parallel
execution
at
the
target.
NOPs
can
be
added
to
adjust
this
alignment.
The
AMD5
K
86
processor
executes
NOPs
(opcode
90h)
at
the
rate
of
two
per
cycle.
Adding
NOPs
is
even
more
effective
if
they
execute
in
parallel
with
existing
code.
Other
instructions
of
greater
length,
such
as a
register-based
TEST
instruction,
can
be
used
as
NOPs
to
minimize
the
overhead
of
such
padding.
Branch
Prediction-
There
are
two
branch
prediction
bits
in
a 32-byte
instruction
cache
line.
One
bit
applies
to
the
first
16
bytes
of
the
line
and
the
second
bit
applies
to
the
second
16
bytes
of
the
line.
For
effective
branch
prediction,
code
should
be
generated
with
one
branch
per
16-byte
line
half.
Address-Generation Interlocks (AGIs) -
The
AMD5
K
86 proces-
sor
does
not
suffer
from
the
single-cycle
penalty
that
the
486
and
Pentium
processors
have
when
a
result
from
execu-
tion
or
from
a
data-cache
access
is
used
to
form
a
cache
address,
so
it
is
not
necessary
to
avoid
these
situations.
Performance

Table of Contents

Related product manuals