EasyManuals Logo

AMD K5 User Manual

AMD K5
406 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #94 background imageLoading...
Page #94 background image
4-4 Performance
AMD-K5 Processor Technical Reference Manual 18524C/0Nov1996
Bit ScanBSF and BSR take 1 cycle (2 cycles for memory-
based input), in contrast to the Pentium processor's data-
dependent 6 to 34 cycles.
Bit TestBT, BTS, BTR, and BTC take 1 cycle for register-
based operands, and 2 or 3 cycles for memory-based oper-
ands with immediate bit-offset, in contrast to the Pentium
processor's 4 to 9 cycles. Register-based bit-offset forms on
the AMD-K5 processor take 5 cycles. If the semantics of the
register-based bit-offset form are desired (where the bit off-
set can cover a very large bit string in memory), it is better
to emulate this with simpler instructions that can be inter-
leaved with independent instructions for greater parallel-
ism.
Floating-Point Top-of-Stack BottleneckThe AMD-K5 proces-
sor has a pipelined floating-point unit. Greater parallelism
can be achieved by using FXCH in parallel with floating-
point operations to alleviate the top-of-stack bottleneck, as
in the Pentium processor. The AMD-K5 processor also per-
mits integer operations (ALU, branch, load/store) in paral-
lel with floating-point operations.
Locating Branch TargetsPerformance can be sensitive to
code alignment, especially in tight loops. Locating branch
targets to the first 17 bytes of the 32-byte cache line maxi-
mizes the opportunity for parallel execution at the target.
NOPs can be added to adjust this alignment. The AMD-K5
processor executes NOPs (opcode 90h) at the rate of two per
cycle. Adding NOPs is even more effective if they execute
in parallel with existing code. Other instructions of greater
length, such as a register-immediate TEST instruction, can
be used as NOPs to minimize the overhead of such padding.
Branch PredictionThere are two branch prediction bits in
a 32-byte instruction cache line. One bit applies to the first
16 bytes of the line and the second bit applies to the second
16 bytes of the line. For effective branch prediction, code
should be generated with one branch per 16-byte line half.
The prediction is associated with the half-line containing
the last byte of the branch instruction.
Address-Generation Interlocks (AGIs)The AMD-K5 proces-
sor does not suffer from the single-cycle penalty that the
486 and Pentium processors have when a result from execu-
tion or from a data-cache access is used to form a cache
address, so it is not necessary to avoid these situations.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the AMD K5 and is the answer not in the manual?

AMD K5 Specifications

General IconGeneral
ManufacturerAMD
ModelK5
Architecturex86
MicroarchitectureK5
Introduction Year1996
Clock Speed75 - 133 MHz
Core Count1
SocketSocket 7
Core steppingSSA/5, 5k86
Voltage3.3V
Transistors4.3 million
L1 Cache8 KB (data) + 16 KB (instruction)
FSB50 MHz to 66 MHz
Process Technology350 nm

Related product manuals