To Next Page

To Previous Page

4-4 Performance

AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996

■ Bit Scan—BSF and BSR take 1 cycle (2 cycles for memory-

based input), in contrast to the Pentium processor's data-

dependent 6 to 34 cycles.

■ Bit Test—BT, BTS, BTR, and BTC take 1 cycle for register-

based operands, and 2 or 3 cycles for memory-based oper-

ands with immediate bit-offset, in contrast to the Pentium

processor's 4 to 9 cycles. Register-based bit-offset forms on

the AMD-K5 processor take 5 cycles. If the semantics of the

set can cover a very large bit string in memory), it is better

to emulate this with simpler instructions that can be inter-

leaved with independent instructions for greater parallel-

ism.

■ Floating-Point Top-of-Stack Bottleneck—The AMD-K5 proces-

sor has a pipelined floating-point unit. Greater parallelism

can be achieved by using FXCH in parallel with floating-

point operations to alleviate the top-of-stack bottleneck, as

in the Pentium processor. The AMD-K5 processor also per-

mits integer operations (ALU, branch, load/store) in paral-

lel with floating-point operations.

■ Locating Branch Targets—Performance can be sensitive to

code alignment, especially in tight loops. Locating branch

targets to the first 17 bytes of the 32-byte cache line maxi-

mizes the opportunity for parallel execution at the target.

NOPs can be added to adjust this alignment. The AMD-K5

processor executes NOPs (opcode 90h) at the rate of two per

cycle. Adding NOPs is even more effective if they execute

in parallel with existing code. Other instructions of greater

length, such as a register-immediate TEST instruction, can

be used as NOPs to minimize the overhead of such padding.

■ Branch Prediction—There are two branch prediction bits in

a 32-byte instruction cache line. One bit applies to the first

16 bytes of the line and the second bit applies to the second

16 bytes of the line. For effective branch prediction, code

should be generated with one branch per 16-byte line half.

The prediction is associated with the half-line containing

the last byte of the branch instruction.

■ Address-Generation Interlocks (AGIs)—The AMD-K5 proces-

sor does not suffer from the single-cycle penalty that the

486 and Pentium processors have when a result from execu-

tion or from a data-cache access is used to form a cache

address, so it is not necessary to avoid these situations.

Manufacturer

AMD

Model

Architecture

x86

Microarchitecture

Introduction Year

1996

Clock Speed

75 - 133 MHz

Core Count

Socket

Socket 7

Core stepping

SSA/5, 5k86

Voltage

3.3V

Transistors

4.3 million

L1 Cache

8 KB (data) + 16 KB (instruction)

FSB

50 MHz to 66 MHz

Process Technology

350 nm

AMD K5 User Manual

Table of Contents

Questions and Answers:

AMD K5 Specifications

Related product manuals