EasyManuals Logo

AMD K5 User Manual

AMD K5
406 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #93 background imageLoading...
Page #93 background image
Code Optimization 4-3
18524C/0Nov1996 AMD-K5 Processor Technical Reference Manual
LoopsUnroll loops to get more parallelism and reduce
loop overhead even with branch prediction. Inline small
routines to avoid procedure-call overhead. In both cases,
however, consider the cost of possible increased register
usage, which might add load/store instructions for register
spilling.
Indexed AddressingThere is no penalty for base + index
addressing in the AMD-K5 processor. However, future
implementations may have such a penalty to achieve a
higher overall clock rate.
4.1.2 Techniques Specific to the AMD-K5 Processor
Jumps and LoopsJCXZ requires 1 cycle (correctly pre-
dicted) and therefore is faster than a TEST/JZ, in contrast
to the Pentium processor in which JCXZ requires 5 or 6
cycles. All forms of LOOP take 2 cycles (correctly pre-
dicted), which is also faster than the Pentium processor's 7
or 8 cycles.
MultipliesIndependent IMULs can be pipelined at one
per cycle with 4-cycle latency, in contrast to the Pentium
processor's serialized 9-cycle time. (MUL has the same
latency, although the implicit AX usage of MUL prevents
independent, parallel MUL operations.)
Dispatch ConflictsLoad-balancing (that is, selecting
instructions for parallel decode) is still important, but to a
lesser extent than on the Pentium processor. In particular,
arrange instructions to avoid execution-unit dispatching
conflicts. (See Section 4.2 on page 4-5.)
Instruction PrefixesThere is no penalty for instruction pre-
fixes, including combinations such as segment-size and
operand-size prefixes. This is particularly important for 16-
bit code. However, future implementations may have penal-
ties for the use of these prefixes.
Byte OperationsFor byte operations, the high and low
bytes of AX, BX, CX, and DX are effectively independent
registers that can be operated on in parallel. For example,
reading AL does not have a dependency on an outstanding
write to AH.
Move and ConvertMOVZX, MOVSX, CBW, CWDE, CWD,
CDQ all take 1 cycle (2 cycles for memory-based input), in
contrast to the Pentium processor's 2 or 3 cycles.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the AMD K5 and is the answer not in the manual?

AMD K5 Specifications

General IconGeneral
ManufacturerAMD
ModelK5
Architecturex86
MicroarchitectureK5
Introduction Year1996
Clock Speed75 - 133 MHz
Core Count1
SocketSocket 7
Core steppingSSA/5, 5k86
Voltage3.3V
Transistors4.3 million
L1 Cache8 KB (data) + 16 KB (instruction)
FSB50 MHz to 66 MHz
Process Technology350 nm

Related product manuals