Prefetch and Predecode 2-3
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
2.1 Prefetch and Predecode
Figure 2-1 (top-left corner) shows the processor’s prefetch and
predecode logic being fed with data from the external bus via
the memory management unit. Prefetching attempts to keep
the instruction cache and prefetch cache filled ahead of the
execution pipeline’s fetch requirements. The processor only
prefetches during fetch-stage misses in the instruction cache,
which typically occur during taken branches.
When a miss occurs, the prefetcher initiates a 32-byte burst
memory read cycle on the bus to fill a prefetch cache. For cache-
able accesses, the prefetch cache also fills 32-byte lines in the
instruction cache. For non-cacheable accesses, the prefetch
cache provides instructions directly to the execution pipeline.
The instruction cache contains a copy of certain fields in the
current code-segment descriptor. During a taken branch, the
fetch logic adds the code-segment base to the effective address
and places the resulting linear address in the prefetch program
counter, which then increments as a linear address along a
sequential stream. All branches during prefetching are
assumed to be not taken.
The processor predecodes its x86-instruction stream in the
same clock in which x86 instructions come out of the prefetch
cache. An x86 instruction can be from 1 to 15 bytes long. Prede-
coding annotates each instruction byte with information that
later enables the decode stage of the pipeline to perform more
efficiently. The predecode information identifies whether the
byte is the start and/or end of an x86 instruction, whether it is
an opcode byte, and the number of internal RISC operations
(ROPs) it will require at the decode stage. The predecode
information is stored in the instruction cache with each x86
instruction byte. It is passed during instruction fetching to the
decode stage, where it allows multiple x86 instructions to be
decoded in parallel. This avoids delaying the decode of one
instruction until the decode of the prior instruction has deter-
mined its ending byte.