AMD AMD5K86 - Prefetch and Predecode

416 pages

Save Page as PDF

To Next Page

To Next Page

To Previous Page

To Previous Page

Loading...

AMDit1

18524B/O-

Mar1996

AMD5J116

Processor

Technical

Reference

Manual

2.1

Prefetch

and

Predecode

Prefetch

and

Predecode

Figure

2-1

(top-left

corner)

shows

the

processor's

prefetch

and

pre

decode

logic

being

fed

with

data

from

the

external

bus

via

the

memory

management

unit.

Prefetching

attempts

to

keep

the

instruction

cache

and

prefetch

cache

filled

ahead

of

the

execution

pipeline's

fetch

requirements.

The

processor

only

prefetches

during

fetch-stage misses

in

the

instruction

cache,

which

typically

occur

during

taken

branches.

When

a

miss

occurs,

the

prefetcher

initiates

a 32-byte

burst

memory

read

cycle

on

the

bus

to

fill a pre/etch cache.

For

cache-

able

accesses,

the

prefetch

cache

also fills 32-byte

lines

in

the

instruction

cache.

For

non-cacheable

accesses,

the

pre

fetch

cache

provides

instructions

directly

to

the

execution

pipeline.

The

instruction

cache

contains

a

copy

of

certain

fields

in

the

current

code-segment

descriptor.

During

a

taken

branch,

the

fetch

logic

adds

the

code-segment

base

to

the

effective

address

and

places

the

resulting

linear

address

in

the

pre

fetch

program

counter,

which

then

increments

as

a

linear

address

along

a

sequential

stream.

All

branches

during

pre

fetching

are

assumed

to

be

not

taken.

The

processor

pre

decodes

its

x86-instruction

stream

in

the

same

clock

in

which

x86

instructions

come

out

of

the

pre

fetch

cache.

An

x86

instruction

can

be

from

1

to

15

bytes

long.

Pre

de-

coding

annotates

each

instruction

byte

with

information

that

later

enables

the

decode

stage

of

the

pipeline

to

perform

more

efficiently.

The

predecode

information

identifies

whether

the

byte

is

the

start

and/or

end

of

an

x86

instruction,

whether

it

is

an

opcode

byte,

and

the

number

of

internal

RIse

operations

(ROPs)

it

will

require

at

the

decode

stage.

The

predecode

information

is

stored

in

the

instruction

cache

with

each

x86

instruction

byte.

It

is

passed

during

instruction

fetching

to

the

decode

stage,

where

it

allows

multiple

x86

instructions

to

be

decoded

in

parallel.

This

avoids

delaying

the

decode

of

one

instruction

until

the

decode

of

the

prior

instruction

has

deter-

mined

its

ending

byte.

2-3

Table of Contents

Related product manuals

Preview: AMD FX series