Cray CRAY-1 - Division Algorithm

216 pages

Save Page as PDF

To Next Page

To Next Page

To Previous Page

To Previous Page

Loading...

Division algorithm

The

CRAY-l

performs

floating

point division

by

the

method

of reciprocal

approximation. This

facilitates

the hardware implementation of a

fully-

segmented

functional

unit.

Operands

may

enter

the reciprocal unit

each

clock period because of

this

segmentation.

In

vector

mode,

results

are

produced

at

a

one

clock period

rate.

These

results

may

be

used

in other

vector operations during chaining because

all

functional units in the

CRAY-l

have

the

same

result

rate.

The

division algorithm

that

computes

Sl

/S2

to

full

precision requires

four operations:

1.

S3

=

I/S

2

2.

S4

=

(2

S3

3.

S5

=

Sl

* S

3

4.

S6

=

S4

*

S5

*

S2)

Reciprocal approximation

Reciprocal

iteration

Numerator

* approximation

Half-precision quotient

* correction factor

The

approximation

is

based

on

Newton's

method.

The

reciprocal approxima-

tion

at

step 1

is

correct

to

30

bits.

The

additional

Newton

iteration

at

step 2 increases

this

accuracy to

47

bits.

This

iteration

is

applied

as

a correction

factor

with a

full-precision

multiply operation.

Where

31

bits

of accuracy

is

sufficient,

the reciprocal approximation

instruction

may

be

used

with the half-precision multiply to produce a

half-precision quotient.

The

18

low-order

bits

of the half-precision

results

are returned

as

zeros

with a

round

applied to the low-order

bit

of the

30-bit

result.

A

scalar

quotient

is

computed

in

29

clock periods since operations 2

and

3 issue in successive clock periods.

A vector quotient requires

effectively

three vector times since operations

1

and

3 are chained together. This hides

one

of the multiply operations.

A vector time

is

one

clock period for

each

element in the vector.

For

example,

two

50-element vectors are divided in about 3 *

50

clock

periods. This estimate

does

not include overhead associated with the

functional

units.

2240004

3-30

E

~.

Table of Contents