Division algorithm
The
CRAY-l
performs
floating
point division
by
the
method
of reciprocal
approximation. This
facilitates
the hardware implementation of a
fully-
segmented
functional
unit.
Operands
may
enter
the reciprocal unit
each
clock period because of
this
segmentation.
In
vector
mode,
results
are
produced
at
a
one
clock period
rate.
These
results
may
be
used
in other
vector operations during chaining because
all
functional units in the
CRAY-l
have
the
same
result
rate.
The
division algorithm
that
computes
Sl
/S2
to
full
precision requires
four operations:
1.
S3
=
I/S
2
2.
S4
=
(2
S3
3.
S5
=
Sl
* S
3
4.
S6
=
S4
*
S5
*
S2)
Reciprocal approximation
Reciprocal
iteration
Numerator
* approximation
Half-precision quotient
* correction factor
The
approximation
is
based
on
Newton's
method.
The
reciprocal approxima-
tion
at
step 1
is
correct
to
30
bits.
The
additional
Newton
iteration
at
step 2 increases
this
accuracy to
47
bits.
This
iteration
is
applied
as
a correction
factor
with a
full-precision
multiply operation.
Where
31
bits
of accuracy
is
sufficient,
the reciprocal approximation
instruction
may
be
used
with the half-precision multiply to produce a
half-precision quotient.
The
18
low-order
bits
of the half-precision
results
are returned
as
zeros
with a
round
applied to the low-order
bit
of the
30-bit
result.
A
scalar
quotient
is
computed
in
29
clock periods since operations 2
and
3 issue in successive clock periods.
A vector quotient requires
effectively
three vector times since operations
1
and
3 are chained together. This hides
one
of the multiply operations.
A vector time
is
one
clock period for
each
element in the vector.
For
example,
two
50-element vectors are divided in about 3 *
50
clock
periods. This estimate
does
not include overhead associated with the
functional
units.
2240004
3-30
E
~.