128 b EDN S
EPTEMBER 24, 1998
The SuperH Series comprises the SH-1, SH-2, SH-3, and SH-4
series of RISC mPs, mCs, and ASIC cores. The SH-1, -2, and -3
employ a fetch, decode, execute, memory-access, and write-
back-to-register pipeline. Hitachi built the devices around 25
32-bit registers that you access using load/store instructions.
These registers comprise 16 general registers (the SH-3 has
eight 32-bit shadow registers for context switching), five con-
trol registers, and four system registers. Depending on the
chip, the interrupt latency can be as low as seven clock cycles.
The chips use 32-bit datapaths to internally move data, but
all versions use a flexible external bus width. The SuperH fam-
ily also has devices with single-cycle mask ROM and one-
time-programmable and flash memory with densities as high
as 256 kbytes, unlike most RISC families.
Although devices in the SH series have a similar core, sig-
nificant differences exist. The major differences between SH-
1 and SH-2 are that the SH-2 features on-chip cache memo-
ry, higher speeds, and a 32332-bit multiply-accumulate
(MAC) unit. (The SH-1’s MAC unit is 16316 bits.) To build
the SH-3, Hitachi added to the SH-2 a memory-management
unit (MMU), a barrel shifter, and the ability for conditional-
branch instructions to enable or disable the pipeline’s delay
slot. Disabling the delay slot, although decreasing perform-
ance, allows the processor to run more deterministically and
reduces the effects of pipeline flushes.
The 200-MHz, two-way-superscalar SH-4 mP includes a 3-
D graphics accelerator that Hitachi claims can perform at 1.2
Gflops. This mP has four 32332-bit multipliers fed by two
128-bit buses; it also has four adders. You can load the mul-
tipliers with eight operands in one cycle; the mP then adds
the results in the next cycle. This hardware performs rota-
tions and transformations on 32-bit, single-precision, float-
ing-point vectors.
SuperH processors use a 16-bit instruction word to achieve
compact code. The instruction width limits the number of
basic operation codes, handles only 16 general registers, and
addresses only two operands. Additionally, only 12 bits are
available for an immediate offset; jumps with immediate data
must be in 2048-byte hops. However, the SH-3 supports FAR-
relative branches to support position-independent code.
Although these restrictions lead to more instructions per task,
the overall result is significantly smaller code.
The SH-1 mPs can operate from external memor
y or from
on-chip program memory at a CPU frequency of 20 MHz. The
16-bit-wide external-memor
y bus can supply the CPU with
instructions from SRAM or fast DRAM on each cycle. If the
processor is operating from external memory
, each data
access to external memor
y may take an additional one to two
cycles.
Instead of on-chip program memory
, the SH-2 and SH-3
have a four-way
, set-associative on-chip cache (4 kbytes for
the SH-2 and 8 kbytes for the SH-3), a 32-bit-wide memor
y
bus for CPU-memor
y bandwidth as high as 60 MHz with a
synchronous-DRAM interface), and a 32-bit divide unit
(replacing the first chip’
s bit-step-divide function on the SH-
2). Y
ou can reconfigure the cache as a two-way, set-associa
-
tive cache and 2 kbytes (SH-2) or 4 kbytes (SH-3) of user-con
-
figurable RAM. The external-memor
y bus supports
multiprocessing; it has bus arbitration for multiple masters.
The SH-3 also has a unique RTOS feature: If a task or thread
crashes, the operating system can gracefully recover and not
have the errant task corrupt other tasks or RTOS environ-
ments.
Power management: Sleep mode discontinues CPU pro-
cessing but keeps peripherals active. Standby stops every-
thing but maintains register and cache contents. The SH-2
and -3 provide several clock modes for reducing power; soft-
ware can adjust the clock rate during program operation. The
SH-3’s unified cache has a special low-power design that dis-
sipates only 100 mW in operation. The cache sense amps are
energized for the cache set that hits while the other three sets
stay switched off. The sense amps respond to only a 60-mV
differential versus the full 3.3V swing.
Special instructions: A 16316-bit MAC instruction (42-bit
accumulator) in the SH-1 and a 32332-bit MAC instruction
(64-bit accumulator) in the SH-2 and SH-3 provide a fast DSP
function. Although Hitachi classifies the architecture as
load/store, some instructions reference memory. Delayed
branch instructions minimize pipeline disruption. An
instruction swaps upper and lower bytes. The SH-4 includes
a set of 3-D, floating-point instructions. The SH-DSP, a ver-
sion of the SH-2, supports 23 32-bit DSP instructions for zero-
overhead looping and modulo-addressing support.
Special on-chip peripherals: The SH-DSP contains a DSP as
an “on-chip peripheral.” This DSP unit shares the five-stage
pipeline with the integer unit; the DSP is not a coprocessor.
The CPU contains a fetch-and-decode unit, which manages
the instruction stream for both the integer and DSP units,
routing instructions to the appropriate unit (see EDN’s 1998
“DSP-architecture directory,” April 23, 1998, pg 54). Other
,
more conventional peripherals include memory controllers,
a real-time clock, smart-card and serial codec interfaces, IrDA
support, a floating-point-unit coprocessor, a hardware divi-
sion unit, complex multifunction timers, a PCMCIA inter-
face, and an LCD controller.
The SH-3 contains an MMU with a 128-entry translation-
look-aside buffer (TLB). The TLB caches virtual-to-physical-
address translations from user-created page tables to external
memory, providing both data protection and virtual memo-
ry. Address translation employs a paging system that supports
1- or 4-kbyte pages. The MMU also handles multitasking by
providing multiple virtual-memory modes. Thus, each
process has its own virtual memory and cannot access the
resour
ces of another process or the OS kernel.
Development tools: Hitachi and a number of third-party
vendors offer development-tool support for the SuperH.
Hitachi, Green Hills Software (www
.ghs.com), and Cygnus
(www.cygnus.com) provide C and C++ compilers. Hitachi,
HP (www
.hp.com), Orion Instruments (www.yokogawa.
com), and Sophia Systems (www
.sophia.com) offer in-circuit
emulators. Wind River (www
.windriver
.com), Accelerated
T
echnology Inc (www.atinucleus.com), and Microsoft
(www
.microsoft.com) provide RTOSs. Other tools include
assemblers, ROM emulators, integrated W
indows-based
development environments, debuggers, floating-point
libraries, and networking libraries. Hitachi supports W
in-
dows CE development with the $10,000 D9000, a reconfig
-
urable development platform.
Second sources: Seiko-Epson (www
.seiko.com), VLSI, ST-
Microelectronics, and Sony (www
.sel.sony.com) are licensees.
See EDN’
s Web-site version, www
.ednmag.com, for block
diagram.
Hitachi SuperH Series