ARM Cortex-M3 - Page 359

To Next Page

To Previous Page

Instruction Timing

Unrestricted Access Non-Confidential

Cycle count information:

• P = pipeline reload

• N = count of elements

Combined Branch 16

1+P

CBZ.

Extended 16

0-1

IT and NOP (includes YIELD).

Divide 32

2-12

SDIV and UDIV. 32/32 divides both signed and unsigned

with 32-bit quotient result (no remainder, it can be derived

by subtraction). This earlies out when dividend and divisor

are close in size.

Sleep 32

1+W

WFI, WFE, and SEV are in the class of hinted NOP

instructions that control sleep behavior.

Barriers 16

1+B

ISB, DSB, and DMB are barrier instructions that ensure

certain actions have taken place before the next instruction

is executed.

Saturation 32 1 SSAT and USAT perform saturation on a register. They

perform three tasks. They normalize the value using shift,

test for overflow from a selected bit position (the Q value)

and set the xPSR Q bit. Saturation refers to the largest

unsigned value or the largest/smallest signed value for the

size selected.

a. Branches take one cycle for instruction and then pipeline reload for target instruction. Non-taken branches are 1 cycle total.

Taken branches with an immediate are normally 1 cycle of pipeline reload (2 cycles total). Taken branches with register

operand are normally 2 cycles of pipeline reload (3 cycles total). Pipeline reload is longer when branching to unaligned 32-bit

instructions in addition to accesses to slower memory. A branch hint is emitted to the code bus that permits a slower system

to pre-load. This can reduce the branch target penalty for slower memory, but never less than shown here.

b. Generally, load-store instructions take two cycles for the first access and one cycle for each additional access. Stores with

immediate offsets take one cycle.

c. UMULL/SMULL/UMLAL/SMLAL use early termination depending on the size of source values. These are interruptible

(abandoned/restarted), with worst case latency of one cycle. MLAL versions take four to seven cycles and MULL versions

take three to five cycles. For MLAL, the signed version is one cycle longer than the unsigned.

d. IT instructions can be folded.

e. DIV timings depend on dividend and divisor. DIV is interruptible (abandoned/restarted), with worst case latency of one cycle.

When dividend and divisor are similar in size, divide terminates quickly. Minimum time is for cases of divisor larger than

dividend and divisor of zero. A divisor of zero returns zero (not a fault), although a debug trap is available to catch this case.

f. Sleep is one cycle for the instruction plus as many sleep cycles as appropriate. WFE only uses one cycle when event has

passed. WFI is normally more than one cycle unless an interrupt happens to pend exactly when entering WFI.

g. ISB takes one cycle (acts as branch). DMB and DSB take one cycle unless data is pending in the write buffer or LSU. If an

interrupt comes in during a barrier, it is abandoned/restarted.

Table 18-1 Instruction timings (continued)

Instruction type Size Cycles count Description

ARM Cortex-M3 - Page 359

Table of Contents

Related product manuals