To Next Page

To Previous Page

Instruction Timing

Unrestricted Access Non-Confidential

18.3 Load-store timings

This section describes how best to pair instructions. This achieves more reductions in

timing.

• STR Rx,[Ry,#imm] is always one cycle. This is because the address generation is

performed in the initial cycle, and the data store is performed at the same time as

the next instruction is executing. If the store is to the store buffer, and the store

buffer is full, the next instruction is delayed until the store can complete. If the

store is not to the store buffer, such as to the Code segment, and that transaction

stalls, the impact on timing is only felt if another load or store operation is

executed before completion.

• LDR Rx!,[any] is not normally pipelined. That is, base update load is generally at

least a two-cycle operation (more if stalled). However, if the next instruction does

not require to read from a register, the load is reduced to one cycle. Non register

writing instructions include CMP, TST, NOP, and non-taken IT controlled

instructions.

• LDR PC,[any] is always a blocking operation. This means minimally two cycles

for the load, and three cycles for the pipeline reload. So at least five cycles (more

if stalled on the load or the fetch).

• LDR Rx,[PC,#imm] might add a cycle because of contention with the fetch unit.

• TBB and TBH are also blocking operations. These are minimally two cycles for

the load, one cycle for the add, and three cycles for the pipeline reload. This

means at least six cycles (more if stalled on the load or the fetch).

• LDR any are pipelined when possible. This means that if the next instruction is

an LDR or non-base updating STR, and the destination of the first LDR is not

used to compute the address for the next instruction, then one cycle is removed

from the cost of the next instruction. So, an LDR might be followed by an STR,

so that the STR writes out what the LDR loaded. More multiple LDRs can be

pipelined together. Some optimized examples:

— LDR R0,[R1]; LDR R1,[R2] - normally three cycles total

— LDR R0,[R1,R2]; STR R0,[R3,#20] - normally three cycles total

— LDR R0,[R1,R2]; STR R1,[R3,R2] - normally three cycles total

— LDR R0,[R1,R5]; LDR R1,[R2]; LDR R2,[R3,#4] - normally four cycles

total.

Questions and Answers:

Need help?

Do you have a question about the ARM Cortex-M3 and is the answer not in the manual?

ARM Cortex-M3 Specifications

General

Architecture	ARMv7-M
Instruction Set	Thumb-2
Pipeline Stages	3-stage
Interrupts	Nested Vectored Interrupt Controller (NVIC)
Interrupt Controller	Nested Vectored Interrupt Controller (NVIC)
Memory Protection Unit	Optional
Power Consumption	Varies by implementation
Max Clock Speed	Up to 100 MHz
Debugging	JTAG and Serial Wire Debug (SWD)
Operating Voltage	1.8V to 3.6V
Manufacturing Process	Varies by implementation
Core Type	32-bit