EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #343 background imageLoading...
Page #343 background image
Optimizing Cache Usage 6
6-53
The baseline for performance comparison is the throughput (bytes/sec)
of 8-MByte region memory copy on a first-generation Pentium M
processor (CPUID signature 0x69n) with a 400-MHz system bus using
byte-sequential technique similar to that shown in Example 6-10. The
degree of improvement relative to the performance baseline for newer
IA-32 processors and platforms with higher system bus speed using
different coding techniques are compared.
The second coding technique moves data at 4-Byte granularity using
REP string instruction. The third column compares the performance of
the coding technique listed in Example 6-11. The fourth column of
performance compares the throughput of fetching 4-KBytes of data at a
time (using hardware prefetch to aggregate bus read transactions) and
writing to memory via 16-Byte streaming stores.
Increases in bus speed is the primary contributor to throughput
improvements. The technique shown in Example 6-12 will likely take
advantage of the faster bus speed in the platform more efficiently.
Additionally, increasing the block size to multiples of 4-KBytes while
keeping the total working set within the second-level cache can improve
the throughput slightly.
The relative performance figure shown in Table 6-2 is representative of
clean microarchitectual conditions within a processor (e.g. looping s
simple sequence of code many times). The net benefit of integrating a
specific memory copy routine into an application (full-featured
applications tend to create many complicated micro-architectural
conditions) will vary for each application.
Deterministic Cache Parameters
If CPUID support the function leaf with input EAX = 4, this is referred
to as the deterministic cache parameter leaf of CPUID (see CPUID
instruction in IA-32 Intel® Architecture Software Developers Manual,
Volume 2A). Software can use the deterministic cache parameter leaf to

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals