EasyManua.ls Logo

Intel ARCHITECTURE IA-32 - Deterministic Cache Parameters

Intel ARCHITECTURE IA-32
568 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Optimizing Cache Usage 6
6-53
The baseline for performance comparison is the throughput (bytes/sec)
of 8-MByte region memory copy on a first-generation Pentium M
processor (CPUID signature 0x69n) with a 400-MHz system bus using
byte-sequential technique similar to that shown in Example 6-10. The
degree of improvement relative to the performance baseline for newer
IA-32 processors and platforms with higher system bus speed using
different coding techniques are compared.
The second coding technique moves data at 4-Byte granularity using
REP string instruction. The third column compares the performance of
the coding technique listed in Example 6-11. The fourth column of
performance compares the throughput of fetching 4-KBytes of data at a
time (using hardware prefetch to aggregate bus read transactions) and
writing to memory via 16-Byte streaming stores.
Increases in bus speed is the primary contributor to throughput
improvements. The technique shown in Example 6-12 will likely take
advantage of the faster bus speed in the platform more efficiently.
Additionally, increasing the block size to multiples of 4-KBytes while
keeping the total working set within the second-level cache can improve
the throughput slightly.
The relative performance figure shown in Table 6-2 is representative of
clean microarchitectual conditions within a processor (e.g. looping s
simple sequence of code many times). The net benefit of integrating a
specific memory copy routine into an application (full-featured
applications tend to create many complicated micro-architectural
conditions) will vary for each application.
Deterministic Cache Parameters
If CPUID support the function leaf with input EAX = 4, this is referred
to as the deterministic cache parameter leaf of CPUID (see CPUID
instruction in IA-32 Intel® Architecture Software Developers Manual,
Volume 2A). Software can use the deterministic cache parameter leaf to

Table of Contents

Related product manuals