EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #385 background imageLoading...
Page #385 background image
Multi-Core and Hyper-Threading Technology 7
7-39
block size for loop blocking should be determined by dividing the target
cache size by the number of logical processors available in a physical
processor package. Typically, some cache lines are needed to access
data that are not part of the source or destination buffers used in cache
blocking, so the block size can be chosen between one quarter to one
half of the target cache (see also, Chapter 3).
Software can use the deterministic cache parameter leaf of CPUID to
discover which subset of logical processors are sharing a given cache.
(See Chapter 6.) Therefore, guideline above can be extended to allow all
the logical processors serviced by a given cache to use the cache
simultaneously, by placing an upper limit of the block size as the total
size of the cache divided by the number of logical processors serviced
by that cache. This technique can also be applied to single-threaded
applications that will be used as part of a multitasking workload.
User/Source Coding Rule 32. (H impact, H generality) Use cache blocking
to improve locality of data access. Target one quarter to one half of the cache
size when targeting IA-32 processors supporting Hyper-Threading Technology
or target a block size that allow all the logical processors serviced by a cache
to share that cache simultaneously.
Shared-Memory Optimization
Maintaining cache coherency between discrete processors frequently
involves moving data across a bus that operates at a clock rate
substantially slower that the processor frequency.
Minimize Sharing of Data between Physical Processors
When two threads are executing on two physical processors and sharing
data, reading from or writing to shared data usually involves several bus
transactions (including snooping, request for ownership changes, and
sometimes fetching data across the bus). A thread accessing a large
amount of shared memory is likely to have poor processor-scaling
performance.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals