EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #366 background imageLoading...
Page #366 background image
IA-32 Intel® Architecture Optimization
7-20
The best practice to reduce the overhead of thread synchronization is to
start by reducing the application’s requirements for synchronization.
Intel Thread Profiler can be used to profile the execution timeline of
each thread and detect situations where performance is impacted by
frequent occurrences of synchronization overhead.
Several coding techniques and operating system (OS) calls that are
frequently used for thread synchronization. These include spin-wait
loops, spin-locks, critical sections, to name a few. Choosing the optimal
OS calls for the circumstance and implementing synchronization code
with parallelism in mind are critical in minimizing the cost of handling
thread synchronization.
SSE3 provides two instructions (MONITOR/MWAIT) to help
multithreaded software improve synchronization between multiple
agents. In the first implementation of MONITOR and MWAIT, these
instructions are available to operating system so that operating system
can optimize thread synchronization in different areas. For example, an
operating system can use MONITOR and MWAIT in its system idle
loop (known as C0 loop) to reduce power consumption. An operating
system can also use MONITOR and MWAIT implement its C1 loop to
improve the responsiveness of C1 loop. (See Chapter 7 in the IA-32
Intel® Architecture Software Developers Manual, Volume 3A).
Choice of Synchronization Primitives
Thread synchronization often involves modifying some shared data
while protecting the operation using synchronization primitives. There
are many primitives to choose from; guidelines that are useful when
selecting synchronization primitives are:
Favor compiler intrinsics or an OS provided interlocked API for
atomic updates of simple data operation, such as increment and
compare/exchange. This will be more efficient than other more
complicated synchronization primitives with higher overhead. For
more information on using different synchronization primitives, see

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals