Chapter 4. Continuous availability and manageability 179
Figure 4-9 shows a schematic of a fault isolation register implementation.
Figure 4-9 Schematic of FIR implementation
Fault isolation
The service processor interprets error data that is captured by the FFDC checkers (saved in
the FIRs or other firmware-related data capture methods) to determine the root cause of the
error event.
Root cause analysis might indicate that the event is recoverable, meaning that a service
action point or need for repair has not been reached. Alternatively, it could indicate that a
service action point has been reached, where the event exceeded a pre-determined threshold
or was unrecoverable. Based on the isolation analysis, recoverable error-threshold counts
can be incremented. No specific service action is necessary when the event is recoverable.
When the event requires a service action, additional required information is collected to
service the fault. For unrecoverable errors or for recoverable events that meet or exceed their
service threshold, meaning that a service action point has been reached, a request for
service is initiated through an error logging component.
Memory
CPU
L2 / L3
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
Text
L1
Text
Text
Text
Text
Text
Text
Text
Text
Non-volatile
RAM
Service
Processor
Error checkers
Text
Fault isolation register (FIR)
Unique fingerprint of each
captured error
Log error
Disk