EasyManua.ls Logo

HP Integrity Superdome X - Page 85

HP Integrity Superdome X
127 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
The channel between the memory buffer and the DIMM is the DDR channel. Because up to three DIMMs
reside on the same DDR channel and two DDR channels might be configured in lockstep (RAS mode
enabled), up to six DIMMs are affected by a single faulty DIMM. It is important to distinguish faulty or
suspect DIMMs from healthy DIMMs that happen to reside on the same bus.
On a new installation, DDR training failures can result from DIMMs being partially unseated during
shipping. A common symptom of a partially unseated DIMM is a MEM_DIMM_NO_VALID_DELAY event.
If the machine is still in the installation phase and hasn’t been released to the customer, before replacing
a DIMM, try removing and reinstalling all the DIMMs on that DDR channel. A DIMM that has been in use
for some time is unlikely to be spontaneously unseated.
If a DIMM suffers a correctable or uncorrectable error at runtime and needs to be replaced, a DIMM pair
might be identified and indicted. A DIMM pair will be two DIMMs on the same memory buffer with the
same loading letter, such as 19A and 24A. In this case, replace both DIMMs in the pair.
CAE generates error events for faulty or suspect DIMMs as indicted, and these DIMMs should be
replaced.
Health Repository, the EFI info mem command, and IPMI events might also identify additional
deconfigured DIMMs, sometimes called partner-deconfigured DIMMs, lockstep-disabled DIMMs, or
sibling-disabled DIMMs. These DIMMs are healthy and should not be replaced.
To identify a possible faulty DIMM, use the HR SHOW INDICT command. Replace DIMMs that are
indicted. Do not replace DIMMs that are deconfigured unless there are other indications of a faulty DIMM,
such as being specifically identified with DIMMERR.
Solution 3
Cause
Using DIMMERR
If there are memory errors that do not clearly indicate which hardware is at fault, the HR dimmerr
command can be used to look for patterns of memory failures.
You can use DIMMERR as follows:
To corroborate other errors that correspond to a specific DIMM or blade
To indicate memory training faults
To look for DIMM errors in newly installed or replaced DIMMs
To look for DIMM errors during partition boot as part of a system installation
IMPORTANT:
DIMMERR will show memory events that were correctable. It is important to note that correctable
errors are expected on large memory systems and all systems will show several correctable errors
over time. Correctable errors only result in indictment after reaching a certain threshold.
DIMMs should not be replaced for normal correctable errors.
From the Health Repository viewer, enter dimmerr <location>, where <location> is the DIMM slot or
a blade.
Example: dimmerr blade-1/1 returns information about all DIMMs for a server blade in slot 1 of
cabinet 1.
DIMM INFO for Cabinet: 1 Board Slot: 1
dimm-1/1/0/1 Location: 1A
Status: OK No Errors Logged.
dimm-1/1/0/2 Location: 2C
Status: OK No Errors Logged.
dimm-1/1/0/3 Location: 3B
Row Bank Col Type Errors First Detected Last Detected
Troubleshooting 85

Table of Contents

Other manuals for HP Integrity Superdome X

Related product manuals