System Event Log Messages for IPMI Systems

The tables in this chapter list the system event log (SEL) messages, their severity, and cause.

NOTE: For corrective actions, see the appropriate documentation.

Temperature Sensor Events

The temperature sensor event messages help protect critical components by alerting the systems management console when the temperature rises inside the chassis. These event messages use additional variables, such as sensor location, chassis location, previous state, and temperature sensor value or state.

Table 4-1. Temperature Sensor Events 

Event Message

Severity

Cause

<Sensor Name/Location> temperature sensor detected a failure <Reading> where <Sensor Name/Location> is the entity that this sensor is monitoring. For example, "PROC Temp" or "Planar Temp."

Reading is specified in degree Celsius. For example 100 C.

Critical

Temperature of the backplane board, system board, or the carrier in the specified system <Sensor Name/Location> exceeded the critical threshold.

<Sensor Name/Location> temperature sensor detected a warning <Reading>.

Warning

Temperature of the backplane board, system board, or the carrier in the specified system <Sensor Name/Location> exceeded the
non-critical threshold.

<Sensor Name/Location> temperature sensor returned to warning state <Reading>.

Warning

Temperature of the backplane board, system board, or the carrier in the specified system <Sensor Name/Location> returned from critical state to non-critical state.

<Sensor Name/Location> temperature sensor returned to normal state <Reading>.

Information

Temperature of the backplane board, system board, or the carrier in the specified system <Sensor Name/Location> returned to normal operating range.

The <Sensor Name/Location> temperature is less than the lower warning threshold.

Warning

Temperature of the backplane, system board, system inlet, or the carrier in the specified system <Sensor Name/Location> entered into non-critical state.

The <Sensor Name/Location> temperature is less than the lower critical threshold.

Critical

Temperature of the backplane, system board, system inlet, or the carrier in the specified system <Sensor Name/Location> entered into critical state.

The <Sensor Name/Location> temperature is greater than the upper warning threshold.

Warning

Temperature of the backplane, system board, system inlet, or the carrier in the specified system <Sensor Name/Location> entered into non-critical state.

The <Sensor Name/Location> temperature is greater than the upper critical threshold.

Critical

Temperature of the backplane, system board, system inlet, or the carrier in the specified system <Sensor Name/Location> entered into critical state.

The <Sensor Name/Location> temperature is outside of range.

Critical

Temperature of the backplane, system board, system inlet, or the carrier in the specified system <Sensor Name/Location> is outside of normal operating range.

The <Sensor Name/Location> temperature is within range.

Information

Temperature of the backplane, system board, system inlet, or the carrier in the specified system <Sensor Name/Location> returned to a normal operating range.

Voltage Sensor Events

The voltage sensor event messages monitor the number of volts across critical components. These messages provide status and warning information for voltage sensors for a particular chassis.

Table 4-2. Voltage Sensor Events 

Event Message

Severity

Cause

<Sensor Name/Location> voltage sensor detected a failure <Reading> where <Sensor Name/Location> is the entity that this sensor is monitoring.

Reading is specified in volts.
For example, 3.860 V.

Critical

The voltage of the monitored device has exceeded the critical threshold.

<Sensor Name/Location> voltage sensor state asserted.

Critical

The voltage specified by
<Sensor Name/Location> is in critical state.

<Sensor Name/Location> voltage sensor state de-asserted.

Information

The voltage of a previously reported
<Sensor Name/Location> is returned to normal state.

<Sensor Name/Location> voltage sensor detected a warning <Reading>.

Warning

Voltage of the monitored entity
<Sensor Name/Location> exceeded the warning threshold.

<Sensor Name/Location> voltage sensor returned to normal <Reading>.

Information

The voltage of a previously reported
<Sensor Name/Location> is returned to normal state.

The <Sensor Name/Location> voltage is less than the lower warning threshold.

Warning

Voltage of the monitored Entity <Sensor Name/Location> exceeded the warning threshold.

The <Sensor Name/Location> voltage is less than the lower critical threshold.

Critical

Voltage of the monitored Entity <Sensor Name/Location> exceeded the critical threshold.

The <Sensor Name/Location> voltage is greater than the upper warning threshold.

Warning

Voltage of the monitored Entity <Sensor Name/Location> exceeded the warning threshold.

The <Sensor Name/Location> voltage is greater than the upper critical threshold.

Critical

Voltage of the monitored Entity <Sensor Name/Location> exceeded the critical threshold.

The <Sensor Name/Location> voltage is outside of range.

Critical

Voltage of the monitored Entity <Sensor Name/Location> is outside of normal operating range.

The <Sensor Name/Location> voltage is within range.

Information

Voltage of the monitored Entity <Sensor Name/Location> returned to a normal operating range.

Fan Sensor Events

The cooling device sensors monitor how well a fan is functioning. These messages provide status warning and failure messages for fans for a particular chassis.

Table 4-3. Fan Sensor Events 

Event Message

Severity

Cause

<Sensor Name/Location> Fan sensor detected a failure <Reading> where <Sensor Name/Location> is the entity that this sensor is monitoring. For example "BMC Back Fan" or "BMC Front Fan."

Reading is specified in RPM. For example, 100 RPM.

Critical

The speed of the specified <Sensor Name/Location> fan is not sufficient to provide enough cooling to the system.

<Sensor Name/Location> Fan sensor returned to normal state <Reading>.

Information

The fan specified by <Sensor Name/Location> has returned to its normal operating speed.

<Sensor Name/Location> Fan sensor detected a warning <Reading>.

Warning

The speed of the specified <Sensor Name/Location> fan may not be sufficient to provide enough cooling to the system.

<Sensor Name/Location> Fan Redundancy sensor redundancy degraded.

Information

The fan specified by <Sensor Name/Location> may have failed and hence, the redundancy has been degraded.

<Sensor Name/Location> Fan Redundancy sensor redundancy lost.

Critical

The fan specified by <Sensor Name/Location> may have failed and hence, the redundancy that was degraded previously has been lost.

<Sensor Name/Location> Fan Redundancy sensor redundancy regained

Information

The fan specified by <Sensor Name/Location> may have started functioning again and hence, the redundancy has been regained.

Fan <number> RPM is less than the lower warning threshold.

Warning

The speed of the specified fan might not provide enough cooling to the system.

Fan <number> RPM is less than the lower critical threshold.

Critical

The speed of the specified fan is not sufficient to provide enough cooling to the system.

Fan <number> RPM is greater than the upper warning threshold.

Warning

The speed of the specified fan exceeded the warning threshold.

Fan <number> RPM is greater than the upper critical threshold.

Critical

The speed of the specified fan exceeded the critical threshold.

Fan <number> RPM is outside of range.

Critical

The speed of the specified fan might not provide enough cooling to the system.

Fan <number> RPM is within range.

Information

The speed of the specified fan is operating in a normal range.

Fan <number> is removed.

Critical

A required fan was removed.

Fan <number> was inserted.

Information

A fan was added.

Fan <number> is present.

Information

The total number of fans present.

Fan <number> is absent.

Critical

A required fan is missing.

The fans are redundant.

Information

One or more fans may have started functioning or installed and the redundancy has been regained.

Fan redundancy is lost.

Critical

One or more required fans may have failed or removed and hence, the redundancy was lost.

Fan redundancy is degraded.

Warning

One or more fans may have failed or removed and hence, the redundancy has been degraded.

Processor Status Events

The processor status messages monitor the functionality of the processors in a system. These messages provide processor health and warning information of a system.

Table 4-4. Processor Status Events 

Event Message

Severity

Cause

<Processor Entity> status processor sensor IERR, where <Processor Entity> is the processor that generated the event. For example, PROC for a single processor system and PROC # for multiprocessor system.

Critical

IERR internal error generated by the <Processor Entity>. This event is generated due to processor internal error.

<Processor Entity> status processor sensor Thermal Trip.

Critical

The processor generates this event before it shuts down because of excessive heat caused by lack of cooling or heat synchronization.

<Processor Entity> status processor sensor recovered from IERR.

Information

This event is generated when a processor recovers from the internal error.

<Processor Entity> status processor sensor disabled.

Warning

This event is generated for all processors that are disabled.

<Processor Entity> status processor sensor terminator not present.

Information

This event is generated if the terminator is missing on an empty processor slot.

<Processor Entity> presence was deasserted.

Critical

This event is generated when the system could not detect the processor.

<Processor Entity> presence was asserted.

Information

This event is generated when the earlier processor detection error was corrected.

<Processor Entity> thermal tripped was deasserted.

Information

This event is generated when the processor has recovered from an earlier thermal condition.

<Processor Entity> configuration error was asserted.

Critical

This event is generated when the processor configuration is incorrect.

<Processor Entity> configuration error was deasserted.

Information

This event is generated when the earlier processor configuration error was corrected.

<Processor Entity> throttled was asserted.

Warning

This event is generated when the processor slows down to prevent overheating.

<Processor Entity> throttled was deasserted.

Information

This event is generated when the earlier processor throttled event was corrected.

CPU <number> has an internal error (IERR).

Critical

The specified CPU generated an internal error.

CPU <number> has a thermal trip (over-temperature) event.

Critical

The CPU generates this event before it shuts down because of excessive heat caused by lack of cooling or heat synchronization.

CPU <number> configuration is unsupported.

Warning

The specified CPU is not support for this system.

CPU <number> is present.

Information

The specified CPU is present.

CPU <number> terminator is present.

Information

This event is generated if the terminator is present on a processor slot.

CPU <number> terminator is absent.

Warning

This event is generated if the terminator is missing on an empty processor slot.

CPU <number> is throttled.

Warning

This event is generated when the processor slows down to prevent overheating.

CPU <number> is absent.

Critical

This event is generated when the system could not detect the processor.

CPU <number> is operating correctly.

Information

This event is generated when the processor recovered from an error.

CPU <number> is configured correctly.

Information

The specified CPU is configured correctly.

Power Supply Events

The power supply sensors monitor the functionality of the power supplies. These messages provide status and warning information for power supplies for a particular system.

Table 4-5. Power Supply Events 

Event Message

Severity

Cause

<Power Supply Sensor Name> power supply sensor removed.

Critical

This event is generated when the power supply sensor is removed.

<Power Supply Sensor Name> power supply sensor AC recovered.

Information

This event is generated when the power supply has been replaced.

<Power Supply Sensor Name> power supply sensor returned to normal state.

Information

This event is generated when the power supply that failed or removed was replaced and the state has returned to normal.

<Entity Name> PS Redundancy sensor redundancy degraded.

Information

Power supply redundancy is degraded if one of the power supply sources is removed or failed.

<Entity Name> PS Redundancy sensor redundancy lost.

Critical

Power supply redundancy is lost if only one power supply is functional.

<Entity Name> PS Redundancy sensor redundancy regained.

Information

This event is generated if the power supply has been reconnected or replaced.

<Power Supply Sensor Name> predictive failure was asserted

Critical

This event is generated when the power supply is about to fail.

<Power Supply Sensor Name> input lost was asserted

Critical

This event is generated when the power supply is unplugged.

<Power Supply Sensor Name> predictive failure was deasserted

Information

This event is generated when the power supply has recovered from an earlier predictive failure event.

<Power Supply Sensor Name> input lost was deasserted

Information

This event is generated when the power supply is plugged in.

PS 1 Status: Power supply sensor for PS 1, presence was asserted

Information

This event is generated when the power supply is plugged in.

PS 1 Status: Power supply sensor for PS 1, presence was deasserted

Critical

This event is generated when the power supply is removed.

PS 1 Status: Power supply sensor for PS 1, failure was asserted

Critical

This event is generated when the power supply has failed.

PS 1 Status: Power supply sensor for PS 1, failure was deasserted

Information

This event is generated when the power supply has recovered from an earlier failure event.

PS 1 Status: Power supply sensor for PS 1, predictive failure was asserted

Warning

This event is generated when the power supply is about to fail.

PS 1 Status: Power supply sensor for PS 1, predictive failure was deasserted

Information

This event is generated when the power supply has recovered from an earlier predictive failure event.

PS 1 Status: Power supply sensor for PS 1, input lost was asserted

Critical

This event is generated when AC power is removed from the power supply.

PS 1 Status: Power supply sensor for PS 1, input lost was deasserted

Information

This event is generated when the power supply is plugged in.

PS 1 Status: Power supply sensor for PS 1, configuration error was asserted

Warning/Critical

This event is generated when an invalid power supply configuration is detected.

PS 1 Status: Power supply sensor for PS 1, configuration error was deasserted

Information

This event is generated when the power supply has recovered from an earlier invalid configuration.

Power supply <number> is present.

Information

This event is generated when the power supply is plugged in.

Power supply <number> is absent.

Critical

This event is generated when the power supply is removed.

Power supply <number> failed.

Critical

This event is generated when the power supply has failed.

A predictive failure detected on power supply <number>.

Warning

This event is generated when the power supply is about to fail.

The power input for power supply <number> is lost.

Critical

This event is generated when input power is removed from the power supply.

The input power for power supply <number> has been restored.

Information

This event is generated if the power supply has been reconnected or replaced.

Power supply <number> is incorrectly configured.

Critical / Warning

This event is generated when an invalid power supply configuration is detected.

Power supply <number> is correctly configured.

Information

This event is generated when the power supply has recovered from an earlier invalid configuration.

Power supply <number> is operating normally.

Information

This event is generated when the power supply has recovered from an earlier failure event.

Cannot communicate with power supply <number>.

Critical

The power supply may operate, however power supply monitoring is degraded.

The temperature for power supply <number> is in a warning range.

Warning

Temperature of specified power supply entered into non-critical state.

The temperature for power supply <number> is outside of range.

Critical

Temperature of specified power supply entered into critical state.

An under voltage fault detected on power supply <number>.

Critical

The specified power supply detected inefficient voltage.

An over voltage fault detected on power supply <number>.

Critical

The specified power supply detected an over voltage condition.

An over current fault detected on power supply <number>.

Critical

The specified power supply detected an over current condition.

Fan failure detected on power supply <number>.

Critical

The specified power supply fan has failed.

Communication has been restored to power supply <number>.

Information

This event is generated when the power supply has recovered from an earlier communication problem.

A power supply wattage mismatch is detected; power supply <number> is rated for <value> watts.

Critical

This event is generated when there is more than one power supplies in the system and the power supply wattage do not match.

Power supply <number> wattage mismatch corrected.

Information

This event is generated when the power supply has recovered from an earlier power supply wattage mismatch.

Power supply redundancy is lost.

Critical

Power supply redundancy is lost if only one power supply is functional.

Power supply redundancy is degraded.

Warning

Power supply redundancy is degraded if one of the power supply sources is removed or failed.

The power supplies are redundant.

Information

This event is generated if the power supply has been reconnected or replaced.

Memory ECC Events

The memory ECC event messages monitor the memory modules in a system. These messages monitor the ECC memory correction rate and the type of memory events that occurred.

Table 4-6. Memory ECC Events 

Event Message

Severity

Cause

ECC error correction detected on Bank # DIMM [A/B].

Information

This event is generated when there is a memory error correction on a particular Dual Inline Memory Module (DIMM).

ECC uncorrectable error detected on Bank # [DIMM].

Critical

This event is generated when the chipset is unable to correct the memory errors. Usually, a bank number is provided and DIMM may or may not be identifiable, depending on the error.

Correctable memory error logging disabled.

Critical

This event is generated when the chipset in the ECC error correction rate exceeds a predefined limit.

Persistent correctable memory errors detected on a memory device at location(s) <DIMM number>.

Warning

This event is generated when there is a memory error correction on a particular Dual Inline Memory Module (DIMM).

Multi-bit memory errors detected on a memory device at location(s) <location>.

Critical

This event is generated when the chipset is unable to correct the memory errors. Usually, more than on DIMM is listed because a single DIMM may or may not be identifiable, depending on the error.

Correctable memory error logging disabled for a memory device at location <location>.

Critical

This event is generated when the chipset in the ECC error correction rate exceeds a predefined limit.

BMC Watchdog Events

The BMC watchdog operations are performed when the system hangs or crashes. These messages monitor the status and occurrence of these events in a system.

Table 4-7. BMC Watchdog Events 

Event Message

Severity

Cause

BMC OS Watchdog timer expired.

Information

This event is generated when the BMC watchdog timer expires and no action is set.

BMC OS Watchdog performed system reboot.

Critical

This event is generated when the BMC watchdog detects that the system has crashed (timer expired because no response was received from Host) and the action is set to reboot.

BMC OS Watchdog performed system power off.

Critical

This event is generated when the BMC watchdog detects that the system has crashed (timer expired because no response was received from Host) and the action is set to power off.

BMC OS Watchdog performed system power cycle.

Critical

This event is generated when the BMC watchdog detects that the system has crashed (timer expired because no response was received from Host) and the action is set to power cycle.

The OS watchdog timer reset the system.

Critical

This event is generated when the BMC watchdog detects that the system has crashed (timer expired because no response was received from Host) and the action is set to reboot.

The OS watchdog timer powered cycle the system.

Critical

This event is generated when the BMC watchdog detects that the system has crashed (timer expired because no response was received from Host) and the action is set to power cycle.

The OS watchdog timer powered off the system.

Critical

This event is generated when the BMC watchdog detects that the system has crashed (timer expired because no response was received from Host) and the action is set to power off.

The OS watchdog timer expired.

Critical

This event is generated when the BMC watchdog timer expires and no action is set.

Memory Events

The memory modules can be configured in different ways in particular systems. These messages monitor the status, warning, and configuration information about the memory modules in the system.

Table 4-8. Memory Events 

Event Message

Severity

Cause

Memory RAID redundancy degraded.

Warning

This event is generated when there is a memory failure in a RAID-configured memory configuration.

Memory RAID redundancy lost.

Critical

This event is generated when redundancy is lost in a RAID-configured memory configuration.

Memory RAID redundancy regained

Information

This event is generated when the redundancy lost or degraded earlier is regained in a RAID-configured memory configuration.

Memory Mirrored redundancy degraded.

Warning

This event is generated when there is a memory failure in a mirrored memory configuration.

Memory Mirrored redundancy lost.

Critical

This event is generated when redundancy is lost in a mirrored memory configuration.

Memory Mirrored redundancy regained.

Information

This event is generated when the redundancy lost or degraded earlier is regained in a mirrored memory configuration.

Memory Spared redundancy degraded.

Warning

This event is generated when there is a memory failure in a spared memory configuration.

Memory Spared redundancy lost.

Critical

This event is generated when redundancy is lost in a spared memory configuration.

Memory Spared redundancy regained.

Information

This event is generated when the redundancy lost or degraded earlier is regained in a spared memory configuration.

Memory RAID is redundant.

Information

This event is generated when the memory redundancy mode has change to RAID redundant.

Memory RAID redundancy is lost. Check memory device at location(s) <DIMM number>.

Critical

This event is generated when redundancy is lost in a RAID-configured memory configuration.

Memory RAID redundancy is degraded. Check memory device at location(s) <DIMM number >.

Warning

This event is generated when there is a memory failure in a RAID-configured memory configuration.

Memory is not redundant.

Information

This event is generated when the memory redundancy mode has change to non-redundant.

Memory mirror is redundant.

Information

This event is generated when the memory redundancy mode has change to mirror redundant.

Memory mirror redundancy is lost. Check memory device at location(s) <DIMM number>.

Critical

This event is generated when redundancy is lost in a mirror-configured memory configuration.

Memory mirror redundancy is degraded. Check memory device at location <DIMM number >.

Warning

This event is generated when there is a memory failure in a mirror-configured memory configuration.

Memory spare is redundant.

Information

This event is generated when the memory redundancy mode has change to spare redundant.

Memory spare redundancy is lost. Check memory device at location <DIMM number>.

Critical

This event is generated when redundancy is lost in a sparer-configured memory configuration.

Memory spare redundancy is degraded. Check memory device at location <DIMM number>.

Warning

This event is generated when there is a memory failure in a spare-configured memory configuration.

Hardware Log Sensor Events

The hardware logs provide hardware status messages to the system management software. On particular systems, the subsequent hardware messages are not displayed when the log is full. These messages provide status and warning messages when the logs are full.

Table 4-9. Hardware Log Sensor Events 

Event Message

Severity

Cause

Log full detected.

Critical

This event is generated when the SEL device detects that only one entry can be added to the SEL before it is full.

Log cleared.

Information

This event is generated when the SEL is cleared.

Drive Events

The drive event messages monitor the health of the drives in a system. These events are generated when there is a fault in the drives indicated.

Table 4-10. Drive Events 

Event Message

Severity

Cause

Drive <Drive #> asserted fault state.

Critical

This event is generated when the specified drive in the array is faulty.

Drive <Drive #> de-asserted fault state.

Information

This event is generated when the specified drive recovers from a faulty condition.

Drive <Drive #>

drive presence was asserted

Informational

This event is generated when the drive is installed.

Drive <Drive #>

predictive failure was asserted

Warning

This event is generated when the drive is about to fail.

Drive <Drive #>

predictive failure was deasserted

Informational

This event is generated when the drive from earlier predictive failure is corrected.

Drive <Drive #>

hot spare was asserted

Warning

This event is generated when the drive is placed in a hot spare.

Drive <Drive #>

hot spare was deasserted

Informational

This event is generated when the drive is taken out of hot spare.

Drive <Drive #>

consistency check in progress was asserted

Warning

This event is generated when the drive is placed in consistency check.

Drive <Drive #>

consistency check in progress was deasserted

Informational

This event is generated when the consistency check of the drive is completed.

Drive <Drive #>

in critical array was
asserted

Critical

This event is generated when the drive is placed in critical array.

Drive <Drive #>

in critical array was deasserted

Informational

This event is generated when the drive is removed from critical array.

Drive <Drive #>

in failed array was asserted

Critical

This event is generated when the drive is placed in the fail array.

Drive <Drive #>

in failed array was deasserted

Informational

This event is generated when the drive is removed from the fail array.

Drive <Drive #>

rebuild in progress was asserted

Informational

This event is generated when the drive is rebuilding.

Drive <Drive #>

rebuild aborted was asserted

Warning

This event is generated when the drive rebuilding process is aborted.

Drive <Drive #> is installed.

Informational

This event is generated when the drive is installed.

Drive <Drive #> is removed.

Critical

This event is generated when the drive is removed.

Fault detected on drive <Drive #>.

Critical

This event is generated when the specified drive in the array is faulty.

Intrusion Events

The chassis intrusion messages are a security measure. Chassis intrusion alerts are generated when the system's chassis is opened. Alerts are sent to prevent unauthorized removal of parts from the chassis.

Table 4-11. Intrusion Events 

Event Message

Severity

Cause

<Intrusion sensor Name> sensor detected an intrusion.

Critical

This event is generated when the intrusion sensor detects an intrusion.

<Intrusion sensor Name> sensor returned to normal state.

Information

This event is generated when the earlier intrusion has been corrected.

<Intrusion sensor Name> sensor intrusion was asserted while system was ON

Critical

This event is generated when the intrusion sensor detects an intrusion while the system is on.

<Intrusion sensor Name> sensor intrusion was asserted while system was OFF

Critical

This event is generated when the intrusion sensor detects an intrusion while the system is off.

The chassis is open.

Critical

This event is generated when the intrusion sensor detects an intrusion.

The chassis is closed.

Information

This event is generated when the earlier intrusion has been corrected.

The chassis is open while the power is on.

Critical

This event is generated when the intrusion sensor detects an intrusion while the system is on.

The chassis is closed while the power is on.

Information

This event is generated when the earlier intrusion has been corrected while the power is on.

The chassis is open while the power is off.

Critical

This event is generated when the intrusion sensor detects an intrusion while the system is off.

The chassis is closed while the power is off.

Information

This event is generated when the earlier intrusion has been corrected while the power is off.

BIOS Generated System Events

The BIOS-generated messages monitor the health and functionality of the chipsets, I/O channels, and other BIOS-related functions.

Table 4-12. BIOS Generated System Events 

Event Message

Severity

Cause

System Event I/O channel chk.

Critical

This event is generated when a critical interrupt is generated in the I/O Channel.

System Event PCI Parity Err.

Critical

This event is generated when a parity error is detected on the PCI bus.

System Event Chipset Err.

Critical

This event is generated when a chip error is detected.

System Event PCI System Err.

Information

This event indicates historical data, and is generated when the system has crashed and recovered.

System Event PCI Fatal Err.

Critical

This error is generated when a fatal error is detected on the PCI bus.

System Event PCIE Fatal Err.

Critical

This error is generated when a fatal error is detected on the PCIE bus.

POST Err

Critical

This event is generated when an error occurs during system boot. See the system documentation for more information on the error code.

POST fatal error #<number> or <error description>

Critical

This event is generated when a fatal error occurs during system boot. See Table 4-13 for more information.

Memory Spared

redundancy lost

Critical

This event is generated when memory spare is no longer redundant.

Memory Mirrored

redundancy lost

Critical

This event is generated when memory mirroring is no longer redundant.

Memory RAID

redundancy lost

Critical

This event is generated when memory RAID is no longer redundant.

Err Reg Pointer

OEM Diagnostic data event was asserted

Information

This event is generated when an OEM event occurs. OEM events can be used by the service team to better understand the cause of the failure.

System Board PFault Fail Safe state asserted

Critical

This event is generated when the system board voltages are not at normal levels.

System Board PFault Fail Safe state deasserted

Information

This event is generated when earlier PFault Fail Safe system voltages return to a normal level.

Memory Add

(BANK# DIMM#) presence was asserted

Information

This event is generated when memory is added to the system.

Memory Removed

(BANK# DIMM#) presence was asserted

Information

This event is generated when memory is removed from the system.

Memory Cfg Err

configuration error (BANK# DIMM#) was asserted

Critical

This event is generated when memory configuration is incorrect for the system.

Mem Redun Gain

redundancy regained

Information

This event is generated when memory redundancy is regained.

Mem ECC Warning

transition to non-critical from OK

Warning

This event is generated when correctable ECC errors have increased from a normal rate.

Mem ECC Warning

transition to critical from less severe

Critical

This event is generated when correctable ECC errors reach a critical rate.

Mem CRC Err

transition to non-recoverable

Critical

This event is generated when CRC errors enter a non-recoverable state.

Mem Fatal SB CRC

uncorrectable ECC was
asserted

Critical

This event is generated while storing CRC errors to memory.

Mem Fatal NB CRC

uncorrectable ECC was
asserted

Critical

This event is generated while removing CRC errors from memory.

Mem Overtemp

critical over temperature was asserted

Critical

This event is generated when system memory reaches critical temperature.

USB Over-current

transition to non-recoverable

Critical

This event is generated when the USB exceeds a predefined current level.

Hdwr version err hardware incompatibility
(BMC/iDRAC Firmware and CPU mismatch) was asserted

Critical

This event is generated when there is a mismatch between the BMC and iDRAC firmware and the processor in use or vice versa.

Hdwr version err hardware incompatibility (BMC/iDRAC Firmware and CPU mismatch) was deasserted

Information

This event is generated when an earlier mismatch between the BMC and iDRAC firmware and the processor is corrected.

SBE Log Disabled

correctable memory error logging disabled was asserted

Critical

This event is generated when the ECC single bit error rate is exceeded.

CPU Protocol Err

transition to
non-recoverable

Critical

This event is generated when the processor protocol enters a non-recoverable state.

CPU Bus PERR

transition to
non-recoverable

Critical

This event is generated when the processor bus PERR enters a non-recoverable state.

CPU Init Err

transition to
non-recoverable

Critical

This event is generated when the processor initialization enters a non-recoverable state.

CPU Machine Chk

transition to
non-recoverable

Critical

This event is generated when the processor machine check enters a non-recoverable state.

Logging Disabled

all event logging disabled was asserted

Critical

This event is generated when all event logging is disabled.

LinkT/FlexAddr: Link Tuning sensor, device option ROM failed to support link tuning or flex address (Mezz XX) was asserted

Critical

This event is generated when the PCI device option ROM for a NIC does not support link tuning or the Flex addressing feature.

LinkT/FlexAddr: Link Tuning sensor, failed to program virtual MAC address (<location>) was asserted.

Critical

This event is generated when BIOS fails to program virtual MAC address on the given NIC device.

PCIE NonFatal Er: Non Fatal IO Group sensor, PCIe error(<location>)

Warning

This event is generated in association with a CPU IERR.

I/O Fatal Err: Fatal IO Group sensor, fatal IO error (<location>)

Critical

This event is generated in association with a CPU IERR and indicates the PCI/PCIe device that caused the CPU IERR.

Unknown system event sensor unknown system hardware failure was asserted

Critical

This event is generated when an unknown hardware failure is detected.

An I/O channel check error was detected.

Critical

This event is generated when a critical interrupt is generated in the I/O Channel.

A PCI parity error was detected on a component at bus <number> device <number> function <number>.

Critical

This event is generated when a parity error is detected on the PCI bus.

A PCI parity error was detected on a component at slot <number>.

Critical

This event is generated when a parity error is detected on the PCI bus.

A PCI system error was detected on a component at bus <number> device <number> function <number>.

Critical

This is generated when the system has crashed and recovered.

A PCI system error was detected on a component at slot <number>.

Critical

This is generated when the system has crashed and recovered.

A bus correctable error was detected on a component at bus <number> device <number> function <number>.

Critical

This is generated when the system has detected bus correctable errors.

A bus correctable error was detected on a component at slot <number>.

Critical

This is generated when the system has detected bus correctable errors.

A bus uncorrectable error was detected on a component at bus <number> device <number> function <number>.

Critical

This is generated when the system has detected bus uncorrectable errors.

A bus uncorrectable error was detected on a component at slot <number>.

Critical

This is generated when the system has detected bus uncorrectable errors.

A fatal error was detected on a component at bus <number> device <number> function <number>.

Critical

This error is generated when a fatal error is detected on the PCI bus.

A fatal error was detected on a component at slot <number>.

Critical

This error is generated when a fatal error is detected on the PCI bus.

A fatal IO error detected on a component at bus <number> device <number> function <number>.

Critical

This error is generated when a fatal IO error is detected.

A fatal IO error detected on a component at slot <number>.

Critical

This error is generated when a fatal IO error is detected.

A non-fatal PCIe error detected on a component at bus <number> device <number> function <number>.

Warning

This event is generated in association with a CPU IERR.

A non-fatal PCIe error detected on a component at slot <number>.

Warning

This event is generated in association with a CPU IERR.

A non-fatal IO error detected on a component at bus <number> device <number> function <number>.

Warning

This event is generated in association with a CPU IERR and indicates the PCI/PCIe device that caused the CPU IERR.

Memory device was added at location <location>.

Information

This event is generated when memory is added to the system.

Memory device is removed from location <location>.

Information

This event is generated when memory is removed from the system.

Unsupported memory configuration; check memory device at location <location>.

Critical

This event is generated when memory configuration is incorrect for the system.

Correctable memory error rate exceeded for <location>.

Warning

This event is generated when correctable ECC errors have increased from a normal rate.

Correctable memory error rate exceeded for <location>.

Critical

This event is generated when correctable ECC errors reach a critical rate.

Memory device at location <location> is overheating.

Critical

This event is generated when system memory reaches critical temperature.

An OEM diagnostic event occurred.

Information

This event is generated when an OEM event occurs. OEM events can be used by the service team to better understand the cause of the failure.

CPU <number> protocol error detected.

Critical

This event is generated when the processor protocol enters a non-recoverable state.

CPU bus parity error detected.

Critical

This event is generated when the processor bus PERR enters a non-recoverable state.

CPU <number> initialization error detected.

Critical

This event is generated when the processor initialization enters a non-recoverable state.

CPU <number> machine check error detected.

Critical

This event is generated when the processor machine check enters a non-recoverable state.

All event logging is disabled.

Critical

This event is generated when all event logging is disabled.

Logging is disabled.

Critical

This event is generated when the ECC single bit error rate is exceeded.

The system board fail-safe voltage is outside of range.

Critical

This event is generated when the system board voltages are not at normal levels.

The system board fail-safe voltage is within range.

Information

This event is generated when earlier Fail-Safe system voltages return to a normal level.

A hardware incompatibility detected between BMC/iDRAC firmware and CPU.

Critical

This event is generated when there is a mismatch between the BMC and iDRAC firmware and the processor in use or vice versa.

A hardware incompatibility was corrected between BMC/iDRAC firmware and CPU.

Information

This event is generated when an earlier mismatch between the BMC and iDRAC firmware and the processor is corrected.

Device option ROM on embedded NIC failed to support Link Tuning or FlexAddress.

Critical

This event is generated when the PCI device option ROM for a NIC does not support link tuning or the Flex addressing feature.

Device option ROM on mezzanine card <number> failed to support Link Tuning or FlexAddress.

Critical

This event is generated when the PCI device option ROM for a NIC does not support link tuning or the Flex addressing feature.

Failed to program virtual MAC address on a component at bus <bus> device <device> function <function>.

Critical

This event is generated when BIOS fails to program virtual MAC address on the given NIC device.

Failed to get Link Tuning or FlexAddress data from iDRAC.

Critical

This event is generated when BIOS could not obtain virtual MAC address or Link Tuning data from iDRAC.

An unknown system hardware failure detected.

Critical

This event is generated when an unknown hardware failure is detected.

POST fatal error <error description>

Critical

This event is generated when a fatal error occurs during system boot. See Table 4-13 for more information.

POST Code Table

Table 4-13 lists the POST Code errors that are generated when a fatal error occurs during system boot.

Table 4-13. POST Code Errors 

Fatal Error
Code

Description

Cause

80

No memory detected

This error code implies that no memory is installed.

81

Memory detected but is not configurable

This error code indicates memory configuration error that could be a result of bad memory, mismatched memory or bad socket.

82

Memory configured but not usable.

This error code indicates memory sub-system failure.

83

System BIOS shadow failure

This error code indicates system BIOS shadow failure.

84

CMOS failure

This error code indicates that CMOS RAM is not working.

85

DMA controller failure

This error code indicates DMA controller failure.

86

Interrupt controller failure

This error code indicates interrupt controller failure.

87

Timer refresh failure

This error code indicates timer refresh failure.

88

Programmable interval timer error

This error code indicates a programmable interval timer error.

89

Parity error

This error code indicates a parity error.

8A

SIO failure

This error code indicates SIO failure.

8B

Keyboard controller failure

This error code indicates keyboard controller failure.

8C

SMI initialization failure

This error code indicates SMI initialization failure.

C0

Shutdown test failure

This error code indicates a shutdown test failure.

C1

POST Memory test failure

This error code indicates bad memory detection.

C2

RAC configuration failure

Check screen for the actual error message

C3

CPU configuration failure

Check screen for the actual error message

C4

Incorrect memory configuration

Memory population order not correct.

FE

General failure after video

Check screen for the actual error message

Operating System Generated System Events

Table 4-14. Operating System Generated Events 

Description

Severity

Cause

System Event: OS stop event

OS graceful shutdown detected

Information

The operating system was shutdown/restarted normally.

OEM Event data record (after OS graceful shutdown/restart event)

Information

Comment string accompanying an operating system shutdown/restart.

System Event: OS stop event runtime

critical stop

Critical

The operating system encountered a critical error and was stopped abnormally.

OEM Event data record (after OS bugcheck event)

Information

Operating system bugcheck code and paremeters.

A critical stop occurred during OS load.

Critical

The operating system encountered a critical error and was stopped abnormally while loading.

A runtime critical stop occurred.

Critical

The operating system encountered a critical error and was stopped abnormally.

An OS graceful stop occurred.

Information

The operating system was stopped.

An OS graceful shut-down occurred.

Information

The operating system was shutdown normally.

Cable Interconnect Events

The cable interconnect messages in Table 4-15 are used for detecting errors in the hardware cabling.

Table 4-15. Cable Interconnect Events 

Description

Severity

Cause

Cable sensor <Name/Location>

Configuration error was asserted.

Critical

This event is generated when the cable is not connected or is incorrectly connected.

Cable sensor <Name/Location>

Connection was asserted.

Information

This event is generated when the earlier cable connection error was corrected.

The <name> cable or interconnect is not connected or is improperly connected.

Critical

This event is generated when the named cable or interconnect is not connected or is incorrectly connected.

The <name> cable or interconnect is connected.

Information

This event is generated when named cable or interconnect earlier cable or interconnect connection error was corrected.

Battery Events

Table 4-16. Battery Events 

Description

Severity

Cause

<Battery sensor Name/Location>

Failed was asserted

Critical

This event is generated when the sensor detects a failed or missing battery.

<Battery sensor Name/Location>

Failed was deasserted

Information

This event is generated when the earlier failed battery was corrected.

<Battery sensor Name/Location>

is low was asserted

Warning

This event is generated when the sensor detects a low battery condition.

<Battery sensor Name/Location>

is low was deasserted

Information

This event is generated when the earlier low battery condition was corrected.

The <Battery sensor Name/Location> battery is low.

Warning

This event is generated when the sensor detects a low battery condition.

The <Battery sensor Name/Location> battery is operating normally.

Information

This event is generated when an earlier battery condition was corrected.

The <Battery sensor Name/Location> battery has failed.

Critical

This event is generated when the sensor detects a failed or missing battery.

Power And Performance Events

The power and performance events are used to detect degradation in system performance with change in power supply.

Table 4-17. Power And Performance Events 

Description

Severity

Cause

System Board Power Optimized: Performance status sensor for System Board, degraded, <description of why> was deasserted

Normal

This event is generated when system performance was restored.

System Board Power Optimized: Performance status sensor for System Board, degraded, <description of why> was asserted

Warning

This event is generated when change in power supply degrades system performance.

System Board Power Optimized: Performance status sensor for System Board, degraded, power capacity changed was asserted

Warning

This event is generated when change in power supply degrades system performance.

System Board Power Optimized: Performance status sensor for System Board, degraded, power capacity changed was deasserted

Normal

This event is generated when the system performance is restored.

System Board Power Optimized: Performance status sensor for System Board, degraded, user defined power capacity was asserted

Warning

This event is generated when a change in power supply degrades system performance.

System Board Power Optimized: Performance status sensor for System Board, degraded, user defined power capacity was deasserted

Normal

This event is generated when the system performance is restored.

System Board Power Optimized: Performance status sensor for System Board, Halted, system power exceeds capacity was asserted

Critical

This event is generated when a change in power supply degrades system performance.

System Board Power Optimized: Performance status sensor for System Board, Halted, system power exceeds capacity was deasserted

Normal

This event is generated when system performance was restored.

The system performance degraded.

Warning

This event is generated when a change degrades system performance.

The system performance degraded because of thermal protection.

Warning

This event is generated when a change in thermal protection degrades system performance.

The system performance degraded because cooling capacity has changed.

Warning

This event is generated when a change in cooling degrades system performance.

The system performance degraded because power capacity has changed.

Warning

This event is generated when change in power supply degrades system performance.

The system performance degraded because of user-defined power capacity has changed.

Warning

This event is generated when change in power supply degrades system performance.

The system halted because system power exceeds capacity.

Critical

This event is generated when there is inefficient power for the system.

The system performance degraded because power exceeds capacity.

Warning

This event is generated when system power is inefficient causing system performance to degrade.

The system performance degraded because power draw exceeds the power threshold.

Critical

This event is generated when system power is inefficient causing system performance to degrade.

The system performance restored

Information

This event is generated when system performance was restored.

Entity Presence Events

The entity presence messages are used for detecting different hardware devices.

Table 4-18. Entity Presence Events 

Description

Severity

Cause

<Device Name>

presence was asserted

Information

This event is generated when the device was detected.

<Device Name>

absent was asserted

Critical

This event is generated when the device was not detected.

The <Device Name> is present.

Information

This event is generated when the device was detected.

The <Device Name> is absent.

Critical

This event is generated when the device was not detected.

Miscellaneous

The following table provides events related to hardware and software components like mezzanine cards, sensors, firmware etc. and compatibility issues.

Table 4-19. Miscellaneous Events 

Description

Severity

Cause

System Board Video Riser: Module sensor for System Board, device removed was asserted

Critical

This event is generated when the required module is removed.

Mezz B<slot number> Status: Add-in Card sensor for Mezz B<slot number>, install error was asserted

Critical

This event is generated when an incorrect Mezzanine card is installed for I/O fabric.

Mezz C<slot number> Status: Add-in Card sensor for Mezz C<slot number>, install error was asserted

Critical

This event is generated when an incorrect Mezzanine card is installed for I/O fabric.

Hdwar version err: Version Change sensor, hardware incompatibility was asserted

Critical

This event is generated when an incompatible hardware is detected.

Hdwar version err: Version Change sensor, hardware incompatibility (BMC firmware) was asserted

Critical

This event is generated when a hardware is incompatible with the firmware.

Hdwar version err: Version Change sensor, hardware incompatibility (BMC firmware and CPU mismatch) was asserted

Critical

This event is generated when the CPU and firmware are not compatible.

Link Tuning: Version Change sensor, successful software or F/W change was deasserted

Warning

This event is generated when the link tuning setting for proper NIC operation fails to update.

Link Tuning: Version Change sensor, successful hardware change <device slot number> was deasserted

Warning

This event is generated when the link tuning setting for proper NIC operation fails to update.

LinkT/FlexAddr: Link Tuning sensor, failed to program virtual MAC address (Bus # Device # Function #) was asserted

Critical

This event is generated when Flex address can be programmed for this device.

LinkT/FlexAddr: Link Tuning sensor, device option ROM failed to support link tuning or flex address (Mezz <location>) was asserted

Critical

This event is generated when ROM does not support Flex address or link tuning.

LinkT/FlexAddr: Link Tuning sensor, failed to get link tuning or flex address data from BMC/iDRAC was asserted

Critical

This event is generated when link tuning or Flex address information is not obtained from BMC/iDRAC.

The <name> is removed.

Critical

This event is generated when the device was removed.

The <name> is inserted.

Information

This event is generated when the device was inserted or installed.

A fabric mismatch detected between IOM and mezzanine card <number>.

Critical

This event is generated when an incorrect Mezzanine card is installed for I/O fabric.

Hardware incompatibility detected with mezzanine card <number>.

Critical

This event is generated when an incorrect Mezzanine card is installed in the system.

The QuickPath Interconnect (QPI) width degraded.

Warning

This event is generated when the bus is not operating at maximum speed or width.

The QuickPath Interconnect (QPI) width regained.

Information

This event is generated when the bus is operating at maximum speed or width.

BIOS detected an error configuring the Intel Trusted Execution Technology (TXT).

Critical

This event is generated when TXT initialization failed.

Processor detected an error while performing an Intel Trusted Execution Technology (TXT) operation.

Critical

This event is generated when TXT CPU microcode boot failed.

BIOS Authenticated Code Module detected an Intel Trusted Execution Technology (TXT) error during POST.

Critical

This event is generated when TXT Post failed.

SINIT Authenticated Code Module detected an Intel Trusted Execution Technology (TXT) error at boot.

Critical

This event is generated when the Authenticated Code Module detected a TXT initialization failure.

Intel Trusted Execution Technology (TXT) is operating correctly.

Information

This event is generated when the TXT returned from a previous failure.

Failure detected on Removable Flash Media <name>.

Critical

This event is generated when the SD card module is installed but improperly configured or failed to initialize.

Removable Flash Media <name> is write protected.

Warning

This event is generated when the module is write-protected. Changes may not be written to the media.

Internal Dual SD Module is redundant.

Information

This event is generated when both SD cards are functioning properly.

Internal Dual SD Module redundancy is lost.

Critical

This event is generated when either one of the SD cards or both the SD cards are not functioning properly.