Name: Cisco CRS-1 - Carrier Routing System Router
Brand: Cisco
Rating: 4 (1 reviews)

To Next Page

To Previous Page

8-172

Cisco IOS XR Troubleshooting Guide for the Cisco CRS-1 Router

OL-21483-02

Chapter 8 Process Monitoring and Troubleshooting

System Manager

Each process is assigned a job ID (JID) when started. The JID does not change when a process is started,

stopped, then restarted. Each process is also assigned a process ID (PID) when started, but this PID

changes each time the process is stopped and restarted.

The System Manager (sysmgr) is the fundamental process and the foundation of the system. The sysmgr

is responsible for monitoring, starting, stopping, and restarting almost all processes on the system. The

restarting of processes is predefined (respawn flag on or off) and honored by sysmgr. The sysmgr is the

parent of all processes started on boot-up and by configuration. Two instances are running on each node

providing a hot standby process level redundancy. Each active process is registered with the SysDB and

once started by the sysmgr active process the sysmgr is notified when it is running. If the sysmgr active

process is dying the standby process takes over the active state and a new standby process is generated.

The sysmgr running on the line card (LC) handles all the system management duties like process

creation, re-spawning, and core-dumping relevant to that node.

The sysmgr itself is started on bootup by the initialization process. Once the sysmgr is started,

initialization hands over the ownership of all processes started by initialization to sysmgr and exits.

Watchdog System Monitor

The Watchdog System Monitor (wdsysmon) keeps historical data on processes and posts this

information to a fault detector dynamic link library (DLL), which can then be queried by manageability

applications. Once per minute, wdsysmon polls the kernel for process data. This data is stored in a

database maintained by the fm_fd_wdsysmon.dll fault detector, which is loaded by wdsysmon.

For more information on wdsysmon and memory thresholds, see the “Watchdog System Monitor”

section on page 9-197 in Chapter 9, “Troubleshooting Memory.”

Deadlock detections

Wdsysmon can attempt to find deadlocks because thread state is returned with the process data.

Wdsysmon specifically looks for mutex deadlocks and local Inter-Process Communication (IPC) hangs.

Only local IPC deadlocks can be detected. If deadlocks are detected, debugging information is collected

in disk0:/wdsysmon_debug.

Deadlocked processes can be stopped and restarted manually using the processes restart command.

Hang detection

When an event manager is created in the system, the event manager library registers the event with

wdsysmon. Wdsysmon expects to periodically hear a “pulse” from every registered event manager in the

system. When an event manager is missing, wdsysmon runs a debug script that shows exactly what the

thread that created the event manager is doing.

Brand	Cisco
Model	CRS-1 - Carrier Routing System Router
Category	Network Router
Language	English

Cisco CRS-1 - Carrier Routing System Router Troubleshooting Guide

Other manuals for Cisco CRS-1 - Carrier Routing System Router