8-172
Cisco IOS XR Troubleshooting Guide for the Cisco CRS-1 Router
OL-21483-02
Chapter 8 Process Monitoring and Troubleshooting
System Manager
System Manager
Each process is assigned a job ID (JID) when started. The JID does not change when a process is started,
stopped, then restarted. Each process is also assigned a process ID (PID) when started, but this PID
changes each time the process is stopped and restarted.
The System Manager (sysmgr) is the fundamental process and the foundation of the system. The sysmgr
is responsible for monitoring, starting, stopping, and restarting almost all processes on the system. The
restarting of processes is predefined (respawn flag on or off) and honored by sysmgr. The sysmgr is the
parent of all processes started on boot-up and by configuration. Two instances are running on each node
providing a hot standby process level redundancy. Each active process is registered with the SysDB and
once started by the sysmgr active process the sysmgr is notified when it is running. If the sysmgr active
process is dying the standby process takes over the active state and a new standby process is generated.
The sysmgr running on the line card (LC) handles all the system management duties like process
creation, re-spawning, and core-dumping relevant to that node.
The sysmgr itself is started on bootup by the initialization process. Once the sysmgr is started,
initialization hands over the ownership of all processes started by initialization to sysmgr and exits.
Watchdog System Monitor
The Watchdog System Monitor (wdsysmon) keeps historical data on processes and posts this
information to a fault detector dynamic link library (DLL), which can then be queried by manageability
applications. Once per minute, wdsysmon polls the kernel for process data. This data is stored in a
database maintained by the fm_fd_wdsysmon.dll fault detector, which is loaded by wdsysmon.
For more information on wdsysmon and memory thresholds, see the “Watchdog System Monitor”
section on page 9-197 in Chapter 9, “Troubleshooting Memory.”
Deadlock detections
Wdsysmon can attempt to find deadlocks because thread state is returned with the process data.
Wdsysmon specifically looks for mutex deadlocks and local Inter-Process Communication (IPC) hangs.
Only local IPC deadlocks can be detected. If deadlocks are detected, debugging information is collected
in disk0:/wdsysmon_debug.
Deadlocked processes can be stopped and restarted manually using the processes restart command.
Hang detection
When an event manager is created in the system, the event manager library registers the event with
wdsysmon. Wdsysmon expects to periodically hear a “pulse” from every registered event manager in the
system. When an event manager is missing, wdsysmon runs a debug script that shows exactly what the
thread that created the event manager is doing.