Tuesday 11 October 2011

SCOM 2007 R2 - the discovery wizard keeps spinning

First time I built the R2 version, I found that the discovery wizard sometimes just keeps spinning forever showing a message about the SQL Broker service at the bottom of the window. This happens only if the filtering for servers/clients is turned on, which turns the option “Verify discovered computers can be contacted” also on.

As this was the first time I installed SCOM 2007 R2 on Windows 2008 R2 using SQL 2008 R2, I rebuilt the servers several times as I thought I missed something. Since that didn’t help, I had to leave it and finish deploying agents on servers, and importing and tuning MPs.

Searching on Google did not help either so I decided to take a closer look into what SCOM is actually doing while discovering Windows devices.

Here is what I found:

When the discovery wizard is off, if you run netstat –a on the SCOM server, there will be a series of connections between the RMS/MS and agents and database server(s). All these connections will have one of the three states: LISTENING, ESTABLISHED or TIME_WAIT.

If you run the wizard and you leave the filter “Computer and Device Classes” on its default setting “Servers and Clients” and you leave the option “Verify discovered computers can be contacted” off, as it is by default, and you repeatedly execute netstat –a (or netstat -a -t 1 > c:\netstat_log.txt), you will see that there are no new connections.

If you run the wizard and you filter devices, set either servers only or clients only, the option “Verify discovered computers can be contacted” will be on and greyed out, so it cannot be turned off. Now, if you run netstat, you will see that SCOM is trying to establish a connection with every Windows device registered in Active Directory, both servers and PCs. These devices are listed by their netbios names, but you may also see printers which are listed by their IPs. Both printers and unavailable Windows devices will have the connection state set to SYN_SENT.

If the discovery process is working, netstat rolls through these devices very quickly, it keeps running even after the wizard pops the list up, pauses for a second or two with every printer, and then it finally completes. Once the wizard completes, the netstat list will show the same connections as before the wizard was launched plus ESTABLISHED connection to the newly discovered devices, and a few others, which is a bit weird. (In my case, these extra machines were Windows 7 desktops and laptops, even though the filter was set to servers only).

If the discovery process fails, that is, it keeps spinning, the netstat list will show many devices that are being verified, both successfully and unsuccessfully. But then, after a while, the devices that were verified are not listed any longer, but only standard connections to the agents and the database server(s), and several unavailable devices (always Windows devices - in my case client machines) with the status set to SYN_SENT. If netstat keeps running while the wizard is still spinning the same set of unavailable devices is listed over and over, as if they are waiting to be processed.

This indicates that SCOM might have issues with handling unavailable (offline) Windows devices, which somehow prevents it from completing the discovery process. Since these devices seem to be either desktops or laptops and since most people use SCOM for monitoring servers, I guess it would be nice if Microsoft makes the option “Verify discovered computers can be contacted” available even when the filter for servers/clients is on, so we can turn it off if we want so.

Discovery Wizard keeps spinning (SCOM 2007 R2 CU5)

No comments:

Post a Comment