Monday 25 March 2013

SCOM 2007 R2: Notifying about SQL jobs that failed to run


A SQL job may fail to run (start) if there is a problem with the job owner’s account (e.g. it was deleted) or there is a database connectivity problem (e.g. DB was removed) or a login for the SQL agent account was not created or granted necessary rights. By default, SCOM 2007 R2 with SQL Server MP will not alert and notify on SQL jobs that failed to run. Database Backup Failed To Complete Rule alerts on failed backup jobs (e.g. a job started, but then it failed for some reason), but will not alert on jobs that failed to run.

In How to monitor SQL Agent jobs using the SQL Management Pack and OpsMgr Kevin Holman explained how to discover all SQL jobs and enable An SQL job failed to complete successfully Rule in order to alert on all kind of issues, but I haven’t tried it and I’m not sure if this rule actually alerts on jobs that failed to run.

Failed-to-run jobs are logged in Windows application event log as Warning events with ID: 208.

SQL 2005 on Windows 2003:
Log Name: Application
Event Type: Warning
Event Source: SQLAgent$SQL2005
Event ID: 208

SQL 2008 R2 on Windows 2008 R2:
Log Name: Application
Source: SQLSERVERAGENT
Event ID: 208
Level: Warning

So I created 2 Simple Event Detection monitors with Timer Reset, one for SQL 2005 and one for 2008, enabled alerting and configured notifications to log these issues as incidents in SCSM.

For Monitoring Target, make sure to select SQL 2005 DB Engine and SQL 2008 DB Engine.


Event Source ==> Matches wildcard: $Target/Property[Type="MicrosoftSQLServerLibrary614000!Microsoft.SQLServer.DBEngine"]/AgentName$

No comments:

Post a Comment