Sometimes it happens that a server stops reporting performance data to SCOM. But we still get alerted when a monitor gets critical.
When do you notice this problem? Right, when you want to dive into performance data to troubleshoot something. Then you see that the server hasn’t sent performance data for the last couple of hours, days, weeks,…
To solve (or rather workaround) this issue I’ve managed to create a SQL query to point me toward servers that haven’t sent performance data in the last 2 hours (excluding servers that are in maintenance mode):
with _CTE AS (select bme.[Path], pcv.ObjectName, pcv.CounterName, pcv.InstanceName, pdv.SampleValue, pdv.timesampled, convert(varchar(max),pdv.TimeSampled,103)+ ' ' + convert(varchar(max),pdv.TimeSampled,108) as [Last Sample], ROW_NUMBER() OVER (PARTITION BY bme.Path ORDER BY pdv.TimeSampled desc) AS RowNumber from PerformanceDataAllView pdv with (NOLOCK) inner join PerformanceCounterView pcv with (NOLOCK) on pdv.performancesourceinternalid = pcv.performancesourceinternalid inner join BaseManagedEntity bme with (NOLOCK) on pcv.ManagedEntityId = bme.BaseManagedEntityId where objectname = 'Memory' AND countername = 'Available MBytes' AND bme.BaseManagedEntityId NOT IN (SELECT M.BaseManagedEntityId FROM MaintenanceMode AS M WITH (NOLOCK) WHERE IsInMaintenanceMode = 1) ) select * from _CTE WHERE RowNumber = 1 and timesampled < DATEADD(HOUR, -2, GETUTCDATE()) order by timesampled DESC
I’ve added this query the SquaredUp dashboard I check every morning.
But I also want to get alerted, so I created a powershell monitor to include this SQL query. As soon as a server stops sending performance data I get notified and I can (manually) solve it:
- Stop Microsoft Monitoring Agent
- Cleanup Health Service State folder
- Start Microsoft Monitoring Agent
Hope this helps,