vCenter Hardware Status Stops Polling After 1 Hour

by Bob Plankers on January 18, 2012 · 5 comments

in Virtualization

(Update, 1/19/2012, 1130 CST: The product manager for this feature, commenting below, has indicated this is actually a bug, and I’ve emailed her the details of my case so she can help track down where the information I was told came from, and fix my problem, too)

——————

For what seems like an eternity I’ve had a support case open with VMware because the hardware status functionality in vCenter (4.1 and 5) stops updating. I was told today by my support guy that, for a variety of reasons that cannot be known by me, VMware has decided that the hardware status polling should stop after 1 hour. So my bug isn’t a bug, it’s a feature, case closed.

I am documenting this here, for you all to see, for two reasons. The first is that many people, like me, are probably using the hardware status to actually monitor hardware. Sorry folks, if you’re doing that you should probably go refresh those views — chances are you have an alarm you didn’t know about. That’s actually how I discovered this, I noticed that I had a failed drive on a host but vCenter didn’t report it.

Second, though, is that terminating polling of hardware status after an hour is one of the dumbest things I’ve ever heard. For me, one of the big selling points of VMware vCenter was that I could do a lot of monitoring using the built-in functionality, and not install the kludgy, expensive, terrible hardware monitoring software from Dell/HP/IBM/etc. I also could set up alarms to take action when there was a hardware fault, like put a host in maintenance mode, etc. Now I can’t, because unless it’s polling it won’t notice a problem. vCenter just got a whole lot less useful, even though it pretends that it can do all that, including shipping with a bunch of hardware alarms.

Imagine if the local weather guy for me, here in Madison, WI, got on the news this morning and told me it was 85 degrees outside (F, that’s 29 C), because he last checked the temperature in August and then stopped polling the thermometer. That’s essentially what this is. It’s 7 degrees outside today (-14 C), and while I’m not dumb enough to think it’s really 85 degrees I am dumb enough to assume that a hardware monitoring solution would operate like all others on Earth. Namely, that it would keep polling because things change, and just because something didn’t fail inside an hour of vCenter’s startup doesn’t mean it hasn’t failed since.

So thanks a ton, VMware. I, and probably a lot of others, don’t appreciate the silent loss of functionality here, endangering our environments because you made a ridiculous decision.

I could say more, but I have to go build a hardware monitoring solution now.

{ 5 comments }

Dan January 18, 2012 at 5:07 PM

12th generation dell hardware will have agent-free hardware monitoring built in to the motherboard. We’ll see how good it is, easy to setup, and how much it costs…

Bob Plankers January 18, 2012 at 5:42 PM

I’m now using agent-free querying via Nagios & ipmitool on Linux, will write it up tomorrow for those that are interested. Works fine for all generations of Dells, actually.

Donna Reineck January 19, 2012 at 10:22 AM

Hello Bob,

Feel free to reach out to me so we can discuss the bug you are experiencing. I am the Product Manager for vCenter Hardware Status monitoring at VMware, and believe me, you’ve encountered a bug; not feature. vCenter monitoring service polls ESX hosts on a regular interval; no stopping after an hour.

If you’ve had an SR on the issue, please share that with me so I can chase this down internally.

Jordan Lederman January 19, 2012 at 1:53 PM

I’ve seen this problem too and per my SE’s request, I’ve opened a ticket. VMWare SR# 12135721201.

Thanks to Bob for making this known.

LiPeng December 11, 2012 at 1:30 AM

hardware status not ok :主机数据未更新或传感器正在重新启动

Comments on this entry are closed.

Previous post:

Next post: