(Update, 1/19/2012, 1130 CST: The product manager for this feature, commenting below, has indicated this is actually a bug, and I’ve emailed her the details of my case so she can help track down where the information I was told came from, and fix my problem, too)
For what seems like an eternity I’ve had a support case open with VMware because the hardware status functionality in vCenter (4.1 and 5) stops updating. I was told today by my support guy that, for a variety of reasons that cannot be known by me, VMware has decided that the hardware status polling should stop after 1 hour. So my bug isn’t a bug, it’s a feature, case closed.
I am documenting this here, for you all to see, for two reasons. The first is that many people, like me, are probably using the hardware status to actually monitor hardware. Sorry folks, if you’re doing that you should probably go refresh those views — chances are you have an alarm you didn’t know about. That’s actually how I discovered this, I noticed that I had a failed drive on a host but vCenter didn’t report it.
Second, though, is that terminating polling of hardware status after an hour is one of the dumbest things I’ve ever heard. For me, one of the big selling points of VMware vCenter was that I could do a lot of monitoring using the built-in functionality, and not install the kludgy, expensive, terrible hardware monitoring software from Dell/HP/IBM/etc. I also could set up alarms to take action when there was a hardware fault, like put a host in maintenance mode, etc. Now I can’t, because unless it’s polling it won’t notice a problem. vCenter just got a whole lot less useful, even though it pretends that it can do all that, including shipping with a bunch of hardware alarms.
Imagine if the local weather guy for me, here in Madison, WI, got on the news this morning and told me it was 85 degrees outside (F, that’s 29 C), because he last checked the temperature in August and then stopped polling the thermometer. That’s essentially what this is. It’s 7 degrees outside today (-14 C), and while I’m not dumb enough to think it’s really 85 degrees I am dumb enough to assume that a hardware monitoring solution would operate like all others on Earth. Namely, that it would keep polling because things change, and just because something didn’t fail inside an hour of vCenter’s startup doesn’t mean it hasn’t failed since.
So thanks a ton, VMware. I, and probably a lot of others, don’t appreciate the silent loss of functionality here, endangering our environments because you made a ridiculous decision.
I could say more, but I have to go build a hardware monitoring solution now.