VMware Engineering, Are You Fixing Anything?

I was just reading Josh Andrews’ account of a serious bug still present in the latest releases of VMware vCenter 5.0 (5.0b):

This bug has been known for a while and while U1 contained some mentions about fixing it – we now have U1b and the problem still exists…. Make sure you have a cluster with HA and/or DRS turned on…. Enable VM Storage Profiles… Now return to your cluster summary to verify HA and DRS have been turned off and all settings have been lost.

This is epic bad stuff here, because if there’s one good way to mess a lot of things up it’s to disable DRS. Especially if you have a vCloud Director setup, since if you shut DRS off all your resource pools disappear. If you are like me and have a ton of rules in place to keep VMs separated or on particular hosts it would be disastrous to have those deleted, too.

This resonates with me, because I’ve had several issues opened with VMware, with no resolution, for ages, to the point where VMware Support is closing the cases because there will never be a fix. My biggest one lately has been that if you try to take a VSS quiesced snapshot of a Windows 2008 guest the guest will report that it has NTFS corruption, and fail the snapshot operation. It doesn’t seem like it actually has corruption, chkdsk comes back fine, but my Windows admins are unwilling to just ignore the error. I don’t blame them. I haven’t been able to take a quiesced snapshot in 10 months, and since VMware products like Data Recovery don’t have the ability to NOT take a quiesced snapshot of an OS it thinks it can quiesce they’re dead to me, too. Thankfully the guys at Veeam thoughtfully offer the option to not quiesce, using old-style crash consistent snaps instead.

It really makes me wonder about how priorities get assigned for fixing problems within VMware Engineering. I have the best level of support you can get from VMware, “Business Critical,” which when it comes to reporting an actual, verifiable, reproducible, serious bug means absolutely nothing. For several months now I’ve been told every two weeks that there’s been no movement on this or my other problems I’ve reported. I’m guessing the same is true of Josh’s bug, too. Meanwhile we sit, with no ability to do fundamental operations on VMs, and no DRS & HA settings anymore.

I think I’m about done reporting problems, since nobody is listening. Maybe I’ll just cancel my support contract and blog about the bugs I find, since the last time I did that I had a project manager actually get back to me.

Or, maybe I won’t. After all, Hyper-V takes a VSS-quiesced snapshot of Windows 2008 just fine.

Comments on this entry are closed.

  • Hi,

    Very good point. I also have multiple cases where engineering doesnt seem to be doing anything:

    1) same vss issues as you, opened multiple cases and they have told me that it is a microsoft problem and i should call them..
    2) problems with prom. mode on a dvs, which messes your vds up..
    3) sorting doesnt work in storage views (told them that the day after 5 was released..

    Lets hope you can make a difference.

    Regards

    Hans

    • Interesting — I was just asked to look at a situation where promiscuous mode on a vDS gets reset. Do you have an SR # I can have my guys look up?

      Storage Views & Storage Profiles need a lot of work, for sure.

  • Hi, could I please get some steps to reproduce that problem as I haven’t seen it and I have storage profiles and HA/DRS across multiple clusters. I’m on the latest patches of ESXi and also vCenter 5.0 U1a.

    Hans, I believe some of the problems you report are fixed in U1a. Although some promiscuous mode things are not yet fixed. I have no problem with my storage views. I also haven’t has the VSS issues on 2k8 VM’s with vDR causing any problems. There are some event log warnings but everything still works and I’ve tested it.

    Hopefully you get a resolution on some of your issues.

  • Haven’t been able to reproduce the issue either to be honest. Also have this version running in my lab with Storage Profiles and HA/DRS. (see kb http://kb.vmware.com/2008203.)

  • And as per the comments on Josh Andrew’s site, both the vCenter server AND the vCenter client must be upgraded to get the complete fix. If the issue persists, then a new Service Request needs to be filed as VMware fully recognizes the severity of this issue.
    Oh, and as I highlighted to Andrew, 5.1 is not yet released, so could you please change the vCenter reference to 5.0? Thanks

    • 5.1 is now 5.0, good call. Seems like I have 5.1 on the brain.

  • Seems like Cormac indicates the HA/DRS issue is fixed if both the client and server are upgraded. Hopefully that fixes Josh up. Thanks guys.

    As for the other stuff, I’m still not sure how priorities get set but I’m hoping my complaints here will kick something loose. 10 months of waiting for anything is too long.

  • With all due respect, the blog you quoted regarding a VMware vCenter 5.0 (5.0b) bug is incorrect. The fix we made is in the Client itself and not VC Server. There’s no indication in that post that the author upgraded his Client, as per KB article:
    HA and DRS appear disabled when a Storage Profile is enabled or disabled on a cluster (2008203)