Why Use SD Cards For VMware ESXi?

I’ve had four interactions now regarding my post on replacing a failed SD card in one of my servers. They’ve ranged from inquisitive:

to downright rude:

“SD cards are NOT reliable and you are putting youre [sic^2] infrastructure at risk. Id [sic] think a person like you would know to use autodeploy.”

Aside from that fellow’s malfunctioning apostrophe, he has a good, if blunt, point. SD cards aren’t all that reliable, and there are other technologies to get a hypervisor like ESXi on a host. So why use SD cards?

1. Cost. Looking at dell.com, if I outfit a Dell PowerEdge R630 with a traditional setup of two magnetic disks and a decent RAID controller my costs are:

300 GB 10K SAS 2.5″ disk: $212.75
300 GB 10K SAS 2.5″ disk: $212.75
PERC H730: $213.47
Keep My Hard Drive, 5 years: $213.46
Power for this setup, at 5.9 Watts per drive (as per Tom’s Hardware), guestimating 5.9 Watts for the RAID controller, and $0.14133 per kWh in my locale: $109.60 for 5 years.
Labor costs dealing with drive replacements, monitoring, etc.: $200.00 (this is low).

This comes to $1162.03 per server. On a 32 node cluster that’s $37,184.96, or the cost of three servers, over five years.

In contrast, the Dell Internal Dual SD Module is $104.60 per server with two 16 GB SD cards. That’s $3347.20 for a 32 node cluster.

To head off the inevitable comment: the PERC H310/H330 is not a decent RAID controller. To start, it isn’t even certified for VMware VSAN. Anybody that argues that the H330 is fine ought to be okay with the mirroring the Internal Dual SD Module does, because the two are about equal in that regard.

2. Use drive bays more productively. Say that I do want to put local disk in my servers, be it some SSD so I can do caching (a la SanDisk Flashsoft, PernixData, vFRC, etc.) or maybe do VSAN, I’d have to use two of my limited drive bays for boot volumes. That isn’t the most productive use of my expensive drive bays (and data center space).

3. Avoid dependency loops. Auto Deploy is an interesting VMware feature but it relies on a functioning network, DHCP & PXE & TFTP, DNS, and vCenter infrastructure to work. And that’sΒ a problem when you’re in the middle of an outage (planned or unplanned) and any of that infrastructure is a VM.

If your vCenter is a VM how do you start everything up after an outage? Oh, you run a management cluster that doesn’t Auto Deploy… well that’s a pain in the ass, because you now have a vSphere cluster that’s different. Different means harder to manage, which means human error and additionalΒ operational cost. What’s the ongoing cost of that, vs $104.60 per server?

If your DHCP/PXE/TFTP server is a VM how do you start everything up after an outage? Oh, you use Auto Deploy with local caching? Okay, where does it cache? Traditional magnetic media… ah, you’ve chosen expensive AND complicated!

My network guys use a bunch of VMs to provide network functionality like DHCP, some DNS (half our anycast DNS infrastructure is on VMs, half on physical hosts), carrier-grade NAT, log collection, etc. It’s great – vSphere dramatically reduces risk and improves uptime for their services. We always have to make sure that we keep track of dependencies, though, which is easy to do when there are so few.

4. Avoid unreliable vSphere features. While we’re on the topic of Auto Deploy I’d just like to say that it isn’t production-ready. First, it’s not at all configurable from the vCenter GUIs. It’s all done from PowerShell, which is hard for many IT shops. People just aren’t as good at scripting and CLIs as they should be. Second, it relies on the perpetually crappy Host Profiles. I don’t think I’ve ever seen a cluster that isn’t complaining about host profile compliance. And when you look at what’s out of compliance you see it’s some parameter that gets automatically changed by vCenter. Or the local RAID controller pathing, or a CD-ROM drive, or something that should just be automatically handled for you. And Lord help you if you want to use mutual CHAP with host profiles.

“I seem to have forgotten all the different iSCSI passwords you entered to be compliant with the Hardening Guide, Bob” – Host Profiles, on every edit.

Auto Deploy also scares me a little when it comes to my pre-existing datastores. I don’t build ESXi hosts with fibre channel storage connected, lest something go wrong and do something bad to a few hundred TB of storage. Yet every time an ESXi host boots from Auto Deploy it’ll look to do some partitioning of local storage. It isn’t supposed to interact with “remote” storage, but I don’t trust VMware’s QA very much, especially with underused features like this. Not worth the risk.

Auto Deploy & host profiles are interesting but until VMware puts more effort into both I cannot endorse putting either in your environment’s critical support path, if only because the alternatives are so cheap & reliable.

5. Boot From SAN is complicated, too. The three major alternatives to SD cards are traditional magnetic media, Auto Deploy, and boot from SAN. Boot from SAN is another one of those ideas that seems great on paper but doesn’t really pan out well in real life. First, look at your disk vendor’s upgrade notes. Pay attention to all the caveats if you’re booting from SAN, versus if you’re booting locally. A number of vendors don’t even support array software updates when you’re booting from the SAN. It all has to come down, and that’s lame.

Second, you’ve got dependency issues again. If you’re booting locally you need power and cooling and you can figure everything else out later. If you’re booting off the SAN you need working power, cooling, networking/SAN, etc. to start. You’re also exposed to a lot more human error, too. Someone screws up a SAN zoning change and your whole infrastructure is offline, versus just some VMs. Good luck with the finger pointing that ensues.

Last, the pre-boot environment on servers is hard to manage, inflexible, and isn’t real helpful for troubleshooting. To make changes you need to be on the console, as very little of it is manageable through typical system management tools. Configuring this sort of boot often uses the horrible BIOS interfaces on the CNAs or NICs you have installed, or archaic DOS utilities you have to figure out how to cram on an ISO or bootable USB drive. That isn’t worth anybody’s time.

6. It just freakin’ works. When it comes right down to it none of these approaches to booting an ESXi host have any ROI. None. Zip. Zero. So every minute you spend messing around with them is a minute of your life you wasted.

The solution is super cheap, from both CapEx and OpEx perspectives. It doesn’t take long to get to $104.60 of labor, especially when Dell will also pre-load ESXi on your new SD cards, thereby saving you even more time.

Once booted, ESXi only writes configuration data back to the card(s) every 10 minutes, so despite the limited write cycles of flash a decent SD card willΒ last the lifetime of the server. And if it doesn’t, the mirroring is reliable enough to let you limp along. Replacement & remirroring is easy, just say Yes at the prompt.

Last, they save you from a ton of extra complexity, complexity with no ROI. I don’t know about you but I’d rather be working on real problems than spinning my wheels managing an overcomplicated environment.

SD cards — Just Enough Storage.

30 thoughts on “Why Use SD Cards For VMware ESXi?”

  1. Good post. Surprised anyone criticized SD cards anymore. It seems to be highly recommended for VSAN. We use them in hundreds of blades, saving us from buying spinning disk for all of them.

  2. Great post. I think anyone that is that die-hard about Auto Deploy spent too much time trying to make it work and is attempting to preserve dignity. I’ve been pushing SD cards in all ESXi hosts for years, for every reason you mentioned. To date, I’ve had fewer SD cards fail (none) vs. spinning media (dozens) over a 4 year period (they are a whole lot cheaper and more reliable to keep on the shelf, too).

    • I was shocked to have one die. πŸ™‚ That’s partly why I wrote the original post, because nobody had anything about them out here. Probably because the approach is reliable.

  3. Good article for SMB clients.
    How come that you didn’t mention the esxi patching aspect? Isn’t it a critical process for any production esxi? Patching esxi is an unpredictable as it might just crash esxi, but the worst part is that vmware won’t provide you any official support afterwards. VMware states that the is NO HCL for SD (show stopper for any enterprise) and the only supported deployment is embedded esxi installed by your hardware vendor.
    There are many reasons why SD isn’t and won’t be taken into consideration for large scale deployments. If you know of any private clouds RA please share.
    Thanks for the article as it brought many benefits of SD for SMB.

    • I didn’t mention patching because patching works just fine with SD cards, and I have had no issues with patching hosts running on SD cards configured in this manner. The only trouble I’ve ever had was with some sketchy USB sticks in the lab, and that was in the lab, and a long time ago. It’s smooth sailing.

      If you’re using the Dell Internal Dual SD Module (or the equivalent from HP, etc.) and ordered the server with ESXi preinstalled you meet all three support guidelines. If you installed ESXi yourself you meet two, and are still eligible for support:
      http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010574

      Given that KB article the lack of an HCL is irrelevant. VMware supports these configurations, for us and others, in enterprise settings. If you’re not taking the SD options from Dell, HP, and others into consideration for all deployments of all sizes you’re overlooking a large cost & complexity savings, especially at scale. Your choice, I guess, but I can find better ways to spend that money.

    • Lets talk about patching. using non-SD card systems means I need to patch the RAID controller firmware (a far dicer thing than just patching a ESXi host). I technically have drive firmware that needs patching too. Local drives increases the patch/update footprint. VMware Update Manager works just fine with SD cards (I had some hosts that went from 3.5u4 to 5.1 over a couple years without any issues).

  4. I’ve had a very high failure rate of SDHC cards in vSphere clusters on HP ProLiant equipment (15 cards across 300 servers, using HP/Sandisk media). Enough so that I’ve moved away from using them and back to simple RAID. I’m also concerned when I see people use SD or USB for single server deployments. I don’t think VMware communicates the risk of this configuration very well.

  5. I do similar with my file server… it runs Nas4free from a LiveCD, and stores the config on a USB stick… since the USB is only ever used during boot (read operation) and config changes (rare), the longevity has proven to be quite decent thus far… I would wonder whether something similar would be possible for ESX.

  6. I’ve been having this discussion with a couple of collages for some years now. I think its a bad idea because we only sell HP servers and they have no raid solution for SD cards.

    That being said allot of people are not aware of the extra configuration an ESX host needs with SD cards. When you first boot up a host it actually warns you: ESX does not have persistent storage. So you need to define “scratch partitions” on your SAN. Something that isn’t supported in NFS enviroment. So you need FC or iSCSI…. Anyway, I wrote a (crappy) blog about it with more information I found: http://www.cloudpressure.com/?p=16

      • Even one that provides either dated or incorrect information? NFS works fine here, and nothing in KB 1033696 suggests otherwise (for almost 2 years, according to the update history).

  7. Well I don’t see any issue to boot ESXi from an SD card, it will load the kernel to memory and once it’s running in memory it won’t do that much on the SD card, it will only write some logs etc.

    A sort of RAID system for SD cards sounds cool, however I never seen any (I must say that I only work with HP stuff).

    Auto Deploy could be a nice feature, however I think you need to have a couple of servers installed the old fashion way (SD or HDD) and create a DRS rule to group the most critical server on the ESXi hosts that won’t use Auto Deploy for starting ESXi.

    However why do you want to make things more complex and add $$$ if it’s super easy to deploy ESXi on a SD card πŸ˜‰
    And personally I didn’t see that much problems with SD cards, USB sticks is a complete different story I will never use them for production systems because I had to many issues with them πŸ™

  8. I manage 8 ESXi hosts and they all use mirrored SD cards for the ESXi operating system. For logs, some of the older servers do have magnetic disks, but the newer ones we redirect the logs to san storage so they are persistant.

    Other than that, I see no reason for disks when you already have a redundant storage array in place. Our initial ESX implemetation a few years back did have sata disks mirrored, but we’ve spent lots of time and money over the years replacing disks as they went bad. I even had one go on Christmas eve that took out BOTH disks. Luckilly we keep more hardware power than needed so other hosts could take care of the load until the server was ultimately replaced with one that that uses dual SD cards and no local disks.

  9. I have been running SD cards in HP servers for 5 years now. Never thought twice about it just works. Now we can put read ssd’s in open bays for acceleration. Nice post Bob

  10. “The PERC H310/H330 is not a decent RAID controller. To start, it isn’t even certified for VMware VSAN.”

    Actually.. it’s a good RAID controller; it just has this one little pesky drawback that impacts VSAN: the maximum queue depth as provided by the custom Dell firmware seems to be limited. The limitation is not that the controller has a reliability issue.

    That said; I have used Dell Dual SD module and mirrored SD cards to boot the server, AND a mirrored pair of drives in RAID1 to provide the scratch partition and local VMFS storage on the same server. In some cases, the SD cards can provide faster boot times, and power consumption can be lower.

    My main reservation with SD cards, is the tendency for some of our folks to want to just go out to some BestBuy or eBay and purchase some random consumer level SD cards of various brands to fit servers with, or to replace a failed card with.

    It is not that Enterprise level SD cards are necessarily unreliable, but they are not infallible either — and harder to replace than a hot-swap disk drive; it’s that there are many _bad_ SD cards on the market from various brands, no-name brands, and counterfeits.

    Plenty of SD cards that would work fine in cameras, but likely to fail early and cause headaches, when placed inside servers.

  11. The IBM x3650 M4’s we got last year had a 64GB Enterprise Value SSD drive option priced at $199 that was great for reads but slow on writes – perfect for a boot drive. I bought two for each host and had them hooked into the built-in ServeRAID that is a zero-cost option and put them in RAID 1. It doesn’t appear that this size SSD is available now, however, as the smallest option is a 120GB for $379. We were using USB sticks to book ESXi from on our last round of servers that was very reliable, but very very slow to boot. Maintenance of these USB-booted hosts was painful to do. Now they boot really quick! I’m curious if the boot times for ESXi on SD cards are “slow” or “fast” for those using SD?

    • That’s a good way to do it. I’d put boot times from SD on par with traditional spinning disk. Most of the overall time is spent doing device detection, loading drivers, etc. anyhow.

    • As a sidenote, all of those IBM “Enterprise Value” SSDs were withdrawn from marketing last June, after being available for not even 2 years. I’m not sure what the story is there, but it definitely doesn’t help when you’re trying to standardize on a platform (x3650 M4, for example) that is still being marketed.

      Hopefully the situation is different with other vendors..

  12. Right on! I love your points about Auto-deploy and Host Profiles. Furthermore, with the SD cards set to RAID 1, any decent shop ought to be able to schedule a maintenance window to take cared of a failed card.

    Many many VMware products go directly from simple & elegant to complicated & unreliable. Products like Auto Deploy and NSX leverage an entirely separate management cluster, which completely defeats the purpose and intent of virtualization in the first place.

  13. Thank you for pointing out the dependencies issues for Boot from SAN and AutoDeploy. These solutions, while they can be built, should not be used unless staff is already experienced with equivalent solutions for stand-alone servers. They often result in lengthy outages for VMware customers with a single critical failure. “K.I.S.S.” is the best policy with VMware products.

    Also, if no local storage is available and all shared storage is NFS, persistent data for logs and vmkernel dumps requires Syslog and NetDump services. Fortunately, both are included in vCenter Server for free.

  14. Been using the Dell dual boot SD cards since our last server refresh 3 years ago, for all the reasons you brought up. Had no issues.

    Thanks for posting.

    Perhaps I’m bitter, but I wonder sometimes how much of the advice I see comes from people who don’t actually administer it for a living. You do. That’s one of many reasons why you rock Bob!

  15. I’ve been using SD cards in our HP blades for 6 years. I totally agree – easy, inexpensive solution that just works. It’s funny how much controversy this causes. πŸ™‚

  16. After spending almost the entire day troubleshooting the mess of host profiles reading this article really cracked me up. After all these years i can’t believe that vmware hasn’t showed them any love yet. If you’re using iSCSI it’s just impossible to get logging to datastores and setting up iSCSI in one single host profile.

Comments are closed.