I’ve had four interactions now regarding my post on replacing a failed SD card in one of my servers. They’ve ranged from inquisitive:
@plankers why would you use an SD card in a server. I’m not a sys admin, but just curious.
— Allan Çelik (@Allan_Celik) January 22, 2015
to downright rude:
“SD cards are NOT reliable and you are putting youre [sic^2] infrastructure at risk. Id [sic] think a person like you would know to use autodeploy.”
Aside from that fellow’s malfunctioning apostrophe, he has a good, if blunt, point. SD cards aren’t all that reliable, and there are other technologies to get a hypervisor like ESXi on a host. So why use SD cards?
1. Cost. Looking at dell.com, if I outfit a Dell PowerEdge R630 with a traditional setup of two magnetic disks and a decent RAID controller my costs are:
300 GB 10K SAS 2.5″ disk: $212.75
300 GB 10K SAS 2.5″ disk: $212.75
PERC H730: $213.47
Keep My Hard Drive, 5 years: $213.46
Power for this setup, at 5.9 Watts per drive (as per Tom’s Hardware), guestimating 5.9 Watts for the RAID controller, and $0.14133 per kWh in my locale: $109.60 for 5 years.
Labor costs dealing with drive replacements, monitoring, etc.: $200.00 (this is low).
This comes to $1162.03 per server. On a 32 node cluster that’s $37,184.96, or the cost of three servers, over five years.
In contrast, the Dell Internal Dual SD Module is $104.60 per server with two 16 GB SD cards. That’s $3347.20 for a 32 node cluster.
To head off the inevitable comment: the PERC H310/H330 is not a decent RAID controller. To start, it isn’t even certified for VMware VSAN. Anybody that argues that the H330 is fine ought to be okay with the mirroring the Internal Dual SD Module does, because the two are about equal in that regard.
2. Use drive bays more productively. Say that I do want to put local disk in my servers, be it some SSD so I can do caching (a la SanDisk Flashsoft, PernixData, vFRC, etc.) or maybe do VSAN, I’d have to use two of my limited drive bays for boot volumes. That isn’t the most productive use of my expensive drive bays (and data center space).
3. Avoid dependency loops. Auto Deploy is an interesting VMware feature but it relies on a functioning network, DHCP & PXE & TFTP, DNS, and vCenter infrastructure to work. And that’s a problem when you’re in the middle of an outage (planned or unplanned) and any of that infrastructure is a VM.
If your vCenter is a VM how do you start everything up after an outage? Oh, you run a management cluster that doesn’t Auto Deploy… well that’s a pain in the ass, because you now have a vSphere cluster that’s different. Different means harder to manage, which means human error and additional operational cost. What’s the ongoing cost of that, vs $104.60 per server?
If your DHCP/PXE/TFTP server is a VM how do you start everything up after an outage? Oh, you use Auto Deploy with local caching? Okay, where does it cache? Traditional magnetic media… ah, you’ve chosen expensive AND complicated!
My network guys use a bunch of VMs to provide network functionality like DHCP, some DNS (half our anycast DNS infrastructure is on VMs, half on physical hosts), carrier-grade NAT, log collection, etc. It’s great – vSphere dramatically reduces risk and improves uptime for their services. We always have to make sure that we keep track of dependencies, though, which is easy to do when there are so few.
4. Avoid unreliable vSphere features. While we’re on the topic of Auto Deploy I’d just like to say that it isn’t production-ready. First, it’s not at all configurable from the vCenter GUIs. It’s all done from PowerShell, which is hard for many IT shops. People just aren’t as good at scripting and CLIs as they should be. Second, it relies on the perpetually crappy Host Profiles. I don’t think I’ve ever seen a cluster that isn’t complaining about host profile compliance. And when you look at what’s out of compliance you see it’s some parameter that gets automatically changed by vCenter. Or the local RAID controller pathing, or a CD-ROM drive, or something that should just be automatically handled for you. And Lord help you if you want to use mutual CHAP with host profiles.
“I seem to have forgotten all the different iSCSI passwords you entered to be compliant with the Hardening Guide, Bob” – Host Profiles, on every edit.
Auto Deploy also scares me a little when it comes to my pre-existing datastores. I don’t build ESXi hosts with fibre channel storage connected, lest something go wrong and do something bad to a few hundred TB of storage. Yet every time an ESXi host boots from Auto Deploy it’ll look to do some partitioning of local storage. It isn’t supposed to interact with “remote” storage, but I don’t trust VMware’s QA very much, especially with underused features like this. Not worth the risk.
Auto Deploy & host profiles are interesting but until VMware puts more effort into both I cannot endorse putting either in your environment’s critical support path, if only because the alternatives are so cheap & reliable.
5. Boot From SAN is complicated, too. The three major alternatives to SD cards are traditional magnetic media, Auto Deploy, and boot from SAN. Boot from SAN is another one of those ideas that seems great on paper but doesn’t really pan out well in real life. First, look at your disk vendor’s upgrade notes. Pay attention to all the caveats if you’re booting from SAN, versus if you’re booting locally. A number of vendors don’t even support array software updates when you’re booting from the SAN. It all has to come down, and that’s lame.
Second, you’ve got dependency issues again. If you’re booting locally you need power and cooling and you can figure everything else out later. If you’re booting off the SAN you need working power, cooling, networking/SAN, etc. to start. You’re also exposed to a lot more human error, too. Someone screws up a SAN zoning change and your whole infrastructure is offline, versus just some VMs. Good luck with the finger pointing that ensues.
Last, the pre-boot environment on servers is hard to manage, inflexible, and isn’t real helpful for troubleshooting. To make changes you need to be on the console, as very little of it is manageable through typical system management tools. Configuring this sort of boot often uses the horrible BIOS interfaces on the CNAs or NICs you have installed, or archaic DOS utilities you have to figure out how to cram on an ISO or bootable USB drive. That isn’t worth anybody’s time.
6. It just freakin’ works. When it comes right down to it none of these approaches to booting an ESXi host have any ROI. None. Zip. Zero. So every minute you spend messing around with them is a minute of your life you wasted.
The solution is super cheap, from both CapEx and OpEx perspectives. It doesn’t take long to get to $104.60 of labor, especially when Dell will also pre-load ESXi on your new SD cards, thereby saving you even more time.
Once booted, ESXi only writes configuration data back to the card(s) every 10 minutes, so despite the limited write cycles of flash a decent SD card will last the lifetime of the server. And if it doesn’t, the mirroring is reliable enough to let you limp along. Replacement & remirroring is easy, just say Yes at the prompt.
Last, they save you from a ton of extra complexity, complexity with no ROI. I don’t know about you but I’d rather be working on real problems than spinning my wheels managing an overcomplicated environment.
SD cards — Just Enough Storage.