Duncan Epping and I were kicking around the whole scale up vs. scale out argument two nights ago on Twitter, which culminated in Duncan’s excellent post on the topic. Aaron Delp also posted some numbers (and a unicorn) where he also adds the consideration for Microsoft licensing. As a Linux guy I hadn’t thought about that style of Microsoft license, and I like that a lot.
While Mr. Epping was crunching numbers, so was I. I am firmly of the belief that scaling up is a better idea, because physical infrastructure and its management is not free. It isn’t cheap, either. You need to consider a lot of different things, including storage connectivity, network connectivity, KVM, power, and cooling. You can also throw in guest & application licensing, too, if it can be done per socket. There are also a bunch of costs that are hard to calculate, like staff time, and I omit them like many others. You can assume, though, that fewer hosts take less time to maintain, and less time doing maintenance means more time moving forward.
So I built a spreadsheet, of course, to flesh out some of what Duncan identified as other considerations. You can download it and tweak it yourself if you’d like. Here are the basic assumptions, building on his numbers (click for a larger version):
Those are my starting numbers, some of which are estimates, but I tried to be realistic. I assembled all these so that I could figure out what the five year cost might be of using:
- a smaller Dell PowerEdge R710 with two quad-core CPUs and 96 GB of RAM,
- a Dell PowerEdge R810 with two CPUs and 256 GB of RAM, and
- a Dell PowerEdge R810 with four 10 core CPUs and 512 GB of RAM.
All would have an H200, two 146 GB 10K disks as RAID 1, dual power supplies, a 5 year Next Business Day warranty (if you have spare capacity you can do this and save some money), and an Intel X520-DA 10 Gbps card in addition to the onboard NICs.
Would there be a big difference?
Note that in this spreadsheet I calculate both the number of hosts we’d need based on vCPUs as well as the number of hosts based on vRAM, and then use the maximum number out of that. Plus one, of course, because we need some extra capacity for HA & maintenance. Similarly, I compute the licenses we’d need based on sockets, as well as based on vRAM, and take the larger of the two to compute license and support costs. I apologize for not making it more clear, but I think the crowd is also tiring of the onslaught of numbers lately. If you grab the spreadsheet you’ll see what I’ve done.
There is definitely a savings, but it’s not nearly as profound as I’d expected, just $37,588 between the small server configuration and the medium-sized one. That’s definitely because of the numbers Duncan chose for averages, though. Altering the average CPU or RAM needs of the VMs makes the difference larger, and if we increase the consolidation ratio, fitting 8 vCPUs per core, you get a $66,770 savings. Lastly, if we amplify the savings by adding more VMs, like 600, you get $97,331. That’s respectable.
I encourage you to play with the spreadsheet yourself. As for this exercise, I can conclude and observe a few things:
- Scale up is worth exploring, for respectable financial gains/savings as well as the hard-to-quantify time savings, as long as you’re comfortable with how many VMs you are putting on one host.
- Over half the price of these setups is license, which levels things out quite a bit.
- Not surprisingly, power usage is pretty level. Big servers use more power, smaller use less, and it evens out with quantity. The same is true between network and storage ports and hardware costs. Big servers cost more but use less infrastructure, potentially.
- Right-sizing VMs is definitely key, as wasted CPU and RAM eat directly into your costs. Which has always been true, but is just more profound now.
- A comment from Duncan, as we proofread each other’s posts: “[with] the N+1 you more or less already have the 85% that [Gabrie] refers to. So depending on the approach you want to take, 85% of N vs 85% of N+1, you could potentially save some money.” Very true. Changing the spreadsheet so there’s no +1 shows anywhere from a $28K to a $56K savings overall.
- There is a definite sweet spot for machine size, cost versus size drawbacks. In this example it’s the middle tier but over time, and depending on your own workloads, it’ll be different.
Anyhow, I hope this has been useful to somebody. Let me know if you see any errors, or have any good thoughts on quantifying staff time saved as part of this process. I would love to be able to represent that.
A big thank you to the man himself, Duncan Epping, for proofreading, offering perspective, and all the conversation in between. I may be the Lone Sysadmin, but it’s definitely nice to be part of a team.
P.S. If you’re interested in where I got the energy numbers, I used the Dell Energy Smart Solution Advisor.