VMware Scale Up vs. Scale Out: The Big Picture

Duncan Epping and I were kicking around the whole scale up vs. scale out argument two nights ago on Twitter, which culminated in Duncan’s excellent post on the topic. Aaron Delp also posted some numbers (and a unicorn) where he also adds the consideration for Microsoft licensing. As a Linux guy I hadn’t thought about that style of Microsoft license, and I like that a lot.

While Mr. Epping was crunching numbers, so was I. I am firmly of the belief that scaling up is a better idea, because physical infrastructure and its management is not free. It isn’t cheap, either. You need to consider a lot of different things, including storage connectivity, network connectivity, KVM, power, and cooling. You can also throw in guest & application licensing, too, if it can be done per socket. There are also a bunch of costs that are hard to calculate, like staff time, and I omit them like many others. You can assume, though, that fewer hosts take less time to maintain, and less time doing maintenance means more time moving forward.

So I built a spreadsheet, of course, to flesh out some of what Duncan identified as other considerations. You can download it and tweak it yourself if you’d like. Here are the basic assumptions, building on his numbers (click for a larger version):

Those are my starting numbers, some of which are estimates, but I tried to be realistic. I assembled all these so that I could figure out what the five year cost might be of using:

  • a smaller Dell PowerEdge R710 with two quad-core CPUs and 96 GB of RAM,
  • a Dell PowerEdge R810 with two CPUs and 256 GB of RAM, and
  • a Dell PowerEdge R810 with four 10 core CPUs and 512 GB of RAM.

All would have an H200, two 146 GB 10K disks as RAID 1, dual power supplies, a 5 year Next Business Day warranty (if you have spare capacity you can do this and save some money), and an Intel X520-DA 10 Gbps card in addition to the onboard NICs.

Would there be a big difference?

Note that in this spreadsheet I calculate both the number of hosts we’d need based on vCPUs as well as the number of hosts based on vRAM, and then use the maximum number out of that. Plus one, of course, because we need some extra capacity for HA & maintenance. Similarly, I compute the licenses we’d need based on sockets, as well as based on vRAM, and take the larger of the two to compute license and support costs. I apologize for not making it more clear, but I think the crowd is also tiring of the onslaught of numbers lately. If you grab the spreadsheet you’ll see what I’ve done.

There is definitely a savings, but it’s not nearly as profound as I’d expected, just $37,588 between the small server configuration and the medium-sized one. That’s definitely because of the numbers Duncan chose for averages, though. Altering the average CPU or RAM needs of the VMs makes the difference larger, and if we increase the consolidation ratio, fitting 8 vCPUs per core, you get a $66,770 savings. Lastly, if we amplify the savings by adding more VMs, like 600, you get $97,331. That’s respectable.

I encourage you to play with the spreadsheet yourself. As for this exercise, I can conclude and observe a few things:

  • Scale up is worth exploring, for respectable financial gains/savings as well as the hard-to-quantify time savings, as long as you’re comfortable with how many VMs you are putting on one host.
  • Over half the price of these setups is license, which levels things out quite a bit.
  • Not surprisingly, power usage is pretty level. Big servers use more power, smaller use less, and it evens out with quantity. The same is true between network and storage ports and hardware costs. Big servers cost more but use less infrastructure, potentially.
  • Right-sizing VMs is definitely key, as wasted CPU and RAM eat directly into your costs. Which has always been true, but is just more profound now.
  • A comment from Duncan, as we proofread each other’s posts: “[with] the N+1 you more or less already have the 85% that [Gabrie] refers to. So depending on the approach you want to take, 85% of N vs 85% of N+1, you could potentially save some money.” Very true. Changing the spreadsheet so there’s no +1 shows anywhere from a $28K to a $56K savings overall.
  • There is a definite sweet spot for machine size, cost versus size drawbacks. In this example it’s the middle tier but over time, and depending on your own workloads, it’ll be different.

Anyhow, I hope this has been useful to somebody. Let me know if you see any errors, or have any good thoughts on quantifying staff time saved as part of this process. I would love to be able to represent that.

A big thank you to the man himself, Duncan Epping, for proofreading, offering perspective, and all the conversation in between. I may be the Lone Sysadmin, but it’s definitely nice to be part of a team.

P.S. If you’re interested in where I got the energy numbers, I used the Dell Energy Smart Solution Advisor.

Comments on this entry are closed.

  • Great post, very valuable Bob! Thanks once again for validating my logic and contributing, excellent work!

  • Thanks for a good post. It hopefully helps to highlight the reason that some folk (myself included) are disappointed about the VMware licensing changes; they already ran these numbers, made the business case based on the numbers and now have to find more to pay for the 33% lift in licensing costs, and for those of us who weren’t running EnterprisePlus before then the uplift is even more unpleasant.

  • I know you are aware of this, and this wasn’t the intention behind your and Duncans really nice posts in the first place, but I’m still going to beat the almost-dead horse of discontent about the new vRAM license model. If you will excuse me.

    So I changed your chart to include the cost of these environments under the vSphere4 license model. It shows that for scenario 2 and 3, the vSphere5 licensing and SNS cost alone almost doubled, and the overall cost increased by 20%:
    http://img853.imageshack.us/img853/6523/licensesl.png

    So the bottom line is scale-up with vSphere5 is still cheaper than scale-out with vSphere5, but it is significantly more expensive than scale-up with vSphere4.

    This is why people like me can’t stop barging into debates and posts around costs, licenses and such like these, even when the original author didn’t intend to compare the old and new model anyways.
    (Sorry again for this “slightly unrelated” comment).

    • No worries — while many people are tired of the licensing discussion (mostly VMware employees tired of being railed on for it) it does represent a major pricing change, and an unwelcome one to most. Myself included. I had a discussion about it with my sales guys just the other day. I appreciate you changing the chart to compute that, last night as I was pulling this all together I was wondering what the difference would look like.

  • I’m glad it’s been shown that with all things considered, scale up is still a viable and in these cases a better solution for the money. My only thought is on the same line as MK where there doesn’t seem to be enough visibility into the fact that in most cases where people have more than 96 GB of memory in a server they’ll be paying more. I understand vRAM and my response is why buy a 128GB ram server and not use it all? What if we already ARE using it all? I’m glad MK showed the difference in pricing between the generations of product. It’s difficult to sell management on the new features of vSphere 5 with price hikes like that.

    • I Think the actual quantity of physical memory on a server is not so tied with the VMware licensing cost.
      Having 128Go of pRam doesn’t mean you have to buy vRam for it upfront, nor its wasted. The “extra unlicensed” pRam is welcome when a host fail, and as vRam is pooled by vCenter instance …

      Its wise to keep some resources for availability purpose, and there should always be extra resources, just in case …
      The vRam limit force us to let this security fence untouched, and keep the “vmsprawl beast” unfeeded! :)

  • Trying to download your spreadsheet but it doesn’t seem to open Excel. Any BKM’s you can share to open the same?

  • Quad core processors (such as X5687) used in dual socket servers run much faster as compared to the ten core ones (such as E7-4870) used in four socket servers. The former runs at 3.6 GHz v/s 2.4 GHz for the latter one. I suppose applications that benefit from faster processing speed would be virtualized better on dual socket servers ?

  • Great analysis! One thing that might be worth considering is to look at this model in a Colocation environment vs. the assumption that it’s in your own datacenter. For example, costs for rackspace (by the U), locking cabinets (minimum size) would be unique in that environment, and power and HVAC would have different (and likely higher) costs.

    Based on some rough numbers from somebody I know that’s in the business, the results would amplify the differences between your three scenarios. Different numbers of switches and other interconnect/infrastructure components could also be significant if they took up space.

    Again, thanks for the great work.
    Taylor