My favorite question from manager types is:
“How many more VMs can we run before we have to expand?”
I can never answer this without someone sticking it to me later. I always do end up answering it, and my answer is always wrong because it’s based on averages and the very little I’m told about future projects, upcoming P2Vs, server replacements, etc. We aren’t going to get 25 more 1.28 vCPU/2.398 GB of RAM VMs, though. It’s like having 1.75 kids — it just doesn’t work that way. I could try to tell them that we have 108 GB of RAM available, but that isn’t what they want, either. They want a concrete number they can multiply by our chargeback rates and put in the budget.
It’s hard to explain the problem with all of this, though, and I’ve been searching for a good analogy to make people realize why I’m so cagy about an answer. My awesome financial analyst, Michelle Fritze, just came up with it:
“How many boxes fit in your office?”
I can’t wait to ask my CIO that.
You need to discuss “a standard virtual-machine-month” with your financial folks, and resource controls with your CIO.
A Unix/Linux example of managing the resources your
users can have is at
http://broadcast.oreilly.com/2009/06/manage-your-performance-with-cgroups-and-projects.html
If you charge for a certain guaranteed minimum
amount of resources per month, resource controls
will ensure that you provide them, and a little
static capacity planning will keep you from badly
oversubscribing your machines (a little
over-subscription is fine, suicidal levels are, well, suicidal).
Then you can charge and provide customers more
if they need more resources, and manage the
results.
But first you need to talk to both your CFO and CIO.
–dave
Your suggestion doesn’t address my problem. It isn’t a question of more arbitrary units. And it isn’t that we don’t know what resource controls are (we do, we aren’t idiots). The problem is getting people to understand what the relationship is between free capacity, server sizes, and projects in the organization.
The biggest issue is that our chargeback model isn’t linear with machine size. There are static per-instance charges which throws things off.
Talking with (not ‘to’) my CxOs doesn’t help unless I find a way to convey the problem so they understand it. A nice analogy does that, another arbitrary unit does not.