RSS Feed for System AdministrationCategory: System Administration

Future Capacity Planning »

My favorite question from manager types is:

“How many more VMs can we run before we have to expand?”

I can never answer this without someone sticking it to me later. I always do end up answering it, and my answer is always wrong because it’s based on averages and the very little I’m told about future projects, upcoming P2Vs, server replacements, etc. We aren’t going to get 25 more 1.28 vCPU/2.398 GB of RAM VMs, though. It’s like having 1.75 kids — it just doesn’t work that way. I could try to tell them that we have 108 GB of RAM available, but that isn’t what they want, either. They want a concrete number they can multiply by our chargeback rates and put in the budget.

It’s hard to explain the problem with all of this, though, and I’ve been searching for a good analogy to make people realize why I’m so cagy about an answer. My awesome financial analyst, Michelle Fritze, just came up with it:

“How many boxes fit in your office?”

I can’t wait to ask my CIO that.

Popularity: 1% [?]

WebEx & Aero »

WebEx and Microsoft Windows 7 don’t seem to get along 100% quite yet. If you are using WebEx on Windows 7 it’ll disable Aero during your session. However, if your session is over and you don’t get Aero back here’s how to fix it without rebooting:

1. Make sure you’ve closed/exited all WebEx components.

2. Right click on Windows Menu->All Programs -> Accessories -> Command Prompt and choose “Run as administrator.” You will need to accept a User Access Control warning about this.

3. Issue the commands:

net stop uxsms
net start uxsms

That should fix it.

Alternately (and potentially easier): you could restart the “Desktop Window Manager Session Manager” service via the Services administrative tool.

Popularity: 1% [?]

Linux Virtual Machine Tuning Guide »

It’s been a while in the making, but I finally started consolidating all my Linux VM tuning notes into a single document for all to read: Linux Virtual Machine Tuning Guide.

Please take a look at it, and if there are corrections or additions to be made let me know. I know there is a lot to be done with network stack tuning, which will be added to a future revision when I get my notes sorted out.

Popularity: 2% [?]

Playing Mastermind With My RAM »

I have a Dell PowerEdge R610 in one of my VMware vSphere clusters that has been reporting memory errors. In fact, the machine wouldn’t boot, and the front panel suggested I reseat all the RAM. Okay…

0. Reseat all the RAM. Didn’t work, as expected.

1. Pull all twelve DIMMs out, put four back in. That worked, machine comes up.

2. Put four more DIMMs back. That worked, machine comes up.

3. Put last four DIMMs in. Machine doesn’t boot, same original error.

4. Pull last set of DIMMs out. Boot machine. Notice that BIOS is really old. Upgrade BIOS, thinking this is some stupid BIOS bug. Machine continues to boot.

5. Put last four DIMMs back in. New BIOS actually tells me what DIMMs are bad. Nice, except it says that A1 and A4 are bad. Two DIMMs? Yeah, not likely.

6. Order single replacement DIMM from Dell, decide to play Mastermind with RAM.

7. Replace DIMM A1. Machine switches to saying DIMMs B3 and B5 are bad. Really? DIMM banks B are on the other CPU.

8. Stifle disbelief, take loose DIMM from A1 and replace B3.

9. Machine switches to saying DIMM B5 is bad.

10. Take loose DIMM from B3 and replace B5. Machine likes that, has all of its RAM again, and I probably have the offending DIMM out now. Probably.

Lessons here: A) physical hardware sucks. B) linear troubleshooting rules. C) keep your firmware up to date.

Popularity: 1% [?]

Heisenberg & Monitoring »

From Wikipedia:

In quantum mechanics, the Heisenberg uncertainty principle states that certain pairs of physical properties, like position and momentum, cannot both be known to arbitrary precision. That is, the more precisely one property is known, the less precisely the other can be known… The measurement of position necessarily disturbs a particle’s momentum, and vice versa.

Stated a little more simply, the sheer act of measuring a particle disturbs it, such that you can only get approximate measurements.

This is also true of computing systems and monitoring. The act of watching a system consumes resources on that system, which in turn skews the numbers you get from the monitoring system. The more data you collect, the more intensive the data collection is, the more resources it consumes. The effect is quite observable on virtual machines. I’ve got some virtual machines where customers are running their own performance monitoring tools, and those tools make what would otherwise be an idle VM into something consuming quite a bit of CPU. Multiply that by the number of VMs involved, and even a 100 MHz CPU increase makes a huge difference, en masse. Especially if those tools all choose to report on the same schedule (every minute, every 5 minutes, etc. from the top of the hour). Your performance monitoring tool might actually be causing performance problems.

Running performance monitoring tools directly on virtual machines might be a bad idea anyhow. Not only do you waste resources by doing so, you also may get incomplete results because the VM itself doesn’t know the whole story. This is especially true if resource limits are in effect in your virtualization environment[0]. What the VM thinks is 100% of a vCPU might only be 25% of an actual CPU because of resource contention. Out-of-band tools like esxtop, or vCenter’s performance charts, can tell a more factual story[1]. Besides, if you really need the guest OS point of view you can always log in and use Resource Monitor or top/iostat/vmstat to find out what the virtual machine thinks. Just make sure you’re not doing all that extra work to collect the wrong data. :-)

—————————-

[0] The VMware Descheduled Time Accounting service, which comes with the VMware Tools, can help Windows VMs by correctly accounting for time spent waiting for ESX to run the VM again. Newer & upcoming Linux distributions also can account for that with their tickless clock kernel features. But it’s usually more efficient to gather the data from the hypervisor itself.

[1] Remember, though, that esxtop and vCenter’s performance charts use resources somewhere, too (usually the ESX console OS, and/or vCenter Server & SQL Server).

Popularity: 1% [?]

GoDaddy, SSL, and $13 »

A GoDaddy representative left a comment on the post about ipsCA, saying:

GoDaddy.com is happy to help ipsCA customers that have found themselves in a jam. For a limited time, our Standard SSLs are $12.99 with code sslqyh1w. Call 480-505-8877 or order online at http://bit.ly/91M3NV

I’m not usually the kind of person to parrot an ad, especially one left on my site, but it’s actually a decent deal if you want a new, real SSL cert. Admittedly it’s not for their advanced certificates, but if you have a couple of ipsCA certs to replace it might work out just fine. Personally, I’ve been quite happy with GoDaddy as a domain registrar.

Popularity: 2% [?]

ipsCA: Getting What You Pay For »

So the SSL certification authority (CA) ipsCA is frantically sending out email because their root CA certificate will expire on 12/29/2009, and every customer of theirs needs to get a new certificate. This is a problem for my organization, because, being an educational institution we were able to get no-cost[0] SSL certs from them. Because they were no-cost we have a lot of these certificates for test & development systems, and are now scrambling to find what will break on December 29th.

Once we find all the certificates there’s another complicating factor. We could just renew the certificates again, but the new ipsCA root certificate is not shipping as part of any browsers except Internet Explorer 8 (the next Firefox will have it when it ships in February).  Since we know nobody ever patches anything[1] nearly every browser in circulation will continue to have errors. I can only conclude that ipsCA is being run by people who don’t understand their business.[2]

There are a few lessons here:

  • Once again, free doesn’t mean it’s a good value. I’d much rather pay for a product I know will work well than have to babysit something that I paid nothing for. Though I’d be seriously upset if I were actually a paying customer of theirs.
  • It would be real nice to have a central spreadsheet or tracking mechanism for SSL certificates and their expiration dates.
  • It would also be nice to have all those SSL certificates co-terminate, so we can renew them all at once. Of course, we have an opportunity to do that now.
  • For most test & development purposes an internal CA would work just fine, since it’s simple enough for staff to import a CA into their browsers. In fact, some of my coworkers have already set it up.

Let’s just hope these points don’t get lost in the chaos.

———————————————————-

[0] I say “no-cost” because it’s now obvious to a lot of people that they aren’t free.

[1] Except toolbars, things that install toolbars, and spyware.

[2] That’s probably the most polite I’ve been when describing this situation.

Popularity: 3% [?]

It Belongs To Everybody »

You think that server in our data center is yours?

The CIO paid for it.

The logistics & purchasing team ordered it.

The data center team installed it.

The system administration team configured it and patch it.

You installed the application on it.

The monitoring guys watch it.

The security team scans it.

I think it’s safe to say it belongs to the whole organization, not you.

Popularity: 1% [?]