Midnight is Always Tomorrow

“So, are you ready for the big power outage on Sunday?” a colleague asks on Thursday. “You mean Saturday.” “No… Sunday morning.” “Um, I was told two months ago, and countless times between, that the outage is on Saturday, midnight to 8 AM, and they were starting to shut things down at 10 PM.” “It’s Sunday, midnight to 8 AM. They’re going to start shutting things down on Saturday at 10 PM.” “Did they move the outage?” “No, I bet they were just telling you when things were going to start. On Saturday.” Midnight is 00:00, meaning the start of a new day. Always. If you’re in doubt, use 00:01. Assume everybody is clueless about time, because they are. For …

Read More

What are P-states and how do I use them in vSphere?

VMware vSphere 4 added the ability to take advantage of Intel SpeedStep and AMD PowerNow! CPU power management features. These features are commonly known as “Dynamic Voltage and Frequency Scaling” or DVFS, and let an OS cooperate with the CPU to reduce power consumption by reducing the frequency of the CPU and the voltage at which it is operating. It reduces these things in preset tiers, and these tiers are known as P-states. On Intel CPUs they are trademarked as “SpeedStep” and on AMD they are either “Cool’n’Quiet” or “PowerNow!” The Wikipedia article on Intel SpeedStep points out that “power consumed by a CPU with a capacitance of C, running at voltage V, and frequency f is approximately P = …

Read More

If You Don't Like Change…

“If you don’t like change, you’re going to like irrelevance even less.” – General Eric Shinseki, former United States Army Chief of Staff.

Power Consumption of a Dell PowerEdge R610

For planning purposes I just did some power draw testing of a Dell PowerEdge R610. Dual Intel X5550 CPUs, 24 GB of RAM, four SSD disks attached to the PERC6/i, and dual 717 Watt power supplies. My testing methodology was to measure the draw using a Fluke 322 clamp meter, both at idle and running a stress test under Red Hat Enterprise Linux 5 (stress -c 32 -d 8 -i 8 -m 16). I did this with one and two power supplies active. 1 PS, idle: 0.65 Amps @ 202.3 Volts = 131.5 Watts 1 PS, loaded: 1.51 Amps @ 202.3 Volts = 305.5 Watts 2 PS, idle: 0.35 Amps @ 202.3 Volts = 70.8 Watts each (total of 141.6 …

Read More

Dell PowerEdge R610 & PERC/6i Disk Comparison

I’ve recently done some very basic disk performance testing of a Dell PowerEdge R610 with 24 GB of RAM (1333 MHz), dual Intel X5550 CPUs, a PERC/6i RAID controller, and a bunch of 146 GB 15K RPM 2.5″ disks, as well as four of the Dell 50 GB enterprise SSD disks (which are Samsung drives). I tested various combinations of RAID 0, 1, 5, 6, 10, and 50 with 1, 2, 3, 4, and 6 disks. While the RAID controller configurations varied, all the configs had the element size set to 64 KB, read policy set to Adaptive Read Ahead, and write policy set to Write Back. The PERC/6i firmware was 6.2.0-013. The operating system was Red Hat Enterprise Linux …

Read More

Never Send Error Email in a Loop

Some of my favorite system outages are denial-of-service attacks brought on by coders who code as if nothing will ever go wrong. For instance, take the following section of pseudocode: foreach $email (@giant_list_of_customer_email_addrs) { @customer_info = get_database_info_for_customer($email); if (!defined(@customer_info)) { send_error_email_to_admins($email); } else { send_customer_email(@customer_info); undef(@customer_info); } } When get_database_info_for_customer() fails (such as when the database is down for maintenance), someone will get an email for every failure. This is merely annoying when @giant_list_of_customer_email_addrs is 50 people, but when it’s 200,000 people it’s a big problem. First, you get hundreds of copies of sendmail running (or whatever the mailer function uses — with a lazy coder like this it’ll usually be something that isn’t efficient at all). Second, your local …

Read More

Future Capacity Planning

My favorite question from manager types is: “How many more VMs can we run before we have to expand?” I can never answer this without someone sticking it to me later. I always do end up answering it, and my answer is always wrong because it’s based on averages and the very little I’m told about future projects, upcoming P2Vs, server replacements, etc. We aren’t going to get 25 more 1.28 vCPU/2.398 GB of RAM VMs, though. It’s like having 1.75 kids — it just doesn’t work that way. I could try to tell them that we have 108 GB of RAM available, but that isn’t what they want, either. They want a concrete number they can multiply by our …

Read More

WebEx & Aero

WebEx and Microsoft Windows 7 don’t seem to get along 100% quite yet. If you are using WebEx on Windows 7 it’ll disable Aero during your session. However, if your session is over and you don’t get Aero back here’s how to fix it without rebooting: 1. Make sure you’ve closed/exited all WebEx components. 2. Right click on Windows Menu->All Programs -> Accessories -> Command Prompt and choose “Run as administrator.” You will need to accept a User Access Control warning about this. 3. Issue the commands: net stop uxsms net start uxsms That should fix it. Alternately (and potentially easier): you could restart the “Desktop Window Manager Session Manager” service via the Services administrative tool.

Playing Mastermind With My RAM

I have a Dell PowerEdge R610 in one of my VMware vSphere clusters that has been reporting memory errors. In fact, the machine wouldn’t boot, and the front panel suggested I reseat all the RAM. Okay… 0. Reseat all the RAM. Didn’t work, as expected. 1. Pull all twelve DIMMs out, put four back in. That worked, machine comes up. 2. Put four more DIMMs back. That worked, machine comes up. 3. Put last four DIMMs in. Machine doesn’t boot, same original error. 4. Pull last set of DIMMs out. Boot machine. Notice that BIOS is really old. Upgrade BIOS, thinking this is some stupid BIOS bug. Machine continues to boot. 5. Put last four DIMMs back in. New BIOS …

Read More

Heisenberg & Monitoring

From Wikipedia: In quantum mechanics, the Heisenberg uncertainty principle states that certain pairs of physical properties, like position and momentum, cannot both be known to arbitrary precision. That is, the more precisely one property is known, the less precisely the other can be known… The measurement of position necessarily disturbs a particle’s momentum, and vice versa. Stated a little more simply, the sheer act of measuring a particle disturbs it, such that you can only get approximate measurements. This is also true of computing systems and monitoring. The act of watching a system consumes resources on that system, which in turn skews the numbers you get from the monitoring system. The more data you collect, the more intensive the data …

Read More