Archive for July, 2007

links for 2007-07-30 »

links for 2007-07-28 »

/bin/sh on iPhone »

The Unofficial Apple Weblog posts a story about the iPhone running /bin/sh when it crashes. Of course, there isn’t a keyboard so you end up doing a restore.

Since the iPhone didn’t ship with /bin/sh anyhow, couldn’t you put a script in its place to reboot your phone by calling init or shutdown? Or put something in your .bashrc to sleep for five minutes and then reboot?

Just a thought.

Happy System Administrator Day! »

To all those system administrators that have come before me, who have shared their wisdom with me personally or through books, articles, blogs, forum and list postings, I say thank you. I stand on the shoulders of giants every day I work in this field.

To all those system administrators working to advance this profession, in LOPSA and other organizations, I say thank you. It is because of you that we even have this day.

To all those system administrators out there, who toil every day in relative anonymity ensuring the services we rely on stay operational, I say thank you.  It is you that makes things work, keeping the users, developers, and managers happy day after day.

Happy System Administrator Appreciation Day!

links for 2007-07-27 »

Why VMmark Sucks »

Sure, sure, having a standard benchmark to measure virtual machine performance is useful. Customers will swoon over hardware vendors’ published results. Virtualization companies will complain that the benchmark is unfair. Then they’ll all get silent, start rigging the tests, scrape and cheat and skew the numbers so that their machines look the greatest, their hypervisor is the fastest. Along the way it’ll stop being about sheer performance and become performance per dollar. Then CapEx vs. OpEx. Watt per tile. Heat per VM. Who knows, except everybody will be the best at something, according to their own marketing department.

Welcome to benchmarking.

It doesn’t make a damn bit of difference to me, though. I’ll never run VMmark. I’ll never pony up $1200 for a copy of SPECweb2005 or $500 for SPECjbb2005. I’ll never buy four copies of Windows Server 2003 per tile, $999 each, or buy a client PC for every tile. This list alone is thousands of dollars, which I will never pay. Never mind the time it would take to set this whole mess up. God help the vendor who wants to prove they have the fastest machine for virtualization.

It’d be nice if VMware, when they said that “organizations can compare performance and scalability of different virtualization platforms, make appropriate hardware choices and monitor virtual machine performance on an ongoing basis” meant an organization smaller than Fortune 10. It’d be nice to have a benchmark an average guy like me could run. Something I could use to test my own performance tuning, running it before and after each tweak to see if my changes make a difference. Something I could use as a diagnostic, or as a test procedure for a change. Did that last patch mess anything up? Why is my environment really slow? Why am I not getting the advertised performance from these new machines? Something like VMmark Lite, free & open source, might be pretty darn cool.

Instead, I’ll be in meetings explaining to folks why we are maxed out at 30 VMs per server when the vendor says they’ll run 50. Or why we chose VMware over Xen, when Xen claims 100 on the same hardware. I’ll have to remember the line from the FAQ that says “that VMmark is neither a capacity planning tool nor a sizing tool.”

Which begs the question: if it isn’t for use in sizing or capacity planning, exactly what is it good for?

links for 2007-07-26 »

AMD & Linux Data Corruption »

Mad props to Don MacAskill for getting the word out that AMD-based machines with more than 4 GB of RAM running Linux may be subject to a silent data corruption problem, mainly on machines with NVidia chipsets. Fixed in 2.6.21, but not yet in a shipping Red Hat kernel. The workaround if you find yourself in this position is to tell the kernel to ignore the hardware MMU with the kernel option “iommu=soft”, or build yourself a kernel that doesn’t have the problem.

This points to a bigger problem with things like Red Hat’s Kernel Application Binary Interface compatibility guarantee: agility. kABI compatibility sounds great to developers, but it significantly increases the response time to problems like this. With Red Hat Enterprise Linux they state that they won’t make changes unless they address a demonstrated issue encountered by customers, preserve compatibility with ABI/API interfaces, and are minor feature enhancements. That second point is the killer. They can’t just go wildly patching the kernel, because every patch needs to be examined and carefully merged to guarantee no ABI/API changes. This gets quite tricky when the patches need to be backported, such as when the original LKML patch is against 2.6.21. Red Hat Enterprise Linux 5 is at 2.6.18, and even worse, RHEL 4 is at 2.6.9. Every new kernel release makes the differences more profound and harder to cope with.

So a customer needs to report a bug. They’ll find a patch and backport it into their kernels, then send it to QA. This whole process can take months of work by people with intricate knowledge of the kernel. Tough job, for sure, especially when you have customers suffering and probably complaining, while developers and vendors will complain if you change anything. Rock, meet hard place.

At least this bug has a workaround. :-)