Archive for September, 2007

iPhone Bricks, MacWorld, and Warnings »

So Jonathan Seff over at iPhone Central bricked his unlocked iPhone by updating it. As Jason Snell said in the comments, “We’re Macworld. This is what we do. We’ll buy Jon a new phone if need be — he bricked his phone in the interest of finding out what would happen.” Awesome. People should be thanking MacWorld. A lot.

My thoughts:

1) iPhone owners agreed to the rules before getting into this. The rules included using AT&T, among others.

2) Apple has been silent about all the hacking going on. But, when it came to the update they decided to speak out about it. Why? Does this sound like the behavior of a company that wants to crap on its customers? To me it sounds like they were warning people before this sort of thing happened. They bothered to look at what the unlockers were doing, noted them doing something in an area that couldn’t be restored (for whatever reasons, legal or technical), and issued a warning to not update. People should have listened.

3) Folks should chill out and wait for the Unlock Dev Team to figure out what people can do to avoid this. They figured out how to unlock it, which is no easy feat. This is just another roadblock, and they’ll figure it out.

So there. I still love my iPhone, despite having to be with AT&T, despite EDGE being a little slow by broadband standards, and despite not having an open API for developers. I like Apple for having warned people about all this. I just wish that people would stop blaming Apple for the consequences of their own actions.

Reliability Isn’t As Straightforward As It Seems »

The concept of reliability isn’t nearly as straightforward as it seems. It also depends heavily on what you are protecting yourself against.

A good example of this is hard disks. You can protect yourself against a single drive failure by adding another disk and mirroring them. However, in doing this you add a controller that is now a point of failure. You also add another disk that may fail in a way that causes disruptions to both disks (freak the controller out, freak the SCSI bus out, etc.).

Is it worth the additional risk? Sure, as long as the controller is way more reliable than the drives.

On the other hand, sometimes the best way to make a service reliable is to keep it as simple as possible.

An example of this is a service that wanted off-site replication of their data. They used storage array-based replication software to mirror the data to another array. They bought expensive equipment to extend the SAN to the remote location. They were a medium-sized environment and didn’t have people or the money to dedicate to being experts in these technologies. As a result the environment went from being a simple collection of servers to being a somewhat not-understood complex collection of servers and storage and networking equipment. As a result they ended up having a lot of downtime, which looks bad when you bought all this gear to increase the availability of a service.

People say that availability depends heavily on how much money you invest. In general, I disagree. Without a clear idea of what you are protecting against and without good training and system design to support the implementation, adding components to a simple system design usually serves to make it less reliable.

planetsysadmin.com »

Are you a sysadmin? You might want to go check out some of the other bloggers out there. A great place to start is over at planetsysadmin.com. That site aggregates a number of blogs, mainly sysadmin stuff, and their blogroll is a bunch of great folks with a lot of great content. I was picked up by them quite a while ago, but I have no idea if I’m still getting rebroadcast there (I understand, in the past I’ve had a lot of random content). Regardless, good stuff, and worth a read. Not saying I agree all the time, but a conversation is boring if you’re always in agreement. :-)

This post was originally going to be a “hey, read all these guys” sort of entry, but their blogroll has a number of the same things I’d point out, so why duplicate?

(Warren’s post on GNU su & the ‘wheel’ group is great. Security? No! All information should be free. Ugh, indeed.)

RAID 5 Is A Cruel Mistress »

I’ve long been a fan of RAID 5. Since you only lose one disk worth of space to parity it has been the best way to maximize local disk space. Sure, the performance isn’t the greatest, but I haven’t had applications that taxed the local drives, and the disk space and generically decent performance has been a good trade off.

In the last six months, though, I’ve had three machines die from a double drive fault. This is the Achilles Heel of RAID 5. A single drive failure is as much as can be tolerated. In two of those cases the array had a hot spare drive, and the second drive faulted during the process of rebuilding on that drive.

This makes me wonder why I’ve gone ten years without any problems, just to be blindsided now.

One answer comes to mind: increased capacities of disks, leading to long array rebuild times.

Think of what happens when a drive fails in a RAID 5 array, especially on an older array that isn’t very busy. If there is a hot spare the controller starts rebuilding the array, which causes a lot of I/O. If there is a second drive that is questionable in the array this might push it over the edge. Before you are done rebuilding you get a second drive error. Game over.

So what do I do about this? Change RAID levels? RAID 0 is out. RAID 1 can’t handle a double drive failure. RAID 1+0 (10) might be able to handle a double drive failure if the two that fail are in the right places. Stick to smaller drives? With less capacity the rebuilds happen faster, helping to minimize the exposure. Use faster drives? Maybe switching from 10,000 RPM disks to 15,000 RPM disks would help. They’re faster, so that would also help minimize the exposure to a double disk fault. However, 15K RPM disks seem to be more sensitive to cooling issues, making them less reliable and more prone to a fault if the environment isn’t perfect.

Maybe I can make disks irrelevant… no. I can’t. I can push my applications towards enterprise storage arrays, but this is a big issue there, too. Similarly, the movement within to embed hypervisors in hardware just moves the issues to the central disk arrays. I don’t want to shuffle the problem around, I want to solve it. The closest I can get is keeping backup copies of my data in as many places as possible, sharing as few things as possible. Keep a copy of my data in separate data centers, on separate servers and disk arrays, even preferably separate types of media, like tape.

All of that is expensive, though. Money is the ultimate trade-off with this sort of discussion.

For now I think RAID 1+0 plus a spare, on 73 GB 15K RPM disks might be my new direction.

iPhone & Airport Express? »

Wouldn’t it be cool if the iPhone could stream to Airport Express base stations? Like the one in my living room, attached to my stereo?

Just sayin’, that’s all.

VM Escape & VMware Critical vmkernel Updates »

The 9/21/2007 SANS NewsBites newsletter has some good commentary on the VMware updates that have shipped in the last two months. In short, if you are running any VMware product you need to be at the latest version in order to be secure against potential VM escapes.

Normally virtual machines are encapsulated, isolated environments. The operating systems running inside the virtual machine shouldn’t know that they are virtualized, and there should be no way to break out of the virtual machine and alter the parent hypervisor. The process of breaking out and interacting with the hypervisor is called a “VM escape” and it is bad news. If an attacker can gain access to the hypervisor they effectively have unlimited control over every other virtual machine running on the host. Not good.

The SANS editors make the point that VMware needs to be more forthcoming about problems with their hypervisors. I agree. VMware has published a number of “critical” patches for VMkernel, but nowhere do they mention any security issues. Compare the disclosure with the description of problems fixed in ESX-8258730. Do you see any mention of a security problem in 3.0.1? No. Furthermore, there is nothing indicating the urgency of upgrading to 3.0.2. There was nearly a month between the release of 3.0.2 and the disclosure of the problem, meaning attackers had at least a month to exploit this in the wild. Despite firewalls and other security precautions, my environments are a subset of “the wild.”

Certainly all software has bugs, and some of those are bound to be security problems. Fixing those bugs takes time, and with multiple shipping products it will take some time to fix everything. However, downplaying the issue by not mentioning it at all is a credibility problem for VMware. Somebody knows about the security holes, right? And if someone knows about them, someone is exploiting them. Virtualization security is a hot topic right now, and appearing to hide security problems doesn’t seem like the best course of action to me.

What has this all taught me? Mainly that I cannot trust VMware to tell me when I’m vulnerable. This means that any time I see a new version of ESX I have to assume it fixes security vulnerabilities, and get it deployed as soon as I can. It also means that whenever I see an update to the vmkernel marked “critical” I have to apply that, too, despite what it might list as the resolved problems.

Patching more frequently will add to the labor needed to maintain my virtual environments. It also means I will have to work harder to justify the extra labor to my customers and employers. Unplanned labor to remediate a critical security problem is easy to justify. Unplanned labor to remediate what looks like a SCSI driver problem seems like a waste. But now it has to be done, every time a point release comes out, and every time a vmkernel patch is issued. What they aren’t saying is what I’m worried about: that they’ve fixed a big security problem, and mum’s the word.

What is VM Escape? »

What is VM escape?

Normally virtual machines are encapsulated, isolated environments. The operating systems running inside the virtual machine shouldn’t know that they are virtualized, and there should be no way to break out of the virtual machine and interact with the parent hypervisor. The process of breaking out and interacting with the hypervisor is called a “VM escape.” Since the hypervisor controls the execution of all of the virtual machines an attacker that can gain access to the hypervisor can then gain control over every other virtual machine running on the host. Because the hypervisor is between the physical hardware and the guest operating system an attacker will then be able to circumvent security controls in place on the virtual machine.

Posting To rottenneighbor.com Considered Harmful »

I just read Lifehacker’s coverage of rottenneighbor.com. A few things stand out to me:

  1. Anybody can post anything to that site, so it isn’t clear where the problem lies. Is the problem with the neighbor or with the person doing the complaining? Both sides of the story are not represented.
  2. Neighbors you don’t get along with might be just fine for someone else.
  3. There doesn’t appear to be any way to retract, dispute, or remove data from the system. So if your neighbor moves the record will continue to stick around. Just think of how this works for apartments, too, especially in high-turnover areas like around colleges and universities (such as when the complaint comes from immature, drunken, partying college students who complain that their middle class, middle-aged family neighbors are prudes).
  4. Posting anything to the site devalues your home. Why would someone want to buy your house when they’d have to live next to rotten neighbors? The only way this would work is if you’re the person being complained about, and people aren’t attentive enough to notice that.

My suggestion: completely ignore that site.

Close
Powered by ShareThis