RSS Feed for FeaturedCategory: Featured

If There’s One Feature I Want… »

If there’s one feature I want to see added to VMware Virtual Infrastructure it’s the ability to update hardware firmware.

“Hi, I’m VirtualCenter. I noticed you have a Dell PowerEdge 2950 with BIOS 2.3.1. I have a copy of BIOS 2.4.3, let me put that on there for you. Your fibre channel HBA has firmware from the stone age? No big deal, maintenance mode, update, reboot, awesome. BTW, I also set the queue depth on the HBA to the optimal values.”

Perhaps you can speculate what I’ve spent the last few hours doing… several hours of my life I’m never getting back. I can only imagine that VMware has thought of this already, but I wish they’d hurry up. :-)

Why Does rnd() Keep Changing? »

My friend Tom found this, I thought it was worth re-sharing:

I can think of several ways of making things like /dev/random stop changing, mainly based on what my customers have done to machines.

Intel 7400 Memory Population »

Intel’s The Server Room blog has an interesting tidbit of information for those of us thinking about servers with the Intel 7400 series of CPUs in it:

As mentioned before, an MP Xeon 7400 series server will provide four channels of FBD memory. There are a couple of considerations here. First, latency to memory increases for every DIMM added to the system. This is important to note because you can keep the memory latency to a minimum by adding fewer high capacity DIMMs. Second, be sure to evenly distribute the DIMMs across all the channels. In other words, don’t fill up all the slots on one channel and then lightly populate the rest.

Some systems get faster when you have more DIMMs, some get slower, and it’s information like this that can sometimes push us from sixteen 2 GB DIMMs to eight 4 GB DIMMs. On the other hand, there’s always the price. A DIMM that has double the capacity often costs more than twice as much. Then again, if you only use half as many DIMMs it means you get expandability later, though now obviously at the cost of some performance. It’s always about the tradeoffs, isn’t it?

No UW Band? What?!? »

It’s going to be a weird Wisconsin/Ohio State football game without the University of Wisconsin - Madison band, seeing as how they all got suspended:

Leckrone suspended the band Friday night from performing in today’s home football game against Ohio State because he said the allegations were serious enough that they required immediate action… There will be no replacement act at tonight’s Badgers football game to fill the absence of the UW Marching Band.

Of course, it’s the one football game a year I actually make it to. I’m not going to make any in-depth comments, except to say that the band has been rowdy and fun for decades, and if people don’t like it, well, all the members are adults (well, they’re over 18) and can make their own decisions. Welcome to life. Personally, I always enjoyed the band parties I went to.

Perhaps I can suggest some replacement halftime activities:

  • interpretive dance by randomly-selected fans (or all the seniors taking the band’s tickets tonight).
  • contests like those that happen between periods at a hockey game, like “who can score the highest on the breathalyzer and still make a field goal.”
  • a high school football game.
  • a deathmatch between mascots, or at least a swordfight or jousting with the team flags.
  • get Donald Lipski, artist who created the sculpture “Nail’s Tales,” on the field to talk about what the hell he was thinking.

Just a thought. We’ll see what they actually do.

The Beauty of Logs »

I’m not sure how many times I’ve been asked by coworkers, friends, and random people if I know how to fix a problem. The conversation always goes something like:

“Hi Bob. I am getting error XYZ when I try to use scp with public keys to copy a VMDK file from one ESX host to another. Can you tell me what I’m doing wrong?”

“Hi Joe. It could be one of thousands of things. You might try looking at /var/log/messages or /var/log/secure to see what SSH thinks the problem is.”

“Bob, thanks! It was a permission problem for my authorized_hosts file.” Neato.

The nice thing about logs is that they often give you information that helps you solve a problem[0]. Like today, I’m trying to use VMware’s Update Manager to patch an ESX host but it keeps reporting that “VMware Update Manager had a failure.” Digging in a little, it turns out that it’s complaining about patch metadata being missing, which doesn’t make any sense because all of my other hosts work just fine. It’s just this one customer’s ESX hosts that are being difficult.

So I “tail -f /var/log/vmware/vpx/vpxa.log” to watch what happens when I tell it to scan for updates. Sure enough, I see the error, and it becomes painfully obvious what the problem is:

[2008-10-03 11:26:06.714 'App' 106433456 info] [VpxLRO] -- ERROR task-47870 --  -- vim.host.PatchManager.Scan: vim.fault.PatchMetadataNotFound:
(vim.fault.PatchMetadataNotFound) {
   dynamicType = <unset>,
   patchID = "Unknown",
   metaData = (string) [
      "http://vcserverhostname:80/vci/hostupdates/hostupdate/esx/esx-3.5.0/contents.xml.sig"
   ],
   msg = "Metadata for patch missing."
}

Sure, the metadata can’t be retrieved because, to this host, ‘vcserverhostname’ isn’t resolvable[1].

About 15 seconds later I’d updated the DNS configuration to include my Update Manager server’s domain (as part of the “Look for hosts in the following domains”). Problem solved.

Thanks /var/log/vmware/vpx/vpxa.log!

————

[0] This seems obvious, but given how many times I’ve had that same conversation, it doesn’t seem like a place people usually remember to look.

[1] Which is also easily checked, at least from ESX’s CLI: “host vcservername”. If it tells you “Host vcservername not found: 3(NXDOMAIN)” you know what the problem is.

Solutions to Match Your Problems »

One of the big things I like about virtualization is that you can find or build solutions that match the size of the problem you have. Need live workload migration? Buy VMotion. Need dynamic load balancing? Buy DRS. But if you only need to move your workloads around once in a while maybe you can get by with something like Mike DiPetrillo’s quick migration script. Cheap, easy, right-sized, and it has a well-known path for growth when you decide you really do need VMotion or DRS.

Which, by the way, is why I’ve been telling folks to skip VMware Server and go straight to the free ESXi. That way, when they decide that virtualization is cool and want more of it, all they need is some license keys, rather than a big conversion process.

Failure Modes I Haven’t Seen Before »

It’s a rare day when I get to see operating systems fail in ways I’ve never seen before.

I’ve been having the strangest problems with a virtual machine I’m trying to deploy. It boots but won’t come up properly on the network. Services will start but complain about the network, or just be unresponsive. I can’t ping it, either. I’ve deployed several other virtual machines today from this same image, so it isn’t the image. Regardless, I redeployed it. Still messed up. I double-checked the network settings, /etc/hosts, /etc/resolv.conf, gateway devices, netstat, route, everything. Nothing is wrong. I changed the IP address to something else, and it works great. I checked with my NOC to see if the IP I’d been using is firewalled, blackholed, or otherwise administratively unusable. Nope. I switch back, and it goes back to failing. OMFGWTFBBQIAMSOFRUSTRATEDWTF.

Turns out my hostmaster had set the A record to 192.168.77.74, rather than 192.168.74.74. Not surprisingly, a lot of stuff seems to care about that. The IP looked right, though, so I didn’t notice it until after a few hours. A few hours of my life I’ll never get back, that is.

Continuous OS Releases »

Gentoo Linux Cancels Distribution:

Instead, Gentoo developers said they are pushing a new model for their distribution — one that eschews the conventional release wisdom used by Red Hat, Novell, Debian and others. Instead of fixed releases, Gentoo is promoting its vision of a live, continuously updating distribution. In practice, that effort revolves around its weekly minimal images, which are then supplemented with customized installed packages.

Continuous OS releases are an interesting idea. One of the annoying aspects of OSes is that every few years you have to go through a big upgrade cycle, as a vendor stops support for version X and forces you to version X+2. For my organization these upgrades haven’t been a problem because you can do the OS upgrades with the normal hardware replacement cycle, every three years or so as leases run out, etc. Now that virtualization is taking over we won’t have the same chance to replace the OS, though. Being able to upgrade the OS more easily and often sounds like a great idea.

The problems with continuous OS releases, though, are numerous. First, application developers are going to hate this. They don’t want the underlying OS to change, ever, and to have it changing constantly means a lot of trouble to them. Time spent testing against new OS releases is time not spent making their software better. That goes double for vendors who have to support things running on these OSes. How do they test & certify their software, hardware, and/or procedures against a constantly changing OS? It would be hard, especially since open source projects already have a terrible time with compatibility issues & QA. Last, all these issues apply to system administrators, too. Instead of having two or three operating environments to track you have the possibility of an infinite number of them. Admins will need to superimpose the old system of releases on top of these continuous release models in order to get any testing done, just like we do now with patch management.

Instead of continuous releases, perhaps a better solution is to make the upgrade process between releases much easier, cleaner, and seamless. It would also help some vendors to do more frequent releases (five years between Windows releases is a long time, for example). Red Hat releases new Enterprise Linux versions every two years or so, and supports them for seven years. Two years is a nice interval, and offers a controlled, regular opportunity to add new technology. If they’d make the upgrade from RHEL 5 to 6 (or 7, or 8) seamless we would get all the benefits of continuous OS releases without all the support problems.

That’s the sort of upgrade feature I’m hoping for, something as easy as a “svn sw” command for my OS.