Failure Modes I Haven't Seen Before

It’s a rare day when I get to see operating systems fail in ways I’ve never seen before.

I’ve been having the strangest problems with a virtual machine I’m trying to deploy. It boots but won’t come up properly on the network. Services will start but complain about the network, or just be unresponsive. I can’t ping it, either. I’ve deployed several other virtual machines today from this same image, so it isn’t the image. Regardless, I redeployed it. Still messed up. I double-checked the network settings, /etc/hosts, /etc/resolv.conf, gateway devices, netstat, route, everything. Nothing is wrong. I changed the IP address to something else, and it works great. I checked with my NOC to see if the IP I’d been using is firewalled, blackholed, or otherwise administratively unusable. Nope. I switch back, and it goes back to failing. OMFGWTFBBQIAMSOFRUSTRATEDWTF.

Turns out my hostmaster had set the A record to 192.168.77.74, rather than 192.168.74.74. Not surprisingly, a lot of stuff seems to care about that. The IP looked right, though, so I didn’t notice it until after a few hours. A few hours of my life I’ll never get back, that is.

Comments on this entry are closed.

  • We hostmasters like to do that once and a while.

    Just because.

  • Well, I can’t say that I haven’t done stupid things to people’s apps before… I’m not really blaming anybody, just wish the error had been more obvious. :-)

%d bloggers like this: