Archive for August, 2007

Good Old FORTRAN 77 »

I walk up to a conversation a few of my coworkers are having. It’s a typical old guy to young guy “you don’t know how good you have it” speech.

“…well, complain all you want, you should have been around to endure the misery of punch cards,” the older admin comments to two students we just hired.

“Yeah, I don’t think those would have been much fun,” comments one of the young guys.

“If you want a taste of the misery you could always program in FORTRAN 77,” I add.

“Why’s that?”

“FORTRAN 77 was built to read programs from punch cards. Punch cards used the first seven columns as control information, to indicate line continuation, etc. So when you write your program you need to observe that, even though you are writing in a modern text editor.”

“Ooh, that sucks,” comments the student. “I guess I’m lucky that I am learning Java.” That’s debatable, I don’t add.

“When did you do any programming in FORTRAN?” asks the older admin of me, suspicious.

“My first CS classes, FORTRAN 77 on a VAX running VMS. Missed FORTRAN 90, which fixed all that nonsense.”

“And all that knowledge sits uselessly in your head now. The wonders of technology.”

“Oh, I wouldn’t say that. Last fall I helped a guy get some ancient program of his to compile on a Macintosh, of all things. It was rocky, but I got it working. The guy was ecstatic that I, being under 60 years of age, figured it out.”

“Really?”

“Really.”

links for 2007-08-29 »

Call The First Version “1.0″ »

Dear Open Source Software Developers,

Take a cue from the commercial software of the world and call your first released version 1.0.

Subsequent feature releases can be 2.0, 3.0, etc. Bug fixes can be 2.1, 3.0.1, etc. I don’t care how you do that, just so long as the version number is greater than or equal to 1.0.0.

“But Bob, I use versions like 0.3.1 to indicate that the software is not feature complete.”

So don’t release it, then, or call it ALPHA. Most software adds features with new version releases. Why are you different?

“Sure, but I want to indicate it isn’t ready for prime-time.”

Hey, pal, unless you put ALPHA or BETA in the version name us poor saps downloading your software will think it’s release quality. When we’re trying a new piece of software we don’t stop to learn all the arcane version numbering schemes for each project. We download it, and when it doesn’t work we think it sucks and ditch it. Give us a hint it’s alpha or beta quality by including one of those words.

“Why do you care what I number each version as?”

Well, there’s a lot of stuff, from CMDBs to package managers, that don’t like versions below 1.0. That causes all sorts of pain for people like me. Eventually you’ll run out of sub-1.0 numbers and switch to 1.0, and then whatever hack I use to cope with your sub-1.0 numbers (usually by just multiplying by 10) isn’t going to work again.

“Whatever, dude. Deal with it.”

I deal with it every day. :-) My responses range from not using your software to dealing with the numbering scheme on my own. And from the looks of things, I’ll continue to have to deal with it for decades…

links for 2007-08-28 »

Lightning, Cold Water, and Me »

This morning my building was hit by lightning. The power sorta browned out for about 10 seconds, and then everything was fine again.

Except some control system freaked out, and the chillers in the data center stopped receiving cold water.

Oops.

Thus began an hour of frantic shutting down of development, test, staging, and otherwise not production machines. All in an attempt to keep the room temperature down while the facilities guys fixed the problem. It worked.

A couple of things occurred to me during all of this:

1. We discovered that the Jabber server we use to coordinate outage handling is open to the world. We had customers joining our chat room. Not that we were hiding anything, but it’s just not right. Never had that happen before, so we never thought of it.

2. We also discovered that the Jabber server we use has a fairly low number of people allowed into a chat room. Combined with #1 this made life a little difficult, since lots of them didn’t understand that they were actually impeding our ability to fix things. I’m not even sure we’d have caught this with a practice outage, either.

3. During a crisis we always have a lead tech and a lead manager. The manager is in charge of political operations and decision making. The tech is in charge of coordinating the technical operations. I accidentally invented a whole new role, though, which I dub “scribe” or at the very least, chat room moderator. I stayed out of the data center, at my desk, and kept track of everything going on with the servers so the others could make decisions and deal with one-off issues. When the outage went from “shut everything down” to “turn everything on” I already had a spreadsheet of everything that went down, published to the web. I was using Excel, but I want to check out Google Apps to see if it’d be easier to collaborate on a list.

4. As part of being scribe I was in a position to know what servers were down but not off. Some OS & hardware combinations don’t shut themselves off, and for a chiller problem that doesn’t help. Two of my coworkers stepped in. Once I noted a server was downed remotely by an admin they’d go to manually power it off. Super freaking cool, especially since it didn’t tax the guys making decisions about the outage. The best part about it was it just happened. It’s definitely a symptom of having good people on a team together, that things just happen and get done.

5. Low tech helped a lot. We’ve had individual chiller problems before, and because of those outages we bought some big barn fans and extension cords. Lifesavers, they are. Likewise, the $5 desktop thermometer/hygrometers scattered around the data center were great. Sure, the routers and servers could tell us what they thought the temperatures were, but that’s so much harder than just looking at a gauge.

So that was my day. I didn’t get anything done, really, but it was good. And for the first time ever, I’m looking forward to the post-incident review.

links for 2007-08-26 »

links for 2007-08-25 »

esxcfg-nics & esxcfg-vswitch »

One of my ESX Servers’ management NICs died today, right as I was to start upgrading to ESX 3.0.2. I don’t have the admin NICs in a redundant configuration yet, and it’s fairly inconvenient to lose management capabilities as you’re about to need VMotion.

Luckily[0] I had an extra, unused NIC, esxcfg-nics, and esxcfg-vswitch. With these commands you can display and alter the settings for the NICs and virtual switches from the console.

So, you find out what you have available with “esxcfg-nics -l”

esxcfg-nics

Then you look at the relationships between the virtual switches and the NICs using “esxcfg-vswitch -l”

esxcfg-vswitch

Since vmnic3 isn’t being used I ran:

esxcfg-vswitch --unlink=vmnic0 vSwitch0
esxcfg-vswitch --link=vmnic3 vSwitch0

And back up it came.

[0] Not too much luck, though, since I intentionally have a spare NIC in all the boxes, just in case of emergency.