RSS Feed for General RamblingCategory: General Rambling

Three Organizational Decisions That Help Me Virtualize »

Over the last ten years my organization has come a long way with its IT policies and processes. We’ve gone from the wild, wild west of IT where personal heroism ruled the day, to a place where there’s just enough process to make sure that communication happens correctly and things like our Configuration Management Database (CMDB) stay up to date. It’s been a lot of work, but I am actually really proud of where we’re at.

There are three fundamental decisions we made a long time ago that, had they not been made, would have drastically changed how virtualization has proceeded here.

1. Clearly defined maintenance windows.

Knowing exactly when someone can do maintenance on server has been crucial to getting things done in our virtualization environment. There are many adjustments you can & should make in virtual environments, but if you can’t ever take the VMs down to make the changes you’re stuck. We’ve been able to do physical to virtual migrations, performance tuning, VMware Tools upgrades, vSphere upgrades, and a whole slew of other things in relatively short timeframes because we have this all worked out already. This also lets us “right-size” our VMs — rather than deploying huge VMs just in case they need the CPU or RAM, we deploy smaller ones and then can take an outage to add CPUs and RAM if we need to. The maintenance windows for a server are negotiated between the application/service admins and the system administrators when a machine is put into production, we track it in our CMDB, and any member of the whole team supporting the service can take the maintenance window, as long as they follow some rules about notifications for the change (timeframes, etc.).

2. Use of load-balancing technologies.

We use application load balancers (layer 4 of the OSI model) to decouple services from individual servers. Not only does this allow us to take a host down without affecting a service, but it also lets us spread the load out more among the physical hosts we have in our virtual infrastructure. In a lot of cases having more, smaller VMs results in better workload scheduling by ESX and DRS, especially on smaller ESX hosts.

Of course, this also plays nicely into the other points, because it’s very liberating to be able to do what we call “rolling maintenance” on a service, just taking one machine down at a time so that customers are not impacted. It also means that system administrator quality of life goes up, for now we can do maintenance tasks during the day instead of on weekends and off-hours. Doing maintenance during business hours has a couple of benefits. First, it means that the maintenance will actually get done. If you try to use someone’s personal time to do work they tend to opt out of that work. Servers go unpatched, tuning doesn’t happen, lots of things that should get done don’t because people will choose their personal time over work. Second, it means that if something goes wrong there are others around to help out. Doing work at 5 AM on a Sunday is fun, but if things go sideways you have to wake someone up or try fixing it yourself. Doing work during the day means you have the rest of the team around to lend a hand.

Third, it gives you a way to make incremental changes and then watch the effects. This has been particularly awesome for performance tuning of applications and our virtual environments themselves. Testing tuning changes is often hard, because test suites and test load generators are synthetic and often don’t compare to real load. But because the load is spread out we can make a change to one VM, or one ESX host servicing one VM, and keep an eye on it. I’m not advocating being a complete cowboy — you still have to do testing — but the risks to your production environment are a lot lower if you can catch problems on one VM first.

There are usually some other benefits to load balancers, too, that make them virtualization-friendly. Many will offload SSL processing, so your VMs have less work to do. Others have features, like iRules in F5′s products, that let you rewrite network traffic on the fly, which has some really neat implications for security, monitoring, and service delivery. And if you don’t want to buy a piece of hardware you can often get a virtual appliance from these vendors, though the physical appliances are usually a lot faster.

3. Commitment to operating system and application patching.

It is a fundamental belief of mine that one of the best ways to stay secure is to keep up on your patching. My organization agrees, and by using load balancers and defining maintenance windows we’ve made it easy for ourselves to keep our hosts up to date with regular patching cycles. Because we can take servers down without taking services down, and because sysadmins know exactly when a server can come down, we can schedule maintenance cycles easily, whether it’s six months out or two weeks. We can also respond very rapidly to emergency situations, like recent remote execution vulnerabilities in Microsoft Windows, by rolling patches out to development & test hosts, then QA & production, over the course of just two days if needed.

Keeping up to date with patches not only keeps you secure, it also lets you take advantage of new features that are added to operating systems. For example, Red Hat keeps adding new virtualization-friendly features, like kernel interrupt clock dividers. Being a kernel parameter you can’t just change it on the fly. And if you have to reboot, but can’t get a time to do it, you won’t do it. For us, we just rolled the change into one of our patching cycles and reduced the load on our infrastructure dramatically. Meaning more VMs per physical host, and a quantifiable amount of savings from just a small change on each machine.

Furthermore, our commitment to patching also extends to the virtual infrastructure itself, and we have a rule that we will not implement anything that breaks vMotion or Storage vMotion. Why? Because then it becomes very difficult to cope with ESX updates, or hardware failures, or any situation where vMotion could be used to prevent an outage. Sure, this means that we still need physical hardware for some applications, but it’s still just a fraction of the hardware we were buying years ago. This also makes virtual infrastructure easy to upgrade when the time comes, for new versions of vSphere, new storage arrays, and new physical hosts. Instead of planning outages on hundreds of VMs we just vMotion them, and nobody is the wiser.

Disclosure: F5 is a sponsor of Gestalt IT Tech Field Day, of which I have been a participant. I am not a customer of F5 at this time, though.

Youth »

“Hey, do any of you guys have an old, full-height hard disk lying around?”

This was a relatively new person from another group in our organization. People occasionally come looking for random old equipment to use for training & examples, because they know we have things like original IBM PCs, Cisco AGS+ routers, token ring MAUs, and 1200 baud Multitech Multimodems on hand.

“Sure, I’ve got a full height drive, one second.” I produce a full-height 600 MB Imprimis SCSI disk. Made in the USA, so it’s pretty old. It’s a bookend on my bookshelf.

600 MB Imprimis SCSI drive, full height

“What in the heck is that?” he asks.

“Um, a full-height drive?” I reply, really wondering what he thinks he’s asking for.

“No, man, I don’t know what that’s out of but it’s wicked. Full-height is like a couple inches tall, though.”

“And 3.5″ form factor, right?”

“Yeah.”

“Dude, that’s half height.”

“Nah, that’s full height, at least that’s what I’ve been told. So what is that?” he asks, as he points to my impressive specimen of early 1990s drive technology.

“What you were looking for is half height. This is a full height 600 MB SCSI fixed disk. Final answer.” I hope he didn’t learn full height vs. half height from someone he paid.

“I’ve never seen one of those before. Can I borrow it? The other guys will flip out when they see this thing.”

I wonder what they’d think if they saw 8″ floppy disks. Freakin’ kids.

Oregon Trail: The Movie »

One would think this is off-topic, but for techies it really isn’t. I present to you the official trailer for The Oregon Trail. Very well done!

Makes me proud to have grown up in Minnesota (MECC == “Minnesota Educational Computing Consortium”). If you want to play it Classic Gaming has it and a copy of AppleWin.

Happy System Administrator Appreciation Day! »

The Wisconsin DMV sent me my gift a day early:

And it was a present — I needed replacement plates but hadn’t ordered them yet. I’m glad I didn’t!

I often joke that I haven’t come up with an original solution to anything in years, thanks to all the other sysadmins out there who share their solutions, knowledge, and time in order to make the world better. Thank you all for everything you do!

Gestalt IT Tech Field Day Seattle »

Apple iPad: $670

Wyse PocketCloud RDP/View Client: $14.99

One flight worth of GoGoWireless: $12.95

Posting on my blog via an RDP connection to my work desktop across a VPN from 30,000 feet: priceless.

I’m on my way to Gestalt IT’s Seattle Tech Field Day. I’m excited, for a lot of reasons. It’s an honor to be invited, nominated by some of the other delegates. I’ve spent little time in Seattle, and while I won’t have a lot of extra free time this trip it’ll be better than last time I was there. I’ll get to hang out at a bunch of high-tech places, and best yet, do so with a bunch of high-tech folks that, frankly, I’ve only read about. How cool is that?

Pretty damn cool, if you ask me. And you didn’t, I know. :)

Saturday Morning in Ohio »

Photo courtesy Maitri, background on the Dell from Vladstudio.

It’s A Family Thing »

I’m taking an off-topic break for a moment to express my delight in popular culture, for a change. Those of you watching HBO’s Treme saw something this week that is summarized quite nicely by the New Orleans Times-Picayune, in their “Treme Explained” post:

Antoine and Desiree barbecuing on a parade route illustrates an aspect of Mardi Gras that’s poorly understood by outsiders: The beads ‘n’ boobs “Girls Gone Wild” version of the holiday that Delmond briefly experiences elsewhere in the episode is largely confined to the French Quarter and almost exclusively perpetrated by drunken tourists, not that there’s anything wrong with that. For most local participants in the Carnival parade experience in New Orleans, the setting more resembles a family picnic.

As someone who rides with the Krewe of King Arthur each year this is the number one thing I have to explain to those who ask me where I’m going for two weeks in late winter. I’m really glad someone is shining some light on it, for maybe it will reduce the “Oh, you go to Mardi Gras? That’s okay with your girlfriend?” comments in my future.

Maybe.

As a side note, and slightly back on topic, my coworkers will be able to tell when I’ve had it, because I’ll start naming servers after New Orleans geographical features. Like Tchoupitoulas (chop-uh-too-luss) Street, Marigny (MARE-uh-knee), Treme (trah-MAY), etc. Even something as seemingly simple as Calliope is really “CAL-ee-ope,” and Burgundy more like “burr-GUN-dee.” Mean? Yeah, I know. Though I really would delight in naming something after the OPP.

Links:

- NOLA.com’s “Treme” explained.
- Krewe of King Arthur
- Continued hat tip to Rafe Colburn’s Essential Reference posts. Good stuff.
- The “Back of Town: Blogging Treme” blog also has in-depth commentary by NOLA locals.

Test My ISP »

Test My ISP is a new program from the FCC to measure the speeds people are getting from their ISPs:

Together, the FCC and Samknows are setting out to provide US consumers with reliable and accurate statistics of their broadband connections. If you are interested in using one of our units to measure your home broadband connection, then please sign up below. You will get to play a part in changing the face of the American broadband industry and you also get a free high-speed wireless router!

I think it’s great they’re doing something like this. My AT&T DSL connection never goes 6 Mbps, which is what I’m paying for (and want). However, it probably won’t be hard for the smart folks at these ISPs to figure out where these devices are, what kind of traffic they generate, and how to make the traffic high-priority. Especially since every big ISP probably has someone signing up for one of these devices right now. “Evil will always win, because good is dumb.”