Rackspace’s Terrible Maintenance Plan

by Bob Plankers on March 20, 2012 · 9 comments

in Cloud,Outright Rant,System Administration,Virtualization

Update, 3/21/12: please read the comments, too — we got a good response from one of Rackspace’s folks.

I got a note today from Rackspace, where I have two virtual servers in their Rackspace Cloud. It was opened in the form of a support ticket, waiting for input from me, but with the text of the support ticket labeled as if I entered it, which was weird.

As part of our ongoing effort to provide you with the best Cloud Servers service possible, we routinely perform maintenance and upgrades of our underlying systems. The majority of these are performed non-disruptively, however maintenances sometimes arise that impact Cloud Servers instances. At this time, a Cloud Servers host update is required that will involve an automated migration (i.e. relocation to a new physical host server) of some cloud servers including the following server(s) in your account:

This makes sense. Over time, hardware gets old and should be replaced. Amazon just did this, too, forcing a lot of people to reboot their stuff, and in the restart process it’ll find its way to hardware & hypervisors from this decade. Shouldn’t be a big deal, I do this at work with my virtualized stuff, too, and most people can work it into their normal maintenance.

Preferred Option: MANAGED MIGRATIONS – Rackspace managed and controlled

After March 21st, 2012 at 11:59PM CDT, managed migrations will begin being scheduled for any cloud servers listed above. If you have multiple servers listed, it is likely that they will be spread out across several days beyond March 21st. You will be notified 24 hours in advance of any managed migrations and will be provided a specific time window in which the migration will occur. Managed migrations require no effort on your part, will automatically be performed by Rackspace, and will effectively appear to you as a reboot…

Alternate Option: AUTOMATED SELF-MIGRATIONS – One click migrations you control

To allow you to migrate at your convenience and minimize impact to your applications, you may perform your own automated self-migrations anytime between now when your servers are scheduled for a managed migration. This process is simple, can be performed with the click of a button, and will effectively appear to you as a reboot.

The plan seems sound, but today is March 20, 2012. From the time this made it to my inbox I had only 57 hours (33 + the 24 they give you prior to doing it for you) to reboot my stuff on my own terms? Are you serious?

While I’m reading this whole thing an update comes in:

We apologize for the miscommunication. The date on the initial update is incorrect. Managed migrations will not begin on these servers until after March 27th, 2012 at 11:59 PM CST.

Now I get a week to take my stuff down on my own terms, which, frankly, isn’t a whole lot better. What if I don’t have a maintenance window in the next week, or staff availability, or any other good reason why I can’t or shouldn’t do the work in that big of a rush? This becomes an unplanned emergency for me now, for no good reason as far as I can tell.

Here’s my take on this whole situation:

  1. A day’s notice is irresponsible & asinine, a week’s notice is ridiculous. A month is better. Two months would be nice. Give people ample time & notice to take care of the situation themselves, then force the stragglers into compliance. There’s a darn good chance that in the next two months they’ll have a maintenance window anyhow. And it isn’t like the folks there at Rackspace haven’t known this was going to need to happen for a long time (or, if this was somehow a surprise, they need a new project manager). Send the stragglers a note every week for two months, and in the last email to them assign a firm date & time when they’re going to see a forced reboot.
  2. Fix the CST/CDT issues in the notices so they’re accurate. Better yet, express time the way everybody else with a multi-timezone audience does: in 24-hour GMT. If you’re worried about people asking questions put a translation table into the FAQ.
  3. While I think the support ticket idea was a good one, don’t open support tickets in my name with initial text credited to me that I didn’t enter. At the best it’s confusing, and I’m the kind of guy that doesn’t like being credited with things I didn’t do. The followup correction was from “Support at Rackspace Cloud…” so it’s obvious it can be done. Choose to do it right.
  4. Get someone who is detail-oriented to read the notices you send to your customers prior to sending them, to vet the whole plan, and perhaps play the devil’s advocate.  I suspect there’s someone in the Rackspace Cloud support organization that could have provided all this same feedback, internally, if they’d had the chance, but now you have an annoyed customer doing it for you. Nice shot.
  5. Do a better job of highlighting that this process won’t be instantaneous, that it requires a soft reboot from the Rackspace Cloud control panel and not from within the OS, and that people should go read the FAQ for more details on how long it’ll take. The notice could have easily been more informational, though the FAQ did do a good job of indicating to me what was going to happen.

I’m comfortable saying that if this were a change request I’d filed in my place of employment it would have been denied by our change managers based on lack of timely customer communication for a non-emergency change and inaccurate details. C’mon Rackspace, you can do much better, and you need to if you want enterprises to move any workloads in your direction after this.


Comments on this entry are closed.

Previous post:

Next post: