Reader Ben asked a good question in the comments of my previous post about downtime. I’m going to take a stab at this with the hopes that others will chime in and augment/correct my thinking.
Isn’t it true that scheduled downtime is not usually factored in when calculating historic availability trends? If you had a scheduled maintenance window for 6 hours every Saturday morning, that wouldn’t count at all towards your downtime calculations. That could also affect the number of minutes per 9 calculation above.
I am of the opinion that service availability[0] should be measured as the amount of time the service was available. If your service is down for 6 hours every week it’s not available during that time, and you have roughly 96.4% availability.
However, you can break that down into planned and unplanned downtime. That’s an entirely different metric. If every outage you take is scheduled and you don’t run past your scheduled maintenance window you have 0% unscheduled downtime and 100% scheduled downtime. Excellent job.
There is absolutely no shame in lower availability numbers so long as that’s what the service is designed around and the customers are happy[1]. Then, if you can keep your unplanned downtime as close to 0% as possible, you’re set.
——
[0] Note that I said ‘service’ availability, which should be the goal. The goal is not individual server availability, per se. Often they go hand in hand but when there are cluster configurations the availability of an individual server doesn’t matter as much. Besides, even if you have a single server it doesn’t mean the service is available, because software issues could affect that. If you are measuring the server and inferring service reliability you might be missing something important.
[1] …or a mitigating entity tells them to shut it, such as a manager to cranky internal users. 🙂
Bob, thanks for clearing this up. This matches my preferred downtime definition as well. A few quick searches didn’t turn up any hard facts (perhaps another reader can find this data), but I’m certain that I’ve seen some major service providers (Blizzard, eBay, Amazon, others?) claim that maintenance windows are not part of their downtime calculations. I’ve always thought that seems like a scam.
Complete scam. If you aren’t up you aren’t up. 🙂
I’ll have to start writing a clause in the fine print of our contracts that we reserve the right to retroactively declare maintenance windows…