When you use software or hardware that you didn’t invent yourself you have to budget time to keep up with that vendor. If you are an IT manager you need to make sure that you have budgeted time and money for your staff to keep up, and that they are keeping up. From time to time vendors change their documentation, especially best practices documents. They don’t tell you when things change and unless you are looking for the changes you won’t see them.
I’ve been caught by this a number of times. Recently I helped with an effort to resolve some storage problems we were having, and in the process discovered some discrepancies between our procedures and what the vendor recommends. The problem originally was poor I/O performance on our VMware servers. During the course of figuring out what the problem is we ended up retrieving the best practices and general “how to connect a host” documents from our vendors to double-check things. As it turns out, things changed as the software on the arrays changed, and our procedures and knowledge actually weren’t up to date.
It is sometimes helpful to have a staff member who didn’t do the work originally, or is knowledgeable but normally uninvolved, double-check the accuracy of procedures against what the vendor says. Why? Because they don’t have any preconceived ideas about what the results will be. In the case of the storage problems I was one of the guys to originally implement the storage systems, but have since moved on to other things. When I came back to them I had a fresh view of things.
There are two tricks with procedural reviews. First, you cannot make it an inquisition. If discrepancies are found, assess the impact, fix the problems and the procedures, and move on. Don’t ask how this mistake happened, ask how the team can catch these needed procedural changes faster. If you find a huge problem then there will be questions, but the point of the review is to catch the mistakes. It worked — move on. It isn’t about blame.
The second trick is to know when to trust your experts. If your experts say everything is cool, do you trust them? Or rather, do you trust their biases, and their egos? Techs often suffer from the “Of course it’s accurate — I just did it” syndrome. The occassional, yet polite, response of “prove it to me” really gives you an idea of where people stand, and when you need to double check things.
Every system you build, every software package you implement needs constant maintenance. If you don’t keep up with them regularly the effort to catch up all at once later will be overwhelming. Keep up with things and make sure to double check the effort in a non-threatening manner.