Human beings tend to hoard things. There are various explanations for it, usually evolutionary. Wealth and possessions tend to be the first things we think of as “hoardable” but on a daily basis I interact with people who hoard data. In some places it seems to be an epidemic, a disease among IT departments and management, where folks try to collect, index, and retain gobs of data. I ask them “why do you need all this data?” and they reply with vague, ambiguous missives. They cannot even fathom not keeping track of absolutely everything. “You can never have too much data,” one fellow told me a few years ago. “You use the data when making decisions!” he uttered as if I were a complete moron.[0]
The problem is that he, along with everybody else hoarding data, is wrong. Absolutely wrong. We, as IT professionals, as managers, as configuration management database (CMDB) administrators are keeping too much data.
What happens when we decide to retain data? We need to store it somewhere. We need to display it to people who need to view it. We need to keep it secure so that people who don’t belong cannot view it. We also need to keep it up to date.
Storing data seems like it would be the simplest of the challenges here, but it isn’t. Disk space is cheap, but the format used to store the data is important. So we discuss that in meetings. Maybe the format comes in the form of an application, like a database or something like HP OpenView Service Desk. So we evaluate applications, spend money to buy and implement them. We have to maintain the application, too. We have to upgrade it, converting the data between versions. Every row of data in a database is a potential problem when you are moving data around. Application maintenance, upgrades, storage, servers, all of it costs money and time. The costs are not linear, either. More data means exponentially more maintenance, upgrades, storage, and servers.
Displaying the data has a lot to do with the format you store it in. If it’s just a database then you have to train people to use reporting tools to look at it.[1] You have to worry about formats. Do you want your data in a CSV file or in XML? What colors should it be in? Maybe the application you use has a client that needs to be installed on desktops, which means more work for the support guys. Do you need to train people to look at the data? You need to have standards for working with the data so that each field means the same thing to each person. Standards mean meetings. Lots of them, where you have to listen to everybody’s opinion of the definition of the word “purpose.”
Then there is security. Now that we’ve collected all this data we’ve decided that only certain people can update it. Certain people can view it. Certain people can view certain things. We take a few man-months and go through and set permissions on the data. We are very careful, because now that we have all this data it would be devastating to see it leak to the wrong people. We also decide that because nobody could agree on standards for the data we’re only going to have two people in the organization update the data, even though many people in the organization have the right to change the real world systems. We’ll create policy and procedure to tell those people they cannot change things we track without the approval of those two data gods, to ensure that the updates make it in. Otherwise our data will get out of sync. Don’t worry about a bottleneck there — we’ve assigned our two most productive people to the role of data guardians, taken from customer programming projects.
That’s how we keep our data up to date. Policy, hashed out in thousands of man-hours of meetings, disseminated as commandments by management. All of our employees have been told that their number one goal is to keep our database in sync with reality. They all do that, because we told them to. They understand fully that without accurate data about the version of wget in /usr/local on server A we cannot succeed as a business, because we cannot make good decisions.
And monkeys will fly from my butt. The good decisions went out the window when we decided to keep all this data.
When we keep data we don’t absolutely need we condemn ourselves to spending money and time handling it. This data and the systems, the policies, the software to maintain it all are overhead. A cost center. Every penny, every second we spend on them is spent not helping our customers, and not driving our business forward. If we choose to ignore the data and stop spending time and money it becomes useless to us. We cannot use it for anything because it is not accurate.
So how do we stop all this nonsense? Where is the line between too much and too little?
My answer: start asking questions. “Why?” and “so what?” work well. It goes like this:
“Why do we need to track the Tivoli Storage Manager node name in our CMDB when it always matches the server name?”
“Do we need to know the specific version of the antivirus software on our desktops? Why isn’t the central AV management console good enough to track this?”
“Why do we need to know that there are 485 MP3s in 48 directories on our Netware file servers? Is the 3 GB they are consuming worth the effort to find them?”
“Seriously, do we ever need to know the speed of the CPUs in a server to make a business decision?”
And the killer: “What happens when we don’t collect this data?”
The decision to collect data is always a battle between making life easier for someone and making it difficult for someone else, often many others. You rob these people, your coworkers and friends, of their time, which could be spent advancing the organization. You rob them of money, maybe not directly from their salaries but from budgets that would fund other initiatives, like things that make customers happier, and therefore grow the business.
If you decide that you do need to collect a certain piece of information, ask yourself how often you need to work with it. Every day? Once a year? Once a month? How will you work with it? If you will open it up in Excel do you really need to have it stored in a database? Maybe someone could write a script to gather the data once a month and store it as a file. Sometimes it turns out that data stored in a system is just as hard to use as it is to keep accurate. So why even keep it?
Ask yourself what you really need for data. Ask yourself what will happen when you find you don’t have data you need. Maybe a more general purpose tool to gather the data when it is needed would be more useful than pre-populating it and watching it rot.
Maybe.
All I really know, though, is that all this data we’re keeping is costing us a lot. So let’s only keep what we need.
[0] I might be. Would I be able to tell if I were a moron?
[1] I’ve seen a few CMDB instances that would be better if they scrapped the front end software and just used a database with Crystal Reports.
“All right, Mr. Wiseguy, you’re so clever, you tell us what color it should be.”