Data. The Good, The Bad and The Ugly

IT Asset management hinges on data. Whether you’re trying to work out how many laptops you’ll need to refresh next year, how well you’ve done with scrubbing Windows Server 2008 R2 out of your estate, or how many licences you’ll need to cover that new Datacenter, you’ll need the right data to start figuring things out. And if you want to do a good job, you’re going to need good data, because ugly data leads to ugly conclusions.

So, once you’ve figured out which DBA owes you a favour and might be willing to extract the data you need from their vast and impenetrable database, and it drops into your inbox/SharePoint/Dropbox (please, for the love of Pete, not Dropbox) that should mark the end of all your worries, right? If experience has taught me anything it is this: It is perfectly possible for this to be the beginning of your troubles rather than the end of them because often the only thing worse than no data, is bad data.

So how can you measure good data? Well, there are a few standards out there and a quick internet search threw up a couple of relevant organisations very quickly (IQ International, DAMA International). Rest assured I will make the time to gleefully devour the information those organisations have on their websites when I get a chance, but until I sat down to write this blog post I didn’t know about either of them. I say this for no other reason than to flag the fact that in what follows I may ‘get the words wrong’ because the terminology is all mine, so apologies in advance for any ‘terminological inexactitude’.

How to measure data quality for ITAM

When I start to examine a data set for use in ITAM, I’m primarily reviewing it against four criteria; Age, Completeness, Quantity, and Quality. If the data is too far adrift of the standards I need for a given task then I find it’s often more productive to try to identify and address the causes of the gaps and issues in the data first. You can then return to the analysis later with an updated data set, rather than trying to piece together some questionable assumptions based on shonky data. This not only means your conclusions are likely to be more robust (albeit a bit later than expected), but it means the same analysis can be performed in the future far more quickly.

First, how old is the data? If it’s more than about 30 days then it’s probably too old to start working with. Much like Sashimi, the fresher it is the better. Also if you go to town on some old stuff it’s likely you’ll end up producing a lot of crap. Again, much like Sashimi.

Secondly, how complete is the data set? Have you got 100 servers? Have you got 100 records, one for each of those assets? If not, it’s time to start figuring out what has gone wrong and closing those gaps.

Thirdly, what quantity of data elements do you have in each record? For me an individual record would be all the data in a given data set on a single unique asset. This record will be composed of a number of data elements, which may be things like the asset’s name, IP address, Proc and core counts etc etc. Some of these elements will be need to haves while some will be nice to haves. If you’re missing any of your need to haves it’s time to start figuring out why. You’re not going to get very far crunching a data set where half the records are missing a device name.

Fourthly but by no means lastly, of what quality is your data? Similar data elements from each record should be in the same format, recorded in the same way. If you have the price of something as a data element in a data set, it needs to be recorded numerically every time. It’s only going to cause extra hassle if 10% of all your prices are recorded using words (“Four thousand pounds” as opposed to £4000).

What does good ITAM data look like?

When it comes to the data available from inventory tools, it’s likely that only the first three of these criteria could cause you a headache, because the inventory tool will lock down the quality of each data element itself. Arguably it’s the first two criteria that you’ll need to spend time addressing and I posted a little while back about the benefits of using agent-less and agent-lead discovery tools to help with this which I invite you to read.

For manually populated data sets all four criteria are going to need addressing to ensure the data is usable, because often when taking initial extracts from an existing data set you’ll find they aren’t. That’s probably because you’re taking data that was meant for one purpose, and trying to use it for a different purpose. For example, getting an extract of software licence purchases from a procurement database with the intention of using it to calculate a licence entitlement position and finding that key data required to establish entitlement is missing, or is all recorded in one data element as text. When this happens it’s worth bearing in mind that the data is probably suitable for whatever purpose the procurement team were using it for, but now you’re trying to use it to establish licence entitlement it’s not sufficient. In situations like these it pays to create some sort of standard data elements template to record the data elements that are mandatory, plus the format in which they should be captured. This can then be shared with the relevant teams and, if you’re really lucky, implemented in any data collection templates or forms in use.

Matt Halstead

Matt Halstead

Comments are closed.