A scoring system for the cleanliness of your data

Most businesses have a database that is used to keep in touch with former customers or prospects, and it might be in a CRM system or it might be a bespoke database of some sort.

But how do you know how accurate it is, and what the quality of the data is like?

If you have a bespoke system, or a database where you can control the structure to an extent, you can build in a relatively simple way of measuring data quality. To a large extent this data generates itself, so the database is “self-measuring” provided you set the thing up properly in the first place.

Assume you have a basic prospect list of people you want to telephone.

How clean is it?

First, within the database, you define the “essential” bits of data you need – for example, a name and a phone number. You may then consider an email a “nice to have” but not essential. Then an address might be seen as optional.

You can create queries or reports which determine which of the fields have data in them, or which have been set to valid values, as appropriate. You need to ensure, obviously, that the essential name and number fields are populated.

Using these measures you can grade your records from A to E, where:

  • A means every field in the table is populated, both essential and optional;
  • B has the essential data and some of the optional information;
  • C has the essential data only
  • D is missing essential data but can potentially be updated
  • E is missing essential data with no prospect of improvement

How old is it?

Then you look at when the data was last updated, and rank that from 1 to 5 – the actual time bands depend on your needs, but the idea is that 1 has been recently updated (within the last 12 months normally) and 5 means you don’t know how old it is.

The information can easily be generated if you have a timestamp in your database – typically these dates set themselves when the record is created or updated, so zero effort is required on your behalf. All you need to do is report it!

So, you might decide on:

  • 1 is a record updated in the last 12 months;
  • 2 is data updated between 12 and 24 months ago
  • 3 is data updated between 24 and 36 months ago;
  • 4 is older than 36 months;
  • 5 is data you have no idea about!

So an A1 record is complete and recently updated, and an E5 is useless, and you can measure all points in between. This can then generate a simple report that measures the “integrity” of your business data with virtually zero effort if planned at the start.

Typically, you will only be looking to work with data that is C or above, so you are confident that you have all the essential fields that you have defined, and it makes sense to keep it as recent as possible, so you may run a query to extract A1, B1, and C1 records to start making you rphone calls.

Equally you can put the metric on a dashboard. A count of the A1, B1, and C1 records can be used to calculate a percentage of those records in the database as a whole, so you can have a “health measure” of (say) 70%, and keep an eye on that.

Photo by Jan Antonin Kolar on Unsplash

Leave a Reply

Your email address will not be published. Required fields are marked *