September 19, 2012
When you last pulled up a chair to this blog we talked about data quality persistence and disposability for big data. The other side of the coin is, should you even do big data quality at all?
So, this blog is dedicated to stepping outside the comfort zone once again and into the world of chaos. Not only may you not want to persist in your data quality transformations, but you may not want to cleanse the data.
Current thinking: Purge poor data from your environment. Put the word “risk” in the same sentence as data quality and watch the hackles go up on data quality professionals. It is like using salt in your coffee instead of sugar. However, the biggest challenge I see many data quality professionals face is getting lost in all the data due to the fact that they need to remove risk to the business caused by bad data. In the world of big data, clearly you are not going to be able to cleanse all that data. A best practice is to identify critical data elements that have the most impact on the business and focus efforts there. Problem solved.
Not so fast. Even scoping the data quality effort may not be the right way to go. The time and effort it takes as well as the accessibility of the data may not meet business needs to get information quickly. The business has decided to take the risk, focusing on direction rather than precision.
Reboot: Don’t worry about bad data. Precision is not always the end game, and the business is balancing risk with reward. Understand the decision process. Decisions are based as much about what the data shows as experience and anecdotal evidence. This trifecta is a balance, and data may be a catalyst or validator, not the only guide. To determine if data cleansing if required, consider time available, deviation of analytic results to perceived or accepted hypothesis, and risk within the context of data use. It may be that data quality really doesn’t matter and the data is good enough.
However, don’t throw away your data quality best practices yet. Data quality measures and indexes created for data governance give you guide posts to build a trust continuum for data that helps determine when and when not to put data quality rules and efforts in place. Continuously profile data sources and the quality of data feeding analysis, not just to correct but to inform on when action is necessary.