The Theory of Data Trust Relativity
Since the dawn of big data data quality and data governance professionals are yelling on rooftops about the impact of dirty data. Data scientists are equally yelling back that good enough data is the new reality. Data trust at has turned relative.
Consider these data points from recent Forrester Business Technographics Survey on Data and Analytics and our Online Global Survey on Data Quality and Trust:
- Nearly 9 out of 10 data professionals rate data quality as a very important or important aspect of information governance
- 43% of business and technology management professionals are somewhat confident in their data, and 25% are concerned
The overall message is clear that data quality still matters quite a bit and there is still work to do.
So, why the philisophical debate on data quality and governance? There was anecdotal evidence that different data consumption scenarios require different data quality standards. The most evident was the trade-off between speed of insight and the quality of data for big data analytics. Being of analytic mind, anecdotal evidence wasn't good enough. Let's measure this.
This fall we reached out to data professionals and business stakeholders to get a sense of how they vet their data for four data consumption scenarios: raw data in systems, data used within business applications, data used for business intelligence, and data used for predictive and advanced analytics. The results were striking – in rank order, here is how 209 respondents determine they can trust the data.
- Raw data: #1 data quality reports, #2 personal experience and knowledge, #3 data collection process
- Data within buisness applications: #1 footnotes and annotations, #2 data quality reports, #3 personal experience and knowledge
- Data within business intelligence: #1 trustworthiness of data provider, #2 data quality reports, #3 trustworthiness of the data source
- Data used for predictive and advanced analysis: #1 Executive and manager's trust of the data, #2 trustworthiness of the data provider, #2 Trustworthiness of the data source
The first take-away is that data governance is having an affect on data use by establishing data quality reports to guide data trust. However, there is a noticeable divide for big data analytics and the data scientist who rely on tribal input and not evidence. If we take data quality's impact on the results and risk of using dirty data for decision making off the table for a minute (stay with me now!) how does this affect data trust?
Our survey brought in a small number of executive level business professionals. The number is too low to be quantitative, but it does give directional insight. We asked participants how much of their time is spent vetting and validating data before they use it.
- Overall, 42% spend more than 40% of their time vetting and validating the data.
- For executives, 70% spend more than 40% of their time vetting and validating data
This indicates that the burden is shifted to executives to make the call on the validity of the results they receive to make strategic decisions. The shift in practice to speed over quality may not speed up decision making if executives are questioning results and pushing back on analysts to verify and iterate their analysis. Where data management, business application environments and business intelligence environments have transparency in data quality, this is lacking for executives. The other issue is that there is potentially a high degree of bias in the acceptance of results.
There certainly is a benefit to unfettered exploration and discovery and too much data governance not only slows things down but may hide the golden insight that has a big impact on the business. However, organizations and data governance teams need to at a minimum consider data policies that help business decision makers and data consumers navigate more easily between the speed of business and the risk good enough data has not only on business outcomes but the efficiency of putting insights to action faster.