March 6, 2014
Coming back from the SAS Industry Analyst Event left me with one big question – Are we taking into account the recommendations or insights provided through analysis and see if they actually produced positive or negative results?
It's a big question for data governance that I'm not hearing discussed around the table. We often emphsize how data is supplied, but how it performs in it's consumed state is fogotten.
When leading business intelligence and analytics teams I always pushed to create reports and analysis that ultimately incented action. What you know should influence behavior and decisions, even if the influence was to say, "Don't change, keep up the good work!" This should be a fundamental function of data govenance. We need to care not only that the data is in the right form factor but also review what the data tells us/or how we interpret the data and did it make us better?
I've talked about the closed-loop from a master data management perspective – what you learn about customers will alter and enrich the customer master. The connection to data governance is pretty clear in this case. However, we shouldn't stop at raw data and master definitions. Our attention needs to include the data business users receive and if it is trusted and accurate. This goes back to the fact that how the business defines data is more than what exists in a database or application. Data is a total, a percentage, an index. This derived data is what the business expects to govern – and if derived data isn't supporting business objectives, that has to be incorporated into the data governance discussion.
Today's machine learning systems are designed to take in new data and continuously optimize insights. The assumption is that more data provides better results. What if we changed this assumption to say, more outcomes provides better results? This changes the game in terms of what we assess and govern. The objective turns to understanding not only if outcomes improve, but also have us look at when outcomes have negative results.
By not incorporating outcome results of using derived data into our data goverance efforts we miss the point of why we govern data. What matters is not that data is used, or shared, or trusted. It matters what the results are by using the data. I remember Marcel Jemio in Financial Management Services (FMS) in the US Department of the Treasury say that the reason to create an open data program is to drive growth in GDP. All his efforts to govern data were guided by achieving this goal. If they only looked at classification, security, quality, lifecycle, and orchestration, FMS could not define, prioritize, or execute data governance policies and processes that met this goal. They have to understand if the insight delivered to the market has an impact. They must continully assess if consumption has produces positive, negative, or neutral results.
While that example is an ambitious undertaking, just the simple step of asking and assessing if a recommendation to change a marketing message, optimize product distribution to stores, or prescribe treatment are producing positive effects, but also looking at the opposite negative affects and putting this into context of how we manage and govern data. In the new data governance paradigm of balancing risk and reward we need to build a closed-loop system that oversees raw data (in-systems), derived data (insights and intelligence), and the results of consumption of each. To simple look at the compliance of data to business rules in data profiling and processing makes data governance a cost center versus a value center.