First there was Hadoop. Then there were data scientists. Then came Agile BI on big data. Drum roll, please . . . bum, bum, bum, bum . . .
Now we have data preparation!
If you are as passionate about data quality and governance and I am, then the 5+-year wait for a scalable capability to take on data trust is amazingly validating. The era for "good enough" when it comes to big data is giving way to an understanding that the way analysts have gotten away with "good enough" was through a significant amount of manual data wrangling. As an analyst, it must have felt like your parents saying you can't see your friends and play outside until you cleaned your room (and if it's anything like my kids' rooms, that's a tall order).
There is no denying that analysts are the first to benefit from data preparation tools such as Altyrex, Paxata, and Trifacta. It's a matter of time to value for insight. What is still unrecognized in the broader data management and governance strategy is that these early forays are laying the foundation for data citizenry and the cultural shift toward a truly data-driven organization.
Today's data reality is that consumers of data are like any other consumers; they want to shop for what they need. This data consumer journey begins by looking in their own spreadsheets, databases, and warehouses. When they can't find what they want there, data consumers turn to external sources such as partners, third parties, and the Web. Their tool to define the value of data, and ultimately if they will procure it and possibly pay for it, is what data preparation tools help with. The other outcome of this data-shopping experience is that they are taking on the risk and accountability for the value of the data as it is introduced into analysis, decision-making, and automation.
Think about it — those centralized data governance committees of data stewards and custodians can only do so much. Their goal is to drive broader cultural accountability for data. However, traditional data practices emphasized governing data to maintain systems of record and define authoritative sources. What data preparation tools are doing is shifting the data governance model to transition data stewardship and custodianship broadly to data consumers, who will shape and govern data toward end-point interaction — business processes, machine processes, mobile consumption, etc.
What this means for data governance and management teams is that they not only get support from their broader data citizens but they also have a window into the wider consumption and use of data. Data pros can translate this into more scalable and agile capabilities and controls without lengthy requirements gathering. They can even use data preparation tools to prototype and validate new data marts; data sets and models for big data; and virtualized data for analysis, applications, and APIs.
Data preparation is that missing capability that is easily addressing analyst needs, but it will take on greater importance to scale not only governance but the value that data provides to the business. Without it, data strategy is stuck in the warehouse. With data preparation, data will drive insights into action at scale.