July 31, 2013
Sometimes getting the data quality right is just hard, if not impossible. Even after implementing data quality tools, acquiring third-party data feeds, and implementing data steward remediation processes, often the business is still not satisfied with the quality of the data. Data is still missing and considered old or irrelevant. For example: Insurance companies want access to construction data to improve catastrophe modeling. Food chains need to incorporate drop-off bays and instructions for outlets in shopping malls and plazas to get food supplies to the prep tables. Global companies need to validate address information in developing countries that have incomplete or fast-changing postal directories for logistics. What it takes to complete the data and improve it has now entered the realm of hands-on processes.
Crowdflower says they have the answer to the data challenges listed above. It has a model of combining a crowdsourcing model and data stewardship platform to manage the last mile in data quality. The crowd is a vast network of people around the globe that are notified of data quality tasks through a data stewardship platform. If they can help with the data quality need within the time period requester, the contributor accepts the task and get to work. The crowd can use all resources and channels available to them to complete tasks such as web searches, visits, and phone inquiries. Quality control is performed to validate crowdsourced data and improvements. If an organization has more data quality tasks, machine learning is applied to analyze and optimize crowd sourcing based on the scores and results of contributors.
Organizations also have the option to use the platform only and apply crowdsourcing strategies within the organization. This becomes a data governance platform to manage remediation workflow and proactively address data quality issues by calling out to subject matter experts that can quickly contribute.
What I find interesting is that Crowdflower is changing the game in a way that doesn't just address data quality problems but gets to the heart of a sustainable and impactful data quality program.
- Data citizenship is achievable. The platform connects business users anywhere in the organization to a platform where tasks can be requested and acted on by internal or external contributors that are experienced to resolve. Rather than data governance, a data community is developed promoting widespread responsibility.
- Infusing agility into data quality. Data quality tasks are input and acted on when needed without ramping up a data quality project to change integration and data quality processes.
- Priority is linked to business impact up front. As business data users and owners identify data quality issues, the tasks they request are determined by how these issues affect a business process or outcome. Rather than having to think about what matters, data quality tasks just happen.
- Data quality defined by the business. What data quality dimensions matter are addressed rather than addressing everything. Tasks can align to validating and classifying insight down to data hygiene. Trust can be based on context and relevence, not just data quality rules.
There are still some outstanding questions to ensure trust of data coming in from crowdsourced tasks. Social-style data services and portals that have been launched in the past have found difficulty sustaining high-quality data over the long term. There is also the issue of adhering to privacy and security policies in the sourcing of data and fixing of data. Crowdflower feels its crowdsourcing process and platform have the right quality controls in place to assuage these concerns, but organizations should see how this aligns with their internal policies and requirements.
Overall, the data quality challenge is as much about what rules to apply as it is the process and program put in place to be successful. Platforms that support wider data governance while tackling complex data quality issues will become more important as big data and analytics moved from reports to operational processes. Agility, managing data quality to the edge, and what to focus on become more important. Crowdflower begins to address these critical factors and is worth taking a look at.