An Approach To Converge The Worlds of Big Data And BI
Webster dictionary defines a synonym as "a word having the same or nearly the same meaning" or as "a word or expression accepted as another name for something." This is so true for popular definitions of BI and big data. Forrester defines BI as:
A set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making.
While BI has been a thriving market for decades and will continue to flourish for the foreseeable future, the world doesn't stand still and:
- Recognizes a need for more innovation. Some of the approaches in earlier generation BI applications and platforms started to hit a ceiling a few years ago. For example, SQL and SQL-based database management systems (DBMS), while mature, scalable, and robust, are not agile and flexible enough in the modern world where change is the only constant.
- Needs to addresses some of the limitations of earlier generation BI. In order to address some of the limitations of more traditional and established BI technologies, big data offers more agile and flexible alternatives to democratize all data, such as NoSQL, among many others.
Forrester defines big data as:
The practices and technologies that close the gap between the data available and the ability to turn that data into business insight.
But at the end of the day, while new terms are important to emphasize the need to evolve, change, and innovate, what's infinitely more imperative is that both strive to achieve the same goal: transform data into information and insight. Alas, while many developers are beginning to recognize the synergies and overlaps between BI and big data, quite a few still consider and run both in individual silos.
Contrary to some of the market hype, data democratization and big data do not eliminate the need for the "BI 101" basics, such as data governance, data quality, master data management, data modeling, well thought out data architecture, and many others. If anything, big data makes these tasks and processes more challenging because more data is available to more people, which in turn may cause new mistakes and drive wrong conclusions. All of the typical end-to-end steps necessary to transform raw data into insights still have to happen; now they just happen in different places and at different times in the process.
To address this challenge in a "let's have the cake and eat it too" approach, Forrester suggests integrating the worlds of BI and big data in a flexible hub-and-spoke data platform. Our hub-and-spoke BI/Big Data architecture defines such components as
- Hadoop based data hubs/lakes to store and process majority of the enterprise data
- Data discovery accelerators to help profile and discover definitions and meanings in data sources
- Data governance that differentiates the processes you need to perform at the ingest, move, use, and monitor stages
- BI that becomes one of many spokes of the Hadoop based data hub
- A knowledge management portal to front end multiple BI spokes
- Integrated metadata for data lineage and impact analysis
Our research also recommends considering architecting the hub-and-spoke environment around the three following key areas:
- A "cold" layer based on Hadoop where processes my run slower than in DBMS but the total cost of ownership is much lower. This is where the majority of your enterprise data should end up
- A "warm" are based on DBMS where queries run faster, but at a price. Forrester typically sees <30% of enterprise data stored and processed in data warehouses and data marts
- A "hot" area based on in-memory technology for real time low latency interactive data exploration. While this area requires the most expensive software/hardware investments, real time data interactivty produces tangible business benefits.