November 18, 2011
As one of the industry-renowned data visualization experts Edward Tufte once said, “The world is complex, dynamic, multidimensional; the paper is static, flat. How are we to represent the rich visual world of experience and measurement on mere flatland?” There’s indeed just too much information out there to be effectively analyzed by all categories of knowledge workers. More often than not, traditional tabular row-and-column reports do not paint the whole picture or — even worse — can lead an analyst to a wrong conclusion. There are multiple reasons to use data visualization; the three main ones are that one:
- Cannot see a pattern without data visualization. Simply seeing numbers on a grid often does not tell the whole story; in the worst case, it can even lead one to a wrong conclusion. This is best demonstrated by Anscombe’s quartet, where four seemingly similar groups of x and y coordinates reveal very different patterns when represented in a graph.
- Cannot fit all of the necessary data points onto a single screen. Even with the smallest reasonably readable font, single line spacing, and no grid, one cannot realistically fit more than a few thousand data points using numerical information only. When using advanced data visualization techniques, one can fit tens of thousands data points onto a single screen — a difference of an order of magnitude. In The Visual Display of Quantitative Information, Edward Tufte gives an example of more than 21,000 data points effectively displayed on a US map that fits onto a single screen.
- Cannot effectively show deep and broad data sets on a single screen. While fitting billions of rows of data onto a single screen can be challenging, that challenge has been mostly solved using various data aggregation and grouping techniques. But fitting and analyzing hundreds, and often thousands, of columns is an entirely different challenge. Just imagine a typical drug trial conducted by the pharmaceutical industry, where each patient undergoing trials has thousands of attributes: physical, psychological, genetic, behavioral, etc. Analysts looking for patterns, dependencies, and correlations typically need to run the data through complex statistical models before they can find a pattern or correlation. But building such models and running them through the millions of rows of data can be time-consuming and can tax even the most advanced software and hardware resources. But in a technique often used in the pharma industry, reducing each data point in a column to a single pixel and color-coding the pixels according to their value ranges can allow an analyst to visualize and identify a pattern relatively easily and then quickly zoom in to research the details.
What’s different between traditional static graphs and charts and modern advanced data visualization (ADV)?
Many corporations have used traditional business graphics like bar charts and pie charts effectively in the past, and those graphics will continue to have their place. At the next level, modern technologies have enabled the use of more dynamic and interactive business graphics, such as real-time dashboards and charts that update automatically as the data changes. Now, through ADV, the potential exists for nontraditional and more visually rich approaches, especially in regard to more complex (i.e., thousands of dimensions or attributes) or larger (i.e., billions of rows) data sets, to reveal insights not possible through conventional means. Forrester differentiates ADV from static graphs and charts along six capabilities, as follows:
- Dynamic data content. These visualizations are linked to data sets (databases) and are updated as the data set changes. Static visualizations produced in most Office documents typically do not have such functionality.
- Visual querying. This is the ability to query and requery data simply by manipulating visual portions of graphs and charts (like clicking on a column to drill into details) or by using visual instrumentation (like dropdown lists, push buttons, and tabs).
- Multiple linked visualization. A typical single chart or graph cannot display more than a few dimensions (attributes such as region and time) at a time. To visualize and analyze data by multiple dimensions or attributes, one typically needs to display several graphs, charts, or panels and have them dynamically linked. Navigating through a dimension in one panel automatically updates all visualizations on all other panels.
- Animation. If a particular dimension, such as time, has hundreds or thousands of values (as in daily values over multiple years), manually clicking through every day is not practical. Launching an automated, animated scroll up and down such a dimension is a more practical approach.
- Personalization. What is intuitive and obvious to one analyst may not be obvious to another. Also, to address privacy and risk issues, many organizations have different levels of access to data for different user groups and individual users. ADV tools must be automatically personalized based on users’ access and authorization levels, locality, and personal preferences.
- Actionable alerts. Even data visualizations often cannot lead an analyst to a conclusion if there’s just too much information to comprehend on a single screen. In an example cited earlier, with more than 21,000 data points on a single screen, one cannot be expected to identify a certain condition unless the software can generate visual alerts (like color-coding or flashing). Also, if one is not looking at the visualization when a certain condition is triggered, an ADV application can be programmed to automatically notify the appropriate person with an email or a text message.
I highly recommend books and research by Perceptual Edge for much more detail on this topic.