JameskobielusBy James Kobielus

Databases are evolving faster than ever. Long regarded as an essential but slightly boring centerpiece of the enterprise information management infrastructure, the database is becoming more fluid and adaptive in architecture in order to keep pace with an online world that’s becoming virtualized at every level.

In many ways, the database as we know it is disappearing into a virtualization fabric of its own. In this emerging paradigm, data will not physically reside anywhere in particular. Instead, it will be transparently persisted, in a growing range of physical and logical formats, to an abstract, seamless grid of interconnected memory and disk resources; and delivered with subsecond delay to consuming applications. Forrester’s ongoing research into the growing market for in-memory distributed-caching middleware shows that this trend is accelerating.

As database revolutions pick up speed, information and knowledge management (I&KM) professionals are likely to get a bit dizzy trying to keep their perspective and sort through competing approaches. When and where should you implement in-memory vs. on-disk data-persistence approaches? When should you go with row-based vs. column-oriented vs. inverted indexing vs. other physical storage models? When does it make sense to implement any of competing vendor-specific OLAP variants–old and new — for logical modeling (MOLAP, ROLAP, HOLAP, DOLAP, D’oh!LAP, SchmoLAP, etc.)? When should you federate your databases behind an on-demand semantic virtualization middleware layer vs. consolidate it all in an enterprise data warehouse? When should you buy into one vendor’s analytic-database religion (be it columnar or whatever?) and when should you remain strictly storage-layer-agnostic?

One of the chief trends driving database virtualization is users’ need for more robust middleware fabric in support of real-time BI. In order to ensure guaranteed subsecond latency for BI, the infrastructure must incorporate a policy-driven, latency-agile, distributed-caching memory grid from end-to-end. However, the convergence of real-time business-intelligence approaches onto a unified, in-memory, distributed-caching infrastructure may take more than a decade to come to fruition because of the immaturity of the technology; lack of multivendor standards; and spotty, fragmented implementation of its enabling technologies among today’s BI/DW vendors.

Nevertheless, all signs point to this trend’s inevitability–most notably, Microsoft’s recent announcement that it is developing its own information fabric platform, codenamed "Project Velocity," to beef up its real-time analytic and transactional computing capabilities. Bear in mind that no BI/DW vendor has clearly spelled out its approach for supporting the full range of physical and logical data-persistence models across its real-time information fabrics. But it’s quite clear that the industry is moving toward a new paradigm wherein the optimal data-persistence model will be provisioned automatically to each node based on its deployment role (including EDW, ODS, staging, data mart)– and in which data will be written to whatever blend of virtualized memory and disk best suits applications’ real-time requirements.

It would be ridiculous to imagine this evolution will take place overnight. Even if solution vendors suddenly converged on a common information-fabric framework — which is highly doubtful — I&KM managers have too much invested in their enterprises’ current data environments to justify migrating them to a virtualized architecture overnight.