February 20, 2010
The NoSQL Movement Is Gaining Momentum, But What The Heck Is It?
The NoSQL movement is a combination of an architectural approach for storing data and software products (such as Tokyo Cabinet, CouchDb, Redis) that can store data without using SQL. Thus the term NoSQL.
The idea is pretty simple: Not all applications need a traditional relational database management system (RDBMS) that uses SQL to perform operations on data. Rather, data can be stored and retrieved using a single key. The NoSQL products that store data using keys are called Key-Value stores (aka KV stores).
Because these KV stores are not relational and lack SQL they may be faster than RDBMS's because they don't have to maintain indexes, relationships constraints,and parse SQL. The downside of NoSQL is that you cannot easily perform queries against related data.
Bravo To the NoSQL Approach
As an analyst who focuses on helping clients achieve massive scale and blazing fast performance, I will be one of the first ones to endorse this approach for many Web applications because:
- Scaling is easier. When data is not directly related to any other data you can store it anywhere. That means that you can handle more data by adding additional nodes.
- The engines are faster. There is less overhead because the KV store does not have to parse SQL or maintain multiple indexes to support relationships. Often a hashing algorithm can be used to retrieve data instead of a more expensive B-tree type algorithm.
- It is easier to change data structures. Need to add a field? No biggy.Many of these NoSQL products store data as blobs. If your data is stored as xml you may only need to add an attribute or tag rather than thinking about the impact of adding a field to a table in your database.
Many Web applications simply don't need to represent data as a set of related tables. Rather, data can be represented as an object graph or byte stream identified by a single key. For example, a user profile can be represented as an object graph (such as pojo) with a single key being the user id. Another example: documents or media files can be stored with a single key with indexing of meta data handling by a separate search engine.
Elastic Caching Platforms Are KV Stores On Steriods
Elastic caching platforms such as IBM eXtremeScale, Gigaspaces, Terracotta, Microsoft Velocity, Hazelcast, NCache, and Infinispan are essentially in-memory KV stores that provide most of the benefits of NoSQL KV Stores but add the following features:
- Lower latency. These platforms store data in-memory. This significantly reduces the latency of data operations. In-memory storage is a downside though if you need to persist objects over time or have large objects such as video or documents.
- Reliability. Distributed caching platforms employ clever data replication algorithms that store the data on multiple nodes. If one of the nodes goes down, the platform will serve the data from a backup node.
- Scale-out. Most of the elastic caching platforms let you add and remove nodes during operations. The platforms use sophisticated algorithms to re-balance the data to optimize the use of all the nodes in the grid.
- Code execution. Some, but not all, of the platforms also let developers distribute the execution of code across the grid. Using distributed code execution, developers can distribute the workload to where the data resides rather than moving the data to the application.
NoSQL Wants To Be Elastic Caching When It Grows Up
Platforms that often get labeled as NoSQL such as Apache Cassandra are closer to elastic caching platforms because they add many of the features of elastic caching technologies. Ultimately, the real difference between NoSQL and elastic caching now may be in-memory versus persistent storage on disk.
Because both are KV stores I predict the following:
- Elastic caching will offer optimization for persistent data stores. Elastic caching platforms will include better features for customers who want the benefits of reliability through replication and scale-out but do not need the low latency of in-memory stores. Most products can read/write data from databases but needed to go through the in-memory cach first. For example, persist features would be a good approach for large objects such as media files or documents.
- Many of the NoSQL platforms will grow up. Platforms generally associated with NoSQL will evolve to gain the reliability through replication and automatic scale-out features. Some will just remain superbad KV stores honed for the single purpose of single repository KV store.
- Query and search across data in the KV stores will be the next big feature. Huh? I thought NoSQL and KV stores were for apps that didn't need much query and search. Where ever there is data stored, someone will want to query or search it. But, it is hard on data that is distributed. Try do a simple aggregate like counting the number of objects that meet a certain criteria. It is hard.
- Code execution is next. As I mentioned above, many of the elastic caching platforms also offer distributed code execution. This lets developers run object code near to where the data is stored. Clever developers can implement map/reduce-like application to process large workloads without moving data around. Or, they can host services on the nodes.
Whoa. Did I just defined the characteristics of a database: persistent storage, query, and stored-procedures (code execution)? Back to the future?
Say "Yes" To Elastic KV Stores In Your Architecture
Enterprise application developers and architects should include elastic KV stores in their architectures because:
- Achieve savings by reducing usurious RDMS licenses and maintainance.
- Add scaling layer in-front of your databases or other data sources.
- Improve performance of Web applications that store session and shared application data.
- Elastic caching and cloud computing are a match made in heaven for app scaling in the cloud.
John Rymer and I plan to publish research on Elastic Caching Platforms (including a Wave) during the first part of Q2 2010. Look for it. If you have a NoSQL or elastic caching success story we would love to hear from you.
Coverage: Blazing-fast and massively scalable Web and application architectures, development, and user experience design