Comparative Technical Use Cases for Distributed Caches and NoSQL

I have been doing some NoSQL research lately. The first fruit of that work was a guest post on myNoSQL, Ehcache: Distributed Cache or NoSQL Store, which crisply distinguished between a Distributed Cache and NoSQL Stores.

In this article I am going to delve into the suitability of each for various technical use cases. I use the word “technical” because a usual use case is a business use case. Here we are interested in a set of features that allow a certain usage. In a follow up I hope to create a second, more business use case oriented table.

I welcome feedback on this, particularly from those with production experience.

Technical Use Case Distributed Cache NoSQL Key Value NoSQL Columnar NoSQL Graph NoSQL Document
Database Offload Excellent Poor (1) Poor (1) Poor (1) Poor (1)
Database Replacement Poor (2) Poor (3) Poor (3) Poor (3) Poor (3)
Weak Consistency Caching Excellent Average (2) Average Poor Poor
Eventual Consistency Caching Excellent (4) Average (5) Average (5) Average (5) Average (5)
Strongly Consistent Caching Excellent Poor Poor Poor Poor
ACID Transactional Caching Excellent Poor Poor Poor Poor
Low Latency Data Access Excellent Average (5) Average (5) Average (5) Average (5)
Big Data (6) Poor Excellent Excellent Excellent Excellent
Big Memory (7) Excellent (8) Poor Poor Poor Poor

Notes

  1. To offload the database you need to work in places and ways in which the database works. So for example you need to support transactions if they are being used and you need a place to plug in to avoid a ton of work like Hibernate or OpenJPA. NoSQL stores don’t do that.
  2. Distributed caches may not provide long term persistence and management of data. They are also often limited in size so may not be able to store all of the data.
  3. It is not clear that NoSQL is a full database replacement. The “Not Just SQL” as an alternative expansion of the acronym, something widely accepted by the NoSQL community, acknowledges this. The lack of SQL, the lack of ACID, sophisticated operations tools and so on, mean that NoSQL itself is not great at being a replacement. Rather, if you can rethink your need for a database to needing persistence, and you can change your application code, then it comes into play.
  4. In a node to the elegant CAP trade off allowed by eventual consistency, Ehcache 2.4.1, due out the end of March adds this consistency mode.
  5. Distributed Caches store hot data in process. You might think of memcache as a distributed cache, which it claims to be but it does not store data in -process – it is always over the network. And NoSQL is always over the network. In most R + W > N strategies, R is greater than one, so that multiple network reads are required and the caller must wait for R reads where each read is to a different server which will have a varying response. Distributed Ehcache has latencies of < 1 ms whereas the average for NoSQL is 5-10ms. This is also why NoSQL gets an average for Weak Consistency Caching. A cache should be fast.
  6. “Big Data” is a moving target that is today generally understood to start at a few dozen terabytes and go up into petabytes. The current implementation of Ehcache has been used to manage datasets up to 2 TB which is just at the starting point of Big Data. The whole point of NoSQL is Big Data, so they get full marks in this area.
  7. “Big Memory” is also a moving target and is early on it’s use as a term. We define it to mean using the physical limits of the hardware. For many architectures this has not been possible. With Java the issue was first 32 bits and then now the limitation is garbage collection. We overcame that issue with our BigMemory architecture, using storage in off-heap byte buffers in September 2010.
  8. Caches tends to be memory-resident. BigMemory allows in-memory densities per physical server up to their limits, which is 2TB for the current generation of commodity hardware from Dell, HP and Oracle but much lower due to their architectures which require full CPU population to achieve maximum memory. Although not all vendors are similarly constrained: Cisco UCS boxes allow more memory per CPU, so that for example they can do 384GB with 2 CPUs. NoSQL stores focus on persistency and have small in-memory server side caches. They focus on speeding up disk reads and writes by for example doing append only.