Comparative Technical Use Cases for Distributed Caches and NoSQL

I have been doing some NoSQL research lately. The first fruit of that work was a guest post on myNoSQL, Ehcache: Distributed Cache or NoSQL Store, which crisply distinguished between a Distributed Cache and NoSQL Stores.

In this article I am going to delve into the suitability of each for various technical use cases. I use the word “technical” because a usual use case is a business use case. Here we are interested in a set of features that allow a certain usage. In a follow up I hope to create a second, more business use case oriented table.

I welcome feedback on this, particularly from those with production experience.

Technical Use Case Distributed Cache NoSQL Key Value NoSQL Columnar NoSQL Graph NoSQL Document
Database Offload Excellent Poor (1) Poor (1) Poor (1) Poor (1)
Database Replacement Poor (2) Poor (3) Poor (3) Poor (3) Poor (3)
Weak Consistency Caching Excellent Average (2) Average Poor Poor
Eventual Consistency Caching Excellent (4) Average (5) Average (5) Average (5) Average (5)
Strongly Consistent Caching Excellent Poor Poor Poor Poor
ACID Transactional Caching Excellent Poor Poor Poor Poor
Low Latency Data Access Excellent Average (5) Average (5) Average (5) Average (5)
Big Data (6) Poor Excellent Excellent Excellent Excellent
Big Memory (7) Excellent (8) Poor Poor Poor Poor

Notes

  1. To offload the database you need to work in places and ways in which the database works. So for example you need to support transactions if they are being used and you need a place to plug in to avoid a ton of work like Hibernate or OpenJPA. NoSQL stores don’t do that.
  2. Distributed caches may not provide long term persistence and management of data. They are also often limited in size so may not be able to store all of the data.
  3. It is not clear that NoSQL is a full database replacement. The “Not Just SQL” as an alternative expansion of the acronym, something widely accepted by the NoSQL community, acknowledges this. The lack of SQL, the lack of ACID, sophisticated operations tools and so on, mean that NoSQL itself is not great at being a replacement. Rather, if you can rethink your need for a database to needing persistence, and you can change your application code, then it comes into play.
  4. In a node to the elegant CAP trade off allowed by eventual consistency, Ehcache 2.4.1, due out the end of March adds this consistency mode.
  5. Distributed Caches store hot data in process. You might think of memcache as a distributed cache, which it claims to be but it does not store data in -process – it is always over the network. And NoSQL is always over the network. In most R + W > N strategies, R is greater than one, so that multiple network reads are required and the caller must wait for R reads where each read is to a different server which will have a varying response. Distributed Ehcache has latencies of < 1 ms whereas the average for NoSQL is 5-10ms. This is also why NoSQL gets an average for Weak Consistency Caching. A cache should be fast.
  6. “Big Data” is a moving target that is today generally understood to start at a few dozen terabytes and go up into petabytes. The current implementation of Ehcache has been used to manage datasets up to 2 TB which is just at the starting point of Big Data. The whole point of NoSQL is Big Data, so they get full marks in this area.
  7. “Big Memory” is also a moving target and is early on it’s use as a term. We define it to mean using the physical limits of the hardware. For many architectures this has not been possible. With Java the issue was first 32 bits and then now the limitation is garbage collection. We overcame that issue with our BigMemory architecture, using storage in off-heap byte buffers in September 2010.
  8. Caches tends to be memory-resident. BigMemory allows in-memory densities per physical server up to their limits, which is 2TB for the current generation of commodity hardware from Dell, HP and Oracle but much lower due to their architectures which require full CPU population to achieve maximum memory. Although not all vendors are similarly constrained: Cisco UCS boxes allow more memory per CPU, so that for example they can do 384GB with 2 CPUs. NoSQL stores focus on persistency and have small in-memory server side caches. They focus on speeding up disk reads and writes by for example doing append only.

Holy Moly, Batman. Apple upgraded my Maven

I ran a Maven build this morning and it broke. Strange, as I had not changed anything. Maven’s ability to simply break because of a change to a non-versioned dependency or even more arcane, a versioned dependency with a non-versioned dependency of it’s own (a transitive dependency) is legendary.  So I thought that was the trouble. On my other machine I ran a build and the Maven output “looked” different. Stranger still. So I did a mvn version on both.

Whoa! My Big Mac was 2.11 and my MacBook Pro was 3.0.2.

Apache Maven 3.0.2 (r1056850; 2011-01-09 10:58:10+1000)
Java version: 1.6.0_24, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x", version: "10.6.6", arch: "x86_64", family: "mac"

It came down in the “Java for Mac  OS X  10.6 Update 4” which was released on 10 March.

Now a lot of folks are not ready to move to 3. For myself firstly I don’t need the grief until it is rock solid. Secondly, I use the site plugin and reporting extensively and this is all getting a big makeover in 3 which is not yet done.

There are lots of ways to go back to your old version.

My was is to add the old version into the front of my PATH in .bash_profile:

export PATH=/Users/gluck/work/apache-maven-2.2.1/bin:$PATH

 

News on JSR107 (JCACHE) and JSR342 (Java EE 7)

JSR342

JSR342 was created on 14 March 2011. JSR107, or JCACHE, is included: In JSR342’s words:

The following new JSRs will be candidates for inclusion in the Java EE 7 platform:

Concurrency Utilities for Java EE (JSR-236)
JCache (JSR-107)

Isn’t JSR107 inactive?

But how could this happen if JSR107 is inactive?

Well the answer is that we are reactivating it. Oracle (various staff) and Terracotta (mostly me) have started work on the specification with the hope of having a draft spec for review by 20 April. To actually make sure it happens Oracle have allocated resources to work on this project. They are being led by Cameron Purdy, who is co-spec lead along with myself of JSR107 and the founder of Coherence. And of course I founded and still continue to lead Ehcache as CTO of it at Terracotta.

To be officially reactivated we need to submit the draft spec. So reactivation should happen on 20 April.

Motivations for finishing JSR107

Today there are two leading widely scoped frameworks for developing enterprise applications: Spring and Java EE. With the release of Spring 3.1, Spring, heavily influenced by Ehcache Annotations for Spring, has significantly enhanced their caching API. It is easier for Spring because they are a single vendor framework and can do things outside of a standards process. Java EE is still lacking any general purpose caching API. There are some use-specific APIs scattered throughout such as in JPA, but nothing developers can write to. And I know there is a significant need for a general purpose API. So, Java EE 7 wants a general purpose caching API, and this is the primary reason for finishing JSR107.

Another reason is that in-process caching is now heavily commoditised but not standardised. There are more than 20 open source in-process caching projects and another 5 or so commercial distributed caches. But if a user wants to change implementations, they need to change their code. This is akin to database access not having the JDBC standard. So we need to provide a standard API so that users can change caching implementations at low cost.

Scope of JSR107

There has been a bit of discussion about this, but it is most likely that the scope will be as it has been plus two new areas:

  1. Generics – similar to collections, allow caches to be created with defined keys and values
  2. Annotations and integration with JSR342 – allow caching annotations so that for example the return value from any functional method can be cached

The draft specification as it has existed for a few years is available under a net.sf.jsr107cache package name on SourceForge. And Ehcache provides an implementation of that draft spec via ehcache-jcache.

Get Involved

It seems a good time to look at the expert group and make a general invitation for new members.

If you are interested and would like to spend some time on this, please email me at gluck At gregluck.com and I can explain how to start. Additions to membership of the JSR107 are by voting of the existing members.

Ehcache: Distributed Cache or NoSQL Store?

Is Ehcache a NoSQL store? No, I would not characterise it as that, but I have seen it used for some NoSQL use cases. In these situations it compared very well — with higher performance and more flexible consistency than the well-known NoSQL stores. Let me explain.

Read more