I will be joining Hazelcast as CTO

I am very excited to announce that I will be joining the world-class team at Hazelcast as CTO.

Hazelcast (www.hazelcast.com) develops, distributes and supports the leading open source in-memory data grid. The product, also called Hazelcast, is a free open source download under the Apache license that any developer can include in minutes to enable them to build elegantly simple mission-critical, transactional, and terascale in-memory applications. The company provides commercially licensed Enterprise extensions, Hazelcast Management Console and professional open source training, development support and deployment support. The company is privately held and headquartered in Palo Alto, California.

What this means for Hazelcast Users

I will join my efforts to those of Hazelcast CEO Talip Ozturk and the team and bring my deep knowledge of caching to Hazlecast to complement that of the team. I will be out and about at conferences and visiting customers.

With the team we will be figuring out what great features to add to Hazelcast and how to improve the leading open source In-Memory Data Grid.

We will also be bring to market Enterprise Extensions which add high value features based around the Hazelcast core.

Hazelcast has made an announcement that puts this move into their own words.

What this means for Terracotta BigMemory Users

We will develop a comparative caching/operational store project and product based on Hazelcast core.   This will then be an alternative for BigMemory users.

What this means for Ehcache Users

The ownership of Ehcache was transferred to Terracotta 4 and a half years ago when I joined them who then took over maintenance of it.

While Ehcache remains widely used today, the open source version is only suitable for single node caching. This is not that useful for most production contexts so it is not directly competitive with Hazelcast or for that matter In-Memory Data Grids, which deal with clusters of computers.

I expect Ehcache will implement JCache and that in the future those ISVs and open source frameworks which currently define Ehcache as their caching layer will instead define it using JCache, of which Ehcache will be one provider.

Hazelcast is already developing their JCache implementation, which is already up on GitHub.

What this means for JCache

JCache is the new standard for caches and IMDGs. It includes a key-value API suitable for caches and operational stores. Importantly it was designed primarily for IMDGs. Listeners, loaders, writers and other user-defined classes are expected to be executed somewhere in the cluster, not in process.  And the spec defines and single and batch EntryProcessors, the defining feature of IMDG, which enables in-situ computation.

I will continue to act as spec lead on new maintenance releases of JCache. And I will also work with the Java EE 8 expert group who are including JCache in Java EE8. And I will be working with open source frameworks and ISVs as they move to add a JCache layer to their architectures.

Hazelcast will be one of the first to market with an implementation of JCache which should be available in a production-ready implementation in February.

As it is Apache 2 open source, I encourage open source frameworks and ISVs to include Hazelcast in their distributions as they add JCache. That way they can ship with an out of the box IMDG built in without locking themselves or their users/customers into a single vendor.

Introducing Deliberate Caching

A few weeks ago I attended a ThoughtWorks Technology Radar seminar. I worked at ThoughtWorks for years and think if anyone knows what is trending up and down in software development these guys do. At number 17 in Techniques with a rising arrow is what they called Thoughtful Caching. At drinks with Scott Shaw, I asked him what it meant.

What the trend is about is the movement from reactive caching to a new style. By reactive I mean you find out your system doesn’t perform or scale after you build it and it is already in production. Lots of Ehcache users come to it that way. This is a trend I am very happy to see.

Deliberate Caching

The new technique is:

  • proactive
  • planned
  • implemented before the system goes live
  • deliberate
  • is more than turning on caching in your framework and hoping for the best – this is the Thoughtful part
  • uses an understanding of the load characteristics and data access patterns
We kicked around a few names for this and came up with Deliberate Caching to sum all of this up.
The work we are doing standardising Caching for Java and JVM based languages, JSR107, will only aid with this transition. It will be included in Java EE 7 which even for those who have lost interest in following EE specifically will still send a signal that this is an architectural decision which should be made deliberately.

Why it has taken this long?

So, why has it taken until 10 years after Ehcache and Memcache and plenty of others came along for this “new” trend to emerge?  I think there are a few reasons.

Some people think caching is dirty

I have met plenty of developers who think that caching is dirty. And caching is cheating. They think it indicates some architectural design failure that is best of being solved some other way.
One of the causes of this is that many early and open source caches (including Ehcache) placed limits on the data safety that could be achieved. So the usual situation is that the data in the cache might but was not sure to be correct. Complicated discussions with Business Analysts were required to find out whether this was acceptable and how stale data was allowed to be. This has been overcome by the emergence of enterprise caches, such as Enterprise Ehcache, so named because they are feature rich and contain extensive data safety options, including in Ehcache’s case: weak consistency, eventual consistency, strong consistency, explicitly locking, Local and XA transactions and atomic operations.  So you can use caching even in situations where the data has to be right.

Following the lead of giant dotcom

The other thing that has happened is that as giant dotcoms it cannot have escaped anyone’s notice that they all use tons of caching. And that they won’t work if the caching layer is down. So much so that if you are building a big dot com app it is clear that you need to build a caching layer in.

Early Performance Optimisation is seen as an anti -pattern

Under Agile we focus on the simplest thing that can possibly work. Requirements are expected to keep changing. Any punts you take on future requirements may turn out to be wrong and your effort wasted. You only add things once it is clear they are needed. Performance and scalability tend to get done this way as well. Following this model you find out about the requirement after you put the app in production and it fails. This same way of thinking causes monolithic systems with single data stores to be built which later turn out to need expensive re-architecting.

I think we need to look at this as Capacity Planning. If we get estimated numbers at the start of the project for number of users, required response times, data volumes, access patterns etc then we can capacity plan the architecture as well as the hardware. And in that architecture planning we can plan to use caching. Because caching affects how the system is architected and what the hardware requirements are, it makes sense to do it then.



javax.cache: The new Java Caching Standard

This post explores the new Java caching standard: javax.cache.

How it Fits into the Java Ecosystem

This standard is being developed by JSR107, of which the author is co-spec lead. JSR107 is included in Java EE 7, being developed by JSR342. Java EE 7 is due to be finalised at the end of 2012. But in the meantime javax.cache will work in Java SE 6 and higher and Java EE 6 environments as well aswith Spring and other popular environments.

JSR107 has draft status. We are currently at release 0.3 of the API, the reference implementation and the TCK. The code samples in this article work against this version.


Vendors who are either active members of the expert group or have expressed interest in implementing the specification are:

  • Terracotta – Ehcache
  • Oracle – Coherence
  • JBoss – Infinispan
  • IBM – ExtemeScale
  • SpringSource – Gemfire
  • GridGain
  • TMax
  • Google App Engine Java

Terracotta will be releasing a module for Ehcache to coincide with the final draft and then updating that if required for the final version.


From a design point of view, the basic concepts are a CacheManager that holds and controls a collection of Caches. Caches have entries. The basic API can be thought of map-­like with the following additional features:

  • atomic operations, similar to java.util.ConcurrentMap
  • read-through caching
  • write-through caching
  • cache event listeners
  • statistics
  • transactions including all isolation levels
  • caching annotations
  • generic caches which hold a defined key and value type
  • definition of storage by reference (applicable to on heap caches only) and storage by value

Optional Features

Rather than split the specification into a number of editions targeted at different user constituencies such as Java SE and Spring/EE, we have taken a different approach.

Firstly, for Java SE style caching there are no dependencies. And for Spring/EE where you might want to use annotations and/or transactions, the dependencies will be satisfied by those frameworks.

Secondly we have a capabilities API via ServiceProvider.isSupported(OptionalFeature feature)so that you can determine at runtime what the capabilities of the implementation are.  Optional features are:

  • storeByReference – storeByValue is the default
  • transactional
  • annotations

This makes it possible for an implementation to support the specification without necessarily supporting all the features, and allows end users and frameworks to discover what the features are so they can dynamically configure appropriate usage.

Good for Standalone and Distributed Caching

While the specification does not mandate a particular distributed cache topology it is cognizant that caches may well be distributed. We have one API that covers both usages but it is sensitive to distributed concerns. For example CacheEntryListener has a NotificationScope of events it listens for so that events can be restricted to local delivery. We do not have high network cost map-like methods such as keySet() and values(). And we generally prefer zero or low cost return types. So while Map has V put(K key, V value) javax.cache.Cache has void put(K key, V value).


Caches contain data shared by multiple threads which may themselves be running in different container applications or OSGi bundles within one JVM and might be distributed across multiple JVMs in a cluster. This makes classloading tricky.

We have addressed this problem. When a CacheManager is created a classloader may be specified. If none is specified the implementation provides a default. Either way object de-serialization will use the CacheManager’s classloader.

This is a big improvement over the approach taken by caches like Ehcache that use a fall-back approach. First the thread’s context classloader is used and it that fails, another classloader is tried. This can be made to work in most scenarios but is a bit hit and miss and varies considerably by implementation.

Getting the Code

The spec is in Maven central. The Maven snippet is:

A Cook’s Tour of the API

Creating a CacheManager

We support the Java 6 java.util.ServiceLoader creational approach. It will automaticaly detect a cache implementation in your classpath. You then create a CacheManager with:

which returns a singleton CacheManager called “__default__”. Subsequent calls return the same CacheManager.

CacheManagers can have names and classloaders configured in. e.g.

Implementations may also support direct creation with new for maximum flexibility:

Or to do the same thing without adding a compile time dependency on any particular implementation:

Creating a Cache

The API supports programmatic creation of caches. This complements the usual convention of configuring caches declaratively which is left to each vendor.

To programmatically configure a cache named “testCache” which is set for read-through

Getting a reference to a Cache

You get caches from the CacheManager. To get a cache called “testCache”:

Cache<Integer, Date> cache = cacheManager.getCache(“testCache”);

Basic Cache Operations

To put to a cache:

Cache<Integer, Date> cache = cacheManager.getCache(cacheName);

Date value1 = new Date();

Integer key = 1;

cache.put(key, value1);


To get from a cache:


To remove from a cache:


JSR107 introduces a standardised set of caching annotations, which do method level caching interception on annotated classes running in dependency injection containers. Caching annotations are becoming increasingly popular, starting with Ehcache Annotations for Spring, which then influenced Spring 3’s caching annotations.

The JSR107 annotations cover the most common cache operations including:

  • @CacheResult – use the cache
  • @CachePut – put into the cache
  • @CacheRemoveEntry – remove a single entry from the cache
  • @CacheRemoveAll – remove all entries from the cache

When the required cache name, key and value can be inputed they are not required. See the JavaDoc for the details. To allow greater control, you can specify all these and more. In the following example, the cacheName attribute is specified to be “domainCache”, index is specified as the key and domain as the value.

The reference implementation includes an implementation for both Spring and CDI. CDI is the standardised container driven injection introduced in Java EE 6. The implementation is nicely modularised for reuse, uses an Apache license, and we therefore expect several open source caches to reuse them. While we have not done an implementation for Guice, this could be easily done.

Annotation Example

This example shows how to use annotations to keep a cache in sync with an underlying data structure, in this case a Blog manager, and also how to use the cache to speed up responses, done with @CacheResult

Wiring Up Spring

For Spring the key is the following config line, which adds the caching annotation interceptors into the Spring context:

A full example  is:

Spring has it’s own caching annotations based on earlier work from JSR107 contributor Eric Dalquist. Those annotations and JSR107 will happily co-exist.

Wiring Up CDI

First create an implementation of javax.cache.annotation.BeanProvider and then tell CDI where to find it  declaring a resource named javax.cache.annotation.BeanProvider in the classpath at /META-INF/services/.

For an example using the Weld implementation of CDI, see the CdiBeanProvider in our CDI test harness.

Further Reading

For further reading visit the JSRs home page at https://github.com/jsr107/jsr107spec.

Terracotta acquired by Software AG

As you probably have heard, Terracotta has been acquired by Software AG. This is an exciting development for both companies. Ari Zilka, CTO of Terracotta has a comprehensive blog post detailing the acquisition and its implications.

For me, it means I keep working for Terracotta, but now Terracotta is a wholly owned business unit within Software AG.

Ehcache will remain available in its current two editions: open source under the Apache 2 license and commercial with value-add features. And of course it will get even more investment as part of the larger organization.

I joined Terracotta 21 months ago. It has been an amazing ride so far for me and for Ehcache. I am looking forward to the next chapter. Right now my area of focus is on standardizing Java caching by leading the specification of JSR107. Once that is done we will implement the specification in Ehcache.

JSR107 (Java Caching API) Update – Lots Happening

I have been very busy the last few months getting JSR107 fired up.

Just to remind you JSR 107 is the Java Temporary Caching API. It is designed to be vendor neutral and will allow for easy change over of implementations in much the same way as JPA or JDBC. In this way it will allow the community to choose the best open source or commercial implementation that best meets their business requirements.

Both Terracotta and Oracle have committed resources to getting this done. Right now that is myself and Yannis Cosmadopoulos from Oracle. Things are moving fast so I thought I would give a status update on where we are at.

JSR107 Early Draft Working Specification Available for Viewing

Work has been going steadily on the spec.

We are drafting this spec in the open on Google Docs. While a work in progress, it is now around 40 pages in length.

See https://docs.google.com/document/d/1YZ-lrH6nW871Vd9Z34Og_EqbX_kxxJi55UrSn4yL2Ak/edit?hl=en

Please keep checking back as this is changing on a daily basis.

We welcome ideas and feedback. Please join the JSR107 public mailing list to do so.

Summary of Scope

The specification covers the following areas:

Object Cache The API will cache objects. Classes that implement Serializable may be stored outside the JVM and potentially shared among JVMs connected by a network but Serializable will not be required.

Format independence The API will specify keys and values but will not limit their types.

Implementation independence The underlying implementation is hidden from API users. An SPI is used to configure a Cache Provider.

Support for Flexible Implementations Though the specification does not require any implementation and a simple in-process implementation will meet the specification, issues raised by distributed caching and storage in Serialized form outside the heap will be dealt with so that those implementations with those features will work well.

Java SE The specification will work with Java SE.

Java EE The specification will work within Java EE. This specification is targeted at inclusion in Java EE 7.

Generics The specification will make use of generic interfaces.

Annotations The specification will define runtime cache annotations.

Declarative Cache Configuration Specifying the behaviour of the CacheManager or Caches in a non-programmatic way. This may take the form of a minimal lowest common denominator with vendor specific further configuration, or it might take the form of a variety of mechansims to inject vendor configuration.

Transactions Support for transactions, both local and XA will be defined but left as optional for implementers

There are many applied areas of caching. This specification will not deal with them. For example:

• Database Caching. JPA[7] deals with that.

• Servlet Caching

• Caching as a REST service


Active Membership

I am pleased to report that we have a very healthy membership. We reconfirmed the membership, added new members with expertise and also added as observers the leads from JSR342 Java EE7.

The following are active expert group members:

  • Greg Luck, Terracotta and co-spec lead
  • Cameron Purdy, Oracle and co-spec lead
  • Nikita Ivanov, Grid Gain

• Manik Surtani, Red Hat Middleware LLC

• Yannis Cosmadopolous, Oracle and working on the spec for Oracle

• Chris Berry

• Andy Piper (formerly BEA Systems rep)

• Jon Stevens

The following have been voted in they are moving through the JCP process:

• Eric Dalquist

• Ben Cotton, Citigroup

• David Mossakowski, Citigroup

The following are members of JSR342 and are observers of JSR107 (they get the expert group mails):

• Linda Demichiel, Oracle

• Roberto Chinnici, Oracle

GitHub Repositories

Similar to the specification we are also coding up the API in public with repositories on GitHub.


See https://github.com/jsr107/jsr107spec


Ehcache via the ehcache-jcache module has provided an implementation of the draft spec to where it got to, for the past 3 years. We have now moved it to GitHub and will develop it along with the spec. If anyone wants to help out please send me your GitHub id.

See https://github.com/jsr107/ehcache-jcache

Public Mailing List

If you want to stay in touch with JSR107, please join our Google Groups public mailing list:  http://groups.google.com/group/jsr107

The address is: jsr107@googlegroups.com

We copy most expert group emails to this list.


Comparative Technical Use Cases for Distributed Caches and NoSQL

I have been doing some NoSQL research lately. The first fruit of that work was a guest post on myNoSQL, Ehcache: Distributed Cache or NoSQL Store, which crisply distinguished between a Distributed Cache and NoSQL Stores.

In this article I am going to delve into the suitability of each for various technical use cases. I use the word “technical” because a usual use case is a business use case. Here we are interested in a set of features that allow a certain usage. In a follow up I hope to create a second, more business use case oriented table.

I welcome feedback on this, particularly from those with production experience.

Technical Use Case Distributed Cache NoSQL Key Value NoSQL Columnar NoSQL Graph NoSQL Document
Database Offload Excellent Poor (1) Poor (1) Poor (1) Poor (1)
Database Replacement Poor (2) Poor (3) Poor (3) Poor (3) Poor (3)
Weak Consistency Caching Excellent Average (2) Average Poor Poor
Eventual Consistency Caching Excellent (4) Average (5) Average (5) Average (5) Average (5)
Strongly Consistent Caching Excellent Poor Poor Poor Poor
ACID Transactional Caching Excellent Poor Poor Poor Poor
Low Latency Data Access Excellent Average (5) Average (5) Average (5) Average (5)
Big Data (6) Poor Excellent Excellent Excellent Excellent
Big Memory (7) Excellent (8) Poor Poor Poor Poor


  1. To offload the database you need to work in places and ways in which the database works. So for example you need to support transactions if they are being used and you need a place to plug in to avoid a ton of work like Hibernate or OpenJPA. NoSQL stores don’t do that.
  2. Distributed caches may not provide long term persistence and management of data. They are also often limited in size so may not be able to store all of the data.
  3. It is not clear that NoSQL is a full database replacement. The “Not Just SQL” as an alternative expansion of the acronym, something widely accepted by the NoSQL community, acknowledges this. The lack of SQL, the lack of ACID, sophisticated operations tools and so on, mean that NoSQL itself is not great at being a replacement. Rather, if you can rethink your need for a database to needing persistence, and you can change your application code, then it comes into play.
  4. In a node to the elegant CAP trade off allowed by eventual consistency, Ehcache 2.4.1, due out the end of March adds this consistency mode.
  5. Distributed Caches store hot data in process. You might think of memcache as a distributed cache, which it claims to be but it does not store data in -process – it is always over the network. And NoSQL is always over the network. In most R + W > N strategies, R is greater than one, so that multiple network reads are required and the caller must wait for R reads where each read is to a different server which will have a varying response. Distributed Ehcache has latencies of < 1 ms whereas the average for NoSQL is 5-10ms. This is also why NoSQL gets an average for Weak Consistency Caching. A cache should be fast.
  6. “Big Data” is a moving target that is today generally understood to start at a few dozen terabytes and go up into petabytes. The current implementation of Ehcache has been used to manage datasets up to 2 TB which is just at the starting point of Big Data. The whole point of NoSQL is Big Data, so they get full marks in this area.
  7. “Big Memory” is also a moving target and is early on it’s use as a term. We define it to mean using the physical limits of the hardware. For many architectures this has not been possible. With Java the issue was first 32 bits and then now the limitation is garbage collection. We overcame that issue with our BigMemory architecture, using storage in off-heap byte buffers in September 2010.
  8. Caches tends to be memory-resident. BigMemory allows in-memory densities per physical server up to their limits, which is 2TB for the current generation of commodity hardware from Dell, HP and Oracle but much lower due to their architectures which require full CPU population to achieve maximum memory. Although not all vendors are similarly constrained: Cisco UCS boxes allow more memory per CPU, so that for example they can do 384GB with 2 CPUs. NoSQL stores focus on persistency and have small in-memory server side caches. They focus on speeding up disk reads and writes by for example doing append only.

News on JSR107 (JCACHE) and JSR342 (Java EE 7)


JSR342 was created on 14 March 2011. JSR107, or JCACHE, is included: In JSR342’s words:

The following new JSRs will be candidates for inclusion in the Java EE 7 platform:

Concurrency Utilities for Java EE (JSR-236)
JCache (JSR-107)

Isn’t JSR107 inactive?

But how could this happen if JSR107 is inactive?

Well the answer is that we are reactivating it. Oracle (various staff) and Terracotta (mostly me) have started work on the specification with the hope of having a draft spec for review by 20 April. To actually make sure it happens Oracle have allocated resources to work on this project. They are being led by Cameron Purdy, who is co-spec lead along with myself of JSR107 and the founder of Coherence. And of course I founded and still continue to lead Ehcache as CTO of it at Terracotta.

To be officially reactivated we need to submit the draft spec. So reactivation should happen on 20 April.

Motivations for finishing JSR107

Today there are two leading widely scoped frameworks for developing enterprise applications: Spring and Java EE. With the release of Spring 3.1, Spring, heavily influenced by Ehcache Annotations for Spring, has significantly enhanced their caching API. It is easier for Spring because they are a single vendor framework and can do things outside of a standards process. Java EE is still lacking any general purpose caching API. There are some use-specific APIs scattered throughout such as in JPA, but nothing developers can write to. And I know there is a significant need for a general purpose API. So, Java EE 7 wants a general purpose caching API, and this is the primary reason for finishing JSR107.

Another reason is that in-process caching is now heavily commoditised but not standardised. There are more than 20 open source in-process caching projects and another 5 or so commercial distributed caches. But if a user wants to change implementations, they need to change their code. This is akin to database access not having the JDBC standard. So we need to provide a standard API so that users can change caching implementations at low cost.

Scope of JSR107

There has been a bit of discussion about this, but it is most likely that the scope will be as it has been plus two new areas:

  1. Generics – similar to collections, allow caches to be created with defined keys and values
  2. Annotations and integration with JSR342 – allow caching annotations so that for example the return value from any functional method can be cached

The draft specification as it has existed for a few years is available under a net.sf.jsr107cache package name on SourceForge. And Ehcache provides an implementation of that draft spec via ehcache-jcache.

Get Involved

It seems a good time to look at the expert group and make a general invitation for new members.

If you are interested and would like to spend some time on this, please email me at gluck At gregluck.com and I can explain how to start. Additions to membership of the JSR107 are by voting of the existing members.

Ehcache: Distributed Cache or NoSQL Store?

Is Ehcache a NoSQL store? No, I would not characterise it as that, but I have seen it used for some NoSQL use cases. In these situations it compared very well — with higher performance and more flexible consistency than the well-known NoSQL stores. Let me explain.

Read more