Introducing Deliberate Caching

A few weeks ago I attended a ThoughtWorks Technology Radar seminar. I worked at ThoughtWorks for years and think if anyone knows what is trending up and down in software development these guys do. At number 17 in Techniques with a rising arrow is what they called Thoughtful Caching. At drinks with Scott Shaw, I asked him what it meant.

What the trend is about is the movement from reactive caching to a new style. By reactive I mean you find out your system doesn’t perform or scale after you build it and it is already in production. Lots of Ehcache users come to it that way. This is a trend I am very happy to see.

Deliberate Caching

The new technique is:

  • proactive
  • planned
  • implemented before the system goes live
  • deliberate
  • is more than turning on caching in your framework and hoping for the best – this is the Thoughtful part
  • uses an understanding of the load characteristics and data access patterns
We kicked around a few names for this and came up with Deliberate Caching to sum all of this up.
The work we are doing standardising Caching for Java and JVM based languages, JSR107, will only aid with this transition. It will be included in Java EE 7 which even for those who have lost interest in following EE specifically will still send a signal that this is an architectural decision which should be made deliberately.

Why it has taken this long?

So, why has it taken until 10 years after Ehcache and Memcache and plenty of others came along for this “new” trend to emerge?  I think there are a few reasons.

Some people think caching is dirty

I have met plenty of developers who think that caching is dirty. And caching is cheating. They think it indicates some architectural design failure that is best of being solved some other way.
One of the causes of this is that many early and open source caches (including Ehcache) placed limits on the data safety that could be achieved. So the usual situation is that the data in the cache might but was not sure to be correct. Complicated discussions with Business Analysts were required to find out whether this was acceptable and how stale data was allowed to be. This has been overcome by the emergence of enterprise caches, such as Enterprise Ehcache, so named because they are feature rich and contain extensive data safety options, including in Ehcache’s case: weak consistency, eventual consistency, strong consistency, explicitly locking, Local and XA transactions and atomic operations.  So you can use caching even in situations where the data has to be right.

Following the lead of giant dotcom

The other thing that has happened is that as giant dotcoms it cannot have escaped anyone’s notice that they all use tons of caching. And that they won’t work if the caching layer is down. So much so that if you are building a big dot com app it is clear that you need to build a caching layer in.

Early Performance Optimisation is seen as an anti -pattern

Under Agile we focus on the simplest thing that can possibly work. Requirements are expected to keep changing. Any punts you take on future requirements may turn out to be wrong and your effort wasted. You only add things once it is clear they are needed. Performance and scalability tend to get done this way as well. Following this model you find out about the requirement after you put the app in production and it fails. This same way of thinking causes monolithic systems with single data stores to be built which later turn out to need expensive re-architecting.

I think we need to look at this as Capacity Planning. If we get estimated numbers at the start of the project for number of users, required response times, data volumes, access patterns etc then we can capacity plan the architecture as well as the hardware. And in that architecture planning we can plan to use caching. Because caching affects how the system is architected and what the hardware requirements are, it makes sense to do it then.

 

 

javax.cache: The new Java Caching Standard

This post explores the new Java caching standard: javax.cache.

How it Fits into the Java Ecosystem

This standard is being developed by JSR107, of which the author is co-spec lead. JSR107 is included in Java EE 7, being developed by JSR342. Java EE 7 is due to be finalised at the end of 2012. But in the meantime javax.cache will work in Java SE 6 and higher and Java EE 6 environments as well aswith Spring and other popular environments.

JSR107 has draft status. We are currently at release 0.3 of the API, the reference implementation and the TCK. The code samples in this article work against this version.

Adoption

Vendors who are either active members of the expert group or have expressed interest in implementing the specification are:

  • Terracotta – Ehcache
  • Oracle – Coherence
  • JBoss – Infinispan
  • IBM – ExtemeScale
  • SpringSource – Gemfire
  • GridGain
  • TMax
  • Google App Engine Java

Terracotta will be releasing a module for Ehcache to coincide with the final draft and then updating that if required for the final version.

Features

From a design point of view, the basic concepts are a CacheManager that holds and controls a collection of Caches. Caches have entries. The basic API can be thought of map-­like with the following additional features:

  • atomic operations, similar to java.util.ConcurrentMap
  • read-through caching
  • write-through caching
  • cache event listeners
  • statistics
  • transactions including all isolation levels
  • caching annotations
  • generic caches which hold a defined key and value type
  • definition of storage by reference (applicable to on heap caches only) and storage by value

Optional Features

Rather than split the specification into a number of editions targeted at different user constituencies such as Java SE and Spring/EE, we have taken a different approach.

Firstly, for Java SE style caching there are no dependencies. And for Spring/EE where you might want to use annotations and/or transactions, the dependencies will be satisfied by those frameworks.

Secondly we have a capabilities API via ServiceProvider.isSupported(OptionalFeature feature)so that you can determine at runtime what the capabilities of the implementation are.  Optional features are:

  • storeByReference – storeByValue is the default
  • transactional
  • annotations

This makes it possible for an implementation to support the specification without necessarily supporting all the features, and allows end users and frameworks to discover what the features are so they can dynamically configure appropriate usage.

Good for Standalone and Distributed Caching

While the specification does not mandate a particular distributed cache topology it is cognizant that caches may well be distributed. We have one API that covers both usages but it is sensitive to distributed concerns. For example CacheEntryListener has a NotificationScope of events it listens for so that events can be restricted to local delivery. We do not have high network cost map-like methods such as keySet() and values(). And we generally prefer zero or low cost return types. So while Map has V put(K key, V value) javax.cache.Cache has void put(K key, V value).

Classloading

Caches contain data shared by multiple threads which may themselves be running in different container applications or OSGi bundles within one JVM and might be distributed across multiple JVMs in a cluster. This makes classloading tricky.

We have addressed this problem. When a CacheManager is created a classloader may be specified. If none is specified the implementation provides a default. Either way object de-serialization will use the CacheManager’s classloader.

This is a big improvement over the approach taken by caches like Ehcache that use a fall-back approach. First the thread’s context classloader is used and it that fails, another classloader is tried. This can be made to work in most scenarios but is a bit hit and miss and varies considerably by implementation.

Getting the Code

The spec is in Maven central. The Maven snippet is:

<dependency>
     <groupId>javax.cache</groupId>
     <artifactId>cache-api</artifactId>
     <version>0.3</version>
</dependency>

A Cook’s Tour of the API

Creating a CacheManager

We support the Java 6 java.util.ServiceLoader creational approach. It will automaticaly detect a cache implementation in your classpath. You then create a CacheManager with:

CacheManager cacheManager = Caching.getCacheManager();

which returns a singleton CacheManager called “__default__”. Subsequent calls return the same CacheManager.

CacheManagers can have names and classloaders configured in. e.g.

CacheManager cacheManager = Caching.getCacheManager(“app1”, Thread.currentThread().getContextClassLoader());

Implementations may also support direct creation with new for maximum flexibility:

CacheManager cacheManager = new RICacheManager(“app1”, Thread.currentThread().getContextClassLoader());

Or to do the same thing without adding a compile time dependency on any particular implementation:

String className = "javax.cache.implementation.RIServiceProvider";
Class<ServiceProvider> clazz =(Class<ServiceProvider>)Class.forName(className);
ServiceProvider provider = clazz.newInstance();
return provider.createCacheManager(Thread.currentThread().getContextClassLoader(), "app1");
We expect implementations to have their own well-known configuration files which will be used to configure the CacheManager. The name of the CacheManager can be used to distinguish the configuration file. For ehcache, this will be the familiar ehcache.xml placed at the root of the classpath with a hyphenated prefix for the name of the CacheManager. So, the default CacheManager will simply be ehcache.xml and “myCacheManager” will be app1-ehcache.xml.

Creating a Cache

The API supports programmatic creation of caches. This complements the usual convention of configuring caches declaratively which is left to each vendor.

To programmatically configure a cache named “testCache” which is set for read-through

cacheManager = getCacheManager();
CacheConfiguration cacheConfiguration = cacheManager.createCacheConfiguration()
cacheConfiguration.setReadThrough(true);
Cache testCache = cacheManager.createCacheBuilder(“testCache”)
.setCacheConfiguration(cacheConfiguration).build();

Getting a reference to a Cache

You get caches from the CacheManager. To get a cache called “testCache”:

Cache<Integer, Date> cache = cacheManager.getCache(“testCache”);

Basic Cache Operations

To put to a cache:

Cache<Integer, Date> cache = cacheManager.getCache(cacheName);

Date value1 = new Date();

Integer key = 1;

cache.put(key, value1);

 

To get from a cache:

Cache<Integer, Date> cache = cacheManager.getCache(cacheName);
Date value2 = cache.get(key);

 

To remove from a cache:

Cache<Integer, Date> cache = cacheManager.getCache(cacheName);
Integer key = 1;
cache.remove(key);

Annotations

JSR107 introduces a standardised set of caching annotations, which do method level caching interception on annotated classes running in dependency injection containers. Caching annotations are becoming increasingly popular, starting with Ehcache Annotations for Spring, which then influenced Spring 3’s caching annotations.

The JSR107 annotations cover the most common cache operations including:

  • @CacheResult – use the cache
  • @CachePut – put into the cache
  • @CacheRemoveEntry – remove a single entry from the cache
  • @CacheRemoveAll – remove all entries from the cache

When the required cache name, key and value can be inputed they are not required. See the JavaDoc for the details. To allow greater control, you can specify all these and more. In the following example, the cacheName attribute is specified to be “domainCache”, index is specified as the key and domain as the value.

public class DomainDao {
     @CachePut(cacheName="domainCache")
     public void updateDomain(String domainId, @CacheKeyParam int index,
          @CacheValue Domain domain) {
     ...
     }
}

The reference implementation includes an implementation for both Spring and CDI. CDI is the standardised container driven injection introduced in Java EE 6. The implementation is nicely modularised for reuse, uses an Apache license, and we therefore expect several open source caches to reuse them. While we have not done an implementation for Guice, this could be easily done.

Annotation Example

This example shows how to use annotations to keep a cache in sync with an underlying data structure, in this case a Blog manager, and also how to use the cache to speed up responses, done with @CacheResult

public class BlogManager {
@CacheResult(cacheName="blogManager")
public Blog getBlogEntry(String title) {...}
@CacheRemoveEntry(cacheName="blogManager")
public void removeBlogEntry(String title) {...}
@CacheRemoveAll(cacheName="blogManager")
public void removeAllBlogs() {...}
@CachePut(cacheName=”blogManager”)
public void createEntry(@CacheKeyParam String title,
@CacheValue Blog blog) {...}
@CacheResult(cacheName="blogManager")
public Blog getEntryCached(String randomArg,
@CacheKeyParam String title){...}
}

Wiring Up Spring

For Spring the key is the following config line, which adds the caching annotation interceptors into the Spring context:

<jcache-spring:annotation-driven proxy-target-class="true"/>

A full example  is:

<beans ...>
<context:annotation-config/>
<jcache-spring:annotation-driven proxy-target-class="true"/>
<bean id="cacheManager" factory-method="getCacheManager" />
</beans>

Spring has it’s own caching annotations based on earlier work from JSR107 contributor Eric Dalquist. Those annotations and JSR107 will happily co-exist.

Wiring Up CDI

First create an implementation of javax.cache.annotation.BeanProvider and then tell CDI where to find it  declaring a resource named javax.cache.annotation.BeanProvider in the classpath at /META-INF/services/.

For an example using the Weld implementation of CDI, see the CdiBeanProvider in our CDI test harness.

Further Reading

For further reading visit the JSRs home page at https://github.com/jsr107/jsr107spec.

0.3 of JSR107:javax.cache released

0.3 of the JSR107 spec, RI and TCK have been released.

Changes in this release:

  • Numerous changes across the spec, TCK and RI
  • Annotations implementations in the RI for Spring and CDI
  • Transactions API finalised
The release is in Maven central so the snippet for the API is:
<dependency>
<groupId>javax.cache</groupId>
<artifactId>cache-api</artifactId>
<version>0.3</version>
</dependency>

We are pretty much on the home run with this now. Work on Ehcache, Infinispan and Coherence implementations are starting. Work will now shift to closing open issues and dealing with review comments as they come in.

We welcome community involvement. The jumping off point for all things JSR107 is the GitHub Page.

My Sessions at JavaOne

I will be speaking at JavaOne 2011. My sessions are:

24241 – The Essence of Caching, Parc 55 – Divisidero at 10:30am Tuesday 4 October

This presentation distills what the speaker has learned from 10 years of scaling Java. It starts with a performance problem and leads you through solving it with caching, discovering the problems of distributed caching and their solution along the way. It will equip you with the tools to analyze a performance situation and see whether a cache will help and what type of cache to apply.

Topics include
• The nature of system load
• Desirable properties of scalable systems
• Caching as a solution for offload, scale-out, and performance
• Why caching works
• Tiered cache design
• SOR coherency problem and solutions
• N * problem and solutiond
• Cache cluster topologies
• CAP and PACELC constraints
• Resulting design trade-offs

24223 – The New JSR 107 Caching Standard, Imperial Ballroom A , Hilton San Francisco at 1:30 pm Tuesday 4 October

In this session, the two spec leads for JSR 107 walk you through this important new caching standard, which will form part of Java EE 7.

You will learn how to
• Abstract your caching implementation, much as with JDBC
• Use the rich and modern API
• Use the new caching annotations
• Use the API before Java EE 7 is released within the Java SE, Java EE 6, and Spring environments
• Apply JCache to common caching scenarios

 

Come along and feel free to ask me any questions after my sessions.

Start using JSR107’s JCache API

JCache is rapidly nearing completion and we would like the community to start using it. The API is becoming quite stable.

The home for all things JCache is: https://github.com/jsr107/jsr107spec. Today I updated that page with the following details so that you can all get started.

We expect to release our first non-snapshot release in a few week’s time with further releases leading up to JavaOne.

I am doing two sessions on caching at JavaOne. If you are attending please come along to learn more. My sessions are:

Session ID: 24223

Session Title: The New JSR 107 Caching Standard

Session ID: 24241

Session Title: The Essence of Caching

For the uninitiated JCache is the API being defined in JSR107. It defines a standard Java Caching API for use by developers and a standard SPI (“Service Provider Interface”) for use by implementers.

Release

The stable releases of this software are tagged with version numbers, starting with 0.1. Eventually, when the specification is further along releases will match the specification number.

We expect out first stable release early August 2011.

Snapshot Releases

Snapshot releases of jars for binaries, source and javadoc are available.

Download the cache-api from https://oss.sonatype.org/index.html#nexus-search;quick~javax-cache

or use the following Maven snippet:

<repository>
    <id>sonatype-nexus-snapshots</id>
    <name>Sonatype Nexus Snapshots</name>
    <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    <releases>
        <enabled>false</enabled>
    </releases>
    <snapshots>
        <enabled>true</enabled>
    </snapshots>
</repository>

<dependency>
  <groupId>javax.cache</groupId>
  <artifactId>cache-api</artifactId>
  <version>0.2-SNAPSHOT</version>
</dependency>

Javadoc

The JavaDoc is available as a jar with the releases. We also have the latest JavaDoc online.

Specification

The evolving specification is available online on as a Google Doc.

Reference Implementation

The reference implementation (“RI”) source is available on GitHub.

This implementation is not meant for production use. For that we would refer you to one of the many open source and commercial implementations of JCache.

The RI is there to ensure that the specification and API works.

For example, some things that we leave out:

  • implementation of transactions.
  • eviction does not use an LRU or similar algorithm it just evicts an entry when full.
  • concurrency. The RI is not exhaustively tested for thread safety.
  • tiered storage. A simple on heap store is used.
  • replicated or distributed caching
  • cache sizing. All caches are hard coded to be of size 100 entries.

Why did we do this? Because a much greater engineering effort, which gets put into the open source and commercial caches which implement this API, is required to accomplish these things.

Having said that, the RI is Apache 2 and is a correct implementation of the spec. It can be used to create new cache implementations.

Building From Source

mvn clean install

Mailing list

Please join the mailing list if you’re interested in using or developing the software: http://groups.google.com/group/jsr107

IRC

We will be using the #jsr107 channel on Freenode for chat.

We also have set up a commit hook which publishes commits to the channel.

Issue tracker

Please log issues to: https://github.com/jsr107/jsr107spec/issues

Contributing

Right now code contribution is limited to the Expert Group, but please feel free to post to the mailing list.

License

The API is available under the JPA license and may be freely used.

The TCK is available under a restricted TCK license although the tests.

The reference implementation is available under an Apache 2 license.

For details please read the license in each source code file.

Contributors

This free, open source software was made possible by the JSR107 Expert Group who put many hours of hard work into it.

 

Creating Terracotta Server Arrays with EC2 CloudFormation for use by Ehcache

This is the first in a series of articles showing how to automate deployment of Ehcache in EC2.

Ehcache is a distributed cache which works with a Terracotta Server Array (“TSA”) which acts as the in-memory store over the network. While Ehcache is simply a jar and is included in your app, provisioning a distributed cache in EC2 requires running up a Terracotta Server Array. Some approaches to this include using Chef, rolling your own AMIs and manual installation.

TSA as a Utility

From the point of view of an application running on EC2. the TSA is a utility. As a utility it would be great if you could provision it much like you do RDS or S3. Now while those are built-in to EC2, CloudFormation templates aim to bring the same ease of provisioning to third party utilities like the TSA.

CloudFormation

On 31 May CloudFormation was upgraded with new features: validation for template parameters, resource deletion policies and the ability to block stack creation until your application is ready. Combine that with Ubuntu’s cloud-init feature supported in Amazon Linux AMI and you have a flexible and powerful provisioning infrastructure that you can drive from the AWS Console, command line or SDKs for multiple programming languages. My example uses the Java SDK for CloudFormation.

JUnit Integration Test

This example shows you how to, within a JUnit test, fire up a Terracotta server with CloudFormation and perform an Ehcache integration test. Though relatively simple, it exercises most of the moving parts of CloudFormation.

The example uses standard 32 bit AMIs. It creates a single active Terracotta server, using the open source version.  First it starts an AMI, downloads and installs Terracotta and starts it using the config. We then query the stack to find the EC2 Instance and then use the EC2 API to resolve the public DNS name. Using ehcache.xml’s token replacement feature we then inject the public dns name into the tcconfig and start Ehcache. Ehcache creates a distributed cache across the Internet to the TSA you just created. We then stick something in the cache and read it back out.

You can use this example to set up your own integration testing.

You can check out and run my sample code from GitHub with the following URL: git://github.com/gregrluck/cloudformation.git.

To run the example you need to:

  1. have an AWS account
  2. Edit AwsCredentials.properties and add your accessKey and secretKey (available from your account page)
  3. (Optional) Edit TerracottaCloudFormationSample.java and set your keyChainName. It is currently set to gluck. This is an ssh keychain that lets you access the running AMIs.

Though not done in the sample, there are a few other Terracotta deployment options which should be easy with these tools:

  1. Create a tc:config for a striped production cluster
  2. Use a custom terracotta configuration
  3. Place the config on a web server and tart each TC server via a remote URL to the TC:config and the -n switch

In the next article I will show you how to automatically deploy an app using Ehcache to BeanStalk, and how to connect your Ehcache to the TSA created.

Terracotta acquired by Software AG

As you probably have heard, Terracotta has been acquired by Software AG. This is an exciting development for both companies. Ari Zilka, CTO of Terracotta has a comprehensive blog post detailing the acquisition and its implications.

For me, it means I keep working for Terracotta, but now Terracotta is a wholly owned business unit within Software AG.

Ehcache will remain available in its current two editions: open source under the Apache 2 license and commercial with value-add features. And of course it will get even more investment as part of the larger organization.

I joined Terracotta 21 months ago. It has been an amazing ride so far for me and for Ehcache. I am looking forward to the next chapter. Right now my area of focus is on standardizing Java caching by leading the specification of JSR107. Once that is done we will implement the specification in Ehcache.

JSR107 JCache Update 9 May 2011

For those not involved in the expert group or following on the Google Group (jsr107@googlegroups.com), here is an update.
As usual the actual specification in all of it’s evolving glory can also be viewed at https://docs.google.com/document/d/1YZ-lrH6nW871Vd9Z34Og_EqbX_kxxJi55UrSn4yL2Ak/edit?hl=en . Some people find working on and editing a spec in plain view “appalling” but I like it. It is like a nightly build in open source.
And the API and code is on GitHub: https://github.com/jsr107/jsr107spec

Workshops

We have been workshopping through the spec. Everyone on both the Google or EG mailing lists was invited. Physically or on the calls Friday and today have been EG and non-EG members:

Greg Luck (Ehcache and Terracotta)
Yannis Cosmadopolous (Oracle)
Christa (Oracle)
Ben Cotton
David Mossakowski (Citibank)
Richard Hightower
Ludovic Orban (Terracotta and Bitronix)

Specification Sections for Review

We have made a lot of progress. The spec is now 51 pages long.
Two of the most difficult areas have been mostly done. They are:
  • Chapter 5 – JTA. This is ready for review.
  • Chapter 8 – Annotations. This is complete but needs reformatting and TODOs done. It and the associated spec interfaces should be ready for review on Monday.

Significant Developments

There are also two significant developments:
1. Two versions with different API Packaging.
So that caching can easily be done in SE, we are considering having a jcache-api.jar which includes mandatory parts of the spec and a jcache-api-ee.jar which will include both mandatory and optional APIs. The latter takes up annotations and JTA. These can still be included in non-EE apps like Spring or SE but there will be significant dependencies that those environments would need to add.
2. Map vs Map-like
No one involved in the workshops likes Map. We are enumerating the reasons and proposing a new map-like Cache interface. That is being documented in the next few days. Add comments to the spec doc if you have them.

Next Meeting

The easiest way for you to get involved is at the next meeting which is 10am Pacific Standard time this Thursday.
Some topics for the next meeting:
  1. Reviewing comments added to the Google Doc
  2. Classloading
  3. Configuration
  4. Lifecycle