Ehcache 2.4 with the new Search feature is out!

Ehcache 2.4 launches today. The big new feature right in the core of Ehcache 2.4 is Search.

It uses a new fluent API which looks like this:

Results results = cache.createQuery().includeKeys().addCriteria(age.eq(32).and(gender.eq(“male”))).execute();

In short, it lets you further offload the database. With Ehcache now supporting up to 2TB and linear scale-out you can do more than ever.

What is searchable?

You can search against predefined indexes of keys, values, or attributes extracted from values.

Attributes can be extracted using JavaBeans conventions or by specifying a method to call.

For example to declare a cache searchable and extract age as a JavaBean and gender as a method out of a Person class:

Caches can also be made searchable programmatically. And Custom Value Extractors can be created so that you can index and search against virtually anything you put in the cache.

Search Query Language

Ehcache Search introduces EQL, a fluent, Object Oriented query language which we call EQL, following DSL principles, which should feel familiar and natural to Java programmers.

Here is a full example. Search for men whose names start with “Greg”, and then order the results by age. Don’t return more than 10 results. We want to include keys and values in the results. Finally we iterate through the Results.

Query query = cache.createQuery();
query.includeKeys();
query.includeValues();
query.addCriteria(name.ilike(“Greg*”).and(gender.eq(Gender.MALE))).addOrderBy(age, Direction.ASCENDING).maxResults(10);

Results results = query.execute();
System.out.println(” Size: ” + results.size());
for (Result result : results.all()) {
System.out.println(“Got: Key[” + result.getKey()
+ “] Value class [” + result.getValue().getClass()
+ “] Value [” + result.getValue() + “]”);
}

EQL is very rich. There is a large number of  Criteria such as ilike, lt, gt, between and or which you use to build up complex queries. There are also Aggregators such as min, max, average, sum and count which will summarise the results.

Like NoSQL, EQL executes against a single cache – there are no joins. If you need to combine the results of searches from two caches, you can perform two searches  and then combine the results yourself.

Standalone and Distributed

Search is built into Ehcache core. It works with standalone in-process caching and will work for distributed caches in the forthcoming Terracotta 3.5 platform release which goes GA in March and is available as a release candidate now.

Distributed cache search is indexed and executes on the Terracotta Server Array using a scatter gather pattern. The EQL is sent to each cache partition (the scatter), returning partial results (the gather) to the requesting Ehcache node which then combines the results and presents them to the caller. Terracotta servers utilise precomputed indexes to speed queries.

Indeed, the distributed cache performance has an important property: searches execute in O(logN)/partitions time. So if you have 50GB of cache in one partition which takes 40ms to search and then you double the data, you can hold the execute time constant by simply adding another partition. Generally, you can hold execution time constant for any size of data with this approach.

The standalone cache takes a different approach. Most in-process caches are relatively small. And Ehcache is lightning fast. We don’t use indexes but instead visit each element in the cache a maximum on once, resolving the EQL. It takes 5ms to run a typical query against a 10,000 element cache. Generally the performance is O(N) but even a 1 million entry cache will take less than a second to search using this approach.

Sample Use Cases

Caches can also be made searchable programmatically. And Custom Value Extractors can be created so that you can index and search against virtually anything you put in the cache.

Database Search Offload

Take a shipping company that creates 50GB of consignment notes per week. Customers search by consignment note id but also by addressee name. Most searches (95%) are done within two weeks of the creation of a consignment note.   The consignment notes get stored in a relational database that grows and grows. Searches against the database now take 650ms which take enquiry outside it’s SLA.

Solution: Put the last two weeks of data in the cache. Index by consignment note id, first name, last name and date.  Search the cache first and only search the database if there the consignment note is not found. This takes about  50ms and provides a 95% database offload.

Real-Time Analytics Search

In-house analytics engines processes large amounts of data and compute some result from it. The results need to be queried very quickly to enable processing of a within a business transaction. And the results need to be updated through the day in response to business events. Some examples are credit card fraud scoring, or a holding position in a trading application.

Create a distributed cache and index it as required. Various roll-ups are cached and updated after the system of record has been written to with new transactions. Use Ehcache’s bulk loading mode to quickly upload the results of overnight analytics runs.

Searches execute much more quickly than it would take to compute positions from scratch using the system of record, enabling the real-time analytics.

More Information

You can get Ehcache 2.4 from ehcache.org. Search is fully documented along with downloadable demos and performance tests here.

Something new under the Sun: A new way of doing a cache invalidation protocol with Oracle 11g

Just when you think the database will never get smarter, it does. And the database that matters most to Enterprise Developers is Oracle.

What happens to coherency between your distributed cache and the database when another application changes the database? If it is a Java application, you can add Ehcache to it and configure it to connect to the distributed cache. But this might require work you do not want to do right now. Or there might be tens of apps involved. Or there might be scripts which are run by DBAs from time to time. Or it could be another language. We have the Cache Server, where you can expose a distributed cache via REST or SOAP and then invalidation becomes as simple as sending a HTTP Delete to the Element resource. See the Cache Server Documentation for more on this.

For a few years, MySQL users have had access to cache invalidation for Memcached using libmemcached. The database calls back to memcached. Brian Aker mentioned to me he was adding this into MySQL a few years ago. See http://www.mysqlconf.com/mysql2009/public/schedule/detail/6277 for a decent introduction.

For Oracle, back in 2005 I tried to use their message queue integration to achieve the same effect. Back then it didn’t even work. I have heard that it works now, but it is a very messy solution with lots of moving parts.

Fortunately, starting from 11g Release 1 (11.1), the Oracle JDBC driver provides support for the Database Change Notification feature of Oracle Database. Using this functionality of the JDBC drivers, multitier systems can take advantage of the Database Change Notification feature to maintain a data cache as up-to-date as possible, by receiving invalidation events from the JDBC drivers. See  the Database Change Notification chapter in the Oracle docs for details.

This lets you achieve a new and very simple way of doing a cache invalidation protocol to keep your cache and the database in sync.

You create a DatabaseChangeListener and register it with your connection:

I like this so much I think we might add a standard DatabaseChangeListener to Ehcache for it.

Ehcache: The Year In Review

Is it that time of year again. A time for reflection and of New Year’s resolutions. I therefore thought this was good time to reflect on what has been happening with Ehcache.

Ehcache and Terracotta got together in August 2009. We got our first combined release done with Ehcache backed by Terracotta three months later. That was really only the beginning. We have done 5 major releases to date with of course another one in the cooker right now.

The big news is that Ehcache and Terracotta together have been a great success. Most Terracotta users now use it as a distributed backing store for Ehcache. Most of Terracotta’s revenue comes from that use case. This means that Ehcache’s needs are driving the evolution of Terracotta.

At the same time, Ehcache as an open source project has seen a huge investment. Most of the features added are designed to work in Ehcache open source as well as with Terracotta as the backing distributed store.

So let’s look at what we added in 2010 and what is in the cooker so far for 2011.

What we did in 2010

New Hibernate Provider

When we merged Terracotta had just done a Hibernate provider and Ehcache had one too. Neither supported the new Hibernate 3.4 SPI which uses CacheRegionFactories instead of CachingProviders. So we combined the two into a new implementation and added support for the new SPI at the same time. This meant that for the first time Ehcache supported all of the cache strategies, including transactional, in Hibernate and importantly supported them across a caching cluster.  (http://ehcache.org/documentation/hibernate.html)

XA Transactions

Right now XA transactions have fallen from favour partly because of flaky support for them in XAResource implementations out there. But what if you could create a canonically correct implementation that could be absolutely relied on in the world’s most demanding transactional applications? We hired one of the Java world’s foremost experts in transactions (Ludovic Orban, the author of the Bitronix transaction manager) and came out with just that. We challenge anyone to prove that our implementation is not correct and does not fully deal with all failure scenarios. If you need to be absolutely sure that the cache is in sync with your other XAResources with a READ_COMMITTED isolation level you have come to the right place. (http://ehcache.org/documentation/jta.html)

Terabyte Scale Out

Ehcache backed by Terracotta initially held keys in memory in each application that had Ehcache in it. This effectively limited the size of the caching clusters that could be created. With a new storage strategy, we blew the lid of that and stopped storing the keys in memory. The result – horizontal scaling to the terabyte level.

Write-through, behind and every which way

What happens when you have off-loaded reads from your database but now your writes are killing you? The answer is write-behind. You write to the cache. It calls a CacheWriter which you implement and connect to the cache which is called periodically with batches of transactions. In your CacheWriter you open a transaction write the batch and then close the transaction. Much easier for the database. And all done in HA with the write-behind queue is safe because it is stored on the Terracotta cluster.

More caching in more places

We were really happy to extend our support for popular frameworks. During the year:

  • Ehcache became the default caching provider for Grails
  • We created an OpenJPA provider
  • We created Ruby Gems for JRuby and Rails 2 and 3 caching providers
  • We created a Google App Engine module

Acknowledgement of the CAP Theorem

Originally Ehcache with Terracotta was consistent above all else. During the year we flexed both ways to allow CAP tradeoffs to be made by our users. We added XA and manual locking modes on the even stricter side and we added an unlocked reads view of a cache and even coherent=false for scale out without coherence on the looser side. And you can choose this cache by cache. There is a tradeoff between latency and consistency, so you choose the highest speed you can afford for a particular cache.

And rather than just blocking on a network partition, we added NonStopCache, a cache decorator which allows you to choose whether to favour availability or consistency.

BigMemory

BigMemory was a big hit. It surprised a lot of people and frankly it surprised us. We were looking to solve our own GC issues in the Terracotta server and found something that was more generally useful than that one use case. So we added BigMemory to Ehcache standalone as well as the server. In the server we have lifted our field engineering recommendation from 20GB of storage per server partition to 100Gb. And we have tested BigMemory itself out to 350GB and it works great!

A new Disk Store

Let’s say you are using a 100GB in Ehcache standalone. When you restart your JVM you want the cache to be there otherwise it might take hours or days to repopulate such a large cache. So we created a new DiskStore that keeps up with BigMemory. It writes at the rate of 10MB/s. So when it is time to shutdown your JVM it just needs to do a final sync and then your are done. And it starts up straight away and gradually loads data into memory. A nice complement to BigMemory and very important.

Ehcache Monitor/Terracotta Dev Console Improvements

For those using Ehcache standalone we have only ever had a JMX API. That is fine but we found many people built their own web app to gather stats. So we did the same and the result was Ehcache Monitor. One of the highlights is the charts including a chart of estimated memory use per cache.

The Terracotta Developer Console got an Ehcache panel, and as we added features to Ehcache we added more to the panel. If you are using Ehcache with a backing Terracotta store then it is a full featured tool which gives you deep introspection.

What is Coming in 2011

Speed, speed and more speed

What does everybody want? More speed. We are splitting hairs in our concurrency model to enable as much speed as possible for each use case. We now have two and will be adding more modes to allow the best tuning for each use case.

Search

Ehcache is based on a Map API. Maps have keys and values. They have a peculiar property – you need to know the key to access the value. What if you want to search for a key, or you want to index values in numerous ways and search those. All of this is coming to Ehcache in February 2011 and is available right now in beta. Oh and one cool thing: search performance is O(log n/partitions). So as your data grows and spreads out onto more Terracotta server partitions, your search performance stays constant! (http://ehcache.org/documentation/search.html)

New Transaction Modes

We already did the hard one: XA. Now we are adding Local Transactions. If you just want transactionality within the caches in your CacheManager and there are no other XAResources, you can use a Local Transaction. It will be three times faster than an XA cache. (http://ehcache.org/documentation/jta.html)

.NET Client

Quite a few customers use Java but also some .NET. And they want to be able to share caches. We have lots of users happily using ehcache for cross-platform use cases, but are planning on extending our cross-platform support still further – for example with a native .NET client

Bigger BigMemory

We are looking at ongoing speedups and testing against larger and larger memory sizes for BigMemory. We are also looking to provide further speed in BigMemory by allowing pluggable Serialization strategies. This will allow our users to use their Serialization framework of choice – and there are now quite a few.

How to find Ehcache and Terracotta Webinars and Presentations

After many requests we have extracted our talks out of the mess that is Webex and have started posting them to ScreenCast. From there you can easily watch them without any sign in rubbish or embed them. See http://www.screencast.com/users/Terracotta

Watch Caching Fundamentals Part 1 by Greg Luck, CTO Ehcache

This is the first in a series of webinars which will explore Caching Principles.

You will learn how to assess the effect that caching to a given performance situation and be able to calculate the performance improvement. Further it will be shown how to tune caches for maximum effectiveness.

This webinar focus on non-clustered caching principles.

The following topics are covered:

– What exactly is caching.
– Locality of Reference, Data lifespans and reuse patterns
– Pareto Distributions
– Amdahl’s Law
– Cache Statistics

Watch the talk here: http://s3.amazonaws.com/tcvideo/2010/Caching%20Principles%20Part%201%20with%20Greg%20Luck,%20Ehcache%20CTO-20100819%201811-1.mp4

Running Ehcache and Terracotta from Ant

A few weeks ago I blogged about the fantastic tc-maven plugin which works just as well as the Jetty plugin and makes life easy for Maven-based developers. My surveys from talks I do suggest that the mix of Maven and Ant based builds is 40-60% Maven. Three years ago it was about 10% Maven. Interestingly in Philadelphia the maven usage was 40% versus 60% in San Francisco. But many Ant people have tried Maven and had a less than stellar experience. My own experience was that Maven was as painful as EJB ever was. But it has been getting better over time.

The forthcoming release of Ehcache bundles the Terracotta server and I am very interested in making this as easy as possible for developers. The 2.1-beta kit has instructions for using the tc-maven plugin. The upcoming 2.1 final will also support Ant. Fortunately Ant and Maven interoperate and we will support Ant via the Maven Ant Tasks library.

Installation

Install Maven

Download and install Maven which is just expanding the download somewhere on your file system. Version 2.2.1 or higher is required.

Installing Maven Ant Tasks

There are a couple of choices documented at the Maven site. For simplicity, download the Maven Tasks for Ant 2.1.0 and copy the jar into your $ANT_HOME/lib directory.

Create a pom.xml

Maven requires a pom.xml which is placed in the same directory as build.xml. Use this sample which has all you need:

Additions to build.xml

Add these to build.xml:

Ensure you change your maven.home property value to where you installed Maven.

Usage

Starting Terracotta Server

The Terracotta server will be running on its default port of 9510.

Stopping Terracotta Server

More Information

The Terracotta plugin is documented on the Forge.

I am interested in people’s experiences using this. Ping me at gluck AT gregluck.com or post questions to the Ehcache Forum.

Adding Terracotta Server into your Maven build

Having servers at development time is pain. You need tooling to make it smooth. Fortunately, Terracotta has the tc-maven plugin for this purpose.

Integration Testing with Maven

To start and stop the server pre and post integration tests, add the following to your pom.xml:

Greg Luck: New US and European Tour and other Ehcache news

Rubbing the bull's nose

Being full time with Terracotta gives me an opportunity to engage with the Ehcache community like never before.

For example I just came back from two weeks in the US. I gave talks in San Francisco, Philadelphia, New York and Atlanta, this last at DevNexus. Here are the details of that tour.

Tour Dates

In May and June I will be hitting the road again. Tour dates so far:

Date     | Location           | Event              | Topic

13 May| Sydney               |               JUG | Scaling Hibernate and DAOs and Ehcache 2.0; New stuff

2 June | San Francisco | Google JUG | Ehcache Google App Engine module and caching in GAE generally

2 June | Jacksonville    |               JUG | Scaling Hibernate and DAOs and Ehcache 2.0; New stuff

2 June | Tampa               |               JUG | Scaling Hibernate and DAOs and Ehcache 2.0; New stuff

15 June | France             |               JUG | Scaling Hibernate and DAOs and Ehcache 2.0; New stuff

16 June | Franfurt           |               JUG | Scaling Hibernate and DAOs and Ehcache 2.0; New stuff

17 June | Amsterdam     |               JUG | Scaling Hibernate and DAOs and Ehcache 2.0; New stuff

18 June | Sweden            |               JUG | Scaling Hibernate and DAOs and Ehcache 2.0; New stuff

Sunrise over the Flat Iron Building, New York City

Topics Flexible

Most people are interested in scaling Hibernate which most of the talks cover. But I am flexible. If you are interested in attending one of these events send me some topic requests to gluck AT gregluck.com.

For example, I learnt on my last tour that around 45% of shops are using JDBC usually with a DAO layer. Because I always use ORM and have been doing that for 7 years this caught me by surprise. Caching DAOs offers the same benefits as Hibernate second level caching. We are developing some new docs on ehcache.org and sample code to show how to do this. So I am going to include that in my next lot.

Another popular topic is Ehcache versus Memcached. Comparing and contrasting the two is a great way to understand what is on offer with Ehcache, particularly in combination with Terracotta.

Other News

There has been a lot going on. Ehcache 2.0 was released a few weeks ago. Ehcache is doing some interesting integrations with Grails, Google App Engine and EC2. Plus there have been new releases of the RESTful server. And next week some bug fix releases coming: ehcache 2.0.1  ehcache-web 2.1.

Finally we will be likely be making some packaging refinements to make it much easier to get Ehcache with Terracotta integrated into your development process. Terracotta is a server. We will probably add Maven and Ant tooling support so that you easily deploy it locally for running integration tests. It’s startup time is 5 seconds which is pretty quick and compares favourably with things like Tomcat and ActiveMQ.