Ehcache 2.4 with the new Search feature is out!

Ehcache 2.4 launches today. The big new feature right in the core of Ehcache 2.4 is Search.

It uses a new fluent API which looks like this:

Results results = cache.createQuery().includeKeys().addCriteria(age.eq(32).and(gender.eq(“male”))).execute();

In short, it lets you further offload the database. With Ehcache now supporting up to 2TB and linear scale-out you can do more than ever.

What is searchable?

You can search against predefined indexes of keys, values, or attributes extracted from values.

Attributes can be extracted using JavaBeans conventions or by specifying a method to call.

For example to declare a cache searchable and extract age as a JavaBean and gender as a method out of a Person class:

<cache name="cache3" maxElementsInMemory="10000" >
     <searchable>
          <searchAttribute name="age"/>
          <searchAttribute name="gender" expression="value.getGender()"/>
     </searchable>
</cache>

Caches can also be made searchable programmatically. And Custom Value Extractors can be created so that you can index and search against virtually anything you put in the cache.

Search Query Language

Ehcache Search introduces EQL, a fluent, Object Oriented query language which we call EQL, following DSL principles, which should feel familiar and natural to Java programmers.

Here is a full example. Search for men whose names start with “Greg”, and then order the results by age. Don’t return more than 10 results. We want to include keys and values in the results. Finally we iterate through the Results.

Query query = cache.createQuery();
query.includeKeys();
query.includeValues();
query.addCriteria(name.ilike(“Greg*”).and(gender.eq(Gender.MALE))).addOrderBy(age, Direction.ASCENDING).maxResults(10);

Results results = query.execute();
System.out.println(” Size: ” + results.size());
for (Result result : results.all()) {
System.out.println(“Got: Key[" + result.getKey()
+ "] Value class [" + result.getValue().getClass()
+ "] Value [" + result.getValue() + "]“);
}

EQL is very rich. There is a large number of  Criteria such as ilike, lt, gt, between and or which you use to build up complex queries. There are also Aggregators such as min, max, average, sum and count which will summarise the results.

Like NoSQL, EQL executes against a single cache – there are no joins. If you need to combine the results of searches from two caches, you can perform two searches  and then combine the results yourself.

Standalone and Distributed

Search is built into Ehcache core. It works with standalone in-process caching and will work for distributed caches in the forthcoming Terracotta 3.5 platform release which goes GA in March and is available as a release candidate now.

Distributed cache search is indexed and executes on the Terracotta Server Array using a scatter gather pattern. The EQL is sent to each cache partition (the scatter), returning partial results (the gather) to the requesting Ehcache node which then combines the results and presents them to the caller. Terracotta servers utilise precomputed indexes to speed queries.

Indeed, the distributed cache performance has an important property: searches execute in O(logN)/partitions time. So if you have 50GB of cache in one partition which takes 40ms to search and then you double the data, you can hold the execute time constant by simply adding another partition. Generally, you can hold execution time constant for any size of data with this approach.

The standalone cache takes a different approach. Most in-process caches are relatively small. And Ehcache is lightning fast. We don’t use indexes but instead visit each element in the cache a maximum on once, resolving the EQL. It takes 5ms to run a typical query against a 10,000 element cache. Generally the performance is O(N) but even a 1 million entry cache will take less than a second to search using this approach.

Sample Use Cases

Caches can also be made searchable programmatically. And Custom Value Extractors can be created so that you can index and search against virtually anything you put in the cache.

Database Search Offload

Take a shipping company that creates 50GB of consignment notes per week. Customers search by consignment note id but also by addressee name. Most searches (95%) are done within two weeks of the creation of a consignment note.   The consignment notes get stored in a relational database that grows and grows. Searches against the database now take 650ms which take enquiry outside it’s SLA.

Solution: Put the last two weeks of data in the cache. Index by consignment note id, first name, last name and date.  Search the cache first and only search the database if there the consignment note is not found. This takes about  50ms and provides a 95% database offload.

Real-Time Analytics Search

In-house analytics engines processes large amounts of data and compute some result from it. The results need to be queried very quickly to enable processing of a within a business transaction. And the results need to be updated through the day in response to business events. Some examples are credit card fraud scoring, or a holding position in a trading application.

Create a distributed cache and index it as required. Various roll-ups are cached and updated after the system of record has been written to with new transactions. Use Ehcache’s bulk loading mode to quickly upload the results of overnight analytics runs.

Searches execute much more quickly than it would take to compute positions from scratch using the system of record, enabling the real-time analytics.

More Information

You can get Ehcache 2.4 from ehcache.org. Search is fully documented along with downloadable demos and performance tests here.