As you probably have heard, Terracotta has been acquired by Software AG. This is an exciting development for both companies. Ari Zilka, CTO of Terracotta has a comprehensive blog post detailing the acquisition and its implications.

For me, it means I keep working for Terracotta, but now Terracotta is a wholly owned business unit within Software AG.

Ehcache will remain available in its current two editions: open source under the Apache 2 license and commercial with value-add features. And of course it will get even more investment as part of the larger organization.

I joined Terracotta 21 months ago. It has been an amazing ride so far for me and for Ehcache. I am looking forward to the next chapter. Right now my area of focus is on standardizing Java caching by leading the specification of JSR107. Once that is done we will implement the specification in Ehcache.

For those not involved in the expert group or following on the Google Group (jsr107@googlegroups.com), here is an update.
As usual the actual specification in all of it’s evolving glory can also be viewed at https://docs.google.com/document/d/1YZ-lrH6nW871Vd9Z34Og_EqbX_kxxJi55UrSn4yL2Ak/edit?hl=en . Some people find working on and editing a spec in plain view “appalling” but I like it. It is like a nightly build in open source.
And the API and code is on GitHub: https://github.com/jsr107/jsr107spec

Workshops

We have been workshopping through the spec. Everyone on both the Google or EG mailing lists was invited. Physically or on the calls Friday and today have been EG and non-EG members:

Greg Luck (Ehcache and Terracotta)
Yannis Cosmadopolous (Oracle)
Christa (Oracle)
Ben Cotton
David Mossakowski (Citibank)
Richard Hightower
Ludovic Orban (Terracotta and Bitronix)

Specification Sections for Review

We have made a lot of progress. The spec is now 51 pages long.
Two of the most difficult areas have been mostly done. They are:
  • Chapter 5 – JTA. This is ready for review.
  • Chapter 8 – Annotations. This is complete but needs reformatting and TODOs done. It and the associated spec interfaces should be ready for review on Monday.

Significant Developments

There are also two significant developments:
1. Two versions with different API Packaging.
So that caching can easily be done in SE, we are considering having a jcache-api.jar which includes mandatory parts of the spec and a jcache-api-ee.jar which will include both mandatory and optional APIs. The latter takes up annotations and JTA. These can still be included in non-EE apps like Spring or SE but there will be significant dependencies that those environments would need to add.
2. Map vs Map-like
No one involved in the workshops likes Map. We are enumerating the reasons and proposing a new map-like Cache interface. That is being documented in the next few days. Add comments to the spec doc if you have them.

Next Meeting

The easiest way for you to get involved is at the next meeting which is 10am Pacific Standard time this Thursday.
Some topics for the next meeting:
  1. Reviewing comments added to the Google Doc
  2. Classloading
  3. Configuration
  4. Lifecycle

 

I have been very busy the last few months getting JSR107 fired up.

Just to remind you JSR 107 is the Java Temporary Caching API. It is designed to be vendor neutral and will allow for easy change over of implementations in much the same way as JPA or JDBC. In this way it will allow the community to choose the best open source or commercial implementation that best meets their business requirements.

Both Terracotta and Oracle have committed resources to getting this done. Right now that is myself and Yannis Cosmadopoulos from Oracle. Things are moving fast so I thought I would give a status update on where we are at.

JSR107 Early Draft Working Specification Available for Viewing

Work has been going steadily on the spec.

We are drafting this spec in the open on Google Docs. While a work in progress, it is now around 40 pages in length.

See https://docs.google.com/document/d/1YZ-lrH6nW871Vd9Z34Og_EqbX_kxxJi55UrSn4yL2Ak/edit?hl=en

Please keep checking back as this is changing on a daily basis.

We welcome ideas and feedback. Please join the JSR107 public mailing list to do so.

Summary of Scope

The specification covers the following areas:

Object Cache The API will cache objects. Classes that implement Serializable may be stored outside the JVM and potentially shared among JVMs connected by a network but Serializable will not be required.

Format independence The API will specify keys and values but will not limit their types.

Implementation independence The underlying implementation is hidden from API users. An SPI is used to configure a Cache Provider.

Support for Flexible Implementations Though the specification does not require any implementation and a simple in-process implementation will meet the specification, issues raised by distributed caching and storage in Serialized form outside the heap will be dealt with so that those implementations with those features will work well.

Java SE The specification will work with Java SE.

Java EE The specification will work within Java EE. This specification is targeted at inclusion in Java EE 7.

Generics The specification will make use of generic interfaces.

Annotations The specification will define runtime cache annotations.

Declarative Cache Configuration Specifying the behaviour of the CacheManager or Caches in a non-programmatic way. This may take the form of a minimal lowest common denominator with vendor specific further configuration, or it might take the form of a variety of mechansims to inject vendor configuration.

Transactions Support for transactions, both local and XA will be defined but left as optional for implementers

There are many applied areas of caching. This specification will not deal with them. For example:

• Database Caching. JPA[7] deals with that.

• Servlet Caching

• Caching as a REST service

 

Active Membership

I am pleased to report that we have a very healthy membership. We reconfirmed the membership, added new members with expertise and also added as observers the leads from JSR342 Java EE7.

The following are active expert group members:

  • Greg Luck, Terracotta and co-spec lead
  • Cameron Purdy, Oracle and co-spec lead
  • Nikita Ivanov, Grid Gain

• Manik Surtani, Red Hat Middleware LLC

• Yannis Cosmadopolous, Oracle and working on the spec for Oracle

• Chris Berry

• Andy Piper (formerly BEA Systems rep)

• Jon Stevens

The following have been voted in they are moving through the JCP process:

• Eric Dalquist

• Ben Cotton, Citigroup

• David Mossakowski, Citigroup

The following are members of JSR342 and are observers of JSR107 (they get the expert group mails):

• Linda Demichiel, Oracle

• Roberto Chinnici, Oracle

GitHub Repositories

Similar to the specification we are also coding up the API in public with repositories on GitHub.

jsr107-api

See https://github.com/jsr107/jsr107spec

ehcache-jcache

Ehcache via the ehcache-jcache module has provided an implementation of the draft spec to where it got to, for the past 3 years. We have now moved it to GitHub and will develop it along with the spec. If anyone wants to help out please send me your GitHub id.

See https://github.com/jsr107/ehcache-jcache

Public Mailing List

If you want to stay in touch with JSR107, please join our Google Groups public mailing list:  http://groups.google.com/group/jsr107

The address is: jsr107@googlegroups.com

We copy most expert group emails to this list.

 

I have been doing some NoSQL research lately. The first fruit of that work was a guest post on myNoSQL, Ehcache: Distributed Cache or NoSQL Store, which crisply distinguished between a Distributed Cache and NoSQL Stores.

In this article I am going to delve into the suitability of each for various technical use cases. I use the word “technical” because a usual use case is a business use case. Here we are interested in a set of features that allow a certain usage. In a follow up I hope to create a second, more business use case oriented table.

I welcome feedback on this, particularly from those with production experience.

Technical Use Case Distributed Cache NoSQL Key Value NoSQL Columnar NoSQL Graph NoSQL Document
Database Offload Excellent Poor (1) Poor (1) Poor (1) Poor (1)
Database Replacement Poor (2) Poor (3) Poor (3) Poor (3) Poor (3)
Weak Consistency Caching Excellent Average (2) Average Poor Poor
Eventual Consistency Caching Excellent (4) Average (5) Average (5) Average (5) Average (5)
Strongly Consistent Caching Excellent Poor Poor Poor Poor
ACID Transactional Caching Excellent Poor Poor Poor Poor
Low Latency Data Access Excellent Average (5) Average (5) Average (5) Average (5)
Big Data (6) Poor Excellent Excellent Excellent Excellent
Big Memory (7) Excellent (8) Poor Poor Poor Poor

Notes

  1. To offload the database you need to work in places and ways in which the database works. So for example you need to support transactions if they are being used and you need a place to plug in to avoid a ton of work like Hibernate or OpenJPA. NoSQL stores don’t do that.
  2. Distributed caches may not provide long term persistence and management of data. They are also often limited in size so may not be able to store all of the data.
  3. It is not clear that NoSQL is a full database replacement. The “Not Just SQL” as an alternative expansion of the acronym, something widely accepted by the NoSQL community, acknowledges this. The lack of SQL, the lack of ACID, sophisticated operations tools and so on, mean that NoSQL itself is not great at being a replacement. Rather, if you can rethink your need for a database to needing persistence, and you can change your application code, then it comes into play.
  4. In a node to the elegant CAP trade off allowed by eventual consistency, Ehcache 2.4.1, due out the end of March adds this consistency mode.
  5. Distributed Caches store hot data in process. You might think of memcache as a distributed cache, which it claims to be but it does not store data in -process – it is always over the network. And NoSQL is always over the network. In most R + W > N strategies, R is greater than one, so that multiple network reads are required and the caller must wait for R reads where each read is to a different server which will have a varying response. Distributed Ehcache has latencies of < 1 ms whereas the average for NoSQL is 5-10ms. This is also why NoSQL gets an average for Weak Consistency Caching. A cache should be fast.
  6. “Big Data” is a moving target that is today generally understood to start at a few dozen terabytes and go up into petabytes. The current implementation of Ehcache has been used to manage datasets up to 2 TB which is just at the starting point of Big Data. The whole point of NoSQL is Big Data, so they get full marks in this area.
  7. “Big Memory” is also a moving target and is early on it’s use as a term. We define it to mean using the physical limits of the hardware. For many architectures this has not been possible. With Java the issue was first 32 bits and then now the limitation is garbage collection. We overcame that issue with our BigMemory architecture, using storage in off-heap byte buffers in September 2010.
  8. Caches tends to be memory-resident. BigMemory allows in-memory densities per physical server up to their limits, which is 2TB for the current generation of commodity hardware from Dell, HP and Oracle but much lower due to their architectures which require full CPU population to achieve maximum memory. Although not all vendors are similarly constrained: Cisco UCS boxes allow more memory per CPU, so that for example they can do 384GB with 2 CPUs. NoSQL stores focus on persistency and have small in-memory server side caches. They focus on speeding up disk reads and writes by for example doing append only.

I ran a Maven build this morning and it broke. Strange, as I had not changed anything. Maven’s ability to simply break because of a change to a non-versioned dependency or even more arcane, a versioned dependency with a non-versioned dependency of it’s own (a transitive dependency) is legendary.  So I thought that was the trouble. On my other machine I ran a build and the Maven output “looked” different. Stranger still. So I did a mvn version on both.

Whoa! My Big Mac was 2.11 and my MacBook Pro was 3.0.2.

Apache Maven 3.0.2 (r1056850; 2011-01-09 10:58:10+1000)
Java version: 1.6.0_24, vendor: Apple Inc.
Java home: /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: "mac os x", version: "10.6.6", arch: "x86_64", family: "mac"

It came down in the “Java for Mac  OS X  10.6 Update 4″ which was released on 10 March.

Now a lot of folks are not ready to move to 3. For myself firstly I don’t need the grief until it is rock solid. Secondly, I use the site plugin and reporting extensively and this is all getting a big makeover in 3 which is not yet done.

There are lots of ways to go back to your old version.

My was is to add the old version into the front of my PATH in .bash_profile:

export PATH=/Users/gluck/work/apache-maven-2.2.1/bin:$PATH

 

JSR342

JSR342 was created on 14 March 2011. JSR107, or JCACHE, is included: In JSR342′s words:

The following new JSRs will be candidates for inclusion in the Java EE 7 platform:

Concurrency Utilities for Java EE (JSR-236)
JCache (JSR-107)

Isn’t JSR107 inactive?

But how could this happen if JSR107 is inactive?

Well the answer is that we are reactivating it. Oracle (various staff) and Terracotta (mostly me) have started work on the specification with the hope of having a draft spec for review by 20 April. To actually make sure it happens Oracle have allocated resources to work on this project. They are being led by Cameron Purdy, who is co-spec lead along with myself of JSR107 and the founder of Coherence. And of course I founded and still continue to lead Ehcache as CTO of it at Terracotta.

To be officially reactivated we need to submit the draft spec. So reactivation should happen on 20 April.

Motivations for finishing JSR107

Today there are two leading widely scoped frameworks for developing enterprise applications: Spring and Java EE. With the release of Spring 3.1, Spring, heavily influenced by Ehcache Annotations for Spring, has significantly enhanced their caching API. It is easier for Spring because they are a single vendor framework and can do things outside of a standards process. Java EE is still lacking any general purpose caching API. There are some use-specific APIs scattered throughout such as in JPA, but nothing developers can write to. And I know there is a significant need for a general purpose API. So, Java EE 7 wants a general purpose caching API, and this is the primary reason for finishing JSR107.

Another reason is that in-process caching is now heavily commoditised but not standardised. There are more than 20 open source in-process caching projects and another 5 or so commercial distributed caches. But if a user wants to change implementations, they need to change their code. This is akin to database access not having the JDBC standard. So we need to provide a standard API so that users can change caching implementations at low cost.

Scope of JSR107

There has been a bit of discussion about this, but it is most likely that the scope will be as it has been plus two new areas:

  1. Generics – similar to collections, allow caches to be created with defined keys and values
  2. Annotations and integration with JSR342 – allow caching annotations so that for example the return value from any functional method can be cached

The draft specification as it has existed for a few years is available under a net.sf.jsr107cache package name on SourceForge. And Ehcache provides an implementation of that draft spec via ehcache-jcache.

Get Involved

It seems a good time to look at the expert group and make a general invitation for new members.

If you are interested and would like to spend some time on this, please email me at gluck At gregluck.com and I can explain how to start. Additions to membership of the JSR107 are by voting of the existing members.

Is Ehcache a NoSQL store? No, I would not characterise it as that, but I have seen it used for some NoSQL use cases. In these situations it compared very well — with higher performance and more flexible consistency than the well-known NoSQL stores. Let me explain.

Read more

Check out my Google Tech Talk on The Essence of Caching: http://www.youtube.com/watch?v=TszcAWgCXD0

Ehcache 2.4 launches today. The big new feature right in the core of Ehcache 2.4 is Search.

It uses a new fluent API which looks like this:

Results results = cache.createQuery().includeKeys().addCriteria(age.eq(32).and(gender.eq(“male”))).execute();

In short, it lets you further offload the database. With Ehcache now supporting up to 2TB and linear scale-out you can do more than ever.

What is searchable?

You can search against predefined indexes of keys, values, or attributes extracted from values.

Attributes can be extracted using JavaBeans conventions or by specifying a method to call.

For example to declare a cache searchable and extract age as a JavaBean and gender as a method out of a Person class:

<cache name="cache3" maxElementsInMemory="10000" >
     <searchable>
          <searchAttribute name="age"/>
          <searchAttribute name="gender" expression="value.getGender()"/>
     </searchable>
</cache>

Caches can also be made searchable programmatically. And Custom Value Extractors can be created so that you can index and search against virtually anything you put in the cache.

Search Query Language

Ehcache Search introduces EQL, a fluent, Object Oriented query language which we call EQL, following DSL principles, which should feel familiar and natural to Java programmers.

Here is a full example. Search for men whose names start with “Greg”, and then order the results by age. Don’t return more than 10 results. We want to include keys and values in the results. Finally we iterate through the Results.

Query query = cache.createQuery();
query.includeKeys();
query.includeValues();
query.addCriteria(name.ilike(“Greg*”).and(gender.eq(Gender.MALE))).addOrderBy(age, Direction.ASCENDING).maxResults(10);

Results results = query.execute();
System.out.println(” Size: ” + results.size());
for (Result result : results.all()) {
System.out.println(“Got: Key[" + result.getKey()
+ "] Value class [" + result.getValue().getClass()
+ "] Value [" + result.getValue() + "]“);
}

EQL is very rich. There is a large number of  Criteria such as ilike, lt, gt, between and or which you use to build up complex queries. There are also Aggregators such as min, max, average, sum and count which will summarise the results.

Like NoSQL, EQL executes against a single cache – there are no joins. If you need to combine the results of searches from two caches, you can perform two searches  and then combine the results yourself.

Standalone and Distributed

Search is built into Ehcache core. It works with standalone in-process caching and will work for distributed caches in the forthcoming Terracotta 3.5 platform release which goes GA in March and is available as a release candidate now.

Distributed cache search is indexed and executes on the Terracotta Server Array using a scatter gather pattern. The EQL is sent to each cache partition (the scatter), returning partial results (the gather) to the requesting Ehcache node which then combines the results and presents them to the caller. Terracotta servers utilise precomputed indexes to speed queries.

Indeed, the distributed cache performance has an important property: searches execute in O(logN)/partitions time. So if you have 50GB of cache in one partition which takes 40ms to search and then you double the data, you can hold the execute time constant by simply adding another partition. Generally, you can hold execution time constant for any size of data with this approach.

The standalone cache takes a different approach. Most in-process caches are relatively small. And Ehcache is lightning fast. We don’t use indexes but instead visit each element in the cache a maximum on once, resolving the EQL. It takes 5ms to run a typical query against a 10,000 element cache. Generally the performance is O(N) but even a 1 million entry cache will take less than a second to search using this approach.

Sample Use Cases

Caches can also be made searchable programmatically. And Custom Value Extractors can be created so that you can index and search against virtually anything you put in the cache.

Database Search Offload

Take a shipping company that creates 50GB of consignment notes per week. Customers search by consignment note id but also by addressee name. Most searches (95%) are done within two weeks of the creation of a consignment note.   The consignment notes get stored in a relational database that grows and grows. Searches against the database now take 650ms which take enquiry outside it’s SLA.

Solution: Put the last two weeks of data in the cache. Index by consignment note id, first name, last name and date.  Search the cache first and only search the database if there the consignment note is not found. This takes about  50ms and provides a 95% database offload.

Real-Time Analytics Search

In-house analytics engines processes large amounts of data and compute some result from it. The results need to be queried very quickly to enable processing of a within a business transaction. And the results need to be updated through the day in response to business events. Some examples are credit card fraud scoring, or a holding position in a trading application.

Create a distributed cache and index it as required. Various roll-ups are cached and updated after the system of record has been written to with new transactions. Use Ehcache’s bulk loading mode to quickly upload the results of overnight analytics runs.

Searches execute much more quickly than it would take to compute positions from scratch using the system of record, enabling the real-time analytics.

More Information

You can get Ehcache 2.4 from ehcache.org. Search is fully documented along with downloadable demos and performance tests here.

Just when you think the database will never get smarter, it does. And the database that matters most to Enterprise Developers is Oracle.

What happens to coherency between your distributed cache and the database when another application changes the database? If it is a Java application, you can add Ehcache to it and configure it to connect to the distributed cache. But this might require work you do not want to do right now. Or there might be tens of apps involved. Or there might be scripts which are run by DBAs from time to time. Or it could be another language. We have the Cache Server, where you can expose a distributed cache via REST or SOAP and then invalidation becomes as simple as sending a HTTP Delete to the Element resource. See the Cache Server Documentation for more on this.

For a few years, MySQL users have had access to cache invalidation for Memcached using libmemcached. The database calls back to memcached. Brian Aker mentioned to me he was adding this into MySQL a few years ago. See http://www.mysqlconf.com/mysql2009/public/schedule/detail/6277 for a decent introduction.

For Oracle, back in 2005 I tried to use their message queue integration to achieve the same effect. Back then it didn’t even work. I have heard that it works now, but it is a very messy solution with lots of moving parts.

Fortunately, starting from 11g Release 1 (11.1), the Oracle JDBC driver provides support for the Database Change Notification feature of Oracle Database. Using this functionality of the JDBC drivers, multitier systems can take advantage of the Database Change Notification feature to maintain a data cache as up-to-date as possible, by receiving invalidation events from the JDBC drivers. See  the Database Change Notification chapter in the Oracle docs for details.

This lets you achieve a new and very simple way of doing a cache invalidation protocol to keep your cache and the database in sync.

You create a DatabaseChangeListener and register it with your connection:

// conn is a OracleConnection object.
// prop is a Properties object containing the registration options.
DatabaseChangeRegistration dcr = conn.registerDatabaseChangeNotifictaion(prop);
...
// Attach the listener to the registration.
// Note: DCNListener is a custom listener and not a predefined or standard
// lsiener
DCNListener list = new DCNListener();
dcr.addListener(list);

I like this so much I think we might add a standard DatabaseChangeListener to Ehcache for it.