Introducing the Elephant Curve

While having a few drinks with some French-speaking colleagues at  Le Meridien hotel in San Francisco during JavaOne 2010 I realised that French speakers have a cool name for a load phenomenon of online systems

It is difficult to tune an online system for the average daily traffic volume because it varies a lot during the day. Specifically in my experience it is common to see demand rise in the morning to a midday peak then lull somewhat in the afternoon to be followed by a lower mid-evening peak. Things then quiten down. Now my experience is in travel systems. The explanation we had was that though some of the usage was business related, a lot was leisure. And users would tend to search and book travel at lunchtime and then after dinner.

It turns out that French speaking people have come across the same phenomenon but were clever enough to give it a name: The Elephant Curve. The reference is to Le Petit Prince, a best-selling children’s story from  1943 which has been published in 190 languages. In the book a boa constrictor swallows an elephant. The silhouette of the boa then becomes an elephant curve. Though I did not read the book at school it seems that many people have.

Here is the elephant curve illustration from the book

So I plan on calling the E-commerce daily double spike the elephant curve from now on like the Francophones do. I am going to add this as a slide in my talks and to my Caching Theory chapter on ehcache.org.

Thanks to Ludovic Orban (Belgian) and Alex Snaps (German/Belgian) for appraising me of this.

Ehcache BigMemory’s big speedup leading up to GA

Last month we made a big splash with our news of BigMemory, our add-on to Ehcache which creates very large caches in the Java process ((VLIPCCs) while still avoiding the achilles heel (hell) of GC pauses. We released charts on ehcache.org showing our performance up to 40GB of cache.

Optimising for Byte Arrays. Why?

We will be GAing later this month and have been doing lots of optimisation work. One case we have optimised for is storage inthe cache of byte arrays. Why? Because we use also use BigMemory in the Terracotta Server Array (TSA) as well. And data in the TSA is stored in byte arrays. We get to escape GC pauses in the TSA, which is generally a very good idea for a high performance server. A field recommendation to escape GC has been to use no more than 4GB of cache in the TSA in-memory with the balance on disk. With BigMemory for users who do not want disk persistence with TSA, we can run the whole lot in memory.

Some new Charts

Himadri Singh, a performance engineer at Terracotta has done some new performance runs, to try out our byte array optimisation, with our published perf tester  (See  https://svn.terracotta.org/repo/forge/offHeap-test/ Terracotta Community Login Required).

These were done on a new machine with 8 cores and 350GB of RAM of which we used up to 100GB for these tests.

This first chart below shows the mean latency for byte arrays. It is almost 4 times faster than our beta release a month ago! Now we get 110 usec flat response time right out to 100Gb and then we had 450 usec out to 40GB.

This next chart is the old test from the documentation page on ehcache.org and is still representative of the present performance for non byte array data types.

This next chart is throughput. We also get a four fold increase in throughput compared to the beta.

What’s Next

The speedup for byte arrays was done by optimising serialization for this one case. We know that serialization for puts and dererialization for gets is by far the biggest cost. So applying Amdahl’s law, it makes sense for us to concentrate our further optimisations there.

The following chart is from the Java Serialization Benchmark, and shows the performance of alternative Java Serialization frameworks.

Some key observations:

  1. Java’s built in one is pretty slow
  2. There are lots of choices now
  3. We can get around a 50 times speedup with the faster libraries!
  4. Handwritten is the fastest

So what’s next? We don’t want to hold up ehcache-enterprise 2.3, but we are planning a follow-up release to exploit this. We are thinking about these changes:

  1. Make Ehcache’s Element Externalizable with a pluggable strategy for readExternal and writeExternal
  2. Expose the pluggable serialization strategy via config on a per cache basis or across an entire CacheManager
  3. Probably bundle or make an optional dependency on one of the above serialization packages. If you have a favourite you want to argue for ping me on twitter. I am gregrluck.

Is it time to fork Java?

This year’s JavaOne was a dismal affair. Crammed into the Hilton hotel and Parc 55, the feeling was that Oracle had ruined the conference. And the dual conference idea also caused Java people problems: those that tried to attend the key note at Moscone with JavaOne passes were turned away – instead needing to go to the Hilton ballroom to see it televised.

Last year, Larry shocked many with a goof ball suggestion to port OpenOffice to JavaFX. This year attendees were shocked to hear of the cancellation of JavaFX. Those giving and attending sessions on the topic felt it was futile. I was never convinced that JavaFX would be able to get the sort of dominance required to make it work as a browser platform. But what of Java 7, something I am interested in? The beer talk I heard at the Thirsty Bear, was that the JCP has stalemated for the past year on Java 7 over Oracle wanting to add a “restricted field of use condition” to it restricting the OpenJDK to desktop and server, not mobile. One possibility is for Oracle to abandon the JCP and just release it. The other rumour floating around is that future free versions of the JDK will be reference implementations, with higher quality or more fully featured versions only available under commercial license.

All of this suggests to me that Java as we have known it is over. Should we wait for Java to lose momentum and popularity to other languages? Or should we as a community step up and go in a new direction. I prefer the latter. Following is a sketch of how this could be done.

What to call the fork

Java is famous for coffee but also for volcanoes. So let’s call the new fork Lava.

Lava Foundation

We don’t want one company to take over the fork. What would be best is if  a foundation, like the Linux Foundation, Mozilla Foundation, or Eclipse Foundation be formed. This group would be funded by corporations with deep enough pockets to make it work such as Google, IBM, HP and RedHat.

It would be a non-profit foundation.

It would  perform the following duties:

  • A code fork of OpenJDK, based on current trunk.
  • A new standards body to replace the JCP
  • Creation and maintenance of a Lava TCK, which implementations would test against.
  • A new annual conference, or a series of conferences.
  • Write Once, Run Anywhere

    So how would we maintain this guarantee? The Lava compiler and Virtual Machine would remain backwardly compatible with Java 6. That way the vast array of existing code would work. Then depending on IP restrictions, Lava versions could add support for new language features in later versions of Java. If IP restrictions would preclude that it stays with Java 6.

    The question then becomes whether developers would write to the new Java versions or to the evolving Lava. The answer likely would depend on market traction. In favour of Oracle is that it is the real Java. However  if you need to pay license fees to Oracle for the higher quality implementations you would likely want to run in production, then that would come at a cost. If enough commercial companies supported Lava so that it was very high quality, then developers would follow. And developers would want the open source version to win.

    I am interested in what the community thinks of this idea. Ping me at gluck AT gregluck.com.

    Updates

    October 7:

    Stefan Asemota created a Lava Foundation facebook page here.

    October 14:

    Well, some interesting developments have occurred in the last week. IBM and Oracle jointly announced that IBM was switching from project Harmony to OpenJDK and would:

    work with IBM and others to enhance and improve the JCP.

    What does this last one mean? Trink Guarino clarified this for me:

    This includes improving the collaboration with other standards bodies, increasing the participation in the JCP processes and expert groups as well as improving the efficiency of the specification process.

    A close reading of the various corporate blogs and press releases shows that the approach between the two companies was made after the reaction to this blog. So, here is hoping that IBM, by negotiating in it’s own interest, will also open things up for the community.

    Finally, what about the conference? Happily I was asked to give some feedback on JavaOne, which I did in great detail. Beyond that I am going to Devoxx 2010 and will be speaking there on “The essence of Caching”. With Google there, and hopefully the European non-attenders of JavaOne this year, it should be a bumper conference and may well be the largest Java conference this year.