I will be speaking at JavaOne 2011. My sessions are:
24241 – The Essence of Caching, Parc 55 – Divisidero at 10:30am Tuesday 4 October
This presentation distills what the speaker has learned from 10 years of scaling Java. It starts with a performance problem and leads you through solving it with caching, discovering the problems of distributed caching and their solution along the way. It will equip you with the tools to analyze a performance situation and see whether a cache will help and what type of cache to apply.
• The nature of system load
• Desirable properties of scalable systems
• Caching as a solution for offload, scale-out, and performance
• Why caching works
• Tiered cache design
• SOR coherency problem and solutions
• N * problem and solutiond
• Cache cluster topologies
• CAP and PACELC constraints
• Resulting design trade-offs
24223 – The New JSR 107 Caching Standard, Imperial Ballroom A , Hilton San Francisco at 1:30 pm Tuesday 4 October
In this session, the two spec leads for JSR 107 walk you through this important new caching standard, which will form part of Java EE 7.
You will learn how to
• Abstract your caching implementation, much as with JDBC
• Use the rich and modern API
• Use the new caching annotations
• Use the API before Java EE 7 is released within the Java SE, Java EE 6, and Spring environments
• Apply JCache to common caching scenarios
Come along and feel free to ask me any questions after my sessions.
This How To is too well hidden for it’s or end users’ good:
Specification Sections for Review
- Chapter 5 – JTA. This is ready for review.
- Chapter 8 – Annotations. This is complete but needs reformatting and TODOs done. It and the associated spec interfaces should be ready for review on Monday.
- Reviewing comments added to the Google Doc
Last month we made a big splash with our news of BigMemory, our add-on to Ehcache which creates very large caches in the Java process ((VLIPCCs) while still avoiding the achilles heel (hell) of GC pauses. We released charts on ehcache.org showing our performance up to 40GB of cache.
Optimising for Byte Arrays. Why?
We will be GAing later this month and have been doing lots of optimisation work. One case we have optimised for is storage inthe cache of byte arrays. Why? Because we use also use BigMemory in the Terracotta Server Array (TSA) as well. And data in the TSA is stored in byte arrays. We get to escape GC pauses in the TSA, which is generally a very good idea for a high performance server. A field recommendation to escape GC has been to use no more than 4GB of cache in the TSA in-memory with the balance on disk. With BigMemory for users who do not want disk persistence with TSA, we can run the whole lot in memory.
Some new Charts
Himadri Singh, a performance engineer at Terracotta has done some new performance runs, to try out our byte array optimisation, with our published perf tester (See https://svn.terracotta.org/repo/forge/offHeap-test/ ).
These were done on a new machine with 8 cores and 350GB of RAM of which we used up to 100GB for these tests.
This first chart below shows the mean latency for byte arrays. It is almost 4 times faster than our beta release a month ago! Now we get 110 usec flat response time right out to 100Gb and then we had 450 usec out to 40GB.
This next chart is the old test from the documentation page on ehcache.org and is still representative of the present performance for non byte array data types.
This next chart is throughput. We also get a four fold increase in throughput compared to the beta.
The speedup for byte arrays was done by optimising serialization for this one case. We know that serialization for puts and dererialization for gets is by far the biggest cost. So applying Amdahl’s law, it makes sense for us to concentrate our further optimisations there.
The following chart is from the Java Serialization Benchmark, and shows the performance of alternative Java Serialization frameworks.
Some key observations:
- Java’s built in one is pretty slow
- There are lots of choices now
- We can get around a 50 times speedup with the faster libraries!
- Handwritten is the fastest
So what’s next? We don’t want to hold up ehcache-enterprise 2.3, but we are planning a follow-up release to exploit this. We are thinking about these changes:
- Make Ehcache’s Element Externalizable with a pluggable strategy for readExternal and writeExternal
- Expose the pluggable serialization strategy via config on a per cache basis or across an entire CacheManager
- Probably bundle or make an optional dependency on one of the above serialization packages. If you have a favourite you want to argue for ping me on twitter. I am gregrluck.
Ehcache has configurable correctness and coherency settings. A good, though not complete analogy is database isolation settings.
So, within an Ehcache cache clustered with Terracotta, you have:
Coherent – Read Committed
I went to DevNexus in Atlanta a few weeks back. Neal Ford gave a talk on predicting the future. Neal is a great speaker and I found it very thought provoking. Now as a former colleague of Neal’s I also felt challenged to think about whether I agreed with him. Neal is a super-smart polyglot coder who does not quite view the world the way most devs do.
Here is my list and justifications for each. Some of these disagree with Neal’s predictions.
Will developers have to deal with the challenges of parallelisation in CPUs? No.
Neal said yes, and that this was justification to go with something like Scala with its Actors to avoid learning threading in Java. I say no, and here’s why.
The issue is real enough. Core frequency has stopped increasing and CPUs are getting more cores. Lots more cores. For example, the Sun Fire X4640 server comes with 4 to 8 Six-Core AMD Opteron processors. That’s 48 processors. The normal threading approach has been to use monitors where one thread at a time gets exclusive access to an object or a synchronized block. That works well with small numbers of cores, but not with lots. What happens is that more and more time gets spent waiting to acquire a lock. The newer approach avoids this as much as possible with a whole slew of tricks such as immutable collections like CopyOnArraySet and CAS.
So will devs have to deal with this? No. Why? Because in the Java world developers are mostly protected from multi-threading. Instead the JDK (think java.util.concurrent), web containers and app servers (think Glassfish, Jetty and Tomcat) and libraries, like my own Ehcache have to deal with this. Those libraries that do get to play in this new world. Those that don’t will fall by the wayside. But it is these product and project vendors who are affected, not the vast developer population.
One ring to rule them all… Maven
William Gibson once said “The future is already here. It’s just not very evenly distributed.” A few years ago I banked on Maven becoming a big deal and decided to back it big time. I converted Ehcache to it and the corporate stuff I was doing too. Everywhere. People complain that Maven is too complicated. Well, the problem is complicated. I admit that Maven ranks up there with EJB in its difficulty. But it is getting better. Maven sets lofty ambitions for a system of world-wide component development, and is largely successful. Hats off to Jason and co.
I surveyed my audiences on my recent US tour. In Silicon Valley, 60% used Maven. In the rest of the country about 40-50%. Where is the future already here?
Swallow the red pill.
Brave new polyglot world
4 years ago, fearing I was missing out on the dynamic language revolution, I learnt Python and then Ruby, both of which were in use at my workplace. That was useful and fun.
Now, as Java developers it is difficult not to feel inadequate unless you know Java and a couple more JVM languages which in order of descending popularity I believe are: Groovy, Scala, JRuby, Cloujure, Jython ….
Ok, so how many people can code fluently in Java and one of these in the same day? Well, I asked my audiences. Answer: < 5%. Which thinking back jelled with my corporate experience. There were two guys, both brothers, who were polyglots. And a few pretenders. And the rest of us struggled.
For the record I am saying that we all need to know at least XML, HTML, CSS, JS, shell, maven, ant – no argument there.
Neal suggested that projects would be written in multiple languages because “that’s how ThoughtWorks Studios does it and that is the future”. Not. ThoughtWorks is filled with polyglots like Neal – which is not representative of the community at large.
My prediction is that a whole project will be written in one JVM language, whether it be Java or one of the others.
Of course a given project can exploit existing libraries which are in byte code form independent of the language they were written in. Incidentally, to support this world, Ehcache is in Grails, has a very useful Groovy caching framework called SpringCache (it is for Groovy not Spring though), has a JRuby GEM and so on. In other words we make ourselves available
Self Healing Open Source
I sometimes worry about the future of Java with the consolidation that has happened – not just the Oracle acquisition but things like Spring. It is a different world than the one we were in 5 years ago. Something similar happened in Unix, which started off free in the 1970s but by the late 80s was dominated by commercial vendors who were in the Unix Wars.
What happened? Linux.
Before turning off the lights, Sun open sourced pretty much everything they owned. That creates a legal basis for the forking of that code into new projects if the open source community feels it needs to self-heal.
So will it? That depends on the vendors. But I think we as developers are safe.
Virtualisation and that cloudy stuff
Love it or hate it, virtualisation is here to stay. Get over it.
Are there problems? Yes. Do the cloud environments create even more problems? Yes. But this is a sea change in our industry which is mostly about freeing us from sysadmin costs and is therefore unstoppable. Get on board or become a dinosaur.
I read a book 5 years ago called The Skeptical Environmentalist which argued quite convincingly a whole host of politically incorrect views, most notably that there were multiple explanations for the warming observations such as the sun spot level. Bjørn Lomborg was pilloried for this and formally investigated for scientific dishonesty. He prevailed.
So at the time my view was that the case was unproven. The nice thing about empirical science is that theories can be falsified with more data. An of course the more data we get the more probably it becomes that global warming is real and is not a problem that will solve itself.
So for some non-Java predictions:
- Global warming will eventually be proven to most people’s satisfaction (think evolution) to be correct
- Despite the heroic efforts of the Europeans amongst others, some serious warming will occur
- This will happen in a perfect storm with the rising cost of fertilizers (based on the price of oil) rolling back a chunk of the green revolution of the 60s
- There will therefore be lots of hot, thirsty, hungry people looking for a new home
- New Zealand is one of the few places in the world predicted to be little affected by global warming
- lots of people will want to move to New Zealand
So the smart money would say “Apply for New Zealand citizenship now and beat the rush”.
Ehcache Server provides a RESTful API for cache operations. I am working on v0.9 and have been doing some performance benchmarks. I thought it would be interesting to compare it with the performance of that other over-the-network cache, Memcache. Now I already knew that Ehcache in-process was around 1,000 times faster than Memcache. But what would the over-the-network comparison be.
Here are the results:
<div id="_mcePaste">Memcache and SpyMemcache Client</div>
<div id="_mcePaste">10000 sets: 3396ms</div>
<div id="_mcePaste">10000 gets: 3551ms</div>
<div id="_mcePaste">10000 getMulti: 2132ms</div>
<div id="_mcePaste">10000 deletes: 2065ms</div>
<div id="_mcePaste">Ehcache 0.9 with Ehcache 2.0.0</div>
<div id="_mcePaste">10000 puts: 2961ms</div>
<div id="_mcePaste">10000 gets: 3841ms</div>
<div id="_mcePaste">10000 deletes: 2685ms</div>
So, the results are a wash. Memcache is slightly slower on put, maybe because the JVM does not have to malloc, it already has the memory in heap. And very slightly faster on get and delete.
A few years ago there was a raging thread on the Memcache mailing list about Memcache versus MySQL with in-memory tables. They were also a wash. I think the point is that serialization and network time is more significant than the server time, provided the server is not that much different. And this is what we see with Ehcache.
And now for the implications:
- REST is just well-formed HTTP. Just about any programming language supports it. Without having to use a language specific client, like you need to do with Memcache. Ehcache Server was the first Java cache to support the REST API. But many others have followed. Why? It is a really good idea.
- Performance wise, REST backed by a Java App Server and Cache, is about as fast as Memcache.
- Therefore, your decision on what to use should depend on other things. And Memcache is no-frills. If you chuck Terracotta in behind Ehcache (we are one company now after all) then you get HA, persistence, and coherence expressed how you like, as a guarantee that you any reading the last write or more formally in XA transactions.
Finally, I will be giving a Webinar in a few weeks where I compare Ehcache and Memcache.
The last time I was in the US a few months ago I was told that the grace period for the new ESTA US Non Visa Waiver program would expire in a few months. Because that time was not up, I did not apply.
While this is true the US Department of Immigration in addition requires all Air New Zealand travellers to use the ESTA program. So on attempting to board we had to apply. No problem.
I searched for “us visa waiver australians” and up came three “Sponsored Links”
www.estaaustralia.org Welcome to the U.S. ESTA Application Website.
www.ESTA-au.org Welcome to the U.S. ESTA Application Website.
www.smartraveller.gov.au Check out the information that is not in your guidebook before you go
I filled out the applications on the first link. AUD53 payment was required for each which I provided by credit card.
Then back to the check-in counter. The applications had not come through which was strange. Then they said there is no fee. Finally they have leaflets for the site you are meant to use which is https://esta.cbp.dhs.gov
This smelt like a scam. Air New Zealand rang the US Consulate who confirmed that I had not been registered, and then pointed out that I didn’t need to be because the grace period was not over 🙂
Final call was to the Commonwealth Bank to cancel my credit card and dispute the payments. A fun boarding.
I have released a new version of Ehcache, 1.5.0-beta1, which provides a host of new features and a few bug fixes. Some of the features are up to a year old, so this pretty large release is a chance to clear the decks ahead of some exciting new work coming in 1.6, such as the ehcache Cache Server.
Please dive into this version and let me know if you find any issues. I am hoping the tyres will be kicked by enough people to do a final release in 3-4 weeks’ time.
Added JGroups Implementation. Thanks to Pierre Monestie for the patch(es) for this. Though new to the core distribution JGroups replication has been in production use in a large cluster for the last year. This does not create a dependency on JGroups unless you want to use this replicator. That will be made clearer when it is moved to a separate module before final release.
CachingFilter performance improvements,
Constructs performance improvements
added loadAll() to the loader implementation to enable preload without specifying keys.
diskPersistent now can be used with caches that use only MemoryStore in normal use but want to persist to disk
DiskStores are now optional. The element is now non-mandatory. This will simplify configurations particularly where multiple CacheManagers are being used. If one or more caches requires a DiskStore, and none is configured, java.io.tmpdir will be used and a warning message will be logged to encourage explicity configuration of the diskStore path.
The default RMI based cache replication can now configure a RemoteObject port so that it can be easily configured to work through firewalls. This is done by adding a new property remoteListenerPort to RMICacheManagerPeerListenerFactory to enable it to be specified.
Added a new system property expansion token “ehcache.disk.store.dir” to DiskStore configuration which can be used to specify the DiskStore directory on the command line or in an environment variable.
e.g. java -Dehcache.disk.store.dir=/u01/myapp/diskdir …
Added the ability to specify system property tokens using $tokenname in ehcache.xml which are then replaced when the configuration is loaded.
Updated the remote debugger with enhanced reporting and better documentation (See Logging page in the documentation).
The new version prints a list of caches with replication configured, prints notifications as they happen, and periodically prints the cache name, size and total events received.
Summary of Bug Fixes
CachingFilter bug fixes for various rare edge conditions
Major speed up to shutdown process when diskPersistent is not being used
Fixed various shutdown exception when both Hibernate and Spring both try to destroy caches