The Limits of notifyAll

notifyAll()

Java’s notifyAll() method, common to all Objects, wakes up all threads
that are waiting on the object lock. A thread waits on an object’s
monitor by calling one of the wait methods. Other threads then compete
to obtain that lock.

A problem emerges

I had a blocking cache in production on a busy J2EE website for almost
a year which worked fine that way. Then we started getting comments
that the site was slowing down. All the performance benchmarks were
good, so we shrugged our shoulders. Then we noticed large numbers of
threads, visibly poor performance, and very high cpu and process
waiting counts. What was happening?

Analysis with thread dump

A thread dump, taken with ~~kill -3 ~~ showed about 1300
threads. Of these there were 800 which started with:

“Thread xx ” prio=1 tid=0xb0576a70 nid=0x540d waiting for monitor entry

Unfortunately, the Java thread dump is not documented. At all. I
created a bug for this with Sun, but they closed it. They think
documentation of it is unnecessary. Having spoken to lots of people
about this it is clear that no-one is entirely sure about all of the
things reported in the thread dump. We considered a threading problem,
but there were others explanations for what we were seeing. Having
hunted those things down with no improvement, I took the thread dump
literally and became convinced that threads were being starved of
access to the object lock.

notifyAll() – Gridlock

Looking at the code we became concerned that once enough threads
started competing for the object lock, so much congestion would result
that throughput would drop off. The code in question is shown below:

/** * Looks up an entry. Blocks if the entry is locked. *
Returns null if the entry is not cached and locks the entry. * Note: If
this method returns null, {@link #put} must be called * at some point
after this method, to mark the entry as fetched. * * Warning: If not a
deadlock will occur. * */ public synchronized Serializable get(final
Serializable key) throws CacheException { LOG.debug(“get called on
Cache ” + cache.getName() + ” for key ” + key); while
(locks.contains(key)) { if (LOG.isDebugEnabled()) { LOG.debug(“Cache ”
+ cache.getName() + ” blocking on ” + key); } try { wait(); } catch
(InterruptedException e) { throw new CacheException(“Exception”); } }
Element element = null; try { element = cache.get(key); } catch
(net.sf.ehcache.CacheException e) { throw new
CacheException(“Exception”); } if (element != null) { return
element.getValue(); } else { if (LOG.isDebugEnabled()) {
LOG.debug(“Cache ” + cache.getName() + ” does not contain ” + key);
LOG.debug(“Cache ” + cache.getName() + ” locks ” + key); }
locks.add(key); return null; } } /** * Adds an entry and unlocks it. */
public synchronized void put(final Serializable key, final Serializable
value) { if (value != null) { if (LOG.isDebugEnabled()) {
LOG.debug(“Cache ” + cache.getName() + ” adding ” + key); } final
Element element = new Element(key, value); cache.put(element); } else {
if (LOG.isDebugEnabled()) { LOG.debug(“Cache ” + cache.getName() + ”
removing ” + key); } cache.remove(key); } locks.remove(key);
notifyAll(); }

Doug Lea talks about the design goals of liveness and safety.
Unfortunately getting both at the same time can be the trick. I started
thinking about how to measure liveness. I added a synchronized method
which simply returned the cache name.

I saw that, when the system slowed down, liveness was dropping.
Ultimately before running out of threads, the liveness method would
take more than 1 minute to return. Some developers I have discussed
this with call it a stampede. notify would be sufficient in the above
code. The test results for notify give a 50% better performance than
notifyAll.

A new Blocking Cache

Improvements to liveness could be improved by:

  • eliminating notifyAll()
  • moving to finer grained locking, by synchronizing on the cache’s keys
    rather than the cache

An extract from the new Blocking Cache is shown below:
{code} /** * A map of cache entry locks, one per key, if present */
private final Map locks = new HashMap();
… /** * Looks up an entry. Blocks if the entry is null. * Relies on
the first thread putting an entry in, which releases the lock * If a
put is not done, the lock is never released */ public Serializable
get(final Serializable key) throws CacheException { Mutex lock =
checkLockExistsForKey(key); try { lock.acquire(); final Element element
= cache.get(key); if (element != null) { lock.release(); //ok let the
other threads in return element.getValue(); } else { //don’t release
the read lock until we write return null; } } catch
(InterruptedException e) { throw new CacheException(“Interrupted”); }
catch (net.sf.ehcache.CacheException e) { throw new
CacheException(“EHCache exception”); } } private synchronized Mutex
checkLockExistsForKey(final Serializable key) { Mutex lock; lock =
(Mutex) locks.get(key); if (lock == null) { lock = new Mutex();
locks.put(key, lock); } return lock; } /** * Adds an entry and unlocks
it */ public void put(final Serializable key, final Serializable value)
{ Mutex lock = checkLockExistsForKey(key); try { if (value != null) {
final Element element = new Element(key, value); cache.put(element); }
else { cache.remove(key); } } finally { //Release the readlock here.
This will have been acquired in the get, where the element was null
lock.release(); } }
{code}

Comparative Liveness

I wrote a test which creates 40 threads each of which hammers the cache
with gets and puts, checking liveness continously.
The old cache’s liveness went as high as 4 seconds. The new one never
exceeded 300 milliseconds.
The new cache went into production and is working very well.

Lessons Learned

I think concurrent programming is hard. The ehcache project itself was
born out of concurrency problems in JCS. When I did ehcache I wrote a
lot of tests which broke JCS but did not break ehcache. Thus confidence
was gained. Similarly the two blocking cache implementations are tested
to show there is a difference in liveness.
When concurrent programming code sits in a business application, it is
hard to get test focus on it. There were tests for the old blocking
cache, but none that tested all of its concurrent aspects under stress.
Beyond that I think it is a good idea to push concurrent code out to
the public domain. As Linus says, “With a million eyeballs, all bugs
are shallow”.
So, I have moved most of the concurrent programming out of the business
application and into http://ehcache.sourceforge.net.

The ehcache blocking constructs package

For some time I have been thinking about adding a constructs package to
ehcache. The new blocking cache is the foundation of a family of caches
added under the constructs package. The old blocking cache has also
been added to the tests to enable comparisions to be made.
You can see the caches and tests here:

  • http://cvs.sourceforge.net/viewcvs.py/ehcache/ehcache/test/java/net/sf/ehcache/constructs/blocking/BlockingCacheTest.java
  • http://cvs.sourceforge.net/viewcvs.py/ehcache/ehcache/test/java/net/sf/ehcache/constructs/blocking/NonScalableBlockingCache.java
  • http://cvs.sourceforge.net/viewcvs.py/ehcache/ehcache/src/java/net/sf/ehcache/constructs/blocking/BlockingCache.java

Update__ 5 July 2004

The constructs package has now been released as part of http://ehcache.sourceforge.net 0.9.