In Ehcache 1.6, HashMap was replaced with ConcurrentHashmap with statistical sampling for eviction.
Having completed 1.6 and released it there were a few surprises along the way with ConcurrentHashMap performance.
There is some existing material online about ConcurrentHashMap versus HashMap performance, notably http://www.informit.com/guides/content.aspx?g=java&seqNum=246.
This article finds that ConcurrentHashMap puts are slower than HashMap when the map gets large: for a map of 1 million objects fully population took three times longer than HashMap for a single threaded scenario. However once you get to multi-threaded scenarios, you need to put synchonrization around HashMap. For those few of you in doubt as to this, email me your HashMap usage I will send you back a multi-threaded test that turns your computer into a fan heater (i.e. 100% infinite loop in the CPUs) in about 30 seconds. The cost of synchronization grows as you add concurrency. Put and Get work well with ConcurrentHashMap in multi-threaded scenarios.
iterate in ConcurrentHashMap is a lot slower than it is in HashMap and get worse as the map size gets larger. See CacheTest#testConcurrentReadWriteRemoveLFU. For my testing scenarios, which uses 57 threads doing a majority of gets but doing other operations with a variety of map sizes, we get:
* With iterator: * 1.6 with 100,000 store size: puts take 45ms. keySet 7ms * 1.6 with 1000,000 store size: puts take 381ms. keySet 7ms * 1,000,000 - using FastRandom (j.u.Random was dog slow) * INFO: Average Get Time for 2065131 observations: 0.013553619 ms * INFO: Average Put Time for 46404 obervations: 0.1605034 ms * INFO: Average Remove Time for 20515 obervations: 0.1515964 ms * INFO: Average Remove All Time for 0 observations: NaN ms * INFO: Average keySet Time for 198 observations: 0.0 ms * 9999 - using iterator * INFO: Average Get Time for 4305030 observations: 0.006000423 ms * INFO: Average Put Time for 3216 obervations: 0.92008704 ms * INFO: Average Remove Time for 5294 obervations: 0.048545524 ms * INFO: Average Remove All Time for 0 observations: NaN ms * INFO: Average keySet Time for 147342 observations: 0.5606073 ms * 10001 - using FastRandom * INFO: Average Get Time for 4815249 observations: 0.005541354 ms * INFO: Average Put Time for 5186 obervations: 0.49826455 ms * INFO: Average Remove Time for 129163 obervations: 0.015120429 ms * INFO: Average Remove All Time for 0 observations: NaN ms * INFO: Average keySet Time for 177342 observations: 0.500733 ms * 4999 - using iterator * INFO: Average Get Time for 4317409 observations: 0.0061599445 ms * INFO: Average Put Time for 2708 obervations: 1.0768094 ms * INFO: Average Remove Time for 17664 obervations: 0.11713089 ms * INFO: Average Remove All Time for 0 observations: NaN ms * INFO: Average keySet Time for 321180 observations: 0.26723954 ms * 5001 - using FastRandom * INFO: Average Get Time for 3203904 observations: 0.0053447294 ms * INFO: Average Put Time for 152905 obervations: 0.056616854 ms * INFO: Average Remove Time for 737289 obervations: 0.008854059 ms * INFO: Average Remove All Time for 0 observations: NaN ms * INFO: Average keySet Time for 272898 observations: 0.3118601 ms
In summary, with 1 million objects in the map a put to Ehcache using iterate for eviction, takes 381ms!
As a result I have used an alternative to iteration using an algorithm called FastRandom. The result is 0.16 ms, 2,300 times faster! For very small maps, ConcurrentHashMap iteration is quite quick. From experimental testing in Ehcache 1.6 we use iteration up to 5000 entries and FastRandom for sizes above that.Thought not as bad as iteration I have noted size as slow in ConcurrentHashMap compared to HashMap. In Ehcache 1.6 we limit the the usage of size().
If you using ConcurrentHashMap and using more that get/put, test the performance. It may be far, far worse than you were expecting.
To give ConcurrentHashMap the best chance of optimisation remember to set the size and expected concurrency when you create it. In ehcache we set the size to the exact size configured for the cache, and we set concurrency to 100 threads.I have a very simple test application up on Google App Engine. See gregrluckapphelloworld.appspot.com.
Go to gregrluckapphelloworld.appspot.com. Each time you hit is exactly 10MB gets added to Ehcache in-process cache. This is an intentiontal memory leak designed to find out how much you stick in the heap.
The answer is around 80MB. I suspect, taking Jetty into account that there is an -Xmx100m setting in play.
When you get an OutOfMemory error the site is cooked. There should be some monitoring that notices and takes it down. That is not the case.
I have a wget script which, every 30 seconds, does
while true; do wget "http://gregrluckapphelloworld.appspot.com/"; sleep 30; done;
The answer is that the dead site stays down for 5 minutes (10 repetitions of my script). And no new instance gets fired up. Your whole site is down.
Update: Google fixed this as of February 2010.
On the page I put an image. I did not configure it as static. I downloaded it and got the IP 74.125.19.141 which is in Mountain View, California.
I then marked the images as static in appengine-web-app and redeployed.
There was no effect on the serving location or speed of download.
It may be that the files are served from the static content location
You would expect this to be distributed via Google’s CDN.
Here is the header you get from the static content servers.
HTTP/1.0 200 OK Date: Tue, 16 Jun 2009 09:47:01 GMT Expires: Tue, 16 Jun 2009 09:57:01 GMT Cache-Control: public, max-age=600 Content-Type: image/jpeg Server: Google Frontend Content-Length: 237952 Connection: Keep-Alive
Another interesting thing – cache expiry is set to 10 minutes. A CDN will normally set the TTL longer and rely on a technique such as resource renaming to overcome browser cache issues.
None of this is good. The first is a very serious limitation. The last two are killers for running a production app. Hopefully Google will fix these things.
The forthcoming Ehcache 1.6.0 is compatible and works with Google App Engine. You can get it now from ehcache snapshots.
Google App Engine provides a constrained runtime which restricts networking, threading and file system access. All features of Ehcache can be used except for the DiskStore and replication. Having said that, there are workarounds for these limitations.
Ehcache cache operations take a few ?s, versus around 60ms for Google’s provided client-server cache memcacheg (as reported on cloudstatus.com). Because it uses way less resources, it is also cheaper.
You can also store non-Serializable objects in it. And finally there is the rich Ehcache API that you can leverage.
The idea here is that your caches are set up in a cache hierarchy. Ehcache sits in front and memcacheg behind. Combining the two lets you elegantly work around limitations imposed by Googe App Engine. You get the benefits of the ?s speed of Ehcache together with the umlimited size of memcached.
Ehcache contains the hooks to easily do this.
To update memcached, use a CacheEventListener .
To search against memcacheg on a local cache miss, use cache.getWithLoader() together with a CacheLoader for memcacheg.
In the CacheEventListener , ensure that when notifyElementEvicted() is called, which it will be when a put exceeds the MemoryStore’s capacity, that the key and value are put into memcacheg.
Configure all notifications in CacheEventListener to proxy throught to memcacheg.
Any work done by one node can then be shared by all others, with the benefit of local caching of frequently used data.
Google App Engine provides acceleration for files declared static in appengine-web.xml.
e.g.
You can get acceleration for dynamic files using Ehcache’s caching filters as you usually would.
To get started see the Ehcache with Google App Engine HowTo.
Anyone with a project on SourceForge who does Maven knows how poorly it supports maven repositories.
I have been waiting for enough people to move to Java 5 to mandate it as a minimum standard for ehcache. At JavaOne 2008 I found out that a lot of people were still to make the move. Now that we are in 2009 I have decided to move to Java 5. As part of this I have done a general cleanup of the core. I can now retire backport-concurrent which has served the project well (thanks guys) and other dependencies. Ehcache-1.6 core has no dependencies.
| Operation |
Number of Times Faster Than Ehcache-1.5.0 |
| get | 92.5 times faster |
| put | 30 times faster |
| remove | 48 times faster |
| removeAll | 80 times faster |
| keySet | 30 times faster |
Users of ehcache server have been discussing extending the basic CRUD operations of REST with some more advanced methods, such as deleting all elements in a cache with one DELETE operation.
You are most welcome to join what has become an informative forum thread here: https://sourceforge.net/forum/forum.php?thread_id=2546225&forum_id=322278
So far we have posts from myself, Jim Webber, Brett Dargan and others interested in creating or finding a REST convention for referring to all and specifying means of multi-get, multi-put and multi-delete.
In April Dave Whitla created a project for a Maven Glassfish Plugin.
Kohsuke Kowaguchi joined the project and copied his code in and released it. His focus was V3 Embedded. It supported one goal: run. There was disagreement as to the features and the code to use. Dave’s plugin was to support a wide range of goals supporting integration of V2 and above into the build process. Now to use the convenience name you normally add a pluginGroup:mvn glassfish:run [INFO] Scanning for projects... [INFO] Searching repository for plugin with prefix: 'glassfish'. [INFO] ------------------------------------------------------------------------ [ERROR] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Required goal not found: glassfish:run in org.glassfish.maven.plugin:maven-glassfish-plugin:2.1 [INFO] ------------------------------------------------------------------------
Until or unless Dave’s add a run goal, you can work around it by avoiding Maven’s convenience naming conventions and fully qualifying Kohsuke’s.
mvn org.glassfish:maven-glassfish-plugin:run
It would be nice for one of Kohsuke, Dave or Byron to sort this out.
My suggestion is for Kohsuke to rename his to maven-glassfish-embedded-plugin.
I gave a talk today at the Glassfish V3 Prelude Launch Event. Ehcache Server uses Glassfish for its self contained cache server. You can watch the video of the session here.
Rick Bryant sent me some sample code he wrote which shows how to use the RESTful Cache Server from Java. Thanks Rick. To use the sample just fire up the cache server: startup.sh and then run the following Java code.
package samples;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
/**
* A simple example Java client which uses the built-in java.net.URLConnection.
*
* @author BryantR
* @author Greg Luck
*/
public class ExampleJavaClient {
private static String TABLE_COLUMN_BASE =
"http://localhost:8080/ehcache/rest/tableColumn";
private static String TABLE_COLUMN_ELEMENT =
"http://localhost:8080/ehcache/rest/tableColumn/1";
/**
* Creates a new instance of EHCacheREST
*/
public ExampleJavaClient() {
}
public static void main(String[] args) {
URL url;
HttpURLConnection connection = null;
InputStream is = null;
OutputStream os = null;
int result = 0;
try {
//create cache
URL u = new URL(TABLE_COLUMN_BASE);
HttpURLConnection urlConnection = (HttpURLConnection) u.openConnection();
urlConnection.setRequestMethod("PUT");
int status = urlConnection.getResponseCode();
System.out.println("Status: " + status);
urlConnection.disconnect();
//get cache
url = new URL(TABLE_COLUMN_BASE);
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
is = connection.getInputStream();
byte[] response1 = new byte[4096];
result = is.read(response1);
while (result != -1) {
System.out.write(response1, 0, result);
result = is.read(response1);
}
if (is != null) try {
is.close();
} catch (Exception ignore) {
}
System.out.println("reading cache: " + connection.getResponseCode()
+ " " + connection.getResponseMessage());
if (connection != null) connection.disconnect();
//create entry
url = new URL(TABLE_COLUMN_ELEMENT);
connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("Content-Type", "text/plain");
connection.setDoOutput(true);
connection.setRequestMethod("PUT");
connection.connect();
String sampleData = "ehcache is way cool!!!";
byte[] sampleBytes = sampleData.getBytes();
os = connection.getOutputStream();
os.write(sampleBytes, 0, sampleBytes.length);
os.flush();
System.out.println("result=" + result);
System.out.println("creating entry: " + connection.getResponseCode()
+ " " + connection.getResponseMessage());
if (connection != null) connection.disconnect();
//get entry
url = new URL(TABLE_COLUMN_ELEMENT);
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
is = connection.getInputStream();
byte[] response2 = new byte[4096];
result = is.read(response2);
while (result != -1) {
System.out.write(response2, 0, result);
result = is.read(response2);
}
if (is != null) try {
is.close();
} catch (Exception ignore) {
}
System.out.println("reading entry: " + connection.getResponseCode()
+ " " + connection.getResponseMessage());
if (connection != null) connection.disconnect();
} catch (Exception e) {
e.printStackTrace();
} finally {
if (os != null) try {
os.close();
} catch (Exception ignore) {
}
if (is != null) try {
is.close();
} catch (Exception ignore) {
}
if (connection != null) connection.disconnect();
}
}
}