How We Solved our Garbage Collection Pausing Problem

I had our main J2EE app at work with 9 second pauses. These would happen on average every 50 seconds. Needless to say this was a huge performance problem. Pauses are caused by major garbage collections. Minor garbage collections do not cause pausing. Pausing means nothing, absolutley nothing, gets done in your app. 9 seconds is a long time. The peaks were up to 15 second.
We tried quite a few garbage collection settings. They each behaved differently but could not be considered better. In the end we consulted some engineers at Sun who, after analysing our verbose gc logs, gave us the following piece of black magic:
java … -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:NewSize=1200m -XX:SurvivorRatio=16
The reasoning for each setting is as follows:
-XX:+DisableExplicitGC – some libs call System.gc(). This is usually a bad idea and could explain some of what we saw.
-XX:+UseConcMarkSweepGC – use the low pause collector
-XX:NewSize=1200m -XX:SurvivorRatio=16 – the black magic part. Tuning these requires emprical observation of your GC log, either from verbose gc or jstat ( a JDK 1.5 tool). In particular the 1200m new size is 1/4 of our heap size of 4800MB.
What was the result? Major GCs and their attendant pauses reduced to 2 per day, from once every 50 seconds. Mean response times dropped from seconds to milliseconds. All in all, one of the best results I have achieved this year. Thanks Sun.