Performance problems in ConcurrentHashMap vs synchronized HashMap

In Ehcache 1.6, HashMap was replaced with ConcurrentHashmap with statistical sampling for eviction.
Having completed 1.6 and released it there were a few surprises along the way with ConcurrentHashMap performance.

PUT and GET performance

There is some existing material online about ConcurrentHashMap versus HashMap performance, notably http://www.informit.com/guides/content.aspx?g=java&seqNum=246.
This article finds that ConcurrentHashMap puts are slower than HashMap when the map gets large: for a map of 1 million objects fully population took three times longer than HashMap for a single threaded scenario. However once you get to multi-threaded scenarios, you need to put synchonrization around HashMap. For those few of you in doubt as to this, email me your HashMap usage I will send you back a multi-threaded test that turns your computer into a fan heater (i.e. 100% infinite loop in the CPUs) in about 30 seconds. The cost of synchronization grows as you add concurrency. Put and Get work well with ConcurrentHashMap in multi-threaded scenarios.

Iterate Performance

iterate in ConcurrentHashMap is a lot slower than it is in HashMap and get worse as the map size gets larger. See CacheTest#testConcurrentReadWriteRemoveLFU. For my testing scenarios, which uses 57 threads doing a majority of gets but doing other operations with a variety of map sizes, we get:

* With iterator:
* 1.6 with 100,000 store size: puts take 45ms. keySet 7ms
* 1.6 with 1000,000 store size: puts take 381ms. keySet 7ms
* 1,000,000 - using FastRandom (j.u.Random was dog slow)
* INFO: Average Get Time for 2065131 observations: 0.013553619 ms
* INFO: Average Put Time for 46404 obervations: 0.1605034 ms
* INFO: Average Remove Time for 20515 obervations: 0.1515964 ms
* INFO: Average Remove All Time for 0 observations: NaN ms
* INFO: Average keySet Time for 198 observations: 0.0 ms
* 9999 - using iterator
* INFO: Average Get Time for 4305030 observations: 0.006000423 ms
* INFO: Average Put Time for 3216 obervations: 0.92008704 ms
* INFO: Average Remove Time for 5294 obervations: 0.048545524 ms
* INFO: Average Remove All Time for 0 observations: NaN ms
* INFO: Average keySet Time for 147342 observations: 0.5606073 ms
* 10001 - using FastRandom
* INFO: Average Get Time for 4815249 observations: 0.005541354 ms
* INFO: Average Put Time for 5186 obervations: 0.49826455 ms
* INFO: Average Remove Time for 129163 obervations: 0.015120429 ms
* INFO: Average Remove All Time for 0 observations: NaN ms
* INFO: Average keySet Time for 177342 observations: 0.500733 ms
* 4999 - using iterator
* INFO: Average Get Time for 4317409 observations: 0.0061599445 ms
* INFO: Average Put Time for 2708 obervations: 1.0768094 ms
* INFO: Average Remove Time for 17664 obervations: 0.11713089 ms
* INFO: Average Remove All Time for 0 observations: NaN ms
* INFO: Average keySet Time for 321180 observations: 0.26723954 ms
* 5001 - using FastRandom
* INFO: Average Get Time for 3203904 observations: 0.0053447294 ms
* INFO: Average Put Time for 152905 obervations: 0.056616854 ms
* INFO: Average Remove Time for 737289 obervations: 0.008854059 ms
* INFO: Average Remove All Time for 0 observations: NaN ms
* INFO: Average keySet Time for 272898 observations: 0.3118601 ms

In summary, with 1 million objects in the map a put to Ehcache using iterate for eviction, takes 381ms!

As a result I have used an alternative to iteration using an algorithm called FastRandom. The result is 0.16 ms, 2,300 times faster!

For very small maps, ConcurrentHashMap iteration is quite quick. From experimental testing in Ehcache 1.6 we use iteration up to 5000 entries and FastRandom for sizes above that.

size() performance

Thought not as bad as iteration I have noted size as slow in ConcurrentHashMap compared to HashMap. In Ehcache 1.6 we limit the the usage of size().

Recommendations

If you using ConcurrentHashMap and using more that get/put, test the performance. It may be far, far worse than you were expecting.

To give ConcurrentHashMap the best chance of optimisation remember to set the size and expected concurrency when you create it. In ehcache we set the size to the exact size configured for the cache, and we set concurrency to 100 threads.

Published
Categorized as Java

By Greg Luck

As Terracotta’s CTO, Greg (@gregrluck) is entrusted with understanding market and technology forces and the business drivers that impact Terracotta’s product innovation and customer success. He helps shape company and technology strategy and designs many of the features in Terracotta’s products. Greg came to Terracotta on the acquisition of the popular caching project Ehcache which he founded in 2003. Prior to joining Terracotta, Greg served as Chief Architect at Australian online travel giant Wotif.com. He also served as a lead consultant for ThoughtWorks on accounts in the United States and Australia, was CIO at Virgin Blue, Tempo Services, Stamford Hotels and Resorts and Australian Resorts and spent seven years as a Chartered Accountant in KPMG’s small business and insolvency divisions. He is a regular speaker at conferences and contributor of articles to the technical press.

8 comments

  1. Hi Greg,
    How did you arrive to the number 100 for concurrency level? It seems to me that it’s a bit high for the common case.
    It would also be interesting to see how NonBlockingHashMap compares.
    Best,
    Ismael

  2. It is not uncommon for Ehcache caches to be hit by hundreds or in some cases thousands of concurrent threads. Having said that, access times for put and get for a range of thread levels are below 10 microseconds. I have tested up to 2000 concurrent threads with good results.

  3. Hi Greg,
    Maybe there’s a terminology issue here. It’s quite possible for people to use hundreds or thousands of threads. However, the concurrency level is about the number of concurrent threads and that is limited by the number of logical cores in your system.
    Systems that provide hundreds or thousands of logical cores are quite rare these days (Azul aside).
    Best,
    Ismael

  4. I forgot to mention, the reason why I bring this up is that the number of segments is affected by the concurrency level and this affects the performance of things like size and iteration.
    Ismael

  5. Hi there,
    I am working on an open-source project that would benefit from having the fastest map/list/set lookup times. I am basing it on at least JDK 1.5, although with 1.7 due out soon, 1.6 has been out long enough that we may require 1.6. I am curious if your library is a) allowed to be distributed with another project so that it can be built and run without requiring another download, and b) does it allow for faster lookups, sorts, even puts than the current JDK map implementation (and list/set for that matter)?
    Thanks.

  6. Sure.
    By number of threads I am referring to the number the program is using. With synchronized, threads queue up to get in. The time taken to get in increases as the number of threads go up.
    The number of threads that can be executed _concurrently_ depends on the number of cores in the system. Our production experience is with 8 cores.

  7. Kevin
    Ehcache has the Apache 2 license. Version 1.6 has NO dependencies, just JDK 1.5 and higher.
    Ehcache will be the fastest implementation you can find for an in-process cache. Caches add things like eviction, expiry, listeners etc. If you really just need concurrent maps/lists/sets I would recommend java.util.concurrent. The Backport Concurrent library has quite a lot more stuff so it is also worth a look.

  8. Greg,
    Yes, but ConcurrentHashMap is mostly lock-free on gets (it just locks if it finds an Entry with value null). Locks are used for puts though. And if you can at most have 8/16 concurrent threads, then 100 segments seems high. To quote from the documentation:
    “Ideally, you should choose a value to accommodate as many
    * threads as will ever concurrently modify the table. Using a
    * significantly higher value than you need can waste space and time”
    Note that it says “concurrently modify”. If you only have 8/16 cores, it’s unlikely you will have 100 threads concurrently modifying the map. Anyway, the documentation also says that underestimates or overestimates within an order of magnitude are do not usually have a noticeably impact.
    I will stop here as I think I have made my point by now. 🙂
    Ismael

Comments are closed.