A terabyte cache with the RESTful Ehcache Server

The RESTful Ehcache Server is designed to achieve massive scaling using data partitioning – all from a RESTful interface. The largest ehcache single instances run at around 20GB in memory. The largest disk stores run at 100Gb each. Add nodes together, with cache data partitioned across them, to get larger sizes. 50 nodes at 20GB gets you to 1 Terabyte. 
Two deployment choices need to be made: where is partitoning performed, and is redundancy required? These choices can be mixed and matched with a number of different deployment topologies.
This topology is the simplest. It does not use a load balancer. Each node is accessed directly by the cache client using REST. No redundancy is provided. 
The client can be implemented in any language because it is simply a HTTP client. It must work out a partitioning scheme. Simple key hashing, as used by memcached, is sufficient. Here is a Java code sample:
String[] cacheservers = new String[]{“cacheserver0.company.com”, “cacheserver1.company.com”, “cacheserver2.company.com”, “cacheserver3.company.com”, “cacheserver4.company.com”, “cacheserver5.company.com”};
Object key = “123231”;
int hash = Math.abs(key.hashCode());
int cacheserverIndex = hash % cacheservers.length;
String cacheserver = cacheservers[cacheserverIndex];
Redundancy is added as shown in the above diagram by:
  1. Replacing each node with a cluster of two nodes. One of the existing distributed caching options in ehcache is used to form the cluster. Options in ehcache 1.5 are RMI and JGroups-based clusters. Ehcache-1.6 will add JMS as a further option.
  2. Put each ehcache cluster behind VIPs on a load balancer.
Interestingly, content-switching load balancers support URI routing using some form of regular expressions. So, you could optionally skip the client-side hashing to achieve partitioning in the load balancer itself. For example:
/ehcache/rest/sampleCache1/a* => cluster1
/ehcache/rest/sampleCache1/a* => cluster2
Things get much more sophisticated with F5 load balancers, which let you create iRules in the TCL language. So rather than regular expression URI routing, you could implement key hashing-based URI routing. Remember in Ehcache’s RESTful server, the key forms the last part of the URI. e.g. In the URI http://cacheserver.company.com/ehcache/rest/sampleCache1/3432 , 3432 is the key. See http://devcentral.f5.com/Default.aspx?tabid=63&PageID=153&ArticleID=135&articleType=ArticleView for a sample URI hashing iRule.
















Published
Categorized as Java

By Greg Luck

As Terracotta’s CTO, Greg (@gregrluck) is entrusted with understanding market and technology forces and the business drivers that impact Terracotta’s product innovation and customer success. He helps shape company and technology strategy and designs many of the features in Terracotta’s products. Greg came to Terracotta on the acquisition of the popular caching project Ehcache which he founded in 2003. Prior to joining Terracotta, Greg served as Chief Architect at Australian online travel giant Wotif.com. He also served as a lead consultant for ThoughtWorks on accounts in the United States and Australia, was CIO at Virgin Blue, Tempo Services, Stamford Hotels and Resorts and Australian Resorts and spent seven years as a Chartered Accountant in KPMG’s small business and insolvency divisions. He is a regular speaker at conferences and contributor of articles to the technical press.