What Does BigMemory Mean for Terracotta?
What Does BigMemory Mean for Terracotta?
"First things first: I am an employee of Gigaspaces Technologies, but this isn't an official communique for Gigaspaces."
- Joseph Ottinger
Terracotta recently announced the release of something called "BigMemory," which claims to remove the burden that Java's garbage collector places on users of large VMs. It's always exciting to see new projects, especially ones that attempt to expand what people think Java can do, but…
By Terracotta's own estimates, 80% of their users use DSO as a low-cost single-JVM-plus-backup scenario - which is hardly DSO's intended usage, but a use case nonetheless. BigMemory looks like a way to serve those users, and generate an extra revenue stream from them. There's no problem with that, but BigMemory isn't a replacement for a truly distributed system (it's a revenue stream from those who aren't willing to purchase DSO), nor is it a panacea for those who simply want more memory with better performance.
Based on descriptions of BigMemory on various news sites (InfoQ, TheServerSide), the Terracotta blog ("BigMemory explained a bit"), and their own information on Ehcache's site, BigMemory is an enhancement to Ehcache that allocates a region of memory outside of the JVM heap, using that for cached data. Since the data is held outside of the JVM, the JVM can allocate a heap appropriate for transient data (i.e., ignoring the cache for the most part), and the garbage collector can work on a much smaller scale. That means the impact of a garbage collection run is minimized.
There are a number of flaws in this approach, though, that are important to consider. This doesn't mean anything like "Don't use BigMemory," but the main goal is to highlight things you might want to think about.
First things first: I am an employee of Gigaspaces Technologies, but this isn't an official communique for Gigaspaces.
Caches aren't enough.
BigMemory is based around a simple key/value mechanism, managed by a simple cache. There's nothing inherently wrong with key/value systems, but it fundamentally limits the feature set you want your data repository to have.
Data services provide a number of features. Storing data and retrieving data are crucial, of course. But storing data needs to have more to it: the ability to locate data flexibly.
A key/value store certainly has value, as stated, and developers have done a lot with simple key/value systems like Ehcache, OSCache, memcached, and other associative caches (as well as Gigaspaces Datagrid itself, of course). But this is only valuable if you can guarantee that you know the key associated with a given data element.
In the real world, you want to be able to find your data based on the data, not on an artificial key. Even the "pure cache" vendors have recognized this: mongodb provides a fairly capable query language, and Oracle Coherence added a query language recently as well.
Using a cache that limits your data to key/value stores means you're catering to the features the cache gives you, rather than the cache enabling you to use features you need. Your use of the tool makes every problem into a nail, because all the tool is able to be is a hammer.
BigMemory is fairly unique in that it expands the size of a cache in a single JVM to fairly impressive sizes, as opposed to the approach taken by others (which is to distribute the cache among many JVMs); this might be useful to some environments, but it's not a game-changer.
Believe it or not, the JVM Heap is a Good Thing
BigMemory is able to make the legitimate claim that it avoids the JVM's garbage collection cycle, because it doesn't use the JVM for memory management. However, it's arguable that maybe it should.
The problem is that BigMemory forces Java code to operate as the "Thing King," a humorous explanation of the tasks of a memory manager. Among many other features, one of the requirements of a good memory manager is that it's able to handle allocation of blocks of different sizes.
The downside of handling heterogenous block sizes: garbage collection. If I allocate irregular block sizes (say, objects that have lists of different sizes in them, for example), and then deallocate those objects, that means my block of memory is going to get fragmented. Over time, that means if my magical 300G cache does actually get saturated and fragmented, it has to…
Run... a...
Garbage collection cycle.
In the article on BigMemory for InfoQ, Amit Pandey said: "What we've done is some very clever algorithms that take care of how we handle fragmentation and issues like that, and because we're basically doing it off-line we're not slowing the application down." Fair enough! But fragmentation in any contiguous region eventually has a cost, and saying that it doesn't isn't actually accurate. Plus, what he's describing is still a fairly complete analog to the normal garbage collectors in Java, which themselves run in background threads.
What's more, according to their own notes, accessing the memory outside the JVM is slower than accessing heap (assuming all things are equal, like swap space not being involved). This means that a giant garbage collection/compaction cycle would be even more damaging than an equivalent JVM garbage collection cycle. (And with a region of 350 gigabytes, it's fair to say that the region is "giant.")
In all fairness, though, the compaction cycle would happen far less often than a JVM garbage collection cycle would, so while it would cruelly affect performance during the cycle, it would occur fairly rarely.
Another (positive) aspect of the compaction cycle on the BigMemory region is that the cache doesn't maintain references to other objects in the cache - so the compactor (my term) doesn't have to chase references to see if an object can be collected or not. This will help some, but it doesn't mean that the system will never have to address memory fragmentation.
However, the benchmarks on BigMemory seem to assume an optimal cache query. In order to get the best responses from a cache with BigMemory, you have to have a read-mostly situation (because writes will cause the OS to have to preserve pages), and your reads have to be from a small set (as BigMemory will cache the most-recently-used data in heap). However, if you're able to fulfill these requirements, you don't need BigMemory in the first place - because a read-only key/value map won't affect garbage collection any more than BigMemory will, but will be faster than going through a set of abstraction layers.
Again, this isn't to say that BigMemory is useless or uninteresting, but it's far from being any kind of end-all answer to caching.
How do you see what's going on?
How can you tell what's being managed by the cache, and what patterns of usage the cache has experienced? Most caches can tell you hit and miss rates (i.e., how often data's been requested that wasn't in the cache, and how often a given cache entry has been requested), but having external management of data hides that data from the tools you have.
This means that in order to actually triage a problem in your app having to do with the cache - which is expected to be most of your data, based on the size comparisons of the cache versus your JVM heap - you will have to use two separate types of tools: one memory profiler, that works on the JVM heap, and … the ehCache analysis tools.
This is not to say that the ehCache administration tools are insufficient, but my experience is that as soon as you split your tools out like that (same effective purpose, different tools and reports), your ability to triage a problem decreases immensely. It's too easy to look at one report and not the other; it's too easy to not correlate the reports properly.
Conclusion
This is not intended to say that BigMemory is useless, or pointless. It's not even to say that it's not interesting - because it is. It's always good to see Java's horizons expanded.
However, it's important to keep your eyes open as to what a product is actually giving you, regardless of marketing claims. BigMemory can have its uses, but it is primarily a solution to problems that few have, and that are more easily solved by a data grid, rather than using a simple cache semantic.
If BigMemory solves your problem, that's great! Use it - and Ehcache makes integrating BigMemory fairly easy. Just remember that a cache isn't enough - and if you're coding to your cache as well as your data store, you're doing too much.
References
1. "BigMemory: Off-heap Store," Terracotta.org
2. "Byte Buffers and Non-Heap Memory," Keith Gregory