Is the overhead of serializing and deserializing POJOs a good reason for using Infinispan over Memcached or Redis for caching POJOs? - caching

I need to cache different user and application data on a daily basis.
Context:
no experience with caches
working on a java web application that sends news articles to users displayed in a user-feed format
MySQL backend
Java middle tier using Hibernate and Jersey
I've checked out different cache technologies, and it seems like Memcached or Redis are the most used technologies in use cases similar to mine -- many reads and writes i.e. Facebook, Twitter, etc.
But I have to serialize objects before I cache them using the two above cache systems. It seemed like an unnecessary step to cache just a POJO, so I checked out POJO caches and stumbled upon JBOSS's Infinispan.
Does anyone have any good reasons why I shouldn't use Infinispan over Memcached or Redis over the serialization, and subsequent deserialization, overhead concern?

When Infinispan works in clustered mode, or when it has to offload data to external stores, it will have to face Serialization.
The good news is:
- you'll avoid any serialization costs unless it has to go somewhere else
- its own serialization mechanism is far more efficient than Java's standard serialization mechanism (and nicely customizable)
Memcached and Redis are "external" caching solutions, while with Infinispan you can keep the same Java instance cached. If this is a good or bad thing depends on your architecture specifics.
Although commonly you'll want to use a hybrid solution: use Infinispan for your in-JVM needs, cap its memory usage, have it offload what can't be fit locally to an external store, and it's easy to have it offload the extra stuff to either Redis, Memcached, another Infinispan cluster, or several other alternatives.
Your benefit is transparent integration with some popular frameworks (i.e. Hibernate) and that it can handle the serialization efficiently for you - if and when it's needed as it might need to happen in background.

Related

Spring ehcache vs Memcached?

I have worked on spring cahing using ehcache . To me it is like same with different set of API exposed and their implementation.
What's the difference in terms of features provided between them
apart from API/implementation ?
Update:- I have already seen Hibernate EHCache vs MemCache but that question is mainly from hibernate perspective but my question is in general for any caching service . Answer to that question also states there is not much difference in terms of features
Aside from the API differences you noted, the major difference here is going to be that memcached lives in a different process while Ehcache is internal to the JVM - unless configured to store on disk or in a cluster.
This mainly means that with Memcached you always need a serialized version of your objects and you always interact with a different process, remote or not.
Ehcache, and other JVM based caching solutions, start with a on-heap based cache initially which allows lookups to be simply about handling references to your Java objects.
Of course this means that the objects keep living in the Java heap, increasing memory pressure. In the case of Ehcache 3.x you have the option to move to offheap memory and more, allowing to grow the cache without impacting JVM heap.
At this point, the benefit of Memcached may be that you want non Java clients to access it.
And the final decision really is in your hands. Caches are consuming memory to provide reduced latency. What works for you may be different than what works for others. You have to measure and decide.

What are the different parameters for comparing the various caching frameworks?

I am currently aware of the following Caching Frameworks:
EHCache, MemCache, Redis, OSCache, DynaCache, JBoss Cache, JCS, Cache4J.
Apart from time taken for accessing the data from the cache, What are the different parameters/attributes for comparing these frameworks. And which framework should one use, and when?
Few things on broad level can be :
- Technology you are using
- API available for the chosen framework
- Each of the framework has a unique feature so depending on your application requirement you can pick one of the frameworks.
Description of few as picked from source mentioned below
Ehcache:
Ehcache is a java distributed cache for general purpose caching, J2EE and light-weight containers tuned for large size cache objects. It features memory and disk stores, replicate by copy and invalidate, listeners, a gzip caching servlet filter, Fast, Simple.
Java Caching System (JCS):
JCS is a distributed caching system written in java for server-side java applications. It is intended to speed up dynamic web applications by providing a means to manage cached data of various dynamic natures. Like any caching system, the JCS is most useful for high read, low put application
OSCache:
OSCache is a caching solution that includes a JSP tag library and set of classes to perform fine grained dynamic caching of JSP content, servlet responses or arbitrary objects. It provides both in memory and persistent on disk caches, and can allow your site to continue functioning normally even if the data source is down(for example if an error occurs like your db goes down, you can serve the cached content so people can still surf the site only)
Cache4J:
Cache4j is a cache for Java objects that stores objects only in memory (suitable for Russian speaking guys only as there is not documentation in English and the JavaDoc is in Russian also :D).
Redis:
Redis can be used for caching sessions and storing simple data structures for fast retrievals which when needed can be used for persistence as well.
It is mainly useful for caching POJO objects only.
Here is an interesting article for further insights :
http://javalandscape.blogspot.in/2009/03/intro-to-cachingcaching-algorithms-and.html

Why is ehcache faster than memcache?

Directly quoting from ehcache's website [source]:
The idea here is that your caches are set up in a cache hierarchy.
Ehcache sits in front and memcacheg behind. Combining the two lets you
elegantly work around limitations imposed by Google App Engine. You
get the benefits of the speed of Ehcache together with the umlimited
size of memcached. Ehcache contains the hooks to easily do this. To
update memcached, use a CacheEventListener. To search against
memcacheg on a local cache miss, use cache.getWithLoader() together
with a CacheLoader for memcacheg.
This seems to imply that using ehcache with memcached would be faster than using memcached alone. Why would ehcache be faster than memcached? The way I see it, both are in memory caches so why the performance difference?
Ehcache often runs in the same jvm process with the application,so it does not need serialization and io costs.
When using Ehcache with memcached,some objects are stored in ehcache heap and others are in memcached. so mix ehcahe and memcached will faster than only use memcached.
Runs ehcache and application in the same JVM process is a way to exchange RAM space for time,
but you can not put too many data to ehcache because you need to consider replication between servers.

Advantage of using ehcahce over a static HashMap

I have always used the java singleton class for my basic caching needs.
Now the project is using ehcache and without looking deeply into source code, I am not able to figure out what was wrong with the singleton pattern.
i.e What are the benefits of using the ehcahce framework except that the caching can be done by using xml configuration and annotation without writing the boilerplate code (i.e a static HashMap)
It depends on what you need from your caching mechanism. Ehcache provides a lot of cool features, which require a lot of well designed code to do it manually:
LRU, LFU and FIFO cache eviction policies
Flexible configuration
Persistence
Replication
many more ...
I would recommend you go through them at http://ehcache.org/about/features and decide do you really need something in your project.
The most important one:
The ability to overflow to disk - this is something you don't have in normal HashMap and writing something like that is far from trivial. EhCache can function as simple to configure key-value database.
Even if you don't use overflow to disk, there's a large boilerplate to write with your own implementation. If loading the whole database would be possible, that using memory database with persistence on write and restoring on startup would be the solution. But memory is limited and you have to remove the elements from memory. But which one, based on what? Also, you must assert cache elements are not too old. Older elements should be replaced. If you need to remove elements from cache, you should start from the outdated ones. But should you do it when user requests something? It will slow down the request. Or start your own thread?
With EhCache you have the library in which all those issues are addressed and tested.
Also there is a clustered closed source version of ehcache, which allows you to have a distributed cache. That might be one reason you might want to consider using ehcache.

Scaling and Clustering JPA

I am putting together a regular Java EE application on jboss7 that will use JPA in the data tier. I would like to make this application such that it scales up with load. While it is pretty clear how to scale up the web tier: create more machines and throw them behind a load balancer, scaling up the data tier is less so.
I can probably cluster my database (MySQL). Stil, that leaves the JPA layer unclustered. Ideally, JPA will scale up by using in (clustered) memory caching backed by MySQL.
When I look around, all information around JPA scaling seems to be 3-4 years old. People talk about ehcache, memcached and infinispan. I am not sure if this is still current.
Can someone tell me the state of the art in Java EE clustering and scaling, especially in the data tier.
Various caching strategies are still the way to scale JPA/Hibernate (you basically named the most popular options in your question). Nothing extraordinary happend since 4-5 years in this field, as far as I know. One more option you haven't mentioned is JBoss Cache. So the Second Level Cache for JPA/Hibernate still rules in this area.
Why no progress here? My wild guess is that first of all people, who need scalable application tend to ignore JPA and Hibernate in areas where high performance is needed. Usually people go with SQL dressed in Spring Framework JDBCTemplate helpers and transaction management. Then scalability is the matter of database capabilities in this area.
The other trend is to use No-SQL databases. There is plany of solutions: MongoDB, CouchoDB, Cassandra, Redis, to name a few. These are usually Google BigTable like key-value storages (this is oversimplification, but it is more or less the idea behind that approach) and they scale as hell, if you accept their limitations (relations are no longer managed easily, etc.).
There are many solutions, the two main categories of solutions are:
scaling the database
using a clustered cache to reduce database load
EclipseLink supports data partitioning for sharding data across a set of database instances,
see:
http://java-persistence-performance.blogspot.com/2011/05/data-partitioning-scaling-database.html
You can also use MySQL Cluster,
see:
http://www.mysql.com/products/cluster/
Oracle TopLink Grid provides EclipseLink JPA support for integration with Oracle Coherence as a distributed cache,
see:
http://www.oracle.com/technetwork/middleware/ias/tl-grid-097210.html
EclipseLink's cache supports clustering through cache coordination,
see:
http://wiki.eclipse.org/EclipseLink/Examples/JPA/CacheCoordination

Resources