I have worked on spring cahing using ehcache . To me it is like same with different set of API exposed and their implementation.
What's the difference in terms of features provided between them
apart from API/implementation ?
Update:- I have already seen Hibernate EHCache vs MemCache but that question is mainly from hibernate perspective but my question is in general for any caching service . Answer to that question also states there is not much difference in terms of features
Aside from the API differences you noted, the major difference here is going to be that memcached lives in a different process while Ehcache is internal to the JVM - unless configured to store on disk or in a cluster.
This mainly means that with Memcached you always need a serialized version of your objects and you always interact with a different process, remote or not.
Ehcache, and other JVM based caching solutions, start with a on-heap based cache initially which allows lookups to be simply about handling references to your Java objects.
Of course this means that the objects keep living in the Java heap, increasing memory pressure. In the case of Ehcache 3.x you have the option to move to offheap memory and more, allowing to grow the cache without impacting JVM heap.
At this point, the benefit of Memcached may be that you want non Java clients to access it.
And the final decision really is in your hands. Caches are consuming memory to provide reduced latency. What works for you may be different than what works for others. You have to measure and decide.
Related
I need to cache different user and application data on a daily basis.
Context:
no experience with caches
working on a java web application that sends news articles to users displayed in a user-feed format
MySQL backend
Java middle tier using Hibernate and Jersey
I've checked out different cache technologies, and it seems like Memcached or Redis are the most used technologies in use cases similar to mine -- many reads and writes i.e. Facebook, Twitter, etc.
But I have to serialize objects before I cache them using the two above cache systems. It seemed like an unnecessary step to cache just a POJO, so I checked out POJO caches and stumbled upon JBOSS's Infinispan.
Does anyone have any good reasons why I shouldn't use Infinispan over Memcached or Redis over the serialization, and subsequent deserialization, overhead concern?
When Infinispan works in clustered mode, or when it has to offload data to external stores, it will have to face Serialization.
The good news is:
- you'll avoid any serialization costs unless it has to go somewhere else
- its own serialization mechanism is far more efficient than Java's standard serialization mechanism (and nicely customizable)
Memcached and Redis are "external" caching solutions, while with Infinispan you can keep the same Java instance cached. If this is a good or bad thing depends on your architecture specifics.
Although commonly you'll want to use a hybrid solution: use Infinispan for your in-JVM needs, cap its memory usage, have it offload what can't be fit locally to an external store, and it's easy to have it offload the extra stuff to either Redis, Memcached, another Infinispan cluster, or several other alternatives.
Your benefit is transparent integration with some popular frameworks (i.e. Hibernate) and that it can handle the serialization efficiently for you - if and when it's needed as it might need to happen in background.
Is it sensible to use Spring in the server side of an in memory data grid based application?
My gut feeling tells me that it is nonsense in a low latency high performance system. A colleague of mine is insisting on including Spring in it. What are the pros and cons of such inclusion?
My position is that Spring is OK to be used in the client but it is too heavy for the server, it brings too many dependancies and is one more leaky abstraction to think of.
Data Grid systems are memory and I/O intensive in general. Using Spring does not affect that (you may argue that Spring creates a lot of beans but with proper Garbage Collection tuning this is not a problem).
On the other hand using Spring (or any other DI) helps you structure and test your code.
So if you are using implementing some sort of server based on Data Grid systems, pay attention to properly adjusting GC, sockets in your OS (memory buffers and socket memories). Those will give you much more benefits than cutting down DI.
First, I'm surprised by the "leaky abstraction" comment. I've never heard anyone criticize Spring for this. In fact, it's just the opposite. Spring removes the implementation details of infrastructure such as data grids from your application code and provides a consistent and familiar programming model, allowing you to focus on business logic. Spring does a lot to enhance configuration and access to data grids, especially Gemfire, and generally does not create any runtime overhead per se. During initialization of a Spring application, Spring uses tools like reflection and AOP internally which may increase the start up time of an application, but this has no impact on runtime performance. Spring has been proven in many high-throughput, low-latency production applications. In extreme cases, things like network latency and serialization, concerns external to Spring, are normally the biggest factors affecting performance.
"Spring brings in too many dependencies" is a common complaint, but is a fallacy. I would say Spring brings in the exact right amount of dependencies for what it needs to do. Additionally, Spring Boot starters and the platform BOM do a lot to simplify dependency management so you don't need to worry about version incompatibilities or explicitly declaring common dependencies. I'll have to side with your colleague on this one.
I am currently aware of the following Caching Frameworks:
EHCache, MemCache, Redis, OSCache, DynaCache, JBoss Cache, JCS, Cache4J.
Apart from time taken for accessing the data from the cache, What are the different parameters/attributes for comparing these frameworks. And which framework should one use, and when?
Few things on broad level can be :
- Technology you are using
- API available for the chosen framework
- Each of the framework has a unique feature so depending on your application requirement you can pick one of the frameworks.
Description of few as picked from source mentioned below
Ehcache:
Ehcache is a java distributed cache for general purpose caching, J2EE and light-weight containers tuned for large size cache objects. It features memory and disk stores, replicate by copy and invalidate, listeners, a gzip caching servlet filter, Fast, Simple.
Java Caching System (JCS):
JCS is a distributed caching system written in java for server-side java applications. It is intended to speed up dynamic web applications by providing a means to manage cached data of various dynamic natures. Like any caching system, the JCS is most useful for high read, low put application
OSCache:
OSCache is a caching solution that includes a JSP tag library and set of classes to perform fine grained dynamic caching of JSP content, servlet responses or arbitrary objects. It provides both in memory and persistent on disk caches, and can allow your site to continue functioning normally even if the data source is down(for example if an error occurs like your db goes down, you can serve the cached content so people can still surf the site only)
Cache4J:
Cache4j is a cache for Java objects that stores objects only in memory (suitable for Russian speaking guys only as there is not documentation in English and the JavaDoc is in Russian also :D).
Redis:
Redis can be used for caching sessions and storing simple data structures for fast retrievals which when needed can be used for persistence as well.
It is mainly useful for caching POJO objects only.
Here is an interesting article for further insights :
http://javalandscape.blogspot.in/2009/03/intro-to-cachingcaching-algorithms-and.html
Directly quoting from ehcache's website [source]:
The idea here is that your caches are set up in a cache hierarchy.
Ehcache sits in front and memcacheg behind. Combining the two lets you
elegantly work around limitations imposed by Google App Engine. You
get the benefits of the speed of Ehcache together with the umlimited
size of memcached. Ehcache contains the hooks to easily do this. To
update memcached, use a CacheEventListener. To search against
memcacheg on a local cache miss, use cache.getWithLoader() together
with a CacheLoader for memcacheg.
This seems to imply that using ehcache with memcached would be faster than using memcached alone. Why would ehcache be faster than memcached? The way I see it, both are in memory caches so why the performance difference?
Ehcache often runs in the same jvm process with the application,so it does not need serialization and io costs.
When using Ehcache with memcached,some objects are stored in ehcache heap and others are in memcached. so mix ehcahe and memcached will faster than only use memcached.
Runs ehcache and application in the same JVM process is a way to exchange RAM space for time,
but you can not put too many data to ehcache because you need to consider replication between servers.
I have always used the java singleton class for my basic caching needs.
Now the project is using ehcache and without looking deeply into source code, I am not able to figure out what was wrong with the singleton pattern.
i.e What are the benefits of using the ehcahce framework except that the caching can be done by using xml configuration and annotation without writing the boilerplate code (i.e a static HashMap)
It depends on what you need from your caching mechanism. Ehcache provides a lot of cool features, which require a lot of well designed code to do it manually:
LRU, LFU and FIFO cache eviction policies
Flexible configuration
Persistence
Replication
many more ...
I would recommend you go through them at http://ehcache.org/about/features and decide do you really need something in your project.
The most important one:
The ability to overflow to disk - this is something you don't have in normal HashMap and writing something like that is far from trivial. EhCache can function as simple to configure key-value database.
Even if you don't use overflow to disk, there's a large boilerplate to write with your own implementation. If loading the whole database would be possible, that using memory database with persistence on write and restoring on startup would be the solution. But memory is limited and you have to remove the elements from memory. But which one, based on what? Also, you must assert cache elements are not too old. Older elements should be replaced. If you need to remove elements from cache, you should start from the outdated ones. But should you do it when user requests something? It will slow down the request. Or start your own thread?
With EhCache you have the library in which all those issues are addressed and tested.
Also there is a clustered closed source version of ehcache, which allows you to have a distributed cache. That might be one reason you might want to consider using ehcache.