Infinispan : Which client API to choose? - caching

If we implement the caching server using Infinispan, what are the possible client APIs to choose? Is Java Hot Rod client a good choice? Any other solutions?
Thank you!

As usually - Depends on your needs.
When you use HotRod you use Infinispan in a fashion similar to using MySQL/Sybase - you have an application that connects to the database backend which means
dedicated servers need to be set up and maintained
need to have multiple (dedicated) boxes to have high-availability and resiliency
but
HotRod client does some load-balancing for you
you can have dedicated data store servers with very specific configuration/separation/etc.
this mode is useful when Infinispan is used as a distributed store with database persistence
You might also use Infinispan in embedded mode, when you data is shared between you applications containing Infinispan instances; this mode is like having a HashMap that is
synchronized across the network with other boxes:
this gives you HA/resiliency by default (if your application is deployed with 2+ instances)
no need to have separate servers (no separate maintenance)
every new instance of your app will also contribute to the Infinispan cluster increasing HA/resiliency
(for testing you'll probably use Infinispan in embedded mode, anyway)
If you have your applications running on the same network segment (no firewall/switches/etc.) it might be easier just to use
Infinispan embedded mode, as it's easy to set up with lot of examples.
My recommendation would be to have a cache layer in your code that separates cache operations w/o the implementation so you can use whatever cache provider you want to use.
For Infininispan you should read the Infinispan User's Guide as #Galder pointed out.

The server modules documentation clarifies this.

Related

Setting Up Ncache (Distributed?/Shared)

I have two servers, where I will be deploying the same application. Basically these two servers will handle work from a common Web API, the work that handed out will be transformed and go through some logic and loaded into DB. I want to cache the data the get loaded/update or deleted in the database, so that when the same data is referenced i can get it from the Cache (Kind of explained the cache mechanism). Now I am using Ncache and it working perfectly fine within one application. I am trying have kind of a shared cache, so that both my application can have access to. How do i go about doing it?
NCache is a distributed cache so you can continue to use that.
There is good general documentation available and very good getting started material that walks you through all the steps required.
In essence you install NCache on both the servers and then reference both servers in your client configuration (%NCHOME%\config\client.ncconf)
In cluster caches, a single logical cache instance is distributed over multiple server nodes and because the cache process is running outside the application address space, multiple applications can share and see the same exact cache data change in terms of addition, removal and update of the cache content.
Local out-proc caches are limited to one server node but as they are outside the application address space, they also support sharing of data between applications.
In fact, besides allowing multiple applications to share data, NCache supports a pub/sub infrastructure to allow for multiple applications to actually communicate with each other. This allows NCache to play a key part in setting up a fast and reliable microservices environment wherein all the participating services send messages to each other through the NCache platform.
See the link below where they have shared information about NCache topologies
http://www.alachisoft.com/resources/docs/ncache/admin-guide/cache-topologies.html
http://www.alachisoft.com/resources/videos/five-steps-getting-started.html

Redis or Ehcache?

Which is better suited for the following environment:
Persistence not a compulsion.
Multiple servers (with Ehcache some cache sync must be required).
Infrequent writes and frequent reads.
Relatively small database (very less memory requirement).
I will pour out what's in my head currently. I may be wrong about these.
I know Redis requires a separate server (?) and Ehcache provides local cache so it must be faster but will replicate cache across servers (?). Updating all caches after some update on one is possible with Ehcache.
My question is which will suit better for the environment I mentioned?
Whose performance will be better or what are scenarios when one may outperform another?
Thanks in advance.
You can think Redis as a shared data structure, while Ehcache is a memory block storing serialized data objects. This is the main difference.
Redis as a shared data structure means you can put some predefined data structure (such as String, List, Set etc) in one language and retrieve it in another language. This is useful if your project is multilingual, for example: Java the backend side , and PHP the front side. You can use Redis for a shared cache. But it can only store predefined data structure, you cannot insert any Java objects you want.
If your project is only Java, i.e. not multilingual, Ehcache is a convenient solution.
You will meet issues with EhCache scaling and need resources to manage it during failover and etc.
Redis benefits over EhCache:
It uses time proven gossip protocol for Node discovery and synchronization.
Availability of fully managed services like AWS ElastiCache, Azure Redis Cache. Such services offers full automation, support and management of Redis, so developers can focus on their applications and not maintaining their databases.
Correct large memory amount handling (we all know that Redis can manage with hundreds of gigabytes of RAM on single machine). It doesn't have problems with Garbage collection like Java.
And finally existence of Java Developer friendly Redis client - Redisson.
Redisson provides many Java friendly objects on top of Redis, like:
Set
ConcurrentMap
List
Queue
Deque
BlockingQueue
BlockingDeque
ReadWriteLock
Semaphore
Lock
AtomicLong
CountDownLatch
Publish / Subscribe
ExecutorService
and many more...
Redisson supports local cache for Map structure which cold give you 45x performance boost for read operations.
Here is the article describing detailed feature comparison of Ehcache and Redis.

scalability of mobicents presence server

I understand that Mobicents PS is not supported now but I want to understand about the scalability of MSPS.
I understand from the source code that MSPS uses JBoss Cache instead of the database to store presence information. I understand the concept of cache but no idea of JBoss cache.
It seems that the storage is limited by the amount of memory available in the machine and whenever a new node(physical machine) is to be added the cache has to be replicated into that machine.
Is this correct behavior or my understanding is totally wrong.
The database is used and JBoss Cache is aimed to be used for replication of some of the volatile data to support failover.
Your mention about cache replication is correct but the memory limits concerns can be mitigated by using buddy replication instead of full cluster replication.
If you move to Cassandra and use in memory data grid such as infinispan or hazelcast, will be better nowadays.
The traditional presence has moved on from sharing all status from all contacts. Its valid to mention for example the issue on GitHub about Presence API, that is currently in development (https://github.com/Mobicents/RestComm/issues/380).
Would you like to contribute either to Presence Server or RestComm Presence in general?

What is the best practices to implement caching layer?

I'm going to use Redis as a cache service.
What is the best practices to access the caching service?
Through a service/API or in-memory component?
I'm not sure I want to have access to the DB from all the services.
Thanks
All your questions depends on topology and/or architecture of your system. I don't think that you would provide a service on separated computer if your application resided completely on one computer.
But suppose you have distributed app.
In this case it makes sense to do caching using separated service on separated node. It's same as within OOP, you can simple encapsulate data also in cache. Other services depends on your cache, not directly on Redis - you can decide to change redis for something else. Another advantage of caching service is that you can cache data in memory depending on throughput and fetches data from redis time to time. Note that you can simple buy a server having a lot of RAM, e.g. 192gb, because caching service needs a memory more than anything else.

hazelcast vs ehcache

Question is clear as you see in the title, it would be appreciated to hear your ideas about adv./disadv. differences between them.
UPDATE:
I have decided to use Hazelcast because of the advantages like distributed caching/locking mechanism as well as the extremely easy configuration while adapting it to your application.
We tried both of them for one of the largest online classifieds and e-commerce platform. We started with ehcache/terracotta(server array) cause it's well-known, backed by Terracotta and has bigger community support than hazelcast. When we get it on production environment(distributed,beyond one node cluster) things changed, our backend architecture became really expensive so we decided to give hazelcast a chance.
Hazelcast is dead simple, it does what it says and performs really well without any configuration overhead.
Our caching layer is on top of hazelcast for more than a year, we are quite pleased with it.
Even though Ehcache has been popular among Java systems, I find it less flexible than other caching solutions. I played around with Hazelcast and yes it did the job, it was easy to get running etc and it is newer than Ehcache. I can say that Ehcache has much more features than Hazelcast, is more mature, and has big support behind it.
There are several other good cache solutions as well, with all different properties and solutions such as good old Memcache, Membase (now CouchBase), Redis, AppFabric, even several NoSQL solutions which provides key value stores with or without persistence. They all have different characteristics in the sense they implement CAP theorem, or BASE theorem along with transactions.
You should care more about, which one have the functionality you want in your application, again, you should consider CAP theorem or BASE theorem for your application.
This test was done very recently with Cassandra on the cloud by Netflix. They reached to million writes per second with about 300 instances. Cassandra is not a memory cache but you data model is like a cache, which is consist of key value pairs. You can as well use Cassandra as a distributed memory cache.
Hazelcast has been a nightmare to scale and stability is still a major issue.
The dedicated client to grid component choices are
The messy version that cant survive node loss anywhere, negating the point of backups (superclient), or
An incredibly slow native client option that does not allow for any type of load balancing to processing nodes in the grid.
If any host could request records from this data grid it would be a sweet design, but you are stuck with those two lackluster option to get anything out of it.
Also multiple issues with database thread pools locking up on individual members and not writing anything to the databases, causing permanent records loss is a frequent issue and we often have to take the whole thing down for hours to refresh any of the JVM's. Split brain is also still an issue, although in 1.9.6 it seems to have calmed down a little.
Rallying to move to Ehcache and improving the database layer instead of using this as a band-aid.
Hazelcast serializes everything whenever there is a node (standard-one), so the data you will save to Hazelcast must implement serialization.
http://open.bekk.no/efficient-java-serialization/
Hazelcast has been a nightmare for me. I was able to get it "working" in a clustered Websphere environment. I use the term "working" loosely. First, all of Hazelcast's documentation is out of date and only shows examples using deprecated method calls. Trying to use the new code without comments in the Javadocs and no examples in the documentation is very hard. Also, the J2EE container code simply does not work at this point because it does not support XA transactions in Websphere. An error is thrown calling code that follows their only J2EE example explicitly(it does look like Milestone 3.0 is addressing this). I had to forget about joining Hazelcast to a J2EE transaction. It does seem Hazelcast is definitely geared to a non EJB/Non-J2EE container environment. Making calls to Hazelcast.getAllInstances() fails to retain any information about Hazelcast's state when switching from one enterprise java bean to another. That forces me to create a new Hazelcast instance just to run calls that give me access to my data. That causes many Hazelcast Instances to start up on the same JVM. Also,retrieving data from Hazelcast is not fast. I tried retrieving data using both the Native Client and directly as a member of the cluster. I stored 51 lists, each containing only 625 objects in Hazelcast. I could not perform a query directly on a list and did not want to store a map just to get access to that feature (SQL operations can be performed on a map). It took about a half second to retrieve each list of 625 objects because Hazelcast Serializes the entire list and sends it over the wire rather than just giving me the delta (what has changed). Another thing, I had to switch to a TCPIP configuration and explicitly list the ip addresses of the servers I wanted to be in the cluster. The default Multicast configuration did not work and from the group discussions in google, other people are experiencing that difficulty as well. To sum up; I did eventually get 8 machines communicating in a cluster through many hours of torturous programmatic configuration and trial and error (the documentation will be little help) but when I did, I still had no control over the number of instances and partitions being created on each JVM due to the half finished nature of Hazelcast for EJB/J2EE and it was VERY SLOW. I implemented a real use case in the unemployment insurance application I work on and the code was much faster making direct calls to the database. It would have been cool if Hazelcast worked as advertised because I really did not want to use a separate service to implement what I am trying to do. I have used MongoDB extensively so I may skip the whole in memory cache and just serialize my objects as documents in a separate repository.
One advantage of Ehcache is that it is backed by a company (Terracotta) that does extensive performance, failover, and platform testing in a large performance lab. Terracotta provides support, indemnity, etc. For many companies, that sort of thing is important.
I have not used Hazelcast but I've heard that it is easy to use and that it works. I haven't heard anything with respect to scalability or performance of Hazelcast vs Terracotta/Ehcache but given the amount of scalability and failover testing that Terracotta does, it's hard for me to imagine that Hazelcast would be competitive in a production deployment. But I presume it would work fine for smaller uses.
[Bias: I'm a former employee of Terracotta.]
Developers describe Ehcache as "Java's Most Widely-Used Cache". Ehcache is an open-source, standards-based cache for boosting performance, offloading your database, and simplifying scalability. It's the most widely-used Java-based cache because it's robust, proven, and full-featured. Ehcache scales from in-process, with one or more nodes, all the way to mixed in-process/out-of-process configurations with terabyte-sized caches. On the other hand, Hazelcast is detailed as "Clustering and highly scalable data distribution platform for Java". With its various distributed data structures, distributed caching capabilities, elastic nature, memcache support, integration with Spring and Hibernate and more importantly with so many happy users, Hazelcast is feature-rich, enterprise-ready and developer-friendly in-memory data grid solution.
Ehcache and Hazelcast are primarily classified as "Cache" and "In-Memory Databases" tools respectively.

Resources