Scalable and Fast restart able cache - caching

I am after a scalable & fast restartable caching solution which has a provision to boot up quickly (E.g. storing the cache on disk during application shut down (or while running) and restoring it back while application comes up) and can maintain a few GB size of cache. (Our cache is not a distributed cache)
Enterprise ehcache/BigMemory Go (http://ww1.terracotta.org/documentation/3.7.4/bigmemorygo/configuration/fast-restart) supports this feature but they are licensed.
Any other suggestions are also welcome. I dont want to use Gemfire or Geode as they need different infra to maintain.

Related

scalability of mobicents presence server

I understand that Mobicents PS is not supported now but I want to understand about the scalability of MSPS.
I understand from the source code that MSPS uses JBoss Cache instead of the database to store presence information. I understand the concept of cache but no idea of JBoss cache.
It seems that the storage is limited by the amount of memory available in the machine and whenever a new node(physical machine) is to be added the cache has to be replicated into that machine.
Is this correct behavior or my understanding is totally wrong.
The database is used and JBoss Cache is aimed to be used for replication of some of the volatile data to support failover.
Your mention about cache replication is correct but the memory limits concerns can be mitigated by using buddy replication instead of full cluster replication.
If you move to Cassandra and use in memory data grid such as infinispan or hazelcast, will be better nowadays.
The traditional presence has moved on from sharing all status from all contacts. Its valid to mention for example the issue on GitHub about Presence API, that is currently in development (https://github.com/Mobicents/RestComm/issues/380).
Would you like to contribute either to Presence Server or RestComm Presence in general?

Cache Cluster deployment topology

I'm going to deploy an in-memory cache cluster (current thinking Redis) for some public facing web workloads and was wondering where the cluster should live (deployment topology), two options IMO:
Sitting on the Web tier (which is horizontally scalable)
Create a dedicated cache cluster behind the Web Tier and in-front of the DB Tier.
Background, application on Web and DB Tier running on Windows, so if I stick the cluster on the Web Tier then it needs to be supported on Windows (MSFT have a stable Redis port), if I go with the dedicated cache tier I was thinking of some lightweight Linux servers (HA cluster) meaning as the Web Tier horizontally scaled it used this cache cluster for its lookups e.g. reference data etc.
Pros, cons thoughts, other option I'm missing?
*Note, I don't have the luxury of utilising a cloud service provider "cache as a service", not an option unfortunately ...
Cheers,
Surprised at the lack of community support around Redis and caching in general.
To answer my question, I ended up going with a Linux (RHEL) master/slave Redis cache tier, opted for master/slave deployment topology giving me HA at the cache tier (as opposed to a Redis cache cluster). Master gives me writes, master/slave allows for reads. Suits my needs as I will go to the DB on a cache miss, configured Redis to never persist to disk (in-memory only).

Infinispan : Which client API to choose?

If we implement the caching server using Infinispan, what are the possible client APIs to choose? Is Java Hot Rod client a good choice? Any other solutions?
Thank you!
As usually - Depends on your needs.
When you use HotRod you use Infinispan in a fashion similar to using MySQL/Sybase - you have an application that connects to the database backend which means
dedicated servers need to be set up and maintained
need to have multiple (dedicated) boxes to have high-availability and resiliency
but
HotRod client does some load-balancing for you
you can have dedicated data store servers with very specific configuration/separation/etc.
this mode is useful when Infinispan is used as a distributed store with database persistence
You might also use Infinispan in embedded mode, when you data is shared between you applications containing Infinispan instances; this mode is like having a HashMap that is
synchronized across the network with other boxes:
this gives you HA/resiliency by default (if your application is deployed with 2+ instances)
no need to have separate servers (no separate maintenance)
every new instance of your app will also contribute to the Infinispan cluster increasing HA/resiliency
(for testing you'll probably use Infinispan in embedded mode, anyway)
If you have your applications running on the same network segment (no firewall/switches/etc.) it might be easier just to use
Infinispan embedded mode, as it's easy to set up with lot of examples.
My recommendation would be to have a cache layer in your code that separates cache operations w/o the implementation so you can use whatever cache provider you want to use.
For Infininispan you should read the Infinispan User's Guide as #Galder pointed out.
The server modules documentation clarifies this.

Replicated cache system to each http servers

I have setup recently memcached for a PHP site with lot of traffic. Before we used APC but this lacks the possibility to have a unique cache system (invalidating one key on one server doesn't invalidate through the others).
I noticed a big difference when comes to memcached being on same machine as http server or on separated server.
http+memcached on same server -> 0.06 average time spent to deliver a page
http and memcache on diff servers (but under NAT) -> 0.15 - 0.20 to the deliver a page
So it's a huge difference and I am wondering if won't be better to have the cache system on same machine as http. The additional complexity is the fact the website is served by couple http servers (through a load balancer). So I actually need a cache system with replication, each http server having a cache "copy" and writing the changes only to the "master" (or other approach doing similar things).
There are couple of such systems (couchbase, redis, aso). I think couchbase is not good for this as won't allow connecting to local cache server but rather to the "gate". Redis may work, I am still checking on others.
The main this is: has someone tried this approach to speed up the website? By having on each machine a cache "copy" (kept in synch with the others)?
You can use GigaSpaces XAP solution which is a distributed in memory data grid, but also has an integration with jetty allowing you to deploy your web app and manage it from a single management system. The central distributed data grid (which can be a used as simple cache) can have a local cache on each web container which is kept in sync with the main cache, you don't have to use the jetty integration for it, you can still use your own web container and just create a proxy to the distributed cache with an embedded local cache via code. Or you can also have a fully replicated topology between the web containers without having a main distributed cache and each web container will contain a full copy of the entire cache which will be in sync with the other instances of the web container.
You can read more in:
http://wiki.gigaspaces.com/wiki/display/SBP/Web+Service+PU
http://wiki.gigaspaces.com/wiki/display/XAP9/Web+Jetty+Processing+Unit+Container
http://wiki.gigaspaces.com/wiki/display/XAP9/Client+Side+Caching
Disclaimer: I am a developer working for GigaSpaces.

Caching with multiple server

I'm building an application with multiple server involved. (4 servers where each one has a database and a webserver. 1 master database and 3 slaves + one load balancer)
There is several approach to enable caching. Right now it's fairly simple and not efficient at all.
All the caching is done on an NFS partition share between all servers. NFS is the bottleneck in the architecture.
I have several ideas implement
caching. It can be done on a server
level (local file system) but the
problem is to invalidate a cache
file when the content has been
update on all server : It can be
done by having a small cache
lifetime (not efficient because the
cache will be refresh sooner that it
should be most of the time)
It can also be done by a messaging
sytem (XMPP for example) where each
server communicate with each other.
The server responsible for the
invalidation of the cache send a
request to all the other to let them
know that the cache has been
invalidated. Latency is probably
bigger (take more time for everybody
to know that the cache has been
invalidated) but my application
doesn't require atomic cache
invalidation.
Third approach is to use a cloud
system to store the cache (like
CouchDB) but I have no idea of the
performance for this one. Is it
faster than using a SQL database?
I planned to use Zend Framework but I don't think it's really relevant (except that some package probably exists in other Framework to deal with XMPP, CouchDB)
Requirements: Persistent cache (if a server restart, the cache shouldn't be lost to avoid bringing down the server while re-creating the cache)
http://www.danga.com/memcached/
Memcached covers most of the requirements you lay out - message-based read, commit and invalidation. High availability and high speed, but very little atomic reliability (sacrificed for performance).
(Also, memcached powers things like YouTube, Wikipedia, Facebook, so I think it can be fairly well-established that organizations with the time, money and talent to seriously evaluate many distributed caching options settle with memcached!)
Edit (in response to comment)
The idea of a cache is for it to be relatively transitory compared to your backing store. If you need to persist the cache data long-term, I recommend looking at either (a) denormalizing your data tier to get more performance, or (b) adding a middle-tier database server that stores high-volume data in straight key-value-pair tables, or something closely approximating that.
In defence of memcached as a cache store, if you want high peformance with low impact of a server reboot, why not just have 4 memcached servers? Or 8? Each 'reboot' would have correspondingly less effect on the database server.
I think I found a relatively good solution.
I use Zend_Cache to store locally each cache file.
I've created a small daemon based on nanoserver which manage cache files locally too.
When one server create/modify/delete a cache file locally, it send the same action to all server through the daemon which do the same action.
That mean I have local caching files and remote actions at the same time.
Probably not perfect, but should work for now.
CouchDB was too slow and NFS is not reliable enough.

Resources