Replicated cache system to each http servers - caching

I have setup recently memcached for a PHP site with lot of traffic. Before we used APC but this lacks the possibility to have a unique cache system (invalidating one key on one server doesn't invalidate through the others).
I noticed a big difference when comes to memcached being on same machine as http server or on separated server.
http+memcached on same server -> 0.06 average time spent to deliver a page
http and memcache on diff servers (but under NAT) -> 0.15 - 0.20 to the deliver a page
So it's a huge difference and I am wondering if won't be better to have the cache system on same machine as http. The additional complexity is the fact the website is served by couple http servers (through a load balancer). So I actually need a cache system with replication, each http server having a cache "copy" and writing the changes only to the "master" (or other approach doing similar things).
There are couple of such systems (couchbase, redis, aso). I think couchbase is not good for this as won't allow connecting to local cache server but rather to the "gate". Redis may work, I am still checking on others.
The main this is: has someone tried this approach to speed up the website? By having on each machine a cache "copy" (kept in synch with the others)?

You can use GigaSpaces XAP solution which is a distributed in memory data grid, but also has an integration with jetty allowing you to deploy your web app and manage it from a single management system. The central distributed data grid (which can be a used as simple cache) can have a local cache on each web container which is kept in sync with the main cache, you don't have to use the jetty integration for it, you can still use your own web container and just create a proxy to the distributed cache with an embedded local cache via code. Or you can also have a fully replicated topology between the web containers without having a main distributed cache and each web container will contain a full copy of the entire cache which will be in sync with the other instances of the web container.
You can read more in:
http://wiki.gigaspaces.com/wiki/display/SBP/Web+Service+PU
http://wiki.gigaspaces.com/wiki/display/XAP9/Web+Jetty+Processing+Unit+Container
http://wiki.gigaspaces.com/wiki/display/XAP9/Client+Side+Caching
Disclaimer: I am a developer working for GigaSpaces.

Related

Setting Up Ncache (Distributed?/Shared)

I have two servers, where I will be deploying the same application. Basically these two servers will handle work from a common Web API, the work that handed out will be transformed and go through some logic and loaded into DB. I want to cache the data the get loaded/update or deleted in the database, so that when the same data is referenced i can get it from the Cache (Kind of explained the cache mechanism). Now I am using Ncache and it working perfectly fine within one application. I am trying have kind of a shared cache, so that both my application can have access to. How do i go about doing it?
NCache is a distributed cache so you can continue to use that.
There is good general documentation available and very good getting started material that walks you through all the steps required.
In essence you install NCache on both the servers and then reference both servers in your client configuration (%NCHOME%\config\client.ncconf)
In cluster caches, a single logical cache instance is distributed over multiple server nodes and because the cache process is running outside the application address space, multiple applications can share and see the same exact cache data change in terms of addition, removal and update of the cache content.
Local out-proc caches are limited to one server node but as they are outside the application address space, they also support sharing of data between applications.
In fact, besides allowing multiple applications to share data, NCache supports a pub/sub infrastructure to allow for multiple applications to actually communicate with each other. This allows NCache to play a key part in setting up a fast and reliable microservices environment wherein all the participating services send messages to each other through the NCache platform.
See the link below where they have shared information about NCache topologies
http://www.alachisoft.com/resources/docs/ncache/admin-guide/cache-topologies.html
http://www.alachisoft.com/resources/videos/five-steps-getting-started.html

how to update local memory cache in all server instances

I have a web server cluster that contains many running web server instances. each instance cache some configurations in its local memory, the original configurations are stored in Database.
these configurations are used for every request, so the cache may necessary for performance reason.
I want to provide an admin page, in which, the administrator can change the configurations. how do I update all the cache in every server instance?
now I have two solutions for this:
set an expire time for the cache.
when administrator update the configuration, notify each instance via some pub/sub mechanism(e.g. use redis).
for solution 1, the drawback is the changes can not take effect immediately.
for solution 2, I'm wondering, if the pub/sub will have impact on the performance of the web server.
which one is better? or is there any common solution for this problem?
Another drawback of option 1 is that you'll periodically hit your database unnecessarily.
If you're already using Redis then option 2 is a good solution. I've used it successfully and can't imagine how there could be a performance impact just because you're using pubsub.
Another option is to create a cache invalidation URL on each website, e.g. /admin/cache-reset/, and have your administration tool call the cache-reset URL on each individual server. The drawback of this solution is that you need to maintain a list of servers. If you're not already using Redis it could just be the simple/practical/low-tech solution that you're looking for.

Performance difference between Azure Redis cache and In-role cache for outputcaching

We are moving an asp.net site to Azure Web Role and Azure Sql Database. The site is using output cache and normal Cache[xxx] (i.e. HttpRuntime.Cache). These are now stored in the classic way in the web role instance memory.
The low hanging fruit is to first start using a distributed cache for output caching. I can use in-role cache, either as co-located or with a dedicated cache role, or Redis cache. Both have outputcache providers ready made.
Are there any performance differences between the two (thee with co-located/dedicated) cache methods?
One thing to consider is that will getting the page from Redis for every pageload on every server be faster or slower than composing the page from scratch one every server every 120 seconds but inbetween just getting it from local memory?
Which will scale better when we want to start caching our own data (i.e. pocos) in a distributed cache instead of HttpRuntime.Cache?
-Mathias
Answering to your each question individually:
Are there any performance differences between the two (thee with
co-located/dedicated) cache methods?
Definately co-located caching solution is faster than dedicated cache server, as in co-located/inproc solution request will be handled locally within the process where as dedicated cache solution will involve getting data over the network. However since data will be in-memory on cache server, getting will still be faster than getting from DB.
One thing to consider is that will getting the page from Redis for
every pageload on every server be faster or slower than composing the
page from scratch one every server every 120 seconds but inbetween
just getting it from local memory?
It will depend on number of objects on page i.e. time taken to compose the page from scratch. Though getting from cache will involve network trip time but its mostly in fractions of a millisecond.
Which will scale better when we want to start caching our own data
(i.e. pocos) in a distributed cache instead of HttpRuntime.Cache?
Since HttpRuntime.Cache is in-process caching, it is limited to single process's memory therefore it is not scalable. A distributed cache on the other hand is a scalable solution where you can always add more servers to increase cache space and throughput. Also out-proc nature of distributed cache solution makes it possible to access data cached by on application process to be used by any other process.
You can also look into NCache for Azure as a distributed caching solution. NCache is a native .Net distributed caching solution.
Following blog posts by Iqbal Khan will help you better understand the need of distributed cache for ASP.Net applications:
Improve ASP.NET Performance in Microsoft Azure with Output Cache
How to use a Distributed Cache for ASP.NET Output Cache
I hope this helps :-)
-Sameer

THE FASTEST Smarty Cache Handler

Does anyone know if there is an overview of the performance of different cache handlers for smarty?
I compared smarty file cache with a memcache handler, but it seemed memcache has a negative impact on performance.
I figured there would be a faster way to cache than through the filesystem... am I wrong?
I don't have a systematic answer for you, but in my experience, the file cache is the fastest. I should clarify that I haven't done any serious performance tests, but in all the time I've used Smarty, I have found the file cache to work best.
One this that definitely improves performance is to disable checking if the template files have changed. This avoids having to stat the tpl files.
File caching is ok when you have a single server instance or using shared drive (NFS) in a server cluster, but when you have a web server cluster (two or more web servers serving the same content), the problem with file based caching is not sync across the web servers. To perform a simple rsync on the caching directories is error prone. May work flawlessly for awhile but not a stable solution. The best solution for a cluster is to use distributed caching, that is memcache, which is a separate server running a memcached instance and each web server has PHP Memcache installed. Each server will then check for the existent of a cached page/item and if exists pulls from memcache otherwise will generate from the database and then save into memcached. When you are dealing with clusters, you cannot skimp on a good caching mechanism. If you are dealing with clusters, then your site already has more traffic (or will be) for a single server to handle.
There is beginners level cluster environment which can be implemented for a relative low cost. You can set up two colocated servers (nginx load balancer and a memcached server), then using free shared web hosting, you create an account of the same domain on those free hosting accounts and install your content. You configure your nginx load balancer to point to the IP addresses of the free web hosts. The free web hosts must have php5 memcache installed or the solution will not work.
Then you set you DNS for the domain with the registrar to point the NGINX IP (which would be a static ip if you are colocating). Now when someone access your domain, nginx redirects to one of your web server clusters located on the free hosting.
You may also want to consider a CDN to off load traffic when serving the static content.

Caching with multiple server

I'm building an application with multiple server involved. (4 servers where each one has a database and a webserver. 1 master database and 3 slaves + one load balancer)
There is several approach to enable caching. Right now it's fairly simple and not efficient at all.
All the caching is done on an NFS partition share between all servers. NFS is the bottleneck in the architecture.
I have several ideas implement
caching. It can be done on a server
level (local file system) but the
problem is to invalidate a cache
file when the content has been
update on all server : It can be
done by having a small cache
lifetime (not efficient because the
cache will be refresh sooner that it
should be most of the time)
It can also be done by a messaging
sytem (XMPP for example) where each
server communicate with each other.
The server responsible for the
invalidation of the cache send a
request to all the other to let them
know that the cache has been
invalidated. Latency is probably
bigger (take more time for everybody
to know that the cache has been
invalidated) but my application
doesn't require atomic cache
invalidation.
Third approach is to use a cloud
system to store the cache (like
CouchDB) but I have no idea of the
performance for this one. Is it
faster than using a SQL database?
I planned to use Zend Framework but I don't think it's really relevant (except that some package probably exists in other Framework to deal with XMPP, CouchDB)
Requirements: Persistent cache (if a server restart, the cache shouldn't be lost to avoid bringing down the server while re-creating the cache)
http://www.danga.com/memcached/
Memcached covers most of the requirements you lay out - message-based read, commit and invalidation. High availability and high speed, but very little atomic reliability (sacrificed for performance).
(Also, memcached powers things like YouTube, Wikipedia, Facebook, so I think it can be fairly well-established that organizations with the time, money and talent to seriously evaluate many distributed caching options settle with memcached!)
Edit (in response to comment)
The idea of a cache is for it to be relatively transitory compared to your backing store. If you need to persist the cache data long-term, I recommend looking at either (a) denormalizing your data tier to get more performance, or (b) adding a middle-tier database server that stores high-volume data in straight key-value-pair tables, or something closely approximating that.
In defence of memcached as a cache store, if you want high peformance with low impact of a server reboot, why not just have 4 memcached servers? Or 8? Each 'reboot' would have correspondingly less effect on the database server.
I think I found a relatively good solution.
I use Zend_Cache to store locally each cache file.
I've created a small daemon based on nanoserver which manage cache files locally too.
When one server create/modify/delete a cache file locally, it send the same action to all server through the daemon which do the same action.
That mean I have local caching files and remote actions at the same time.
Probably not perfect, but should work for now.
CouchDB was too slow and NFS is not reliable enough.

Resources