Caching with multiple server - caching

I'm building an application with multiple server involved. (4 servers where each one has a database and a webserver. 1 master database and 3 slaves + one load balancer)
There is several approach to enable caching. Right now it's fairly simple and not efficient at all.
All the caching is done on an NFS partition share between all servers. NFS is the bottleneck in the architecture.
I have several ideas implement
caching. It can be done on a server
level (local file system) but the
problem is to invalidate a cache
file when the content has been
update on all server : It can be
done by having a small cache
lifetime (not efficient because the
cache will be refresh sooner that it
should be most of the time)
It can also be done by a messaging
sytem (XMPP for example) where each
server communicate with each other.
The server responsible for the
invalidation of the cache send a
request to all the other to let them
know that the cache has been
invalidated. Latency is probably
bigger (take more time for everybody
to know that the cache has been
invalidated) but my application
doesn't require atomic cache
invalidation.
Third approach is to use a cloud
system to store the cache (like
CouchDB) but I have no idea of the
performance for this one. Is it
faster than using a SQL database?
I planned to use Zend Framework but I don't think it's really relevant (except that some package probably exists in other Framework to deal with XMPP, CouchDB)
Requirements: Persistent cache (if a server restart, the cache shouldn't be lost to avoid bringing down the server while re-creating the cache)

http://www.danga.com/memcached/
Memcached covers most of the requirements you lay out - message-based read, commit and invalidation. High availability and high speed, but very little atomic reliability (sacrificed for performance).
(Also, memcached powers things like YouTube, Wikipedia, Facebook, so I think it can be fairly well-established that organizations with the time, money and talent to seriously evaluate many distributed caching options settle with memcached!)
Edit (in response to comment)
The idea of a cache is for it to be relatively transitory compared to your backing store. If you need to persist the cache data long-term, I recommend looking at either (a) denormalizing your data tier to get more performance, or (b) adding a middle-tier database server that stores high-volume data in straight key-value-pair tables, or something closely approximating that.

In defence of memcached as a cache store, if you want high peformance with low impact of a server reboot, why not just have 4 memcached servers? Or 8? Each 'reboot' would have correspondingly less effect on the database server.

I think I found a relatively good solution.
I use Zend_Cache to store locally each cache file.
I've created a small daemon based on nanoserver which manage cache files locally too.
When one server create/modify/delete a cache file locally, it send the same action to all server through the daemon which do the same action.
That mean I have local caching files and remote actions at the same time.
Probably not perfect, but should work for now.
CouchDB was too slow and NFS is not reliable enough.

Related

What are the size limits for Laravel's file-based caching?

I am a new developer and am trying to implement Laravel's (5.1) caching facility to improve the speed of my app. I started out caching a large DB table that my app constantly references - but it got too large so I have backed away from that and am now 'forever' caching smaller chunks of data - for example, for each page only the portions of that large DB table that are relevant.
I have watched 'Caching Essentials' on Laracasts, done some Googling and had a search in this forum (and Laracasts') but I still have a couple of questions:
I am not totally clear on how the cache size limits work when you are using Laravel's file-based system - is there an overall in-app size limit for the cache or is one limited size-wise only per key and by your server size?
What are the signs you should switch from file-based caching to something like Memcached or Redis - and what are the benefits of using one of those services? Is it the fact that your caching is handled on a different server (thereby lightening the load on your own)? Do you switch over to one of these services when your local, file-based cache gets too big for your server?
My app utilizes several tables that have 3,000-4,000 rows - the data in these tables is constantly referenced and will remain static unless I decide to add new options. I am basically looking for the best way to speed up queries to the data in these tables.
Thanks!
I don't think Laravel imposes any limitations on its file i/o at all - the limitations will be with how much what PHP can read / write to a file at once, or hold in its memory / process at any one time.
It does serialise the data that you cache, and unserialise it when you reload it, so your PHP environment would have to be able to process the entire cache file (which is equivalent to the top level cache key) at once. So, if you are getting cacheduser.firstname, it would have to load the whole cacheduser key from the file, unserialise it, then get the firstname key from that.
I would take the PHP memory limit (classic, i know!) as a first point to investigate if you want to keep down this road.
Caching services like Redis or memcached are bespoke, optimised caching solutions. They take some of the logic and responsibility out of your PHP environment.
They can, for example, retrieve sub-keys from items without having to process the whole thing, so can retrieve part of some cached data in a memory efficient way. So, when you request cacheduser.firstname from redis, it just returns you the firstname attribute.
They have other advantages regarding tagging / clearing out subsets of caches (see [the cache tags Laravel docs] (https://laravel.com/docs/5.4/cache#cache-tags))
Another thing to think about is scaling. If your site is large enough, and is load-balanced across multiple servers, the filesystem caching may be different across those servers, as each server can only check their local filesystem for the cache files. A caching service can be on a different server (many hosts will have a separate redis / memcached services available), so isn't victim to this issue.
Also - as I understand it (and this might be the most important thing), the file cache driver in Laravel is mainly for local development and testing. Although it can work fine for simple applications with basic caching needs, it's not intended for large scalable production environments.
Personally, I develop locally and test with file caching, as i'm only dealing with small amounts of data then, and use redis to cache on production environments.
It doesn't necessarily need to be on a separate server to get the benefits. If you are never going to scale to multiple application servers, then using a caching service on the same server will already be a large improvement to caching large documents.

Is it good idea to provide cache as service?

We have many web services and web app applications which have caching needs so we are trying to come up with caching strategy which can help all the teams irrespective of their technical choices. We have used Memcached(not replicated) & Couchbase(multi master) running locally on each server node and applications connect to them locally using Memcached protocol but going forward we are planning to go with centralized cache cluster exposed via REST APIs which can be used by all the applications running on different server nodes in a datacenter. Following are reasons behind this thought process:
Easy maintenance of a cluster without worrying about app server
nodes.
Single protocol(HTTP) used to access the cache without worrying
about underlying implementation.(We might use Redis or Couchbase or
Aerospike cluster)
But we are not sure about this strategy because we are worried about performance impact due to network overhead because of HTTP.
Has anyone tried this strategy? Is it a good idea to make cache as centralized service or local caches are the best?
While it's true that HTTP and network add latency, generally you need a cache because the actual operation takes significantly longer. So the question is: if you add 1-2 milliseconds to the cache access, does that still shorten the un-cached operation time significantly? If the answer is yes, and you follow some common best practices, having a centralized cache could be a good idea.
You might want to look into low latency, high throughput server-side frameworks for the HTTP service, like Node.js or Go. Also, you will probably benefit from implementing proper ETag support in your cache HTTP API.
Another alternative might be centralizing the cache server(s) without wrapping them in an HTTP layer. There are standard cache provider implementations for all the technologies you mentioned available for most modern web frameworks.
Disclaimer: I work for Redis Labs, a commercial company that makes tools for managing Redis and Memcached clusters. My employer, Redis Labs, has made a business of the strategy that you want affirmed :)
Cache is a dish best served close, but remote caching has benefits (e.g., offloading the DB) even if the latency penalty suggests differently. In most cases, compared to the time spent in the application, the local area network latency becomes negligible, so using a shared network-attached cache makes a lot of sense.
To get the best performance, interact from your app directly with the shared cache using its own protocol. An HTTP API, unless provided by the caching engine itself, could add latency to the client app's requests. OTOH, formalizing your apps access to the cache with a custom layer (such as a REST API) has a lot of nice benefits too, so you should evaluate the cost in the context of your latency budgets.
Your strategy is sound and it is used everywhere to build scalable and performant applications. Feel free to hit me if you need further advice.

Performance difference between Azure Redis cache and In-role cache for outputcaching

We are moving an asp.net site to Azure Web Role and Azure Sql Database. The site is using output cache and normal Cache[xxx] (i.e. HttpRuntime.Cache). These are now stored in the classic way in the web role instance memory.
The low hanging fruit is to first start using a distributed cache for output caching. I can use in-role cache, either as co-located or with a dedicated cache role, or Redis cache. Both have outputcache providers ready made.
Are there any performance differences between the two (thee with co-located/dedicated) cache methods?
One thing to consider is that will getting the page from Redis for every pageload on every server be faster or slower than composing the page from scratch one every server every 120 seconds but inbetween just getting it from local memory?
Which will scale better when we want to start caching our own data (i.e. pocos) in a distributed cache instead of HttpRuntime.Cache?
-Mathias
Answering to your each question individually:
Are there any performance differences between the two (thee with
co-located/dedicated) cache methods?
Definately co-located caching solution is faster than dedicated cache server, as in co-located/inproc solution request will be handled locally within the process where as dedicated cache solution will involve getting data over the network. However since data will be in-memory on cache server, getting will still be faster than getting from DB.
One thing to consider is that will getting the page from Redis for
every pageload on every server be faster or slower than composing the
page from scratch one every server every 120 seconds but inbetween
just getting it from local memory?
It will depend on number of objects on page i.e. time taken to compose the page from scratch. Though getting from cache will involve network trip time but its mostly in fractions of a millisecond.
Which will scale better when we want to start caching our own data
(i.e. pocos) in a distributed cache instead of HttpRuntime.Cache?
Since HttpRuntime.Cache is in-process caching, it is limited to single process's memory therefore it is not scalable. A distributed cache on the other hand is a scalable solution where you can always add more servers to increase cache space and throughput. Also out-proc nature of distributed cache solution makes it possible to access data cached by on application process to be used by any other process.
You can also look into NCache for Azure as a distributed caching solution. NCache is a native .Net distributed caching solution.
Following blog posts by Iqbal Khan will help you better understand the need of distributed cache for ASP.Net applications:
Improve ASP.NET Performance in Microsoft Azure with Output Cache
How to use a Distributed Cache for ASP.NET Output Cache
I hope this helps :-)
-Sameer

Is SQLite suitable for use as a read only cache on a web server?

I am currently building a high traffic GIS system which uses python on the web front end. The system is 99% read only. In the interest of performance, I am considering using an externally generated cache of pre-generated read-optimised GIS information and storing in an SQLite database on each individual web server. In short it's going to be used as a distributed read-only cache which doesn't have to hop over the network. The back end OLTP store will be postgreSQL but that will handle less than 1% of the requests.
I have considered using Redis but the dataset is quite large and therefore it will push up the administrative cost and memory cost on the virtual machines this is being hosted on. Memcache is not suitable as it cannot do range queries.
Am I going to hit read-concurrency problems with SQLite doing this?
Is this a sensible approach?
Ok after much research and performance testing, SQLite is suitable for this. It has good request concurrency on static data. SQLite only becomes an issue if you are doing writes as well as heavy reads.
More information here:
http://www.sqlite.org/lockingv3.html
if usage case is just a cache why don't you use something like
http://memcached.org/.
You can find memcached bindings for python in pypi repository.
Another options is that you use materialized views in postgres, this way you will keep things simple and have everything in one place.
http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

How do I update an expensive in-memory cache across a SharePoint farm?

We have 3 front-end servers each running multiple web applications. Each web application has an in memory cache.
Recreating the cache is very expensive (>1 min). Therefore we repopulate it using a web service call to each web application on each front-end server every 5 minutes.
The main problem with this setup is maintaining the target list for updating and the cost of creating the cache several times every few minutes.
We are considering using AppFabric or something similar but I am unsure how time consuming it is to get up and running. Also we really need the easiest solution.
How would you update an expensive in memory cache across multiple front-end servers?
The problem with memory caching is that it's unique to the server. I'm going with the idea that this is why you want to use AppFabric. I'm also assuming that you're re-creating the cache every few minutes to keep the in memory caches in sync across all servers. With all this work, I can well appreciate that caching is expensive for you.
It sounds like you're doing a lot of work that probably isn't necessary. This article has some detail about the caching mechanisms available within SharePoint. You may be interested in the output cache discussed near the top of the article. You may also want to read the linked TechNet article and the linked article called "Custom Caching Overview".
The only SharePoint way to do that is to use Service Application infrastructure. The only problem is that it requires some time to understand how it works. Also it's too complicated to do it from scratch. You might consider downloading one of existing applications and rename classes/GUIDs to match your naming conventions. I used this one: http://www.parago.de/2011/09/paragoservices-a-sharepoint-2010-service-application-sample/. In this case you can have single cache per N front-end servers.

Resources