With Memcached, it is my understanding that each of the cache servers doesn't need to know diddly about the other servers. With AppFabric Cache on the other hand, the shared configuration links the servers (and consequently becomes a single point of failure).
Is it possible to use AppFabric cache servers independently? In other words, can the individual clients choose where to store their key/values based on the available cache servers and would that decision be the same for all clients (the way it is with memcached).
NOTE: I do realize that more advanced features such as tagging would be broken without all the servers knowing about each other.
Are you viewing the shared configuration as a single point of failure? If you are using SQL Server as your configuration repository, then this shouldn't be an issue with a redundant SQL Server setup.
This approach would obviously loose you all of the benefits of using a distributed cache, however, if you really want to do this then simply don't use a shared configuration. When you configure a new AppFabric node create a new configuration file or database. Choosing an existing one basically says "add this new node to the existing cache cluster".
Related
Correct me if I'm wrong, but from my understanding, "database caches" are usually implemented with an in-memory database that is local to the web server (same machine as the web server). Also, these "database caches" store the actual results of queries. I have also read up on the multiple caching strategies like - Cache Aside, Read Through, Write Through, Write Behind, Write Around.
For some context, the Write Through strategy looks like this:
and the Cache Aside strategy looks like this:
I believe that the "Application" refers to a backend server with a REST API.
My first question is, in the Write Through strategy (application writes to cache, cache then writes to database), how does this work? From my understanding, the most commonly used database caches are Redis or Memcached - which are just key-value stores. Suppose you have a relational database as the main database, how are these key-value stores going to write back to the relational database? Do these strategies only apply if your main database is also a key-value store?
In a Write Through (or Read Through) strategy, the cache sits in between the application and the database. How does that even work? How do you get the cache to talk to the database server? From my understanding, the web server (the application) is always the one facilitating the communication between the cache and the main database - which is basically a Cache Aside strategy. Unless Redis has some kind of functionality that allows it to talk to another database, I don't quite understand how this works.
Isn't it possible to mix and match caching strategies? From how I see it, Cache Aside and Read Through are caching strategies for application reads (user wants to read data), while Write Through and Write Behind are caching strategies for application writes (user wants to write data). Couldn't you have a strategy that uses both Cache Aside and Write Through? Why do most articles always seem to portray them as independent strategies?
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
Could you implement a cache using a normal (not in-memory) database? I suppose this would still be somewhat useful since you do not need to make an additional network hop to the database server (since the cache lives on the same machine as the web server)?
Introduction & clarification
I guess you have one misunderstood point, that the cache is NOT expclicitely stored on the same server as the werbserver. Sometimes, not even the database is sperated on it's own server from the webserver. If you think of APIs, like HTTP REST APIs, you can use caching to not spend too many resources on database connections & queries. Generally, you want to use as few database connections & queries as possible. Now imagine the following setting:
You have a werbserver who serves your application and a REST API, which is used by the webserver to work with some resources. Those resources come from a database (lets say a relational database) which is also stored on the same server. Now there is one endpoint which serves e.g. a list of posts (like blog-posts). Every user can fetch all posts (to make it simple in this example). Now we have a case where one can say that this API request could be cached, to not let all users always trigger the database, just to query the same resources (via the REST API) over and over again. Here comes caching. Redis is one of many tools which can be used for caching. Since redis is a simple in-memory key-value storage, you can just put all of your posts (remember the REST API) after the first DB-query, into the cache. All future requests for the posts-list would first check whether the posts are alreay cached or not. If they are, the API will return the cache-content for this specific request.
This is one simple example to show off, what caching can be used for.
Answers on your question
My first question is, why would you ever write to a cache?
To reduce the amount of database connections and queries.
how is writing to these key-value stores going to help with updating the relational database?
It does not help you with updating, but instead it helps you with spending less resources. It also helps you in terms of "temporary backing up" some data - but that only as a very little side effect. For this, out there are more attractive solutions (Since redis is also not persistent by default. But it supports persistence.)
Do these cache writing strategies only apply if your main database is also a key-value store?
No, it is not important which database you use. Whether it's a NoSQL or SQL DB. It strongly depends on what you want to cache and how the database and it's tables are set up. Do you have frequent changes in your recources? Do resources get updated manually or only on user-initiated actions? Those are questions, leading you to the right caching implementation.
Isn't it possible to mix and match caching strategies?
I am not an expert at caching strategies, but let me try:
I guess it is possible but it also, highly depends on what you are doing in your DB and what kind of application you have. I guess if you find out what kind of application you are building up, then you will know, what strategy you have to use - i guess it is also not recommended to mix those strategies up, because those strategies are coupled to your application type - in other words: It will not work out pretty well.
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
I guess that both is possible. Usually you have one database, maybe clustered or synchronized with copies, to which your webservers (e.g. REST APIs) make their requests. Then whether each of you API servers would have it's own cache, to not query the database at all (in cloud-based applications your database is also maybe on another separated server - so another "hop" in terms of networking). OR (what i also can imagine) you have another middleware between your APIs (clusterd up) and your DB (maybe also clustered up) - but i guess that no one would do that because of the network traffic. It would result in a higher response-time, what you usually want to prevent.
Could you implement a cache using a normal (not in-memory) database?
Yes you could, but it would be way slower. A machine can access in-memory data faster then building up another (local) connection to a database and query your cached entries. Also, because your database has to write the entries into files on your machine, to persist the data.
Conclusion
All in all, it is all about being fast in terms of response times and to prevent much network traffic. I hope that i could help you out a little bit.
I have two servers, where I will be deploying the same application. Basically these two servers will handle work from a common Web API, the work that handed out will be transformed and go through some logic and loaded into DB. I want to cache the data the get loaded/update or deleted in the database, so that when the same data is referenced i can get it from the Cache (Kind of explained the cache mechanism). Now I am using Ncache and it working perfectly fine within one application. I am trying have kind of a shared cache, so that both my application can have access to. How do i go about doing it?
NCache is a distributed cache so you can continue to use that.
There is good general documentation available and very good getting started material that walks you through all the steps required.
In essence you install NCache on both the servers and then reference both servers in your client configuration (%NCHOME%\config\client.ncconf)
In cluster caches, a single logical cache instance is distributed over multiple server nodes and because the cache process is running outside the application address space, multiple applications can share and see the same exact cache data change in terms of addition, removal and update of the cache content.
Local out-proc caches are limited to one server node but as they are outside the application address space, they also support sharing of data between applications.
In fact, besides allowing multiple applications to share data, NCache supports a pub/sub infrastructure to allow for multiple applications to actually communicate with each other. This allows NCache to play a key part in setting up a fast and reliable microservices environment wherein all the participating services send messages to each other through the NCache platform.
See the link below where they have shared information about NCache topologies
http://www.alachisoft.com/resources/docs/ncache/admin-guide/cache-topologies.html
http://www.alachisoft.com/resources/videos/five-steps-getting-started.html
I have a web server cluster that contains many running web server instances. each instance cache some configurations in its local memory, the original configurations are stored in Database.
these configurations are used for every request, so the cache may necessary for performance reason.
I want to provide an admin page, in which, the administrator can change the configurations. how do I update all the cache in every server instance?
now I have two solutions for this:
set an expire time for the cache.
when administrator update the configuration, notify each instance via some pub/sub mechanism(e.g. use redis).
for solution 1, the drawback is the changes can not take effect immediately.
for solution 2, I'm wondering, if the pub/sub will have impact on the performance of the web server.
which one is better? or is there any common solution for this problem?
Another drawback of option 1 is that you'll periodically hit your database unnecessarily.
If you're already using Redis then option 2 is a good solution. I've used it successfully and can't imagine how there could be a performance impact just because you're using pubsub.
Another option is to create a cache invalidation URL on each website, e.g. /admin/cache-reset/, and have your administration tool call the cache-reset URL on each individual server. The drawback of this solution is that you need to maintain a list of servers. If you're not already using Redis it could just be the simple/practical/low-tech solution that you're looking for.
I have setup recently memcached for a PHP site with lot of traffic. Before we used APC but this lacks the possibility to have a unique cache system (invalidating one key on one server doesn't invalidate through the others).
I noticed a big difference when comes to memcached being on same machine as http server or on separated server.
http+memcached on same server -> 0.06 average time spent to deliver a page
http and memcache on diff servers (but under NAT) -> 0.15 - 0.20 to the deliver a page
So it's a huge difference and I am wondering if won't be better to have the cache system on same machine as http. The additional complexity is the fact the website is served by couple http servers (through a load balancer). So I actually need a cache system with replication, each http server having a cache "copy" and writing the changes only to the "master" (or other approach doing similar things).
There are couple of such systems (couchbase, redis, aso). I think couchbase is not good for this as won't allow connecting to local cache server but rather to the "gate". Redis may work, I am still checking on others.
The main this is: has someone tried this approach to speed up the website? By having on each machine a cache "copy" (kept in synch with the others)?
You can use GigaSpaces XAP solution which is a distributed in memory data grid, but also has an integration with jetty allowing you to deploy your web app and manage it from a single management system. The central distributed data grid (which can be a used as simple cache) can have a local cache on each web container which is kept in sync with the main cache, you don't have to use the jetty integration for it, you can still use your own web container and just create a proxy to the distributed cache with an embedded local cache via code. Or you can also have a fully replicated topology between the web containers without having a main distributed cache and each web container will contain a full copy of the entire cache which will be in sync with the other instances of the web container.
You can read more in:
http://wiki.gigaspaces.com/wiki/display/SBP/Web+Service+PU
http://wiki.gigaspaces.com/wiki/display/XAP9/Web+Jetty+Processing+Unit+Container
http://wiki.gigaspaces.com/wiki/display/XAP9/Client+Side+Caching
Disclaimer: I am a developer working for GigaSpaces.
I'm building an application with multiple server involved. (4 servers where each one has a database and a webserver. 1 master database and 3 slaves + one load balancer)
There is several approach to enable caching. Right now it's fairly simple and not efficient at all.
All the caching is done on an NFS partition share between all servers. NFS is the bottleneck in the architecture.
I have several ideas implement
caching. It can be done on a server
level (local file system) but the
problem is to invalidate a cache
file when the content has been
update on all server : It can be
done by having a small cache
lifetime (not efficient because the
cache will be refresh sooner that it
should be most of the time)
It can also be done by a messaging
sytem (XMPP for example) where each
server communicate with each other.
The server responsible for the
invalidation of the cache send a
request to all the other to let them
know that the cache has been
invalidated. Latency is probably
bigger (take more time for everybody
to know that the cache has been
invalidated) but my application
doesn't require atomic cache
invalidation.
Third approach is to use a cloud
system to store the cache (like
CouchDB) but I have no idea of the
performance for this one. Is it
faster than using a SQL database?
I planned to use Zend Framework but I don't think it's really relevant (except that some package probably exists in other Framework to deal with XMPP, CouchDB)
Requirements: Persistent cache (if a server restart, the cache shouldn't be lost to avoid bringing down the server while re-creating the cache)
http://www.danga.com/memcached/
Memcached covers most of the requirements you lay out - message-based read, commit and invalidation. High availability and high speed, but very little atomic reliability (sacrificed for performance).
(Also, memcached powers things like YouTube, Wikipedia, Facebook, so I think it can be fairly well-established that organizations with the time, money and talent to seriously evaluate many distributed caching options settle with memcached!)
Edit (in response to comment)
The idea of a cache is for it to be relatively transitory compared to your backing store. If you need to persist the cache data long-term, I recommend looking at either (a) denormalizing your data tier to get more performance, or (b) adding a middle-tier database server that stores high-volume data in straight key-value-pair tables, or something closely approximating that.
In defence of memcached as a cache store, if you want high peformance with low impact of a server reboot, why not just have 4 memcached servers? Or 8? Each 'reboot' would have correspondingly less effect on the database server.
I think I found a relatively good solution.
I use Zend_Cache to store locally each cache file.
I've created a small daemon based on nanoserver which manage cache files locally too.
When one server create/modify/delete a cache file locally, it send the same action to all server through the daemon which do the same action.
That mean I have local caching files and remote actions at the same time.
Probably not perfect, but should work for now.
CouchDB was too slow and NFS is not reliable enough.