if I understand correctly, the Play Framework uses cookies to store the whole session, while PHP just stores a Session-ID in a cookie and saves the real session itself on the server-side.
The Play Framework promotes good horizontal scalability with its approach. But I do not see the advantage, if I use any other framework and save my session into a database, for example with Symfony and Redis.
So how is the Play Framework better (for some use cases)?
The initial idea behind Play's architecture is that the designers wanted it to be stateless ie. no data being maintained between requests on the server-side - this is why it does not follow the servlet spec. This opens up flexibility with things like scalability as you have mentioned - this is in itself a big advantage if your application is large enough that it needs to scale across more than a single machine - managing server-side session data across a cluster is a pain.
But of course, anything other than a trivial application, Will have a need to maintain some session data and as you have mentioned yourself you would usually do this with a cache or DB. The cookie session Play uses is restricted to about 4Kb so is only intended for a relatively small amount of data.
The benefits of a stateless architecture are manyfold and are what Play's architecture is designed to exploit.
A bit dated but the relevancy still applies in this article.
Related
Correct me if I'm wrong, but from my understanding, "database caches" are usually implemented with an in-memory database that is local to the web server (same machine as the web server). Also, these "database caches" store the actual results of queries. I have also read up on the multiple caching strategies like - Cache Aside, Read Through, Write Through, Write Behind, Write Around.
For some context, the Write Through strategy looks like this:
and the Cache Aside strategy looks like this:
I believe that the "Application" refers to a backend server with a REST API.
My first question is, in the Write Through strategy (application writes to cache, cache then writes to database), how does this work? From my understanding, the most commonly used database caches are Redis or Memcached - which are just key-value stores. Suppose you have a relational database as the main database, how are these key-value stores going to write back to the relational database? Do these strategies only apply if your main database is also a key-value store?
In a Write Through (or Read Through) strategy, the cache sits in between the application and the database. How does that even work? How do you get the cache to talk to the database server? From my understanding, the web server (the application) is always the one facilitating the communication between the cache and the main database - which is basically a Cache Aside strategy. Unless Redis has some kind of functionality that allows it to talk to another database, I don't quite understand how this works.
Isn't it possible to mix and match caching strategies? From how I see it, Cache Aside and Read Through are caching strategies for application reads (user wants to read data), while Write Through and Write Behind are caching strategies for application writes (user wants to write data). Couldn't you have a strategy that uses both Cache Aside and Write Through? Why do most articles always seem to portray them as independent strategies?
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
Could you implement a cache using a normal (not in-memory) database? I suppose this would still be somewhat useful since you do not need to make an additional network hop to the database server (since the cache lives on the same machine as the web server)?
Introduction & clarification
I guess you have one misunderstood point, that the cache is NOT expclicitely stored on the same server as the werbserver. Sometimes, not even the database is sperated on it's own server from the webserver. If you think of APIs, like HTTP REST APIs, you can use caching to not spend too many resources on database connections & queries. Generally, you want to use as few database connections & queries as possible. Now imagine the following setting:
You have a werbserver who serves your application and a REST API, which is used by the webserver to work with some resources. Those resources come from a database (lets say a relational database) which is also stored on the same server. Now there is one endpoint which serves e.g. a list of posts (like blog-posts). Every user can fetch all posts (to make it simple in this example). Now we have a case where one can say that this API request could be cached, to not let all users always trigger the database, just to query the same resources (via the REST API) over and over again. Here comes caching. Redis is one of many tools which can be used for caching. Since redis is a simple in-memory key-value storage, you can just put all of your posts (remember the REST API) after the first DB-query, into the cache. All future requests for the posts-list would first check whether the posts are alreay cached or not. If they are, the API will return the cache-content for this specific request.
This is one simple example to show off, what caching can be used for.
Answers on your question
My first question is, why would you ever write to a cache?
To reduce the amount of database connections and queries.
how is writing to these key-value stores going to help with updating the relational database?
It does not help you with updating, but instead it helps you with spending less resources. It also helps you in terms of "temporary backing up" some data - but that only as a very little side effect. For this, out there are more attractive solutions (Since redis is also not persistent by default. But it supports persistence.)
Do these cache writing strategies only apply if your main database is also a key-value store?
No, it is not important which database you use. Whether it's a NoSQL or SQL DB. It strongly depends on what you want to cache and how the database and it's tables are set up. Do you have frequent changes in your recources? Do resources get updated manually or only on user-initiated actions? Those are questions, leading you to the right caching implementation.
Isn't it possible to mix and match caching strategies?
I am not an expert at caching strategies, but let me try:
I guess it is possible but it also, highly depends on what you are doing in your DB and what kind of application you have. I guess if you find out what kind of application you are building up, then you will know, what strategy you have to use - i guess it is also not recommended to mix those strategies up, because those strategies are coupled to your application type - in other words: It will not work out pretty well.
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
I guess that both is possible. Usually you have one database, maybe clustered or synchronized with copies, to which your webservers (e.g. REST APIs) make their requests. Then whether each of you API servers would have it's own cache, to not query the database at all (in cloud-based applications your database is also maybe on another separated server - so another "hop" in terms of networking). OR (what i also can imagine) you have another middleware between your APIs (clusterd up) and your DB (maybe also clustered up) - but i guess that no one would do that because of the network traffic. It would result in a higher response-time, what you usually want to prevent.
Could you implement a cache using a normal (not in-memory) database?
Yes you could, but it would be way slower. A machine can access in-memory data faster then building up another (local) connection to a database and query your cached entries. Also, because your database has to write the entries into files on your machine, to persist the data.
Conclusion
All in all, it is all about being fast in terms of response times and to prevent much network traffic. I hope that i could help you out a little bit.
I'm working on a web app that receives data from an API provider. Now I need a way to cache the data to save from calling the provider again for the same data.
Then I stumbled on Redis which seems to serve my purpose but I'm not 100% clear about the concept of caching using Redis. I've checked their documentation but I don't really follow what they have to say.
Let's suppose I have just deployed my website to live and I have my first visitor called A. Since A is the first person that visits, my website will request a new set of data over API provider and after a couple seconds, the page will be loaded with the data that A wanted.
My website caches this data to Redis to serve future visitors that will hit the same page.
Now I have my second visitor B.
B hits the same page url as A did and because my website has this data stored in the cache, B is served from the cache and will experience much faster loading time than what A has experienced.
Is my understanding in line with with the concept of web caching?
I always thought of caching as per user basis so my interaction on a website has no influence or whatsoever to other people but Redis seems to work per application basis.
Yes, your understanding of web caching is spot on, but it can get much more complex, depending on your use case. Redis is essentially a key-value store. So, if you want application-level caching, your theoretical key/value pair would look like this:
key: /path/to/my/page
value: <html><...whatever...></html>
If you want user-level caching, your theoretical key would just change slightly:
key: visitorA|/path/to/my/page
value: <html><...whatever...></html>
Make sense? Essentially, there would be a tag in the key to define the user (but it would generally be a hash or something, not a plain-text string).
There are redis client libraries that are written for different web-development frameworks and content-management systems that will define how they handle caching (ie. user-specific or application-specific). If you are writing a custom web app, then you can choose application-level caching or user-level caching and do whatever else you want with caching.
It seems some web architects aim to have a stateless web application. Does that mean basically not storing user sessions? Or is there more to it?
If it is just the user session storing, what is the benefit of not doing that?
Reduces memory usage. Imagine if google stored session information about every one of their users
Easier to support server farms. If you need session data and you have more than 1 server, you need a way to sync that session data across servers. Normally this is done using a database.
Reduce session expiration problems. Sometimes expiring sessions cause issues that are hard to find and test for. Sessionless applications don't suffer from these.
Url linkability. Some sites store the ID of what the user is looking at in the sessions. This makes it impossible for users to simply copy and paste the URL or send it to friends.
NOTE: session data is really cached data. This is what it should be used for. If you have an expensive query which is going to be reused, then save it into session. Just remember that you cannot assume it will be there when you try and get it later. Always check if it exists before retrieving.
From a developer's perspective, statelessness can help make an application more maintainable and easier to work with. If I know a website I'm working on is stateless, I need not worry about things being correctly initialized in the session before loading a particular page.
From a user's perspective, statelessness allows resources to be linkable. If a page is stateless, then when I link a friend to that page, I know that they'll see what I'm seeing.
From the scaling and performance perspective, see tsters answer.
I'm starting to step into unfamiliar territory with regards to performance improvement and our RIA (Rich Internet Application) built with GWT. For those unfamiliar with GWT, essentially when deployed it's just pure JavaScript. We're interfacing with the server side using a REST-style XML web service via XMLHttpRequest.
Our XML is un-marshalled into JavaScript objects and used within the application to represent the data model behind the interface. When changes occur, the model is updated and marshalled back to XML and sent back to the server.
I've learned the number one rule of performance (in terms of user experience) is to make as few requests as possible. Obviously this brings up the possibility of caching. Caching is great for static data but things get tricky in a multi-user system where data on the server may be changing. Also, use of "Last-Modified" and "If-Modified-Since" requests don't quite do enough since we'd like to avoid unnecessary requests altogether.
I'm trying to figure out if caching data in the browser is even right for us before researching the approaches. I hope someone has tread this path before. I'm looking for similar approaches, lessons learned, things to avoid, etc.
I'm happy to provide more specific info if needed...
For GWT, if performance matters that much to you, you get better performance by sending all the data you need in a single request, instead of querying multiple small data. I would recommend against client-side data caching as there are lots of issues like keeping the data in sync with the database.
Besides, you already have a good advantage with GWT over traditional html apps. Unless you are dealing with special data (eg: does not become stale too quickly - implies mostly-read queries) I found out that there is no special need for caching. You are better off doing a service-layer caching, since most of the time should come of server-side processing.
If you can provide more details about the nature of the app, maybe some different conclusions can be taken.
With squid, we can cache webpages. I am not sure if it provides the same number of caching methods as ASP.NET caching (I primarily use ASP.NET), but it's a tool to cache webpages.
Then we have memcached, which can cache database tables. I believe this is correct, and it is like SqlCacheDependency (correct me if I am wrong).
However, is there any situation in a large web application where one would find room to use memcached, squid, AND ASP.NET (or PHP, JSP - application framework-level) caching.
Thanks!
You may find that caching entire pages is too coarsely-grained, and caching database tables doesn't get you enough of a boost, leaving a big need for caching chunks of stuff.
Say, for example, you had an application that showed the name of the logged-in user on every page. Caching entire pages wouldn't really work, so you need to drop down a level and cache somewhere within the app framework.
Then we have memcached, which can cache database tables. I believe this is correct, and it is like SqlCacheDependency (correct me if I am wrong).
Memcached is a distributed hashtable. The main benefits over the built in .NET caching is that your cache is scalable (you can add as many memcached boxen as you want) and synchronized (all your web servers have access to the same stuff, and invalidating or updating data from one web server is instantly propagated to all of them)
Performance-wise, it is worse than the .NET cache (you are looking up objects across servers, as opposed to an in-memory lookup on one machine)
However, is there any situation in a large web application where one would find room to use memcached, squid, AND ASP.NET (or PHP, JSP - application framework-level) caching.
For the reasons above, I can imagine a 2-level cache, using the .NET cache first, then memcached. (e.g. a Get() looks at memcached, stores the result in the .NET cache set to expire in 10 seconds, then uses the .NET cache for all the get calls with the same cache key during the next 10 seconds, rinse, repeat)
This way, you get the performance of the in-memory cache lookup without the network IO cost of a pure memcached solution, with the synchronization and scalability benefits of memcached.