Is a CDN an appropriate solution to cache query results? - ajax

I'm working on a little search engine, where I'm trying to find out how to cache query results.
These results are simple JSON text, retrieved using an ajax request.
Storing results in memory is not an option, I can see two options remaining:
Use a nosql database to retrieve cached results.
Store results on a CDN and redirect the http request (307 - Temporary Redirect) in case the result was already cached.
However, I don't have much experience with CDN, and wonder if using it for a huge amount of temporary small text files is a good practice.
Is it a good practice to use redirection on an ajax request?
Is a CDN an appropriate solution to cache small text files?

Short answer: no.
Long: Usually, you use a CDN for large static files that you want the CDN to mirror all around the world so it's close to a user when she requests them. When you have data that changes a lot, it would always take a while to propagate the changes to all nodes of the CDN, in the meantime users get inconsistent results (this may or may not matter to you).
Also, to avoid higher latency I wouldn't use an HTTP redirect (where you tell the client to make a second request to somewhere else) but rather figure out whether to get the data from the cache or the engine on your end (e.g. using a caching proxy or a load balancer) and then serve it directly to the client.

Related

Incremental updates using browser cache

The client (an AngularJS application) gets rather big lists from the server. The lists may have hundreds or thousands of elements, which can mean a few megabytes uncompressed (and some users (admins) get much more data).
I'm not planning to let the client get partial results as sorting and filtering should not bother the server.
Compression works fine (factor of about 10) and as the lists don't change often, 304 NOT MODIFIED helps a lot, too. But another important optimization is missing:
As a typical change of the lists are rather small (e.g., modifying two elements and adding a new one), transferring the changes only sounds like a good idea. I wonder how to do it properly.
Something like GET /offer/123/items should always return all the items in the offer number 123, right? Compression and 304 can be used here, but no incremental update. A request like GET /offer/123/items?since=1495765733 sounds like the way to go, but then browser caching does not get used:
either nothing has changed and the answer is empty (and caching it makes no sense)
or something has changed, the client updates its state and does never ask for changes since 1495765733 anymore (and caching it makes even less sense)
Obviously, when using the "since" query, nothing will be cached for the "resource" (the original query gets used just once or not at all).
So I can't rely on the browser cache and I can only use localStorage or sessionStorage, which have a few downsides:
it's limited to a few megabytes (the browser HTTP cache may be much bigger and gets handled automatically)
I have to implement some replacement strategy when I hit the limit
the browser cache stores already compressed data which I don't get (I'd have to re-compress them)
it doesn't work for the users (admins) getting bigger lists as even a single list may already be over limit
it gets emptied on logout (a customer's requirement)
Given that there's HTML 5 and HTTP 2.0, that's pretty unsatisfactory. What am I missing?
Is it possible to use the browser HTTP cache together with incremental updates?
I think there is one thing you are missing: in short, headers. What I'm thinking you could do and that would match (most) of your requirements, would be to:
First GET /offer/123/items is done normally, nothing special.
Subsequents GET /offer/123/items will be sent with a Fetched-At: 1495765733 header, indicating your server when the initial request has been sent.
From this point on, two scenarios are possible.
Either there is no change, and you can send the 304.
If there is a change however, return the new items since the time stamp previously sent has headers, but set a Cache-Control: no-cache from your response.
This leaves you to the point where you can have incremental updates, with caching of the initial megabytes-sized elements.
There is still one drawback though, that the caching is only done once, it won't cache updates. You said that your lists are not updated often so it might already work for you, but if you really want to push this further, I could think of one more thing.
Upon receiving an incremental update, you could trigger in the background another request without the Fetched-At header that won't be used at all by your application, but will just be there to update your http cache. It should not be as bad as it sounds performance-wise since your framework won't update its data with the new one (and potentially trigger re-renders), the only notable drawback would be in term of network and memory consumption. On mobile it might be problematic, but it doesn't sounds like an app intended to be displayed on them anyway.
I absolutely don't know your use-case and will just throw that out there, but are you really sure that doing some sort of pagination won't work? Megabytes of data sounds a lot to display and process for normal humans ;)
I would ditch the request/response cycle entirely and move to a push model.
Specifically, WebSockets.
This is the standard technology used on financial trading websites serving tables of real-time ticker data. Here is one such production application demonstrating the power of WebSockets:
https://www.poloniex.com/exchange#btc_eth
WebSocket applications have two types of state: global and user. The above link will show three tables of global data. When you're logged in, two aditional tables of user data are displayed at the bottom.
This is not HTTP; you won't be able to just slap this into a Java Servlet. You'll need to run a separate process on your server which communicates over TCP. The good news is, there are mature solutions readily available. A Java-based solution with a very decent free licensing option, which includes both client and server APIs (and does integrate with Angular2) is Lightstreamer. They have a well-organized demo page too. There are also adapters available to integrate with your data sources.
You may be hesitant to ditch your existing servlet approach, but this will be less headaches in the long run, and scales marvelously. HTTP polling, even with well-designed header-only requests, do not scale well with large lists which update frequently.
---------- EDIT ----------
Since the list updates are infrequent, WebSockets are probably overkill. Based on the further details provided by comments on this answer, I would recommend a DOM-based, AJAX-updated sorter and filterer such as DataTables, which has some built-in options for caching. In order to reuse client data across sessions, ajax requests in the previous link should be modified to save the current data in the table to localStorage after every ajax request, and when the client starts a new session, populate the table with this data. This will allow the plugin to manage the filtering, sorting, caching and browser-based persistence.
I'm thinking about something similar to Aperçu's idea, but using two requests. The idea is yet incomplete, so bear with me...
The client asks for GET /offer/123/items, possibly with the ETag and Fetched-At headers.
The server answers with
200 and a full list if either header is missing, or when there are too many changes since the Fetched-At timestamp
304 if nothing has changed since then
304 and a special Fetch-More header telling the client that more data is to be fetched otherwise
The last case is violating how HTTP should work, but AFAIK it's the only way letting the browser cache everything what I want it to cache. Since the whole communication is encrypted, proxies can't punish me for violating the spec.
The client reacts to Fetch-Errata by requesting GET /offer/123/items/errata. This way, the resource has got split into two requests. The split is ugly, but an angular $http interceptor can hide the ugliness from the application.
The second request is cacheable, too, and there can be also a Fetched-At header. The details are unclear, but some strong handwavium makes me believe that it can work. Actually, the errata could itself be inaccurate but still useful and get an errata itself.... etc.
With HTTP/1.1, more requests may mean more latency, but having a couple of them should still be profitable because of the saved bandwidth. The server can decide when to stop.
With HTTP/2, multiple requests could be send at once. The server could be make to handle them efficiently as it knows that they belong together. Some more handwavium...
I find the idea strange, but interesting and I'm looking forward to comments. Feel free to downvote me, but please leave an explanation.

Should requests contain unnecessary parameters which are sent if manually browsing the application

I'm currently testing a asp.net application. I have recorded all the steps i need and i have noticed that if i remove some of the parameters that i'm sending with the request the scripts still work and the desired outcome still happens. Anyway i couldn't find difference in the response time with them or without them, and i was wondering can i remove those parameters which are not needed and is this going to impact the performance in any way? I understand that the most realistic way of executing the scripts should be to do it like a normal user does (send all which is sent with normal usage) but this would really improve the readability of my scripts, any idea?
Thank you in advance and here is a picture which shows for example some parameters which i can remove and the scripts still work this is from a document management system and i'm performing step which doesn't direct the document as the parameters say but the normal usage records those :
Although it may be something very trivial like pre-populating date and time in calendar in user's time zone I believe you shouldn't be omitting any request parameters.
I strongly believe that load testing should mimic real user as close as possible so if it is not a big deal to send these extra parameters and perform their correlation - I would leave them.
Few other tips:
Embedded Resources (scripts, styles, images). Real-browsers download these entities so
Make sure you have "Retrieve All Embedded Resources" box checked
Make sure you "Use concurrent pool" size 3-5 threads
Filter out any "external" stuff via "URLs must match" input
Well-behaved browsers download embedded resources but do it only once. On subsequent requests they're being returned from browser's cache. Add HTTP Cache Manager to your Test Plan to simulate browser cache.
Add HTTP Cookie Manager to represent browser cookies and deal with cookie-based authentication.
See How To Make JMeter Behave More Like A Real Browser article for above tips explained just in case you want to dive into details
Less data to send, faster response time (normally).
Like you said, it's more realistic to test with all data from the recorded case, but if these parameters really doesn't impact your result and measured time, you can remove them for a better readability.
Sometimes jmeter records not necessary parameters because they are only needed for brower compability.

Large number of concurrent ajax calls and ways to deal with it

I have a web page which, upon loading, needs to do a lot of JSON fetches from the server to populate various things dynamically. In particular, it updates parts of a large-ish data structure from which I derive a graphical representation of the data.
So it works great in Chrome; however, Safari and Firefox appear to suffer somewhat. Upon the querying of the numerous JSON requests, the browsers become sluggish and unusable. I am under the assumption that this is due to the rather expensive iteration of said data structure. Is this a valid assumption?
How can I mitigate this without changing the query language so that it's a single fetch?
I was thinking of applying a queue that could limit the number of concurrent Ajax queries (and hence also limit the number of concurrent updates to the data structure)... Any thoughts? Useful pointers? Other suggestions?
In browser-side JS, create a wrapper around jQuery.post() (or whichever method you are using)
that appends the requests to a queue.
Also create a function 'queue_send' that will actually call jQuery.post() passing the entire queue structure.
On server create a proxy function called 'queue_receive' that replays the JSON to your server interfaces as though it came from the browser, collects the results into a single response, sends back to browser.
Browser-side queue_send_success() (success handler for queue_send) must decode this response and populate your data structure.
With this, you should be able to reduce your initialization traffic to one actual request, and maybe consolidate some other requests on your website as well.
in particular, it updates parts of a largish data structure from which i derive a graphical representation of the data.
I'd try:
Queuing responses as they come in, then update the structure once
Hiding the representation invisible until the responses are in
Magicianeer's answer is also good - but I'm not sure if it fits your definition of "without changing the query language so that it's a single fetch" - it would avoid re-engineering existing logic.

Combining apache requests?

I have an ajax client that often needs to retrieve 3-10 static documents from the server. Those 3-10 documents are selected by the client from roughly 100 documents in total. I have no way of knowing in advance which 3-10 documents the client will require.
Additionally, those 100 documents are generated from database content, and are dynamic.
It does not seem logical though to make a single ajax requests per document.
My first thought was to write a JSP that makes use of the include action.
ie in pseudo code:
for (param in params){
jsp:include page="[param]"
}
Tomcat can't support this as it doesn't just include the html resource, it re-compiles it, generating a class file each time, which also seems costly .
Can the community provide a solution for combining apache requests to static files to make use of single requests, rather than several, but without the overhead of extra class files for each of the static files and in way that avoids the regeneration each time the static file changes?
You can certainly write a Servlet (or JSP if you're really insistent on doing so) which will serve the contents of a set of documents.
However, doing so would encounter some of the following issues
How do you delimit the documents so that the clients know which is which
How to stop the clients from requesting something they don't have permission to (this one is tricky)

How do I get around the Twitter API caching problem?

I'm building a Twitter app that requires to check user data somewhat frequently, but I'm facing trouble with a cache that's oddly on Twitter's side, not mine.
Try the following user:
users/show in XML: http://twitter.com/users/show.xml?screen_name=technolocus
users/show in JSON: http://twitter.com/users/show.json?screen_name=technolocus
normal page: http://twitter.com/technolocus
All these methods of accessing data should return the same values, right? Check the statuses_count for each of them.
XML: 12548
JSON: 12513
normal: 12498
The normal method (i.e. just visiting the profile non-programatically) serves up the most correct value of 12498. If I post or delete tweets to this account, it gets updated on the profile page instantly, but the XML and JSON methods still return cached data.
At this point, the values of the XML and JSON methods are 12 to 18 hours old respectively.
I first tried to access these methods from my website (hosted on Dreamhost). I thought it was Dreamhost caching the responses. Then I tried to access the API directly from my browser. I did a cURL from the command line from my machine after that. It wasn't dreamhost. I thought it was probably my ISP (I think they use NetApp or something like that). Then I asked a friend in another corner of India to try it. He's getting the exact same cached responses as I am.
So it isn't Dreamhost's cache; it isn't my ISP or my country's cache. There's only one conclusion - Twitter is caching responses.
How in the heavens do I get around this?!?
Forgot to mention this: The script on the server is in PHP and is using cURL to retrieve the XML and JSON data from Twitter, while the local tests have been just using the browser. Both have the exact same result!
First, I think you should report this a a bug to Twitter. I see the same discrepancy as you, and no matter what that seems like a bug. Even if they're caching, I'd expect that a cache on their side would store an abstract form that would then be rendered into HTML, JSON, and XML. I wonder if what's actually going on is that these requests are performing similar but different queries.
Are you sure that the values are "old"? For example, did you actually delete about 50 updates recently (since you say the HTML one is newest but shows a lower count than the other two)? If you create another update do you see the HTML number increment while the other numbers stay the same, or do they all increment simultaneously?
If what you are saying is accurate, and it probably is, generally, you can't get around it. Twitter would want to be caching its responses since they are costly to reproduce every single time.
When you use Twitter's APIs, you end up being bound by its conventions, even if that includes caching.
Your best bet is to tweet to #twitterapi and get them to give you a response as to why the two representations are divergent.
Add ?blah=xxxx to all urls.
I don't develop anything against twitter and ocassionaly manually "follow" three tweets by going to them in my browser. They always lag behind by half a day. I add ?asdsadsadsad to the url (everytime something different) and it always updates. I don't know what Twitter is doing here and came here while searching for the problem. But I guess this trick of appending a random value to the url via GET will probably work for your api requests, too.

Resources