Automatically rebuild cache - performance

I run a Symfony 1.4 project with very large amount of data. The main page and category pages are using pagers which need to know how much rows are available. I'm passing a query which contains joins to the pager which leads to a loading-time of 1 minute on these pages.
I configured cache.yml for the respective actions. But I think the workaround is insufficient and here are my assumptions:
Symfony rebuilds the cache within a single request which is made by a user. Let's call this user "cache-victim" to simplify things.
In our case, the data needs to be up-to-update - a lifetime of 10 minutes would be sufficient. Obviously, the cache won't be rebuilt, if no user is willing to be the "cache-victim" and therefore just cancels the request. Are these assumptions correct?
So, I came up with this idea:
Symfony should fake the http-request after rebuilding the cache. The new cache-entries should be written on a temporary file/directory and should be swapped with the previous cache-entries, as soon as cache rebuilding has finished.
Is this possible?
In my opinion, this is similar to the concept of double buffering.
Wouldn't it be silly, if there was a single "gpu-victim" in a multiplayer game who sees the screen building up line by line? (This is a lop-sided comparison, I know ... ;) )
Edit
There is no "cache-victim" - Every 10 minutes page reloading takes 1 minute for every user.

I think your problem is due to some missing or wrong indexes. I've a sf1.4 project for a large soccer site (i.e. 2M pages/day) and pagers aren't going so slow even if our database has more than 1M rows these days. Take a look at your query with EXPLAIN and check where it is going bad...

Sorry for necromancing (is there a badge for that?).
By configuring cache.yml you are just caching the view layer of your app (that is, css, js and html) for REQUESTS WITHOUT PARAMETERS. Navigating the pager obviously has a ?page=X on the GET request.
Taken from symfony 1.4 config.yml documentation:
An incoming request with GET parameters in the query string or submitted with the POST, PUT, or DELETE method will never be cached by symfony, regardless of the configuration. http://www.symfony-project.org/reference/1_4/en/09-Cache
What might help you is to cache the database results, but its a painful process on symfony/doctrine. Refer to:
http://www.symfony-project.org/more-with-symfony/1_4/en/08-Advanced-Doctrine-Usage#chapter_08_using_doctrine_result_caching
Edit:
This might help you as well:
http://www.zalas.eu/symfony-meets-apc-alternative-php-cache

Related

Incremental updates using browser cache

The client (an AngularJS application) gets rather big lists from the server. The lists may have hundreds or thousands of elements, which can mean a few megabytes uncompressed (and some users (admins) get much more data).
I'm not planning to let the client get partial results as sorting and filtering should not bother the server.
Compression works fine (factor of about 10) and as the lists don't change often, 304 NOT MODIFIED helps a lot, too. But another important optimization is missing:
As a typical change of the lists are rather small (e.g., modifying two elements and adding a new one), transferring the changes only sounds like a good idea. I wonder how to do it properly.
Something like GET /offer/123/items should always return all the items in the offer number 123, right? Compression and 304 can be used here, but no incremental update. A request like GET /offer/123/items?since=1495765733 sounds like the way to go, but then browser caching does not get used:
either nothing has changed and the answer is empty (and caching it makes no sense)
or something has changed, the client updates its state and does never ask for changes since 1495765733 anymore (and caching it makes even less sense)
Obviously, when using the "since" query, nothing will be cached for the "resource" (the original query gets used just once or not at all).
So I can't rely on the browser cache and I can only use localStorage or sessionStorage, which have a few downsides:
it's limited to a few megabytes (the browser HTTP cache may be much bigger and gets handled automatically)
I have to implement some replacement strategy when I hit the limit
the browser cache stores already compressed data which I don't get (I'd have to re-compress them)
it doesn't work for the users (admins) getting bigger lists as even a single list may already be over limit
it gets emptied on logout (a customer's requirement)
Given that there's HTML 5 and HTTP 2.0, that's pretty unsatisfactory. What am I missing?
Is it possible to use the browser HTTP cache together with incremental updates?
I think there is one thing you are missing: in short, headers. What I'm thinking you could do and that would match (most) of your requirements, would be to:
First GET /offer/123/items is done normally, nothing special.
Subsequents GET /offer/123/items will be sent with a Fetched-At: 1495765733 header, indicating your server when the initial request has been sent.
From this point on, two scenarios are possible.
Either there is no change, and you can send the 304.
If there is a change however, return the new items since the time stamp previously sent has headers, but set a Cache-Control: no-cache from your response.
This leaves you to the point where you can have incremental updates, with caching of the initial megabytes-sized elements.
There is still one drawback though, that the caching is only done once, it won't cache updates. You said that your lists are not updated often so it might already work for you, but if you really want to push this further, I could think of one more thing.
Upon receiving an incremental update, you could trigger in the background another request without the Fetched-At header that won't be used at all by your application, but will just be there to update your http cache. It should not be as bad as it sounds performance-wise since your framework won't update its data with the new one (and potentially trigger re-renders), the only notable drawback would be in term of network and memory consumption. On mobile it might be problematic, but it doesn't sounds like an app intended to be displayed on them anyway.
I absolutely don't know your use-case and will just throw that out there, but are you really sure that doing some sort of pagination won't work? Megabytes of data sounds a lot to display and process for normal humans ;)
I would ditch the request/response cycle entirely and move to a push model.
Specifically, WebSockets.
This is the standard technology used on financial trading websites serving tables of real-time ticker data. Here is one such production application demonstrating the power of WebSockets:
https://www.poloniex.com/exchange#btc_eth
WebSocket applications have two types of state: global and user. The above link will show three tables of global data. When you're logged in, two aditional tables of user data are displayed at the bottom.
This is not HTTP; you won't be able to just slap this into a Java Servlet. You'll need to run a separate process on your server which communicates over TCP. The good news is, there are mature solutions readily available. A Java-based solution with a very decent free licensing option, which includes both client and server APIs (and does integrate with Angular2) is Lightstreamer. They have a well-organized demo page too. There are also adapters available to integrate with your data sources.
You may be hesitant to ditch your existing servlet approach, but this will be less headaches in the long run, and scales marvelously. HTTP polling, even with well-designed header-only requests, do not scale well with large lists which update frequently.
---------- EDIT ----------
Since the list updates are infrequent, WebSockets are probably overkill. Based on the further details provided by comments on this answer, I would recommend a DOM-based, AJAX-updated sorter and filterer such as DataTables, which has some built-in options for caching. In order to reuse client data across sessions, ajax requests in the previous link should be modified to save the current data in the table to localStorage after every ajax request, and when the client starts a new session, populate the table with this data. This will allow the plugin to manage the filtering, sorting, caching and browser-based persistence.
I'm thinking about something similar to Aperçu's idea, but using two requests. The idea is yet incomplete, so bear with me...
The client asks for GET /offer/123/items, possibly with the ETag and Fetched-At headers.
The server answers with
200 and a full list if either header is missing, or when there are too many changes since the Fetched-At timestamp
304 if nothing has changed since then
304 and a special Fetch-More header telling the client that more data is to be fetched otherwise
The last case is violating how HTTP should work, but AFAIK it's the only way letting the browser cache everything what I want it to cache. Since the whole communication is encrypted, proxies can't punish me for violating the spec.
The client reacts to Fetch-Errata by requesting GET /offer/123/items/errata. This way, the resource has got split into two requests. The split is ugly, but an angular $http interceptor can hide the ugliness from the application.
The second request is cacheable, too, and there can be also a Fetched-At header. The details are unclear, but some strong handwavium makes me believe that it can work. Actually, the errata could itself be inaccurate but still useful and get an errata itself.... etc.
With HTTP/1.1, more requests may mean more latency, but having a couple of them should still be profitable because of the saved bandwidth. The server can decide when to stop.
With HTTP/2, multiple requests could be send at once. The server could be make to handle them efficiently as it knows that they belong together. Some more handwavium...
I find the idea strange, but interesting and I'm looking forward to comments. Feel free to downvote me, but please leave an explanation.

Parse Server - Saving objects with many fields - Schema Validation takes too long (enforceFieldExists)

We're using ParseServer to migrate a CloudCode based application to Heroku.
Using versions:
parse#1.8.5
parse-server#2.2.16
We noticed (its hard not to notice) that saving some is unreasonably slow. These objects are typically saved a few at a time - between 2 to 6 objects (using an Parse.Object.saveAll which fires a REST call to /1/batch
Saving each of these objects now takes anything between 4 to 12 seconds. Digging into Parse code, it was easy to see that schema validation is the cause.
SchemaController.validateObject() {
...
SchemaController.enforceFieldExists()
We are using triggers for simple validation, but as per the logic in RestWrite.js this causes schema validation to be executed twice - once before trigger and once after.
The problem lies in that our collection has about 40 fields. SchemaController.enforceFieldExists() loads the entire schema twice while attempting to validate each field. Moreover, it always attempts to write to the schema document (again, for each field), only to fail usually because all fields are already listed in the schema.
this means that we get an overhead of about 240 round trips to the database for each object, and we store up to 5 objects typically in each invocation. that adds up to over 1000 round trips to the database. so we easily go beyond the Heroku router timeout limit of 30 seconds.
My questions are:
Is there anything I can do to speed up this validation? (did not find documentation or settings for that)
Is there a fix for this redundant implementation planned or available anywhere?
Can I safely castrate enforceFieldExists() to do nothing without anything else breaking on me assuming we don't add fields often? What is this collection (_SCHEMA) used for other tan to draw the tables in Dashboard UI?
I'm currently thinking about patching this function to do nothing with an npm postinstall script. Does that sound like a good approach?
Appreciate any help on this,
Ron
This is being fixed with this pull request https://github.com/ParsePlatform/parse-server/pull/2286
and that line
https://github.com/ParsePlatform/parse-server/pull/2286/files#diff-7d0dd667d7bdafd6ebee06cf70139fa0R555
This will skip trying to write the schema is the current field is available.
This should be released soon

Varnish: purge cache every time user hits "like" button

I need to implement like/dislike functionality (for anonymous users so there is no need to sign up). Problem is that content is served by Varnish and I need to display actual number of likes.
I'm wondering how it's done on website like stackoverflow. Assuming pages are cached in Varnish (for anonymous users only), so every time user votes on answer/question, page needs to be purged from cache. Am I right? Current number of votes needs to be visible for other users.
What is good approach in this situation? Should I send PURGE to Varnish every time user hits "like" button?
A common way of implementing this is to do the like button and display client side in Javascript instead. This avoids the issue slightly.
Assuming that pressing Like leads to a POST request hitting a single Varnish server, you can make the object be invalidated/replaced in different ways. Using purge and a VCL restart is most likely the better way to do this.
Of course there is a slight race here, where other clients will be served the old page while this is ongoing.

Codeigniter cache page

I have a site developed in codeigniter.
In the page search I have a form that when I compile It I send a request to a servere with CURL and return me an xml.
This query and the print date is about 15seconds because I have to make more query with many server and this time is necessary.
But the problem is: I have a list of element, when I click on an element I make a query to retrieve the data of the element.
But if I click back or click to go back to all element searched I don't want to make an other query that takes 15second.
When I search the element I have a get request and I have a link like this:
http://myurl/backend/hotel/hotel_list?nation=94&city=1007&check-in=12%2FApr%2F2013&check-out=13%2FApr%2F2013&n_single_rooms=1&n_double_rooms=0&n_triple_rooms=0&n_extra_beds=0
I load the page and I can have more elements. i click on some of this in a simple link like this:
http://myurl/backend/hotel/hotel_view?id_service=tra_0_YYW
When I enter into this page I have to go back to the previous url (the first) without remake the query that takes more seconds.
I can't cache the result because is a realtime database and change every minutes or second but I thinked to cache the page search when I enter on it and if i go back to it reload from cache if the time is minor than 2 minutes for example.
Is this a good way or there is a more perfmormant way to do this in codeigniter?
I can't put in session because there is large data.
The other solution are:
- cache page (but every minutes I have to delete it)
- cache result (but every minutes I have to delete it)
- create sessionflashdata (but I have a large amount of data)
is there a way with the browser when I go back to don't remake the page?
Thanks
cache page (but every minutes I have to delete it)
I think you can easily implement it with codeigniter's page caching function "$this->output->cache(1);"
cache result (but every minutes I have to delete it)
You will have to use codeigniter's object caching method to implement it.
create sessionflashdata (but I have a large amount of data)
Its not a good idea to save huge data in session. Rather use 'database session' instead, which will help you handling similar way and codeigniter has its integrated support.
Hope this helps. You can read more about all kind of codeigniter caching if you are just starting with it.

Invalidating cached category pages (page1, page2 etc.) when new post is added?

Let's imagine that we have blog with category A. Category A is currently having 1000 posts on 100 pages. All pages are cached in files (for example, cached by Smarty template engine). I'm adding post and want it to be displayed on first page immediately. So, I have to clear or invalidate cache for all 100 pages of category A.
Deleting cached pages is not a good idea because we can have too much files (for example, thousands of pages). I think that invalidating cache and regenerating page on request is much more efficient way.
My only thought is to add number of posts in category to cache id. So, first we should get number of posts in category (for example, from memcache) and then check if cached version valid by this number.
Everything looks fine and simple. But let's imagine situation when I'm adding new post and then after 1 minute I'm removing another (older) post. Number of posts still 1000 and some category pages will stay old (if they were not viewed during this 1 minute).
What is the solution?
PS: Sorry for my English, but I think that my question will be clear from people who have already faced such problem.
Thank you
Number of posts is not a good solution because when you edit some post you would want to refresh cache as well.
Couple of strategies I can think of:
Use time when a change was made as a reference.
When new post is added (removed, edited) - store current timestamp in a category, lets call it cache_threshold. When a page is requested - check when this page was cached. If it is older than our threshold - page needs to be regenerated.
Switch to object caching rather than page caching.
Instead of caching whole pages, you can cache each individual post. If new post is added (removed, edited) you would just immediately regenerate its cache as it is not time consuming. In order to display the page you would just need to grab a required amount of cached posts and display them.
This solution requires more work but it is more flexible and effective.

Resources