Trouble getting Varnish to cache - caching

We are hosting a major tennis tournament website and are trying to use Varnish on Rackspace to help with the traffic we anticipate. We have hired too systems consultants to help install Varnish on our cloud servers, but for whatever reason they are not able to get Varnish to work with our scripts. A typical script can be found here:
162.242.140.232/scoring/DemoGetOOP.php
There is nothing special about this script. It doesn't have any special caching commands in the headers and doesn't use session control. You can see by the date/time at the bottom that we have for testing purposes, that the page is not being cached. We set up a timer page which is cached:
162.242.140.232/scoring/timer.php
and also an info.php page at:
162.242.140.232/scoring/info.php
What's odd, is that if you first go to the timer.php, you can see it's cached for 10 seconds. However, if you then run our DemoGetOOP.php script and go back to timer.php, it's no longer cached. We have to clear the cache again or open up a private browser window to see the caching.
if (req.url ~ "^/scoring/DemoGetOOP.php") and
if (req.url ~ "/scoring/DemoGetOOP.php")
any help would be greatly appreciated!
S

First of all i would start with setting correct cache headers, i would prefer the Cache-Control header. The DemoGetOOP script also send a cookie, whereby Varnish will pass caching.
I would suggest to check the varnishlog which will give you a clear insight in why Varnish decides to cache or not.

They seem to be working fine to me, first link has a ttl of 120 seconds and the second link has a ttl of 10 seconds and both are working just fine.
I'd say always double check the cookies when things seem to be not workning.

Related

Varnish cache doesn't hit first time

I just learned and implemented varnish reverse proxy to increase my website speed.
Everything works fine but something minor bothers me.
For some reason, when I check page TTFB for the first time, I get .999s, however, when I rerun the test the number drops to .237s.
I use the following website to check TTFB:
https://www.webpagetest.org
and my website is:
https://www.findfestival.com/
It makes me wonder if the first request to the website hits the cache. When I use curl I can see x-varnish but still it's strange that first time clicking on links are slower compared to the second time clicking on them. (specifically on mobile)
Can you please help me understand why first time Varnish cache doesn't hit?
This is my default.vcl is:
Thanks,
PS, I have seen this post and already tried the solution with no luck!
Varnish Cache first time hit
Seeing how you have X-Mod-Pagespeed in your headers and minimalistic VCL, the conclusion is that you need to take a look at Downstream Caching and make sure that PageSpeed would not send Cache-Control: max-age=0, no-cache which breaks Varnish caching for the most part.
In my own experience, Pagespeed does not play well with Varnish even with downstream caching configuration applied.
It "loves" to send the aforementioned header no matter what. Even if you manage to turn this behaviour off, it results in PageSpeed's own assets not having proper Cache-Control headers plus a few more interesting issues like causing Varnish "hit-for-pass" when rebeaconing has to take place - which is really bad and breaks caching further.
Also have a look at possible configurations. You might want to put PageSpeed at your SSL terminator level (option #1) - that way you don't need Downstream Cache configuration and PageSpeed will be "in front" of Varnish.

CloudFlare permanently caches page even though cookies are set

I had this problem where CloudFlare wouldn't cache any of my pages because all of my pages returned session cookies. I've fixed this with a own written method, which removes unnecessary cookies from my response header. It's based on the same idea used and described here https://github.com/HaiFangHui/sessionmonster.
Currenlty i'm having this situation which is driving me bananas and i was hoping someone could help me out a little bit with your expertise about this subject.
I'm having this problem that after i login within my site after CloudFlare had it's chance to cache the page in a previous request... It will do that permanently untill the Edge TTL time expires.
Since the official CloudFlare documentation states it will not cache a page if it contains cookies i was hoping that after a succesfull login attempt it will serve a live/personalized version of the page. But so it seems that is not the case.
Does somebody know if this is normal? Of course i'm interested in knowing a way to circumvent this. I'm wondering how other sites solves this problem. My assumption would be i wouldn't be the first one having this issue.
Any advice regarding this subject would be greatly appreciated.
So it seems like supplying a "Bypass cache" setting is the solution.
It's only available on a paid plan.
More info: https://blog.cloudflare.com/caching-anonymous-page-views/

Scrapy persistent cache

We need to be able to re-crawl historical data. Imagine today is 23rd of June. We crawl a website today but after a few days we realize we have to re-crawl it, "seeing" it exactly as it was on 23rd. That means, including all possible redirects, GET and POST requests etc. ALL the pages the spider sees, should be exactly as they were on 23rd, no matter what.
Use-case: if there is a change in the website, and our spider is unable to crawl something, we want to be able to get back "in the past" and re-run the spider after we fix it.
Generally, this should be quite easy - subclass the standard Scrapy's cache, force it to use dates for subfolders and have something like that:
cache/spider_name/2015-06-23/HERE ARE THE CACHED DIRS
but when I was experimenting with this, I realized sometimes the spider crawls the live website. That means, it doesn't take some pages from the cache (though the appropriate files exist on the disk) but instead it takes them from the live website. It happened with pages with captchas, in particular, but maybe with some other ones.
How can we force Scrapy to always take the page from the cache, not hitting the live website at all? Ideally, it should even work with no internet connection.
Update: we've used the Dummy policy and HTTPCACHE_EXPIRATION_SECS = 0
Thank you!
To do exactly what you want you should had this in your settings:
HTTPCACHE_IGNORE_MISSING = True
Then if enabled, requests not found in the cache will be ignored instead of downloaded.
When you are setting :
HTTPCACHE_EXPIRATION_SECS = 0
It only assure you that "cached requests will never expire" , but if a page isn't in your cache, then it will be download.
You can check the documentation.

How do I know if my page is being cached?

I have a WordPress site that is doing a few weird things, and I believe it is because it is being cached. I changed the contents of a CSS stylesheet file, and the change took around 10 minutes before it appeared live.
I can't however find any caching mechanism setup. I've looked through cPanel and can't see anything setup there. The IP of the site resolves to the IP that cPanel is showing.
I've looked for plugins in WordPress and can't see any caching plugins (although if it was a caching plugin, would accessing a stylesheet be cached?).
Any tips on how I can see if the page is being cached on the server or by a plugin?
Put a JavaScript bug on the page which crafts a random URL and requests it. Compare the number of page requests to random URL requests. But there are lots of scenarios where a browser can cache a page in the absence of caching information.
If your website is behind Cloud Flare network or such, this is normal behavior.
Try running next command (Windows Command prompt/Linux terminal):
ping www.yoursite.com
and visit resolved IP address in browser - this may tell you if you are behind caching network.
Take a look at this article: http://www.mobify.com/blog/beginners-guide-to-http-cache-headers/

Clear all website cache?

Is it possible to clear all site cache? I would like to do this when the user logs out or the session expires instead of instructing the browser not to cache on each request.
As far as I know, there is no way to instruct the browser to clear all the pages it has cached for your site. The only control that you, as a website author, have over caching of a page occurs when the browser tries to access that page. You can specify that cached versions of your pages should expire at a certain time using the Expires header, but even then the browser won't actually clear the page from its cache at that time.
i certainly hope not - that would give the web site destructive powers over the client machine!
If security is your main concern here, why not use HTTPS? Browsers don't cache content received via HTTPS (or cache it only in memory).
One tricky way to mimic this would be to include the session-id as a parameter when referencing any static piece of content on the site. When the user establishes the session, the browser will recognize all the pieces of content as new due to the inclusion of this parameter. For the duration of the session the browser will used the static content in its cache. After the user logs out and logs back in again, the session-id parameter for the static contents will be different, so the browser will recognize this is as completely new content and will download everything again.
That being said... this is a hack and I wouldn't recommend pursuing it.. For what reason do you want the user's cache to be cleared after their session expires? There's probably a better solution that can fit your situation as opposed to what you are currently asking for.
If you are talking about asp.net cache objects, you can use this:
For Each elem As DictionaryEntry In Cache
Cache.Remove(elem.Key)
Next
to remove items from the cache, but that may not be the full-extent of what you are trying to accomplish.

Resources