Varnish how to cache on public cache-control header - caching

I just started updating my VCL and found a good boilerplate on the web:
https://github.com/mattiasgeniar/varnish-5.0-configuration-templates/blob/master/default.vcl
The only problem i have is that when i have a valid response header like Cache-Control:public, max-age=2592000 it is not being cached and it keeps returning me a MISS all the time. I found out that if i return unset beresp.http.set-cookie; in below method all the time it will work but ofcourse it probably will cover way too much.
What is the correct way in setting this config so that it only caches at the point where the HTTP header tells it to do so?
# Handle the HTTP response coming from our backend
sub vcl_backend_response {
# Called after the response headers has been successfully retrieved from the backend.
# Pause ESI request and remove Surrogate-Control header
if (beresp.http.Surrogate-Control ~ "ESI/1.0") {
unset beresp.http.Surrogate-Control;
set beresp.do_esi = true;
}
# Enable cache for all static files
# The same argument as the static caches from above: monitor your cache size, if you get data nuked out of it, consider giving up the static file cache.
# Before you blindly enable this, have a read here: https://ma.ttias.be/stop-caching-static-files/
if (bereq.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|otf|ogg|ogm|opus|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
unset beresp.http.set-cookie;
}
# Large static files are delivered directly to the end-user without
# waiting for Varnish to fully read the file first.
# Varnish 4 fully supports Streaming, so use streaming here to avoid locking.
if (bereq.url ~ "^[^?]*\.(7z|avi|bz2|flac|flv|gz|mka|mkv|mov|mp3|mp4|mpeg|mpg|ogg|ogm|opus|rar|tar|tgz|tbz|txz|wav|webm|xz|zip)(\?.*)?$") {
unset beresp.http.set-cookie;
set beresp.do_stream = true; # Check memory usage it'll grow in fetch_chunksize blocks (128k by default) if the backend doesn't send a Content-Length header, so only enable it for big objects
}
# Sometimes, a 301 or 302 redirect formed via Apache's mod_rewrite can mess with the HTTP port that is being passed along.
# This often happens with simple rewrite rules in a scenario where Varnish runs on :80 and Apache on :8080 on the same box.
# A redirect can then often redirect the end-user to a URL on :8080, where it should be :80.
# This may need finetuning on your setup.
#
# To prevent accidental replace, we only filter the 301/302 redirects for now.
if (beresp.status == 301 || beresp.status == 302) {
set beresp.http.Location = regsub(beresp.http.Location, ":[0-9]+", "");
}
# Set 2min cache if unset for static files
if (beresp.ttl <= 0s || beresp.http.Set-Cookie || beresp.http.Vary == "*") {
set beresp.ttl = 120s; # Important, you shouldn't rely on this, SET YOUR HEADERS in the backend
set beresp.uncacheable = true;
return (deliver);
}
# Don't cache 50x responses
if (beresp.status == 500 || beresp.status == 502 || beresp.status == 503 || beresp.status == 504) {
return (abandon);
}
# Allow stale content, in case the backend goes down.
# make Varnish keep all objects for 6 hours beyond their TTL
set beresp.grace = 6h;
return (deliver);
}

Related

Varnish - stale-while-revalidate doesn't appear to revalidate

We are using Varnish cache 6.2 to sit in front of our WebAPI backend.
The backend is sending a cache-control header back on certain requests, for things that we can cache for a bit longer.
However - should the backend go down, and stay down, we send stale-while-revalidate of an hour.
So a typical cache-control response header from our backend looks like:
public, max-age=30, stale-while-revalidate=3600
In our Varnish VCL we have added a routine that stops background fetch on certain errors. This is to stop the bad response from the backend from entering the cache:
sub vcl_backend_response {
if (beresp.status == 500 || beresp.status == 502 || beresp.status == 503 || beresp.status == 504)
{
if (bereq.is_bgfetch)
{
return (abandon);
}
set beresp.ttl = 1s;
}
}
The problem we are facing is simple - Varnish does not update the item in the cache after Max-Age expires, even though the backend is available. (And changes have occurred to the response)
We have seen issues where the responding "Age" header from Varnish exceeds 200s, with the wrong response. We have also seen cases where the "Age" header is 1-3s, which would indicate a background fetch (or normal fetch) has occurred.
This happens often enough that we notice it - but not on every request.
I have tried a simple "pass", such as the following in Varnish:
sub vcl_recv {
return(pass);
}
However, this appeared to have no effect.
Could there be anything else with Varnish setup that could cause the situation above?
EDIT, as per comment, this is a small thing we add to each sub that interacts with our request, to see what actually happened:
sub vcl_deliver {
if (obj.uncacheable) {
set req.http.x-cache = req.http.x-cache + " uncacheable" ;
} else {
set req.http.x-cache = req.http.x-cache + " cached" ;
}
set resp.http.x-cache = req.http.x-cache;
}
sub vcl_hit {
set req.http.x-cache = "hit";
}
That's the expected behavior. Once the object if fetched from the backend side for the first time (i.e. t=0), Varnish caches it setting beresp.ttl to 30s and beresp.grace to 3600s. Then, if you request the object to Varnish when t=3000, the old object will be delivered to the client side (i.e. Age: 3000) and an asynchronous background fetch will be triggered in order to refresh the cached object. If you request again the object to Varnish when t=3001, if the background fetch already completed its job, a fresh object will be delivered (i.e. Age: 1). The following test illustrates this behavior:
varnishtest "..."
server s1 {
rxreq
txresp -hdr "Cache-Control: public, max-age=1, stale-while-revalidate=60" \
-hdr "Version: 1"
rxreq
txresp -hdr "Cache-Control: public, max-age=1, stale-while-revalidate=60" \
-hdr "Version: 2"
} -start
varnish v1 -vcl+backend {
} -start
client c1 {
txreq
rxresp
expect resp.http.Version == 1
expect resp.http.Age == 0
delay 5.0
txreq
rxresp
expect resp.http.Version == 1
expect resp.http.Age == 5
delay 0.1
txreq
rxresp
expect resp.http.Version == 2
expect resp.http.Age == 0
} -run
varnish v1 -expect client_req == 3
In order to refresh the object synchronously once the item in the cache consumes its TTL, you need to play with req.grace during vcl_recv. You probably want to set it to 0s if the backend is healthy. Please, check https://varnish-cache.org/docs/trunk/users-guide/vcl-grace.html#misbehaving-servers for details.

What does varnish hit-for-pass

When we were running load test on our application that is behind a varnish 4.1 server, we remarked that after a server error (500 returned with Cache-Control: no-cache) we were experiencing a load pea on our backendk.
Afer diving in the varnish configuration we spot that line https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/builtin.vcl#L157
:
sub vcl_backend_response {
if (bereq.uncacheable) {
return (deliver);
} else if (beresp.ttl <= 0s ||
beresp.http.Set-Cookie ||
beresp.http.Surrogate-control ~ "no-store" ||
(!beresp.http.Surrogate-Control &&
beresp.http.Cache-Control ~ "no-cache|no-store|private") ||
beresp.http.Vary == "*") {
# Mark as "Hit-For-Miss" for the next 2 minutes
set beresp.ttl = 120s;
set beresp.uncacheable = true;
}
return (deliver);
}
If a page returns no-cache, it will not be cacheable for the next 2 minutes, even if the next call the the backend returns a valid cacheable response.
I can not figure out why this is the default behaviour (since a long time ago according to the repository history...)
In my case, an error in my backend generates a 500 no-cache, then leads to more traffic, and finally causes a 503...
I plan to remove this rule but I want to understand it before.
Any clue ?
Thanks in advance
M.
You might want to read
https://info.varnish-software.com/blog/hit-for-pass-varnish-cache
to understand hit-for-pass.

How to clear complete cache in Varnish?

I'm looking for a way to clear the cache for all domains and all URLs in Varnish.
Currently, I would need to issue individual commands for each URLs, for example:
curl -X PURGE http://example.com/url1
curl -X PURGE http://example.com/url1
curl -X PURGE http://subdomain.example.com/
curl -X PURGE http://subdomain.example.com/url1
// etc.
While I'm looking for a way to do something like
curl -X PURGE http://example.com/*
And that would clear all URLs under example.com, but also all URLs in sub-domains of example.com, basically all the URLs managed by Varnish.
Any idea how to achieve this?
This is my current VCL file:
vcl 4.0;
backend default {
.host = "127.0.0.1";
.port = "8080";
}
sub vcl_recv {
# Command to clear the cache
# curl -X PURGE http://example.com
if (req.method == "PURGE") {
return (purge);
}
}
With Varnish 4.0 I ended up implementing it with the ban command:
sub vcl_recv {
# ...
# Command to clear complete cache for all URLs and all sub-domains
# curl -X XCGFULLBAN http://example.com
if (req.method == "XCGFULLBAN") {
ban("req.http.host ~ .*");
return (synth(200, "Full cache cleared"));
}
# ...
}
Well, I suggest just restart varnish. It will purge all of the files because varnish keeps cache into memory.
Run: sudo /etc/init.d/varnish restart
Assuming no change of URL or internal cache key, for a full flush the simplest approach would be to restart Varnish, as it maintains its cache in memory.
If doing a quick restart is not acceptable, the BAN suggested by Rastislav is a great approach. It will need to stay active as long your longest TTL, so if you frequently need a full flush, the BAN list will be pretty much permanent as the ban lurker (which sweeps for BANs which no longer are relevant) may always think that your BAN is useful
So in your case, your VCL would be:
# Highly recommend that you set up an ACL for IPs that are allowed
# to make the BAN call
acl acl_ban {
"localhost";
"1.2.3.4"/32;
}
sub vcl_recv {
if (client.ip ~ acl_ban && req.method == "BAN") {
ban("req.http.host == " + req.http.host);
# Throw a synthetic page so the request won't go to the backend.
return(synth(200, "Ban added"));
}
}
However as noted by Carlos in the comments, this will actually create a lazy invalidation (and so only removed at request time). If you want to have the objects actually get purged by the background ban lurker every so often, you can instead do:
# Highly recommend that you set up an ACL for IPs that are allowed
# to make the BAN call
acl acl_ban {
"localhost";
"1.2.3.4"/32;
}
sub vcl_recv {
if (client.ip ~ acl_ban && req.method == "BAN") {
# see below for why this is obj. rather than req.
ban("obj.http.host == " + req.http.host);
# Throw a synthetic page so the request won't go to the backend.
return(synth(200, "Ban added"));
}
}
sub vcl_backend_response {
# add any portions of the request that would want to be able
# to BAN on. Doing it in vcl_backend_response means that it
# will make it into the storage object
set beresp.http.host = bereq.http.host;
}
sub vcl_deliver {
# Unless you want the header to actually show in the response,
# clear them here. So they will be part of the stored object
# but otherwise invisible
unset beresp.http.host;
}
Then to do the flush:
curl -X BAN http://example.com;
Purge all Varnish cache from command line (invalidate all the cache):
varnishadm "ban.url ." # Matches all URLs
Note: Command is purge.url in Varnish 2.x.
We can also ban by a hostname:
varnishadm "ban req.http.host == xxx.com"

Varnish Cache - Cache a 403 response

We're using varnish in front of a AWS S3 bucket and things have been running really well as we've had a 98.4% hit rate which has saved us from very large S3 bills!
Our applications now need to be able to make requests for files which may or may not exist yet. When this happens Varnish will make a request to S3 and receive a 403 (permission denied) response. We catch that response in the vcl_error function as it allows us to display a custom error message. Since we're expecting 400-500 requests per second with about 40% being for files which don't exist yet we will run into cost issues with S3.
My question is, is it possible to have Varnish remember that the file returned a 403 and return a cached response? I would like Varnish to wait 5 minutes before requesting the file from the backend. We're running Varnish 3.
I've read the documentation which appears to sugest I can use "set obj.ttl = 5m;" in the vcl_error function but this doesn't seem to work.
Thanks!
Alan
Yes, you can cache it. Just check status code of response from S3 and set ttl.
Varnish 3:
sub vcl_fetch {
if (beresp.status == 403 || beresp.status == 404 || beresp.status >= 500)
{
set beresp.ttl = 3s;
}
}
Varnish 4:
sub vcl_backend_response {
if (beresp.status == 403 || beresp.status == 404 || beresp.status >= 500)
{
set beresp.ttl = 3s;
}
}

How to return(pass) all hosts not in acl - Varnish

So here is what I am trying to accomplish. Im trying to get varnish working on a shared environment and I would like to set it up to where only domains within a vcl include get cached and the rest are simply passed. Here is what I am looking at:
include "/etc/varnish/whitelist.vcl";
if (req.http.host !~ vhosts) {
return(pass);
}
acl vhosts {
"domain.net";
"www.domain.net";
"...";
}
...
Now varnish tells me that this isnt possible:
Message from VCC-compiler:
Expected CSTR got 'vhosts'
(program line 940), at
('input' Line 11 Pos 30)
if (req.http.host !~ vhosts) {
-----------------------------######---
Running VCC-compiler failed, exit 1
VCL compilation failed
Now I know I can just do the following:
sub vcl_recv {
if (req.http.host == "domain1.com" ||
req.http.host == "domain2.com") {
return(pass);
}
}
But I really like the clean look of the first. Any ideas?
Unfortunately, we can't use ACL for HTTP Host header. It's for matching client address only
ghloogh is kindof right, except you can use std.ip() function to convert IP addresses to correct format for matching against acls. Still doesn't work with hostnames though.
But I suggest you use regular expressions instead of individual string matches, like so:
sub vcl_recv {
if (req.http.host ~ "^(domain1.com|domain2.com)$") {
return(pass);
}
}

Resources