Varnish - How to cache logged-in users

Varnish - How to cache logged-in users - caching

I installed varnish and everything works OK.
However, I have a need to cache logged-in users. This is what I have in my VCL:
backend default {
.host = "127.0.0.1";
.port = "8080";
}
sub vcl_recv {
unset req.http.Cookie;
if (req.http.Authorization || req.http.Cookie) {
return (lookup);
}
return (lookup);
}
sub vcl_fetch {
unset beresp.http.Set-Cookie;
set beresp.ttl = 24h;
return(deliver);
}
The above works but users can view other users data e.g. Say I'm logged in as Sam and access page A. When another user, say Angie, logins in and opens page A, she sees the same content as Sam.
Is there a way I can restrict pages to logged-in users who actually are authorized to view that page?
My request header is as follows:
Request Headersview source
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding gzip, deflate
Accept-Language en-US,en;q=0.5
Authorization Basic YWRtaW46YWRtaW4=
Connection keep-alive
Cookie tree-s="eJzTyCkw5NLIKTDiClZ3hANXW3WuAmOuRKCECUjWFEnWIyIdJGvGlQgEegAD3hAj"; __ac="0zVm7PLtpaeQUXtm9BeYrC5%2BzobmBLPQkjRtxjyRGHs1MGY1MTgzZGFkbWluIQ%3D%3D"; 'wc.cookiecredentials'="0241d135445f7ca0b1cb7aced45da4e750f5414dadmin!"
Can I use the Authorization entry on the request header to enforce this restriction?

What your VCL is currently doing is removing Cookie from the request header and caching all requests. This causes the exact behavior you describe: the first page load is cached and all subsequent users get the cached content - no matter who makes the request. Generally you only want to cache content for users that have not logged in.
You cannot do authorization or access control using Varnish. This needs to be handled by the backend. In order to do this you need to identify the cookie that contains relevant session information and keep the cookie if a session is defined, and drop the cookie in other cases.
For example:
sub vcl_recv {
if(req.http.Cookie) {
# Care only about SESSION_COOKIE
if (req.http.Cookie !~ "SESSION_COOKIE" ) {
remove req.http.Cookie;
}
}
}
That way all requests that contain a "SESSION_COOKIE" cookie will be passed through to the backend while users that have not logged in receive a cached copy from Varnish.
If you wish to use Varnish for caching with logged in users as well, I'd suggest you take a look at Varnish's ESI features.

It can be done through the vcl only - with the addition of a vmod or inline C.
A simple example:
Let's say you have three different pages for three different levels of users: (not logged in, logged in, admin) and the level is stored in the cookie. You can take the cookie value and add a get parameter to the url:
http://example.com/homepage.html?user_level=none
http://example.com/homepage.html?user_level=logged_in
http://example.com/homepage.html?user_level=admin
(Your vmod would handle either adding '?' or '&' and the name value pair to the end of the URL.)
All of this would normally be done in a vmod but it could also be done, depending upon the size, as inline C. Vmods are preferred but inline C can get you up and running for initial testing - then move to a vmod.
When vcl_hash() runs it will hash over the url that your just modified. The Varnish cache will now contain up to three different versions of the page. From Varnish's point of view these are three different cache objects. You can set a different TTL for each variation of the page.
On a cache miss, the backend web server can either ignore this get param that you just added or you can remove it within the vcl, in vcl_hash() or vcl_backend_fetch. Your backend server would typically use the cookie value and not this vcl added parameter.
In summary, you are altering the URL parameters so that it hashes differently for the different user-levels that you need. Alternatively, you can just change your vcl_hash() method to hash over cookie values instead but I find altering the URL better for varnishncsa logging and reporting. In addition, if the cookie is encrypted then PURGE requests can be handled easier with basic cURL requests when you simply modify the URL parameters. Your PURGE conditional will handle if the client sending the PURGE requests is an authorized user.

Related

How do I instruct Varnish to cache based on response header?

I have a series of videos on various URLs across the site. I would like to cache them all with Varnish, even if the user is logged in. I can use the VCL configuration in order to whitelist certain URLs for caching. But I don't know how I can whitelist all videos.
Is there a way to say that all responses that return with a content type of video/mp4 are cached?

Deciding to serve an object from cache, and deciding to store an object in cache are 2 different things in Varnish. Both situations need to be accounted for.
Built-in VCL
In order to understand what happens out-of-the-box, you need to have a look at the following VCL file: https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/builtin.vcl
This is the built-in VCL that is executed. For each subroutine this logic is executed when you don't do an explicit return(xyz) in your VCL file for the corresponding subroutine.
This means you have some sort of safety net to protect you.
From a technical perspective, the Varnish Compiler will add the built-in VCL parts to the subroutines you extend in your VCL prior to compiling the VCL file into a C code.
What do we learn from the built-in VCL
The built-in VCL teaches us the following things when it comes to cacheability:
Varnish will only serve an object from cache for GET and HEAD requests (see vcl_recv)
Varnish will not serve an object from cache if a Cookie or Authorization header is present (see vcl_recv)
Varnish will not store an object in cache if a Set-Cookie header is present (see vcl_backend_response)
Varnish will not store an object in cache if TTL is zero or less (see vcl_backend_response)
Varnish will not store an object in cache if the Cache-Control header contains no-store(see vcl_backend_response)
Varnish will not store an object in cache if the Surrogate-Control header contains no-cache, no-store or private (see vcl_backend_response)
Varnish will not store an object in cache if the Vary header performs cache variations on all headers via a * (see vcl_backend_response)
How to make sure video files are served from cache
In vcl_recv you have to make sure Varnish is willing to lookup video requests from cache. In practical terms, this means taking care of the cookies.
My advice would be to remove all cookies, except the ones you really need. The example below will remove all cookies, except the PHPSESSID cookie, which is required by my backend:
vcl 4.1;
sub vcl_recv {
if (req.http.Cookie) {
set req.http.Cookie = ";" + req.http.Cookie;
set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1=");
set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
if (req.http.cookie ~ "^\s*$") {
unset req.http.cookie;
}
}
This example will remove tracking cookies from the request, which is fine, because they are processed by Javascript.
When the PHPSESSID cookie is not set, vcl_recv will fall back on the built-in VCL, and the request will be served from cache.
But in your case, you want them to be served from cache, even when users are logged in. This is fine, because videos are static files that aren't influenced by state.
The problem is that in the request context you cannot specify Content-Type information. You'll have to use the URL.
Here's example:
sub vcl_recv {
if(req.url ~ "^/video") {
return(hash);
}
}
This snippet will bypass the built-in VCL, and will explicitly look the object up in cache, if the URL matches the ^/video regex pattern.
How to make sure video files are stored in cache
When you do an explicit return(hash) in vcl_recv, the hash will be created and a cache lookup takes place. But if the object is not stored in cache, you'll still have a miss, which results in a backend request.
When the backend response comes back, it needs to be stored in cache for a certain amount of time. Given the built-in VCL, you have to make sure you don't specify a zero-TTL, and the Cache-Control response header must return cacheable syntax.
This is how I would set the Cache-Control header if for example we want to cache video files for a day:
Cache-Control: public, max-age=86400
Varnish will respect this header, and will set the TTL to 1 day based on the max-age syntax.
Even if you don't specify a Cache-Control header, Varnish will still store it in cache, but for 2 minutes, which is the default TTL.
Here's an example where Varnish will not store the object in cache, based on the Cache-Control header:
Cache-Control: private, max-age=0, s-maxage=0 ,no-cache, no-store
If either of these expressions is in Cache-Control, Varnish will make the object uncacheable.
If Set-Cookie headers are part of the response, the object becomes uncacheable as well.
In case you don't have full control over the headers that are returned by the backend server, you can still force your luck in VCL.
Here's a VCL snippet where we force objects to be stored in cache for images and videos:
sub vcl_backend_response {
if(beresp.http.Content-Type ~ "^(image|video)/") {
set beresp.ttl = 1d;
unset beresp.http.set-cookie;
return (deliver);
}
}
This example will strip off Set-Cookie headers, will override the TTL to a day, and it will explicitly store and deliver the object. This is only the case when the Content-Type response headers either starts with image/, or with video/

Does a fetch call from a ServiceWorker use the regular browser cache

I'm wondering if a fetch call in a ServiceWorker uses the normal browser cache or bypasses it and sends the request always to the server
For example. Does the fetch call on line 5 first look in the browser cache or not.
self.addEventListener('fetch', function(event) {
event.respondWith(
caches.open('mysite-dynamic').then(function(cache) {
return cache.match(event.request).then(function (response) {
return response || fetch(event.request).then(function(response) {
cache.put(event.request, response.clone());
return response;
});
});
})
);
});

That's a good question. And the answer is: fetch inside the SW works just like fetch in the browser context. This means that the browser's HTTP cache is checked and only after that the network is consulted. Fetch from the SW doesn't bypass the HTTP cache.
This comes with a potential for a race condition if you're not careful with how you name your static assets.
Example:
asset.css is served from the server with max-age: 1y
after the first request for it the browser's HTTP cache has it
now the file's contents are updated; the file is different but the name is still the same (asset.css)
any fetch event to the file, asset.css, is now served from the HTTP cache and any logic that the SW implements to check the file from the server is actually leading to getting the initial file from step 1 from the HTTP cache
at this point the file on the server could be incompatible with some other files that are cached and something breaks
Mitigations:
1. Always change the name of the static asset when the content changes
2. Include a query string (do not ask for asset.css, but for asset.css?timestamporsomething)
Required very good reading: https://jakearchibald.com/2016/caching-best-practices/

What's the best way to prime/refresh a url in varnish 4 using a single http request?

The force cache miss note states:
Forcing cache misses do not evict old content. This means that causes Varnish to have multiple copies of the content in cache. In such cases, the newest copy is always used. Keep in mind that duplicated objects will stay as long as their time-to-live is positive.
I don't want to keep multiple copies in the cache. Is my approach of priming the url valid? Where I manually evict the old content by adding them to the ban lurker. And then forcing a cache miss myself to replace the content which were banned.
acl purge_prime {
"127.0.0.1";
"::1";
}
sub vcl_recv {
if (req.method == "PRIME") {
if (!client.ip ~ purge_prime) {
return(synth(405,"No priming for you. (" + client.ip + ")"));
}
# Add to the ban lurker. Purging existing pages.
ban("obj.http.x-host == " + req.http.host + " && obj.http.x-url == " + req.url);
# Call the backend to fetch new content and add it to the cache.
set req.method = "GET";
set req.hash_always_miss = true;
}
# ... other custom rules.
}
# ... other subroutines below, e.g. adding ban-lurker support etc.
The logic makes sense to me, I'm just worried because no-one else has done it (which I'm assuming there's a reason for).
Is this the wrong approach over using purge with restart, if so what's the best way to prime a url using a single http request?

I suppose that your approach is simply not good because it uses bans.
Using purges as opposed to bans will allow you to leverage grace mode.
Restarts are perfectly fine - they will not result in two or more HTTP requests, but rather push the request through the VCL state machine once again.

how to cache post requests with varnish?

I use varnish with docker - see million12/varnish
GET requests work great !
but i have no idea what i have to set in the settings to cache POST requests.
on google i have found many posts (from 2010 or 2011) where it says that POST requests can not be cached with varnish - is this statement still correct?
Or is there another way to cache POST requests?
here my varnish.vcl settings:
vcl 4.0;
backend default {
...
}
# Respond to incoming requests.
sub vcl_recv {
unset req.http.Cookie;
}
# Set a header to track a cache HIT/MISS.
sub vcl_deliver {
if (obj.hits > 0) {
set resp.http.X-Varnish-Cache = "HIT";
}
else {
set resp.http.X-Varnish-Cache = "MISS";
}
}
# Not cache 400 - 500 status requests
sub vcl_backend_response {
if (beresp.status >= 400 && beresp.status <= 600) {
set beresp.ttl = 0s;
}
}
Thanks for help !

Edit Dec. 2020:
There is an updated tutorial for this here:
https://docs.varnish-software.com/tutorials/caching-post-requests/
There is a Varnish module and tutorial for caching POST requests. This adds the ability to add the post body to the hash key and to pass along the POST request.
The VMOD is available for Varnish 4 releases and includes the following
functions:
buffer_req_body(BYTES size): buffers the request body if it is smaller
than size. This function is a “better” (bug-free**) copy of
std.CacheReqBody(), so please use this one instead of the one provided
by the libvmod-std. Please note that retrieving req.body makes it
possible to retry pass operations(POST, PUT).
len_req_body(): returns the request body length.
rematch_req_body(STRING re): regular expression match on request body.
hash_req_body(): adds body bytes to the input hash key.
https://info.varnish-software.com/blog/caching-post-requests-with-varnish
https://github.com/aondio/libvmod-bodyaccess
Note that in the 4.1 branch of this VMOD, the built in std.cache_req_body() is used instead of buffer_req_body(), but a standing bug in Varnish 4.1 breaks the 4.1 branch. https://github.com/varnishcache/varnish-cache/issues/1927

Currently Varnish can't cache POST requests.
AFAIK people's tries to cache POST request have failed. It seems that Varnish ends up converting those into GET requests.
Sources:
A Blog (with additional information about how to do it with Nginx)
Varnish Forum

put another ngnx/apache/whatever on an unused port
push the POST request to that server
over there you forward them to the varnish as a get request and fetch the result
display the result via your relay-server
will probably slow down the whole thing a bit - but we are talking dirty workarounds here right? Hope this is not a way too crazy solution..

RSS Feed Generator caching Using varnish

I want to write an RSS feed generator application.
I want to know if varnish or similar caching solution can be used for caching the RSS feed.

Yes, caching an RSS feed application with Varnish will work very well.
Just send the usual "Cache-Control: max-age=XXX" response header from your application, and Varnish will happily cache it for the duration.
I've seen that some RSS clients send a "?forceupdate=" GET argument to RSS feeds. Depending on your traffic levels and requirements you might want to do some request URL sanitation to handle that:
sub vcl_recv {
if (req.url ~ "/rss/") {
# remove any GET arguments to increase cache hit rate
set req.url = regsub(req.url, "\?.*$", "");
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio