How does Varnish know how long to cache each response for? - caching

Does Varnish simply follow the Cache-Control header from the origin server?
And are there any other ways that you can control how long it caches a response for? For example, can you tell Varnish to cache a response “indefinitely” (i.e. “until further notice”) and then later explicitly instruct it to delete that object from the cache when you know the underlying data has changed?
(Please note: I've never used Varnish; I'm just trying to work out whether it would be a good fit for a forthcoming project.)

Those are very basic questions. I think you should start from reading great docs on https://www.varnish-cache.org/docs/
To answer your question: It depends on how you configure varnish.
You can leave the defaults so it'll use expires;
You can set it up to have different TTL(Time To Live) for each domain/backend/filetype/cookie...
If you set it up with ie. 1year cache TTL, you can remove it from cache by "Purging" specific address/url or whole domain.
You can do so in two ways:
by PURGE HTTP Method if you have it configured in your vcl file
by using purge command in varnishadm/varnish console
https://www.varnish-cache.org/docs/2.1/tutorial/purging.html

Related

Browser Cache Private S3 Resources

Stack is:
Angular
Laravel
S3
nginx
I'm using S3 to store confidential resources of my users. Bucket access is set to private which means I can access files either by creating temporary (signed, dynamic) links or by using Storage::disk('s3')->get('path/to/resource') method and returning an actual file as a response.
I'm looking for a way to cache resources in user's browser. I have tried to set cache headers to resource response directly on AWS, but since I'm creating temporary urls, they are dynamic and cache is not working in that case.
Any suggestion is highly appreciated.
EDIT: One thing that makes the whole problem even more complex is that security of resources should be intact. It means that I need a way to cache resources, but in the same time I must prevent users from copy-pasting links and using them outside of the app (sharing with others via direct links).
Temporary links in terms of security are still not an ideal solution, since they can be shared (and accessed multiple times) within the period of time they are valid for (in my case it's 30 seconds).
Caching will work as-is (based on Cache-Control, et al.) as long as the URL stays the same. So, if your application uses the same signed URL for awhile, you'll be fine.
The problem comes when you want to update an expiration date or something. This of course has different querystring parameters, and is effectively a different URL. You need a different caching key, but the browser has no concept of this by default.
If it is acceptable for your security, you can create a Service Worker which uses just the base URL (without querystring) as the cache key. Then, future requests for the same object on the bucket will be able to used the cached response, regardless of other URL parameters.
I must prevent users from copy-pasting links and using them outside of the app (sharing with others via direct links).
This part of your requirement is impossible, and unrelated to caching. Once that URL is signed, it can be used by others.
You have just add one parameter in your code.
'ResponseCacheControl' => 'no-store'
Storage::disk('s3')->getAwsTemporaryUrl(Storage::disk('s3')->getDriver()->getAdapter(), trim($mNameS3), \Carbon\Carbon::now()->addMinutes(config('app.aws_bucket_temp_url_time')), ['ResponseCacheControl' => 'no-store']);

How to choose TTURLRequestCachePolicy?

I'm building an app with Three20 and I'm using the photo gallery component.
I can't find any documentation about the different cache policy available.
Could you explain to me each of them ?
TTURLRequestCachePolicyDefault
TTURLRequestCachePolicyDisk
TTURLRequestCachePolicyEtag
TTURLRequestCachePolicyLocal
TTURLRequestCachePolicyMemory
TTURLRequestCachePolicyNetwork
TTURLRequestCachePolicyNoCache
TTURLRequestCachePolicyNone
Thanks !
I'm not sure of the exact policy of each type, and they are not well documented. These is the information I have found out by using and reading the code:
TTURLRequestCachePolicyNone - requests will not use three20 cache system. meaning each request will perform a network request.
TTURLRequestCachePolicyMemory - the request will try to look for an existing cache object in the device memory. memory is cleaned each time the application is terminated. not sure how useful it is. from what i have seem, it's working only for UIImage objects
TTURLRequestCachePolicyDisk - Three20 saves cache objects in the application document folder as files. The request will look only on that disk cache.
TTURLRequestCachePolicyNetwork - not sure. i think it's checks the header expire date of the content.
TTURLRequestCachePolicyNoCache - will not cache new responses and will not look for cache objects in existing cache
TTURLRequestCachePolicyEtag - requests will be looked based on their header etag. I think it's a little buggy in three20, so it's better not to use it.
TTURLRequestCachePolicyLocal - requests will be looked on both disk & memory cache
TTURLRequestCachePolicyDefault - requests will be looked in all cache types (besides the etag)
From my experience, i use TTURLRequestCachePolicyDefault with expiration time i want, and TTURLRequestCachePolicyNoCache for requests i want to disable cache and make sure each request is doing a network call.

Refreshing in RestSharp for Windows Phone

I implemented RestSharp succesfully in my WP7 application, but one issue remains:
When I load resources from the server (for example a GET request on http://localhost:8080/cars), the first time the collection of (in this case) cars is succesfully returned.
When I issue the same request for the second time, I always get the same result as the first time - even when the resources have changed in the meantime. When looking at my server, the second time there is no request issued at all.
I presume there's a caching mechanism implemented in RestSharp, but I see no way to invalidate the cache results.
Are there any ways to manually invalidate the RestSharp for Windows Phone cache results? (Or ways to force the library to get the results from the server)
You can control caching of resources by setting headers on the response your server sends back. If you do not want the resource to be cached then set the cache-control header to no-cache.
It is the server's job to specify how long a resource is good for, the client should do its best to respect that information.
If you really, really want to delete entries in the cache you need to go via the WinINet API
As a quick hack to avoid caching you can append a unique value to the end of the query string. The current DateTime (including seconds and milliseconds if necessary) or a GUID are suitable.
eg.
var uri = "http://example.com/myrequest?rand=" + DateTime.Now().ToString();

Can I clear a specific URL in the browser's cache (using POST, or otherwise)?

The Problem
There's an item (foo.js) that rarely changes. I'd like this item to be stored in the browser's cache (using Expires header). However, when it does change, I'd like the browser to update to the newest version.
The Attempt
Foo.js is returned with a far future Expires header. It's cached on the browser and requires no round trip query to the server. Just the way I like it. Now, when it changes....
Let's assume I know that the user's version of foo.js is outdated. How can I force a fresh copy of it to be obtained? I use xhr to perform a POST to foo.js. This should, in theory, force the browser to get a newer version of foo.js.
Unfortunately, this only seems to work in Firefox. Other browsers will use their cached version of the copy, even if other POST paramters are set.
WTF
First off, is there a way to do what I'm trying to do?
Second, why is there no sensible key/value type of cache that browser's have? Why can I not simply not include in headers: "Cache: some_key, some_expiration_time" and also specify "Clear-Cache: key1, key2, key3" (the keys must be domain specific, of course). Instead, we're stuck with either expensive round-trips that ask "is content new?", or the ridiculous "guess how long it'll be before you modify something" Expires header.
Thanks
Any comments on this matter are greatly appreciated.
Edits
I realize that adding a version number to the file would solve this. However, in my case it is not possible -- the call to "foo.js" is hardcoded into a bookmarklet.
You can just add a querystring to the end of the file, the server can ignore it, but the browser can't, it must treat it as a new request:
http://www.site.com/foo.js?v=1.12345
Many people use this approach, SO uses a hash of some sort, I use the build number (so users get a new version each build). If either of these is an option, you get the benefit of long duration cache headers, but still force a fetch of a new copy when needed.
Why set your cache expiration so far in the future? If you set it to one day for instance, the only overhead that you will incur (once a day) is the browser revalidating that it is the same file. If you still have not changed it, then you will not re-download the file, the server will respond with a not-modified response.
All caches have a set of rules that
they use to determine when to serve a
representation from the cache, if it’s
available. Some of these rules are set
in the protocols (HTTP 1.0 and 1.1),
and some are set by the administrator
of the cache (either the user of the
browser cache, or the proxy
administrator).
Generally speaking, these are the most
common rules that are followed (don’t
worry if you don’t understand the
details, it will be explained below):
If the response’s headers tell the cache not to keep it, it won’t.
If the request is authenticated or secure (i.e., HTTPS), it won’t be
cached.
A cached representation is considered fresh (that is, able to be
sent to a client without checking with
the origin server) if:
* It has an expiry time or other age-controlling header set, and
is still within the fresh period, or
* If the cache has seen the representation recently, and it was
modified relatively long ago.
Fresh representations are served directly from the cache, without
checking with the origin server.
If an representation is stale, the origin server will be asked to
validate it, or tell the cache whether
the copy that it has is still good.
Under certain circumstances — for example, when it’s disconnected
from a network — a cache can serve
stale responses without checking with
the origin server.
If no validator (an ETag or
Last-Modified header) is present on a
response, and it doesn't have any
explicit freshness information, it
will usually — but not always — be
considered uncacheable.
Together, freshness and validation are
the most important ways that a cache
works with content. A fresh
representation will be available
instantly from the cache, while a
validated representation will avoid
sending the entire representation over
again if it hasn’t changed.
http://www.mnot.net/cache_docs/#BROWSER
There is an excellent suggestion made in this thread: How can I make the browser see CSS and Javascript changes?
See the accepted answer by user, "grom".
The idea is to use the "modified" time stamp from the server to note when the file has been modified, and adding a version parameter to the end of the URL, making your CSS and JS files have URLs like this: my.js?version=12345678
This makes the browser think it is a new file, and so it does not refer to the cached version.
I am using a similar method in my app. It works pretty well. Of course, this would assume you are using something like PHP to process your HTML.
Here is another link with a more simple implementation for WordPress: http://markjaquith.wordpress.com/2009/05/04/force-css-changes-to-go-live-immediately/
With these constraints I guess your only option is to use window.location.reload(true) and force the browser to fresh all the cached items.. it's not pretty
You can invalidate cache on a specific url, using Cache-Control HTML header.
On your desired URL you can run (with xhr/ajax for instance) a request with following headers :
headers: {
'Cache-Control': 'no-cache, no-store, must-revalidate, max-age=0',
Pragma: 'no-cache',
Expires: '0',
}
Your cache will be invalidated, and next GET requests will return a brand new result.

Check if web page is modifed / has expired with Ruby

I'm writing a crawler for Ruby, and I want to honour the headers that the server sends out in order to make the crawl more efficient. Is there a straightforward way in Ruby of determining whether a page needs to be re-downloaded by the client? I know I need to consider at least these headers:
Last Modified
Etags
Cache Control
Expires
What's the definitive way of determining this - is it specified anywhere?
You are right on the headers you will need to look at, but you need to consider that the server is what is setting these. If they are set correctly, then you can use them to make the decision, but none of them are required.
Personally, I would probably start with tracking the expires value as I do the initial download, as well as logging the etag. Finally I'd look at last modified as I did the next pass, assuming the expires or etag showed some sign that I might need to re-download (or if they aren't even set). I wouldn't expect Cache Control to be all the useful.
You'll want to read about the head method in Net::HTTP -- http://www.ruby-doc.org/stdlib/

Resources