How to control amp-live-list caching? - caching

I'm implementing an amp-live-list for our site and I have everything set up. Everything looks good when I go the AMP version of my live blog pages (where the element is implemented) however when I run the URL through Google, i.e. https://www.google.com/amp/www.example.com/test-live-blog/amp, it can take up to 3-4 minutes for an update to come through even though polling is set to the minimum 15 seconds.
The delay directly on the AMP URL, i.e. https://www.example.com/test-live-blog/amp, is around the expected 15 second mark. Does Google AMP have a separate cache or request header it uses? What response header can I set to try and reduce this time to live for the AMP version of my document? I can't find any suitable documentation for these kinds of caching questions. Thanks.

The Google AMP Cache respect the max-age header, as specified in the docs:
The cache follows a "stale-while-revalidate" model. It uses the origin's caching headers, such as Max-Age, as hints in deciding whether a particular document or resource is stale. When a user makes a request for something that is stale, that request causes a new copy to be fetched, so that the next user gets fresh content.
The Google AMP cache, including the case where the cache ping is used, has some latency which is on the order of minutes and I have seen as low as a minute.

Related

How to calculate TTL for various types of cache?

Is there a standard way to calculate TTL for different types of cache? this's more of a generic question so lets assume we're designing a system from scratch and we have the following requirements/specs:
static resources served by CDNs are rarely updated e.g.(privacy
policy, about, images and maps)
application cache is used to
serve a- sessions b- recently used reads regardless of the type
client side cache (previously requested files), as well as lets say
images or posts a client can see (something similar to
Instagram/twitter in this case)
Calculate TTL for the following types based on the little to no information provided above:
Client cache
CDN
Webserver cache (used for media)
Application caache (sessions and recent reads of some data)
TTLs are mostly defined using historical data, use cases, and experience. There are no predefined rules/theories that tell you about the cache expiry. Cache TTL should have some tolerance like if you set TTL too high then you might see expired(stale) data, what's the impact of stale data in your application? In some cases, stale data is not accepted at all but in other cases, it's ok to use stale data for SOME TIME.
Still, you'll observe each caching system has some predefined TTL for example AWS CDN has 24 hours expiry, Google CDN has 1 hour. Etag is another thing, that's used in CDN.
CDN can catch data for a week but depending on the data some data can change hourly as well so in that case expiry is set to a lower value, similar things apply to other use cases.
The session should be cached for a week or so, but some applications cache the session for a longer period. Of course, there're pros and cons of using low/high TTL.
Application data cache has similar characters as CDN data, the data can change any time and change must reflect in the cache. Again depending on the use case the TTL should be used, my experiences say you can cache some data for one day or one week but some data can not be cached for more than 15 minutes since it might get updated within 15 minutes.
Depending on the nature of the data you can always find some optimal TTL, finding optimal TTL takes time as you would have to monitor cache hit/miss and stale data ratio.
Refers
https://martinfowler.com/bliki/TwoHardThings.html
https://www.stevesouders.com/blog/2012/03/22/cache-them-if-you-can/

Does the Google Analytics API throttle requests?

Does the Google Analytics API throttle requests?
We have a batch script that I have just moved from v2 to v3 of the API and the requests go through quite well for the first bit (50 queries or so) and then they start taking 4s or so each. Is this Google throttling us?
While Matthew is correct, I have another possibility for you. Google analytics API cashes your requests to some extent. Let me try and explain.
I have a customer / site that I request data from. While testing I noticed some strange things.
the first million rows results would come back with in an acceptable amount of time.
after a million rows things started to slow down we where seeing results returning in 5 times as much time instead of 5 minutes we where waiting 20 minutes or more for the results to return.
Example:
Request URL :
https://www.googleapis.com/analytics/v3/data/ga?ids=ga:34896748&dimensions=ga:date,ga:sourceMedium,ga:country,ga:networkDomain,ga:pagePath,ga:exitPagePath,ga:landingPagePath&metrics=ga:entrances,ga:pageviews,ga:exits,ga:bounces,ga:timeOnPage,ga:uniquePageviews&filters=ga:userType%3D%3DReturning+Visitor;ga:deviceCategory%3D%3Ddesktop&start-date=2014-05-12&end-date=2014-05-22&start-index=236001&max-results=2000&oauth_token={OauthToken}
Request Time (seconds:milliseconds): :0:484
Request URL :
https://www.googleapis.com/analytics/v3/data/ga?ids=ga:34896748&dimensions=ga:date,ga:sourceMedium,ga:country,ga:networkDomain,ga:pagePath,ga:exitPagePath,ga:landingPagePath&metrics=ga:entrances,ga:pageviews,ga:exits,ga:bounces,ga:timeOnPage,ga:uniquePageviews&filters=ga:userType%3D%3DReturning+Visitor;ga:deviceCategory%3D%3Ddesktop&start-date=2014-05-12&end-date=2014-05-22&start-index=238001&max-results=2000&oauth_token={OauthToken}
Request Time (seconds:milliseconds): :7:968
I did a lot of testing stopping and starting my application. I couldn't figure out why the data was so fast in the beginning then slow later.
Now I have some contacts on the Google Analytics Development team the guys in charge of the API. So I made a nice test app, logged some results showing my issue and sent it off to them. With the question Are you throttling me?
They where also perplexed, and told me there is no throttle on the API. There is a flood protection limit that Matthew speaks of. My Developer contact forwarded it to the guys in charge of the traffic.
Fast forward a few weeks. It seams that when we make a request for a bunch of data Google cashes the data for us. Its saved on the server incase we request it again. By restarting my application I was accessing the cashed data and it would return fast. When I let the application run longer I would suddenly reach non cashed data and it would take longer for them to return the request.
I asked how long is data cashed for, answer there was no set time. So I don't think you are being throttled. I think your initial speedy requests are cashed data and your slower requests are non cashed data.
Email back from google:
Hi Linda,
I talked to the engineers and they had a look. The response was
basically that they thinks it's because of caching. The response is
below. If you could do some additional queries to confirm the behavior
it might be helpful. However, what they need to determine is if it's
because you are querying and hitting cached results (because you've
already asked for that data). Anyway, take a look at the comments
below and let me know if you have additional questions or results that
you can share.
Summary from talking to engineer: "Items not already in our cache will
exhibit a slower retrieval processing time than items already present
in the cache. The first query loads the response into our cache and
typical query times without using the cache is about 7 seconds and
with using the cache is a few milliseconds. We can also confirm that
you are not hitting any rate limits on our end, as far as we can tell.
To confirm if this is indeed what's happening in your case, you might
want to rerun verified slow queries a second time to see if the next
query speeds up considerably (this could be what you're seeing when
you say you paste the request URL into a browser and results return
instantly)."
-- IMBA Google Analytics API Developer --
Google's Analytics API does have a rate limit per their docs: https://developers.google.com/analytics/devguides/reporting/core/v3/coreErrors
However they should not caused delayed requests, rather the request should be returned with a response of: 403 userRateLimitExceeded
Description of that error:
Indicates that the user rate limit has been exceeded. The maximum rate limit is 10 qps per IP address. The default value set in Google Developers Console is 1 qps per IP address. You can increase this limit in the Google Developers Console to a maximum of 10 qps.
Google's recommended course of action:
Retry using exponential back-off. You need to slow down the rate at which you are sending the requests.

Varnish: How to send hit/miss stats to backend

I hope you can help
I have a image server that generates images on the fly.
I'm using varnish to cache generated images.
I need to record how many requests (per image) varnish receives as well as if it was a hit or miss (pass gets marked as miss). Currently, I'm writing access logs with hit/miss to file, I then using crontab process this access-log file and write the data to my db...
What I would like to do instead is:
Have Varnish make a request to my backend notifying it of a cache hit (and if possible the response size (bytes)).
My backend could then save this data...
Is this at all possible and if so how?
In-case anybody is interested:
2 varnish instances each with 1 (java+tomcat) backend.
Service manipulates and generates each image specific to the requirements made in the request...
Below are per day:
Over 35 million page views where each page has at least 3 images in it.
Varnish gets around 3+ million requests for images (images are also cached by the browser).
Varnish has a 87% hit rate
Response times for a hit are a few micro seconds
Response times for a miss are 50ms to 1000ms depending on the size of the image (both source and output)
The best way of doing this is to have a helper process that tails varnishlog output and
does the HTTP calls when needed.
You can do this by logging the necessary data with std.log() in vcl_deliver, so the
helper process gets all the data it needs. Use obj.hits > 0 to check if this was a cache hit.
If you really really need to do it inline (and slowing down all your cache hits badly), you
can use libvmod-curl:
https://github.com/varnish/libvmod-curl
If you are going to send a request to a stats server from within your vcl I would try to incorporate some type of aggregate request, where you send it every 100 (or whatever) requests instead of every single incoming request.
Like the other answer, I would recommend using varnishncsa (or varnishlog) with a process that tails the log file. There could be some delay in that method but if that is acceptable then I would consider post processing the varnish log when logrotated runs. This way you have a full day's worth of data and you can churn through it, producing whatever report you need.

How long should static file be cached?

I'd like to set browser caching for some Amazon S3 files. I plan to use this meta data:
Cache-Control: max-age=86400, must-revalidate
that's equal to one day.
Many of the examples I see look like this:
Cache-Control: max-age=3600
Why only 3600 and why not use must-revalidate?
For a file that I rarely change, how long should it be cached?
What happens if I update the file and need that update to be seen immediately, but its cache doesn't expire for another 5 days?
Why only 3600 ?
Assumingly because the author of that particular example decided that one hour was an appropiate cache timeout for that page.
Why not use must-revalidate ?
If the response does not contain information that is strictly required to follow the cache rules you set, omitting must-revalidate could in theory ensure that a few more requests are delivered through the cache. See this answer for details, the most relevant part being from the HTTP spec:
When a cache has a stale entry that it would like to use as a response
to a client's request, it first has to check with the origin server
(or possibly an intermediate cache with a fresh response) to see if
its cached entry is still usable.
For a file that I rarely change, how long should it be cached?
Many web performance advices says to set a very far into the future cache expiration, such as a few years. This way, the client browser will only download the data once, and subsequent visits will be served from the cache. This works well for "truly static" files, such as Javascript or CSS.
On the other hand, if the data is dynamic, but does not change too often, you should set an expiration time that is reasonable based for your specific scenario. Do you need to get the newest version to the customer as soon as it's available, or is it okay to serve a stale version ? Do you know when the data change ? Etc. An hour or a day is often appropiate trade-offs between server load, client performance, and data freshness, but it depends on your requirements.
What happens if I update the file and need that update to be seen immediately, but its cache doesn't expire for another 5 days?
Give the file a new name, or append a value to the querystring. You will of course need to update all links. This is the general approach when static resources need to change.
Also, here is a nice overview of the cache control attributes available to you.

Automatically rebuild cache

I run a Symfony 1.4 project with very large amount of data. The main page and category pages are using pagers which need to know how much rows are available. I'm passing a query which contains joins to the pager which leads to a loading-time of 1 minute on these pages.
I configured cache.yml for the respective actions. But I think the workaround is insufficient and here are my assumptions:
Symfony rebuilds the cache within a single request which is made by a user. Let's call this user "cache-victim" to simplify things.
In our case, the data needs to be up-to-update - a lifetime of 10 minutes would be sufficient. Obviously, the cache won't be rebuilt, if no user is willing to be the "cache-victim" and therefore just cancels the request. Are these assumptions correct?
So, I came up with this idea:
Symfony should fake the http-request after rebuilding the cache. The new cache-entries should be written on a temporary file/directory and should be swapped with the previous cache-entries, as soon as cache rebuilding has finished.
Is this possible?
In my opinion, this is similar to the concept of double buffering.
Wouldn't it be silly, if there was a single "gpu-victim" in a multiplayer game who sees the screen building up line by line? (This is a lop-sided comparison, I know ... ;) )
Edit
There is no "cache-victim" - Every 10 minutes page reloading takes 1 minute for every user.
I think your problem is due to some missing or wrong indexes. I've a sf1.4 project for a large soccer site (i.e. 2M pages/day) and pagers aren't going so slow even if our database has more than 1M rows these days. Take a look at your query with EXPLAIN and check where it is going bad...
Sorry for necromancing (is there a badge for that?).
By configuring cache.yml you are just caching the view layer of your app (that is, css, js and html) for REQUESTS WITHOUT PARAMETERS. Navigating the pager obviously has a ?page=X on the GET request.
Taken from symfony 1.4 config.yml documentation:
An incoming request with GET parameters in the query string or submitted with the POST, PUT, or DELETE method will never be cached by symfony, regardless of the configuration. http://www.symfony-project.org/reference/1_4/en/09-Cache
What might help you is to cache the database results, but its a painful process on symfony/doctrine. Refer to:
http://www.symfony-project.org/more-with-symfony/1_4/en/08-Advanced-Doctrine-Usage#chapter_08_using_doctrine_result_caching
Edit:
This might help you as well:
http://www.zalas.eu/symfony-meets-apc-alternative-php-cache

Resources