API response non static fields and AWS Cloudfront caching - caching

For an API I'm currently building I'm including timestamp and a unique requestId (uuid) for every service response delivered to the client e.g:
"requestId": "bf0c66fa-0f1c-406c-9ee0-48ff73b8c5ee"
"timestamp": "2017-11-03T23:57:40.141Z"
The API sits behind AWS CloundFront.
The API delivers mostly static contents and we'd like the majority of requests to hit the CDN cache rather than the API itself.
Could the varying field values for requestId and timestamp have any undesired side effects when it comes to caching?

No, varying the content will not have an impact on caching.
CloudFront does not examine the content or make caching decisions based on it. It's only interested in the Cache-Control headers and the TTLs you've defined in the relevant Cache Behavior settings.
If you are using CloudFront to cache the responses, then the requestId and timestamp will of course be the same each time a response is returned from the cache, rather than being sent to the origin.

Related

Prevent AJAX POST responses from being cached

I have AJAX POST requests generated from my webpage, and there may be multiple post requests with the same post data. But the response may vary, and I want to make sure I am not getting cached responses to any of these requests. I need each request to hit the webpage.
Am I right in assuming that responses to POST requests will not be cached?
There is two level of caching will be involved in that process
Browser caching
Server caching
To eliminate first one you have to cheat your browser and add a fake parameter to your ajax request so it will think it's unique each time i.e
www.example.com/api/ajax?123
www.example.com/api/ajax?1234
For server level you have to make sure that no cache been added to your configuration for such link, for example some developer will cache any file ends with .json or service like Cloud Flare it will automatically cache any static content.

Is permanent storage of video metadata against the YouTube Data API ToS?

The YouTube Data API ToS says:
Your API Client may employ session-based caching solely of YouTube API results, but You must use commercially reasonable efforts to cause Your API Client to update cached results upon any changes in video metadata. For example, if a video is removed from the YouTube service or made "private" by the video uploader, cached results shall be removed from Your cache. For the avoidance of doubt, Your API Client shall not be designed to cache YouTube audiovisual content.
The YouTube Data API overview also says:
Your application can cache API resources and their ETags. Then, when your application requests a stored resource again, it specifies the ETag associated with that resource. If the resource has changed, the API returns the modified resource and the ETag associated with that version of the resource. If the resource has not changed, the API returns an HTTP 304 response (Not Modified), which indicates that the resource has not changed. Your application can reduce latency and bandwidth usage by serving cached resources in this manner.
This does not mean that any time I wish to request a resource I must go back to the YouTube Data API, correct?
The only data from the API I'm interested in storing in my database is
ETag
Video id
Duration
Title
Is it okay for me to store these four items in my database (given I update the information reasonably regularly)?
I'm not interested in storing any portion of the actual video whatsoever. Just the metadata.

Is it possible for CloudFront to cache REST API calls

I have a Single Page Application, and would like to cache some of the public REST API calls. Is it possible to use CloudFront to cache the JSON result of those API calls?
You can point api.yourdomain.com to cloudfront domain. Cloudfront will cache the json response based on your cache control headers.
However, you'll likely have to deal with cross domain issue if your single page app is not served from api.yourdomain.com. Cloudfront supports OPTIONS request which means it should be able to support CORS. You can also enable caching of OPTIONS requests.
http://aws.amazon.com/cloudfront/faqs/#Does_Amazon_CloudFront_cache_POST_responses

Geo location / filtering and HTTP Caching

I'm trying to add cache support (both HTTP and server) for a ASP.NET Web Api solution.
The solution is geo located, meaning that I can get different results based on the caller IP address.
The question can be trivially solved for the server side cache, using an approach similar to VaryByCustom (like this one). However that does not solve the problem with the client side HTTP caches. Here are the alternatives
I'm considering the following options:
Enforcing a must-revalidate in the cache
Keep the validation server side using the same algorithm to VaryByCustom, but include the extra cache revalidate calls on the server side with ETAGS or any mechanism that keep track of the originally cached value country of origin.
Creating country specific routes HTTP 302
In this scenario an application invoking
http://site/UK/content
Redirects to US version if originating from an US IP address when the cache has expired
http://site/US/content
It might present out-of-date contents that do not match the IP of origin local. That is not a serious problem if the cache expires is a small value (< 1 hour), since country changes are fairly uncommon.
What is the recommended solution?
I'm not sure I understand the problem.
For client caching, if you enable private caching then a user in UK will cache the UK version of http://site/content and the US user will cache the US version of http://site/content.
The only problem I can see is if a user travels from the US to the UK and accesses the content. Or if you allow public caching and some intermediary is shared by US and UK users.
After detailed evaluation first approach was chosen. Actual implementation is:
Create a cache key that depends on the country of origin IP address
Create a ETag for that cache key and store it in Server cache
Additional requests that include ETag If-None-Match header are evaluates in server for cache freshness:
If the country of origin is the same, the cache key will be the same and ETag is valid, returning a HTTP 304 not modified
If the country of origin is different, cache key will be different and such the ETag is not valid, returning a HTTP 200 and returning a new ETag.
Agree with Poul-Henning Kamp geolocation should be a transport level thing, but unfortunately is not, so this is the only way we could come up with to ensure cache freshness for a given country.
The disadvantage is that cannot have any infrastructure cache, e.g., all requests need to check the server for cache freshness.

Amazon Cloudfront: private content but maximise local browser caching

For JPEG image delivery in my web app, I am considering using Amazon S3 (or Amazon Cloudfront
if it turns out to be the better option) but have two, possibly opposing,
requirements:
The images are private content; I want to use signed URLs with short expiration times.
The images are large; I want them cached long-term by the users' browser.
The approach I'm thinking is:
User requests www.myserver.com/the_image
Logic on my server determines the user is allowed to view the image. If they are allowed...
Redirect the browser (is HTTP 307 best ?) to a signed Cloudfront URL
Signed Cloudfront URL expires in 60 seconds but its response includes "Cache-Control max-age=31536000, private"
The problem I forsee is that the next time the page loads, the browser will be looking for
www.myserver.com/the_image but its cache will be for the signed Cloudfront URL. My server
will return a different signed Cloudfront URL the second time, due to very short
expiration times, so the browser won't know it can use its cache.
Is there a way round this without having my webserver proxy the image from Cloudfront (which obviously negates all the
benefits of using Cloudfront)?
Wondering if there may be something I could do with etag and HTTP 304 but can't quite join the dots...
To summarize, you have private images you'd like to serve through Amazon Cloudfront via signed urls with a very short expiration. However, while access by a particular url may be time limited, it is desirable that the client serve the image from cache on subsequent requests even after the url expiration.
Regardless of how the client arrives at the cloudfront url (directly or via some server redirect), the client cache of the image will only be associated with the particular url that was used to request the image (and not any other url).
For example, suppose your signed url is the following (expiry timestamp shortened for example purposes):
http://[domain].cloudfront.net/image.jpg?Expires=1000&Signature=[Signature]
If you'd like the client to benefit from caching, you have to send it to the same url. You cannot, for example, direct the client to the following url and expect the client to use a cached response from the first url:
http://[domain].cloudfront.net/image.jpg?Expires=5000&Signature=[Signature]
There are currently no cache control mechanisms to get around this, including ETag, Vary, etc. The nature of client caching on the web is that a resource in cache is associated with a url, and the purpose of the other mechanisms is to help the client determine when its cached version of a resource identified by a particular url is still fresh.
You're therefore stuck in a situation where, to benefit from a cached response, you have to send the client to the same url as the first request. There are potential ways to accomplish this (cookies, local storage, server scripting, etc.), and let's suppose that you have implemented one.
You next have to consider that caching is only just a suggestion and even then it isn't a guarantee. If you expect the client to have the image cached and serve it the original url to benefit from that caching, you run the risk of a cache miss. In the case of a cache miss after the url expiry time, the original url is no longer valid. The client is then left unable to display the image (from the cache or from the provided url).
The behavior you're looking for simply cannot be provided by conventional caching when the expiry time is in the url.
Since the desired behavior cannot be achieved, you might consider your next best options, each of which will require giving up on one aspect of your requirement. In the order I would consider them:
If you give up short expiry times, you could use longer expiry times and rotate urls. For example, you might set the url expiry to midnight and then serve that same url for all requests that day. Your client will benefit from caching for the day, which is likely better than none at all. Obvious disadvantage is that your urls are valid longer.
If you give up content delivery, you could serve the images from a server which checks for access with each request. Clients will be able to cache the resource for as long as you want, which may be better than content delivery depending on the frequency of cache hits. A variation of this is to trade Amazon CloudFront for another provider, since there may be other content delivery networks which support this behavior (although I don't know of any). The loss of the content delivery network may be a disadvantage or may not matter much depending on your specific visitors.
If you give up the simplicity of a single static HTTP request, you could use client side scripting to determine the request(s) that should be made. For example, in javascript you could attempt to retrieve the resource using the original url (to benefit from caching), and if it fails (due to a cache miss and lapsed expiry) request a new url to use for the resource. A variation of this is to use some caching mechanism other than the browser cache, such as local storage. The disadvantage here is increased complexity and compromised ability for the browser to prefetch.
Save a list of user+image+expiration time -> cloudfront links. If a user has an non-expired cloudfront link use it for an image and don't generate a new one.
It seems you already solved the issue. You said that your server is issuing a redirect http 307 to the cloudfront URL (signed URL) so the browser caches only the cloudfront URL not your URL(www.myserver.com/the_image). So the scenario is as follows :
Client 1 checks www.myserver.com/the_image -> is redirect to CloudFront URL -> content is cached
The CloudFront url now expires.
Client 1 checks again www.myserver.com/the_image -> is redirected to the same CloudFront URL-> retrieves the content from cache without to fetch again the cloudfront content.
Client 2 checks www.myserver.com/the_image -> is redirected to CloudFront URL which denies its accesss because the signature expired.

Resources