Single Page Application with Lambda#Edge - aws-lambda

So I have a SPA served from AWS S3 by AWS Cloudfront. I have configured the following Error Pages behaviour:
404: Not Found -> /index.html with HTTP code 200
This is needed to be able to handle routing on the client-side.
Now I've got a Lambda#Edge function which is triggered by the viewer-response event in Cloudfront and sets some custom headers like HSTS and X-Frame. The function is being invoked and works as intended on all the resources besides the actual /index.html. I'm inclined to think that this is because it's being handled by the aforementioned error pages behaviour in Cloudfront, as the actual GET request for the html is being handled by the error pages configuration in Cloudfront.
What would be a practical approach on solving this?
I'm not sure why the redirect doesn't trigger the lambda function. Is there any way to implement the same logic as the error pages configuration in lambda#edge?

update: The behavior of the service has changed.
https://aws.amazon.com/about-aws/whats-new/2017/12/lambda-at-edge-now-allows-you-to-customize-error-responses-from-your-origin/
The answer, below, was correct at the time it was posted, but is no longer applicable. Origin errors now trigger the Lambda#Edge function as expected in Origin Response triggers (but not Viewer Response triggers).
Note, that you can generate a custom response body in an Origin Response trigger, but you don't have programmatic access to read the original response body returned from the origin, if there is one. You can replace it, or leave it as it is -- whatever it is. This is because Lambda#Edge Origin Response triggers do not wait to fire after CloudFront receives the entire response from the origin -- they appear to fire as soon as the origin finishes returning complete, valid response headers back to CloudFront.
When you’re working with the HTTP response, note that Lambda#Edge does not expose the HTML body that is returned by the origin server to the origin-response trigger. You can generate a static content body by setting it to the desired value, or remove the body inside the function by setting the value to be empty. If you don’t update the body field in your function, the original body returned by the origin server is returned back to viewer.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-updating-http-responses.html
Important reminders: Any time you are testing changes on CloudFront, remember that your changes tend to start working sooner than you would expect -- before the distribution state changes back to Deployed, but you may need to do a cache invalidation to make your changes fully live and visible. Invalidations should include the path actually requested by the browser, not the path being requested from the origin (if different), or /* to invalidate everything. When viewing a response from CloudFront, if there is an Age: response header, you are viewing a cached response. Remember, also, that errors use a different set of timers for caching responses. These are configured separately from the TTL values in the cache behavior. See my answer to Amazon CloudFront Latency for an explanation of how to change the Error Caching Minimum TTL, which defaults to 5 minutes and does not generally respect Cache-Control headers. This is a protective measure to prevent excessive errors from reaching your origin (or triggering your Lambda functions) but is confusing during testing and troubleshooting if you aren't aware of its impact.
(original answer follows)
CloudFront doesn't execute Lambda functions for origin response and viewer response events if the origin returns HTTP status code 400 or higher.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-requirements-limits.html
What this means is that an unhandled error results in no response triggers firing.
However, when an origin error is handled by a custom error response document, the origin triggers do fire on the fallback request, including Origin Response if the error document renders successfully, and here is where you'll find your solution.
You code will run if you implement it as an Origin Response trigger instead of a Viewer Response trigger, because when fetching /index.html (the substitute error page) the origin returns 200, which invokes the Origin Response trigger -- but the Viewer Response trigger still doesn't fire. This behavior does not appear to be fully documented but testing reveals Origin Request and Response triggers firing separately on successful error document fetches, as long as the Cache Behavior with the path that matches the error document is configured with the triggers.
In fact, it seems like an Origin Response trigger makes more sense for your application anyway, because it will be able to modify the response before it goes into the cache, and the added headers will be cached along with the response -- which should result in an overall reduction in the number of times the trigger actually needs to fire.
You can add it as an Origin Response trigger, wait for the distribution to return to Deployed, then do a cache invalidation for /* (so that you don't serve any pages that were cached without the headers added), and after the invalidation is complete, remove the Viewer Response trigger.
Aside: I submitted a feature request to support firing response triggers on errors but I don't know whether this is something they are considering adding, or not and apparently I wasn't the only one, since the feature was implemented and released, as described in the revised answer.

Related

Firefox 68.0 update causes API calls to return 403 after CSP report-uri POST requests

After the most recent Firefox update (68.0), I am having problems with persistent session data.
When a user logs in, as the page loads, there are various expected CSP violations that send a POST request containing the violation report to the path report-uri directive contains.
Subsequent API GET requests to retrieve user data returns a 403 Forbidden, which (by design) redirects the user back to the login page. Since the user is logged in already, same API requests are sent that result in another 403, which leads to an infinite loop until after an arbitrary number of loops API requests return 200 OK.
All requests (both POST and GET) before and after the update are the same.
It seems to me that the fact that there are CSP report POST requests before the API requests changes something related to the session, which is used by the back-end to determine if the user has the correct privileges.
Could Firefox have changed something about the way it handles CSP report-uri requests or their responses change with the update?
What would be a good way to approach this problem?
Firefox has just been updated to version 68.0.1. The update seems to have fixed this problem. Release notes don't seem to be related to this in a way I can make sense, but regardless, the problem is solved.

Embedded google drive api to show a pdf returns 204

My website has an iframe pointing to https://drive.google.com/viewer?url=https://mywebsite/myfile.pdf&embedded=true
Most of the times, the pdf loads correctly, but sometimes it doesn't, I get just a blank page. The request seems to be returning 204 (request successful - response empty).
I could even replicate this, by entering the url above directly on the browser, and refreshing multiple times, until I got a 204, so it is not something on my website and/or the iframe.. any idea why this happens? and how to prevent it.
Thanks in advance :)
The error HTTP Status 204 (No Content) indicates that the server has successfully fulfilled the request and that there is no content to send in the response payload body. The server might want to return updated meta information in the form of entity-headers, which if present SHOULD be applied to current document’s active view if any.
By default, 204 (No Content) response is cacheable. If caching needs
to be overridden then response must include cache respective cache
headers.
In order to solve this issue, the lost update problem, the server may also include HTTP header ETag to let the client validate client side resource representation before making further update on server:
Lost update problem happens when multiple people edit a resource
without knowledge of each other’s changes. In this scenario, the last
person to update a resource “wins”, and previous updates are lost.
ETags can be used in combination with the If-Match header to let the
server decide if a resource should be updated. If ETag does not match
then server informs the client via a 412 (Precondition Failed)
response.
Please check this site for more details.

CloudFront Lambda#Edge HTTPS redirect

I have a CloudFront distribution with a Lambda function attached to the viewer request hook. I'm using this to redirect to the canonical domain (eg. www.foo.tld -> foo.tld). I also have the distribution itself set up to redirect HTTP -> HTTPS.
The problem is that this requires clients to potentially have to do 2 requests to get to the correct URL. For example:
http://www.foo.tld/ -> https://www.foo.tld/ (performed by CloudFront)
https://www.foo.tld/ -> https://foo.tld/ (performed by Lambda function attached to viewer request hook)
I would like to have this done in 1 request:
http://www.foo.tld/ -> https://foo.tld/
It looks like I need to add this functionality to the Request Event, but the documentation seems to indicate the protocol is not exposed to the Lambda function in the request event.
My question is:
How do I expose the protocol to the Lambda function attached to the Viewer Request hook?
Alternately, is there a better way to do this?
Side note: redirects that change both the hostname and the scheme may be problematic, more in the future than now, as browsers become less accepting of HTTP behavior without TLS. I am at a loss, at the moment, to cite a source to back this up, but am under the impression that redirecting directly from http://www.example.com to https://example.com should be avoided. Still, if that's what you want...
CloudFront and Lambda#Edge support this, but only in an Origin Request trigger.
If you whitelist the CloudFront-Forwarded-Proto header in the Cache Behavior settings, you can then access that value like this:
const request = event.Records[0].cf.request; // you may already have this
const scheme = request.headers['cloudfront-forwarded-proto'][0].value;
The value of scheme will either be http or https.
I'm a little bit pedantic, so I like a failsafe. This alternative version will always set scheme to https and avoid the exception that will be thrown if for whatever reason the header is not there. This may or may not suite your taste:
const request = event.Records[0].cf.request; // you may already have this
const scheme = (request.headers['cloudfront-forwarded-proto'] || [{ value: 'https' }])[0].value;
The reason this can only be done in an Origin Request trigger is that CloudFront doesn't actually add this header internally until after the Viewer Request trigger has already fired, if there is one.
But note also that you almost certainly want to do this in an Origin Request trigger -- because responses from these triggers can be cached... which should mean faster responses and lowered costs. Whitelisting the header also adds it to the cache key, meaning that CloudFront will automatically cache separate HTTP and HTTPS responses for any given page, and only replay them for identical requests.
See also https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-requirements-limits.html#lambda-cloudfront-star-headers

Do browsers allow cross-domain requests to be "sent"?

I am newbie to website security and currently trying to understand Same-Origin-Policy in some depth.
While there are very good posts on stackoverflow and elsewhere about the concept of SOP, I could not find updated information on whether chrome and other browsers allow cross-domain XHR post requests to be 'sent' from the first place.
From this 5 year old post, it appears that chrome allows the request to pass through to the requested server but does not allow reading the response by the requester.
I tested that on my website trying to change user info on my server from a different domain. Details below:
My domain: "www.mysite.com"
Attacker domain: "www.attacker.mysite.com"
According to Same-Origin-Policy those two are considered different Origins.
User (while logged in to www.mysite.com) opens www.attacker.mysite.com and presses a button that fires a POST request to 'www.mysite.com' server...The submitted hidden form (without tokens in this case) has all the required information to change the user's info on 'www.mysite.com' server --> Result: CSRF successful attack: The user info does indeed change.
Now do the same but with javascript submitting the form through JQuery .post instead of submitting the form--> Result: Besides chrome giving the normal response:
No 'Access-Control-Allow-Origin' header is present on the requested
resource
, I found that no change is done on the server side...It seems that the request does not even pass through from the browser. The user info does not change at all! While that sounds good, I was expecting the opposite.
According to my understanding and the post linked above, for cross-domain requests, only the server response should be blocked by the browser not sending the post request to the server from the first place.
Also, I do not have any CORS configuration set; no Access-Control-Allow-Origin headers are sent. But even if I had that set, that should apply only on 'reading' the server response not actually sending the request...right?
I thought of preflights, where a request is sent to check if it's allowed on the server or not, and thus blocking the request before sending its actual data to change the user info. However, according to Access_Control_CORS , those preflights are only sent in specific situations which do not apply to my simple AJAX post request (which includes a simple form with enctype by default application/x-www-form-urlencoded and no custom headers are sent).
So is it that chrome has changed its security specs to prevent the post request to a cross domain from the first place?
or am I missing something here in my understanding to the same-origin-policy?
Either way, it would be helpful to know if there is a source for updated security measures implemented in different web browsers.
The XMLHttpRequest object behavior has been revisited with time.
The first AJAX request were unconstrained.
When SOP was introduced the XMLHttpRequest was updated to restrict every cross-origin request
If the origin of url is not same origin with the XMLHttpRequest origin the user agent should raise a SECURITY_ERR exception and terminate these steps.
From XMLHttpRequest Level 1, open method
The idea was that an AJAX request that couldn't read the response was useless and probably malicious, so they were forbidden.
So in general a cross-origin AJAX call would never make it to the server.
This API is now called XMLHttpRequest Level 1.
It turned out that SOP was in general too strict, before CORS was developed, Microsoft started to supply (and tried to standardize) a new XMLHttpRequest2 API that would allow only some specific requests, stripped by any cookie and most headers.
The standardization failed and was merged back into the XMLHttpRequest API after the advent of CORS. The behavior of Microsoft API was mostly retained but more complex (read: potentially dangerous) requests were allowed upon specific allowance from the server (through the use of pre-flights).
A POST request with non simple headers or Content-Type is considered complex, so it requires a pre-flight.
Pre-flights are done with the OPTIONS method and doesn't contain any form information, as such no updates on the server are done.
When the pre-flight fails, the user-agent (the browser) terminate the AJAX request, preserving the XMLHttpRequest Level 1 behavior.
So in short: For XMLHttpRequest the SOP was stronger, deny any cross-origin operations despite the goals stated by the SOP principles. This was possible because at the time that didn't break anything.
CORS loosened the policy allowing "non harmful" requests by default and allowing the negotiation of the others.
OK...I got it...It's neither a new policy in chrome nor missing something in SOP...
The session cookies of "www.mysite.com" were set to "HttpOnly" which means, as mentioned here, that they won't be sent along with AJAX requests, and thus the server won't change the user's details in point (4).
Once I added xhrFields: { withCredentials:true } to my post request, I was able to change the user's information in a cross-domain XHR POST call as expected.
Although this proves the already known fact that the browser actually sends the cross-domain post requests to the server and only blocks the server response, it might still be helpful to those trying to deepen their understanding to SOP and/or playing with CORS.

How to invalidate the cache of an arbitrary URL?

According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10 clients must invalidate the cache associated with a URL after a POST, PUT, or DELETE request.
Is it possible to instruct a web browser to invalidate the cache of an arbitrary URL, without making an HTTP request to it?
For example:
PUT /companies/Nintendo creates a new company called "Nintendo"
GET /companies lists all companies
Every time I create a new company, I want to invalidate the cache associated with GET /companies. The browser doesn't do this automatically because the two operate on different URLs.
Is the Cache-Control mechanism inappropriate for this situation? Should I use no-cache along with ETag instead? What is the best-practice for this situation?
I know I can pass no-cache the next time I GET /companies but that requires the application to keep track URL invalidation instead of pushing the responsibility to the browser. Meaning, I want to invalidate the URL after step 1 as opposed to having to persist this information and applying it at step 2. Any ideas?
Yes, you can (within the same domain). From this answer (slightly paraphrased):
In response to a PUT or POST request, if the Content-Location header URI is different from the request URI, then the cache for the Content-Location URI is invalidated.
So in your case, include a Content-Location: /companies header in response to your POST request. This will invalidate the browser's cached version of /companies.
Note that this does not work for GET requests.
No, in HTTP/1.1 you may only invalidate a client's cache for a resource in a response to a request for that resource. It may be in response to a PUT, POST or DELETE rather than a GET (see RFC 7234, section 4.4 for details).
If you have a resource where you need clients to confirm that they have the latest version then no-cache and an entity tag is an ideal solution.
HTTP/2 allows for pushing a cache clear (Nine Things to Expect from HTTP/2 4. Cache Pushing).
In the link which you have given "the phrase "invalidate an entity" means that the cache will either remove all instances of that entity from its storage, or will mark these as "invalid" and in need of a mandatory revalidation before they can be returned in response to a subsequent request.". Now the question is where are the caches? I believe the Cache the article is talking about is the server cache.
I have worked on a project in VC++ where whenever a model changes the cache is updated. There is a programming logic implemention involved to achieve this. Your mentioned article rightly says "There is no way for the HTTP protocol to guarantee that all such cache entries are marked invalid" HTTP Protocol cannot invalidate cache on its own.
In our project example we used publish subscribe mechanism. Wheneven an Object of class A is updated/inserted it is published to a bus. The controllers register to listen to objects on the Bus. Suppose A Controller is interested in Object A changes, it will not be called back whenever Object Type B is changed and published. When Object Type A indeed is changed and published then Controller A Listener function updates the Cache with latest changes of Object A. The subsequent request of GET /companies will get the latest from the cache. Now there is a time gap between changing the object A and the Cache being refreshed with the latest changes. To avoid something wrong happening in this time gap Object is marked dirty before the Object A Changes. So a request coming inbetween of these times will wait for dirty flag being cleared.
There is also a browser cache. I remember ETAGS are used to validate this. ETAG is the checksum of the resource. For this Client should maintain old ETAG value somehow. If the checksum of resource has changed then the new resource with HTTP 200 is sent else HTTP 304 (use local copy) is sent.
[Update]
PUT /companies/Nintendo
GET /companies
are two different resources. Your the cache for /companies/Nintendo is only expected to be updated and not /companies (I am talking of client side cache) when PUT /companies/Nintendo request is executed. Suppose you call GET /companies/Nintendo next time, based on http headers the response is returned. GET /companies is a brand new request as it points to different resource.
Now question is what should be the http headers? It is purely application specific. Suppose it is stock quote I would not cache. Suppose it is NEWS item I would cache for certain time. Your reference link http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html has all the details of Cache http headers. Only thing not mentioned much is ETag usage. ETag can have checksum of resource. Check http://en.wikipedia.org/wiki/HTTP_ETag and also check https://devcenter.heroku.com/articles/increasing-application-performance-with-http-cache-headers

Resources