How to get host of current URL if using CDN? - ruby

I am using various domain aliases through a CDN and I would like to return slightly different results depending on the domain alias. Is there a way to identify the domain of the current URL in Ruby?
For example, the origin URL of my website is non-cdn.herokuapp.com. There is a CDN which is caching that origin URL at 123.cloudfront.net. There are 2 custom domains which are using CNAMEs to point to that CDN URL, cdn-url1.com and cdn-url2.com. When someone visits cdn-url1.com, request.host returns non-cdn.herokuapp.com rather than cdn-url1.com.
I know that I can return the true domain via Javascript, but can I determine it in Ruby?

Amazon Cloudfront is not forwarding the host header to the origin. Here is how to fix it:
Note: This resolution applies to origins other than an Amazon Simple
Storage Service (Amazon S3) bucket. If you're using an Amazon S3
origin, avoid whitelisting the host header. For more information, see
Selecting the Headers to Base Caching On.
Open the Amazon CloudFront console, and then choose your distribution.
Choose the Behaviors view, and then choose the path you are using.
Choose Edit.
For Cache Based on Selected Request Headers, choose Whitelist.
Under Whitelist Headers, choose Host from the column on the left, and then choose Add.
Choose Yes, Edit.
https://aws.amazon.com/premiumsupport/knowledge-center/configure-cloudfront-to-forward-headers/

Related

CloudFront distribution with EC2 origin (no ALB)

I set up CF distribution with EC2 origin for dynamic content.
When I use the CF URL, it shows me the website, and it's secure, and everything looks okay, which means Get request is working, but I click on some links similar save or next then I got a 403 error.
EC2 origin need to set as HTTP not https

Get CloudFront custom domain in the headers of a request

I have a CloudFront distribution abcd1234.cloudfront.net and I've added a custom domain of mysite.com to the distribution.
The CloudFront distribution's origin: aaabbbccc.execute-api.us-east-1.amazonaws.com
When I load the page mysite.com/hello/world, CloudFront is then consuming API Gateway aaabbbccc.execute-api.us-east-1.amazonaws.com/prod/{proxy+}
the API Gateway path endpoint is invoking a Lambda Function that calls a function like getPageContent(customDomainName, pagePath) which should be mysite.com and /hello/world respectively.
However, inside that function, the Host header that eventually makes it into the function's event.headers.Host value is never the custom domain. Instead, the Host header is always aaabbbccc.execute-api.us-east-1.amazonaws.com.
I want headers.Host to equal mysite.com (or another header to show that the request comes from mysite.com, but no matter what I do, the Host value is always just the origin url.
Edit: I tried whitelisting Host and it caused the site to break completely, with the error about not being able to reach the CloudFront distribution.

How to validate that a certain domain is reachable from browser?

Our single page app embeds videos from Youtube for the end-users consumption. Everything works great if the user does have access to the Youtube domain and to the content of that domain's pages.
We however frequently run into users whose access to Youtube is blocked by a web filter box on their network, such as https://us.smoothwall.com/web-filtering/ . The challenge here is that the filter doesn't actually kill the request, it simply returns another page instead with a HTTP status 200. The page usually says something along the lines of "hey, sorry, this content is blocked".
One option is to try to fetch https://www.youtube.com/favicon.ico to prove that the domain is reachable. The issue is that these filters usually involve a custom SSL certificate to allow them to inspect the HTTP content (see: https://us.smoothwall.com/ssl-filtering-white-paper/), so I can't rely TLS catching the content being swapped for me with the incorrect certificate, and I will instead receive a perfectly valid favicon.ico file, except from a different site. There's also the whole CORS issue of issuing an XHR from our domain against youtube.com's domain, which means if I want to get that favicon.ico I have to do it JSONP-style. However even by using a plain old <img> I can't test the contents of the image because of CORS, see Get image data in JavaScript? , so I'm stuck with that approach.
Are there any proven and reliable ways of dealing with this situation and testing browser-level reachability towards a specific domain?
Cheers.
In general, web proxies that want to play nicely typically annotate the HTTP conversation with additional response headers that can be detected.
So one approach to building a man-in-the-middle detector may be to inspect those response headers and compare the results from when behind the MITM, and when not.
Many public websites will display the headers for a arbitrary request; redbot is one.
So perhaps you could ask the party whose content is being modified to visit a url like: youtube favicon via redbot.
Once you gather enough samples, you could heuristically build a detector.
Also, some CDNs (eg, Akamai) will allow customers to visit a URL from remote proxy locations in their network. That might give better coverage, although they are unlikely to be behind a blocking firewall.

Using AWS Route 53 http redirect working, https times out

Using the routing rules as mentioned here: Set up DNS based URL forwarding in Amazon Route53
<RoutingRules>
<RoutingRule>
<Redirect>
<Protocol>https</Protocol>
<HostName>dota2.becomethegamer.com</HostName>
<HttpRedirectCode>301</HttpRedirectCode>
</Redirect>
</RoutingRule>
</RoutingRules>
I am able to see that http://becomethegamer.com properly redirect to https://dota2.becomethegamer.com but https://becomethegamer.com times out.
I thought it was the Protocol piece but realized that's the outbound rather than inbound.
This is in a bucked named becomethegamer.com and in Route 53 becomethegamer.com is an alias with the target as that bucket.
What could be causing https to not redirect?
No, it's this:
The website endpoints do not support https.
http://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteEndpoints.html
You can't redirect an https request without speaking https, and additionally, you need an SSL certificate that's valid for the hostname.
You can still do exactly what you're trying to do, but you'll need to use CloudFront in front and S3 in the back. Your S3 redirection configuration stays the same, but you'll create a CloudFront distribution, configure your domain name as an alternative domain name there, load your SSL cert into CloudFront, use the bucket-name.s3-website-xx-xxxx-xx.amazonaws.com web site endpoint (from the S3 console) as the Origin server, and point Route 53 to CloudFront instead of S3.
http://docs.aws.amazon.com/gettingstarted/latest/swh/getting-started-create-cfdist.html

Amazon Cloudfront: private content but maximise local browser caching

For JPEG image delivery in my web app, I am considering using Amazon S3 (or Amazon Cloudfront
if it turns out to be the better option) but have two, possibly opposing,
requirements:
The images are private content; I want to use signed URLs with short expiration times.
The images are large; I want them cached long-term by the users' browser.
The approach I'm thinking is:
User requests www.myserver.com/the_image
Logic on my server determines the user is allowed to view the image. If they are allowed...
Redirect the browser (is HTTP 307 best ?) to a signed Cloudfront URL
Signed Cloudfront URL expires in 60 seconds but its response includes "Cache-Control max-age=31536000, private"
The problem I forsee is that the next time the page loads, the browser will be looking for
www.myserver.com/the_image but its cache will be for the signed Cloudfront URL. My server
will return a different signed Cloudfront URL the second time, due to very short
expiration times, so the browser won't know it can use its cache.
Is there a way round this without having my webserver proxy the image from Cloudfront (which obviously negates all the
benefits of using Cloudfront)?
Wondering if there may be something I could do with etag and HTTP 304 but can't quite join the dots...
To summarize, you have private images you'd like to serve through Amazon Cloudfront via signed urls with a very short expiration. However, while access by a particular url may be time limited, it is desirable that the client serve the image from cache on subsequent requests even after the url expiration.
Regardless of how the client arrives at the cloudfront url (directly or via some server redirect), the client cache of the image will only be associated with the particular url that was used to request the image (and not any other url).
For example, suppose your signed url is the following (expiry timestamp shortened for example purposes):
http://[domain].cloudfront.net/image.jpg?Expires=1000&Signature=[Signature]
If you'd like the client to benefit from caching, you have to send it to the same url. You cannot, for example, direct the client to the following url and expect the client to use a cached response from the first url:
http://[domain].cloudfront.net/image.jpg?Expires=5000&Signature=[Signature]
There are currently no cache control mechanisms to get around this, including ETag, Vary, etc. The nature of client caching on the web is that a resource in cache is associated with a url, and the purpose of the other mechanisms is to help the client determine when its cached version of a resource identified by a particular url is still fresh.
You're therefore stuck in a situation where, to benefit from a cached response, you have to send the client to the same url as the first request. There are potential ways to accomplish this (cookies, local storage, server scripting, etc.), and let's suppose that you have implemented one.
You next have to consider that caching is only just a suggestion and even then it isn't a guarantee. If you expect the client to have the image cached and serve it the original url to benefit from that caching, you run the risk of a cache miss. In the case of a cache miss after the url expiry time, the original url is no longer valid. The client is then left unable to display the image (from the cache or from the provided url).
The behavior you're looking for simply cannot be provided by conventional caching when the expiry time is in the url.
Since the desired behavior cannot be achieved, you might consider your next best options, each of which will require giving up on one aspect of your requirement. In the order I would consider them:
If you give up short expiry times, you could use longer expiry times and rotate urls. For example, you might set the url expiry to midnight and then serve that same url for all requests that day. Your client will benefit from caching for the day, which is likely better than none at all. Obvious disadvantage is that your urls are valid longer.
If you give up content delivery, you could serve the images from a server which checks for access with each request. Clients will be able to cache the resource for as long as you want, which may be better than content delivery depending on the frequency of cache hits. A variation of this is to trade Amazon CloudFront for another provider, since there may be other content delivery networks which support this behavior (although I don't know of any). The loss of the content delivery network may be a disadvantage or may not matter much depending on your specific visitors.
If you give up the simplicity of a single static HTTP request, you could use client side scripting to determine the request(s) that should be made. For example, in javascript you could attempt to retrieve the resource using the original url (to benefit from caching), and if it fails (due to a cache miss and lapsed expiry) request a new url to use for the resource. A variation of this is to use some caching mechanism other than the browser cache, such as local storage. The disadvantage here is increased complexity and compromised ability for the browser to prefetch.
Save a list of user+image+expiration time -> cloudfront links. If a user has an non-expired cloudfront link use it for an image and don't generate a new one.
It seems you already solved the issue. You said that your server is issuing a redirect http 307 to the cloudfront URL (signed URL) so the browser caches only the cloudfront URL not your URL(www.myserver.com/the_image). So the scenario is as follows :
Client 1 checks www.myserver.com/the_image -> is redirect to CloudFront URL -> content is cached
The CloudFront url now expires.
Client 1 checks again www.myserver.com/the_image -> is redirected to the same CloudFront URL-> retrieves the content from cache without to fetch again the cloudfront content.
Client 2 checks www.myserver.com/the_image -> is redirected to CloudFront URL which denies its accesss because the signature expired.

Resources