CloudFront strange behavior

CloudFront strange behavior - caching

I spent a few hours on a AWS CloudFront problem, hope someone can save me :D
When i load the site homepage, for every assets i get (on each reload):
X-Cache:Miss from cloudfront
When i try to open an asset in another tab, i then got the correct behavior (miss on first call, then hit on each reload). Same with curl requests.
I reload my home and inspect the same asset, i get:
X-Cache:Hit from cloudfront
with a similar age as in the tab.
Then i reload my homepage and inspect the same element again:
X-Cache:Miss from cloudfront
:'(
And from this point, when i reload the asset in the other tab, i get a miss...
Any idea ?
Thanks,
Jérémy

After many tests on Apache configuration and CloudFront configuration, i found that cookies where causing this behavior.
Some cookies used by tracking systems change their values on each call and then cloudfront do not cache the request.
To solve it, create a custom behavior for each type of path (backend calls, assets, etc.) and create a whitelist of cookies that you really need to forward in it.
i hope this will help someone.
Thanks #AlexZakharenko and #BruceP for your replies ;)

Related

Custom domain redirecting to index.html in firebase

So I just linked my custom domain with firebase and it shows connected:
image of connected status
which is great. But now when I search website without /index.html, it redirects me to this page. I want to see this page which is accessible only when I append website domain with /index.html. I am new to firebase. How can I make my domain access index.html page without specifically mentioning /index.html?
EDIT: I just noticed that it's working fine on the mobile devices and in the incognito tab on PC. It must be something with my chrome browser I am logged in with. which is weird :/ should I change the title? Cause I think fault could be related to browser. but help me if you can.

So the real issue wasn't the configuration but the browser cache. if you are facing similar issues then try clearing the browser cache or try browsing the website on a different device. Spent literally 1-2 hrs on such a silly problem. Either way, thank you.

Long term caching with webpack chunkhash and cache control

I am using webpack to bundle all of my files. Inside webpack I use chunkhash, Md5Hash and manifest to produce unique chunkhash for each of my files that get download by browser. It would look like this.
styles.3840duiel348fdjh385idfki.css
bundle.488503289dfksdlkor93lfui.js
vendor.sdkkfuuewkf892377rfkjdle.js
image.dkkdiiue9984ujjkfld003kfpp.png
This means that browser can cache them and if the hash is not changed it will use the cached version.At the same time I can for example just change styles and only that hash will be changed so when I deploy the app browser will download only the new styles.
The problem is that on my server i use this:
Cache-Control: public max-age=31536000
This represents aggressive caching and the browser will use cached version for one year unless URL, file name is changed. When I for example change styles my hash is changed and browser should request the new styles from server. That is according to this article(pattern 1) and a few more that I found https://jakearchibald.com/2016/caching-best-practices/
My problem is that when I update something, example styles, hash for styles is changed and I deploy that. The browser will not request new styles unles I hit reload while on my page. It will server the cached files. How can I fix this?
I can use Cache-Control: no-cache but that is not a solution because than browser has to check with server every time if he can use cached version. That is 4 http requests that are not needed every time someone visits the page.

The way I solved this is by adding one more number (which is Date.now()) into my file names as below.
filename: [name].${Date.now().valueOf()}.[chunkhash].js
This works pretty reliably for foreseeable time. The only drawback I see with this method is that: With each release, this forces all the bundles to be refreshed.
However, There are cases when chunkhash is rendered to create problem is when the modules do not change but their order changes (and hence the module id). The module id and order is not part of chunkhash! Please refer: https://github.com/webpack/webpack/issues/1856
Other alternatives is to use named module (which I believe will have some performance impact) besides may probably involve naming my modules which I am not sure of.

The browser is only instructed to load your assets when your page loads... That is just the way it works. You can poll or use any kind of push to the browser to detect backend changes and then force your users to refresh their browser.
This is not a caching issue, not a browser issue... just the fact that as soon as you have info it might already be outdated. You'll have to live with it or create a workaround.
See it as a calendar next to the coffee machine at your company which has an event 'party' this saturday. You see it and run to your coworkers to tell them the great news. Meanwhile the person who wrote the event on the calendar realized he has made a mistake and changes the event to next saturday. You don't have this new information so you will provide your coworkers with wrong information as long as you do not go for another coffee. The only way to know about the changes is if someone would notify you sooner than you go for another coffee... for example the person who wrote the event on the calendar sends an email to everyone to say sorry for his mistake and that the party is scheduled next saturday.

I did it like this. I use different webpack config for my client and server bundle. For my client bundle that has bundle.js, vendor.js, styles.css... I use chunkhash and Cache-Control: public max-age=31536000 as described in my question. For my server bundle that serves html(ejs) I use Cache-Control: no-cache. This works good because browser will contact server on every reload but that will be only one http request. If nothing is changed browser uses the cached version of all the assets since chunkhash in html didn't changed.
If for example I update styles and chunkhash is changed browser will see that when in contacts the server on reload and only new styles will be downloaded.

CloudFlare permanently caches page even though cookies are set

I had this problem where CloudFlare wouldn't cache any of my pages because all of my pages returned session cookies. I've fixed this with a own written method, which removes unnecessary cookies from my response header. It's based on the same idea used and described here https://github.com/HaiFangHui/sessionmonster.
Currenlty i'm having this situation which is driving me bananas and i was hoping someone could help me out a little bit with your expertise about this subject.
I'm having this problem that after i login within my site after CloudFlare had it's chance to cache the page in a previous request... It will do that permanently untill the Edge TTL time expires.
Since the official CloudFlare documentation states it will not cache a page if it contains cookies i was hoping that after a succesfull login attempt it will serve a live/personalized version of the page. But so it seems that is not the case.
Does somebody know if this is normal? Of course i'm interested in knowing a way to circumvent this. I'm wondering how other sites solves this problem. My assumption would be i wouldn't be the first one having this issue.
Any advice regarding this subject would be greatly appreciated.

So it seems like supplying a "Bypass cache" setting is the solution.
It's only available on a paid plan.
More info: https://blog.cloudflare.com/caching-anonymous-page-views/

How do I know if my page is being cached?

I have a WordPress site that is doing a few weird things, and I believe it is because it is being cached. I changed the contents of a CSS stylesheet file, and the change took around 10 minutes before it appeared live.
I can't however find any caching mechanism setup. I've looked through cPanel and can't see anything setup there. The IP of the site resolves to the IP that cPanel is showing.
I've looked for plugins in WordPress and can't see any caching plugins (although if it was a caching plugin, would accessing a stylesheet be cached?).
Any tips on how I can see if the page is being cached on the server or by a plugin?

Put a JavaScript bug on the page which crafts a random URL and requests it. Compare the number of page requests to random URL requests. But there are lots of scenarios where a browser can cache a page in the absence of caching information.

If your website is behind Cloud Flare network or such, this is normal behavior.
Try running next command (Windows Command prompt/Linux terminal):
ping www.yoursite.com
and visit resolved IP address in browser - this may tell you if you are behind caching network.
Take a look at this article: http://www.mobify.com/blog/beginners-guide-to-http-cache-headers/

Deeplinks with Backbone frontend on S3

I have a one page javascript(Backbone) frontend running on S3 and I'd like to have a couple of deeplinks to be redirected to the same index file. You'd normally do this with mod_rewrite in Apache but there is no way to do this in S3.
I have tried setting the default error document to be the same as the index document, and
this works on the surface, but if you check the actual response status header you'll see the page comes back as a 404. This is obviously not good.
There is another solution, its ugly but better than the error document hack:
It turns out that you can create a copy of index.html and name it simply the same as the subdirectory(minus the trailing slash), so for example if I clone index.html and name it 'about', and make sure the Content-Type is set to text/html (in the metadata tab) all requests to /about will return the new 'about' which is a copy of index.html.
Obviously this solution is sub-optimal and only works with predefined deeplink targets, but the hassle could be lessened if the step to clone index.html was part of a build process for the frontend. Using Backbone-Boilerplate I could write a grunt task to do just that.
Other than these 2 hacky workarounds I dont see a way of doing this other than resorting to hashbangs..
Any suggestions will be greatly appreciated.
UPDATE:
S3 now (for a while actually) supports Index Documents which solves this problem.
Also if you use Route 53 for your DNS management you can set up an alias record pointing to your S3 bucket, so you dont need a subdomain+cname anymore :)

Unfortunately as far as I know (and I use s3 websites quite a bit) you're right on the money. The 404 hack is a really bad idea as you said, and so you basically have these options:
Use a regular backend of some kind and not S3
The Content-Type work-around
Hashbangs
Sorry to be the bearer of bad news :)
For me, the fact that you can't really direct the root of the domain to S3 websites was the deal breaker for some of my stuff. mod_rewrite-type scenarios sounds like another good example where it just doesn't work.

Did you try redirecting to hash? I am not sure if this S3 feature was available when you asked this question, but I was able to fix the problem using these redirection rules in static web hosting section of folder's properties.
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>topic/</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyPrefixWith>#topic/</ReplaceKeyPrefixWith>
</Redirect>
</RoutingRule>
</RoutingRules>
The rest is handled in Backbone.js application.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio