Cache miss at a CDN edge server

Cache miss at a CDN edge server - caching

Upon cache miss at a CDN edge server:
the server might redirect the end-user to another CDN server that has the content or
it might try to download the solicited content from the producer, cache it, and then respond to the end-user.
Now, if this understanding is right then how the second solution works for very big files (like a movie). I just do not understand how the edge server can put the end-user on hold for several minutes to download the content from the producer and then sends it to the end-user!!

The CDN won't delay the user more than the origin (producer) would.
In the case of streaming a movie, with or without a CDN between you and the origin, you don't download the entire movie and then play it (neither does the CDN), you download the movie in chunks while playing the previously downloaded chunk (and the CDN does the same, and caches the chunks).
btw, a CDN can actually accelerate downloading big files or dynamic content, even if it doesn't hold it in cache, by accelerating TCP, and other cool techniques.

Related

CDN-server with http/1.1 vs. webserver with http/2

I have a hosted webserver with http/2 (medium fast) and additionally I have a space on a fast CDN-Server with only http/1.1.
Is it recommended to load some ressources from the CDN or should I use only the webserver because of http/2?
Loading too many recources from the CDN could be a bottleneck due to http/1.1?
Would be kind to get some hints...

You need to test. It really depends on your app, your users and your servers.
Under HTTP/1.1 you are limited to 6 connections to a domain. So hosting content on a separate domain (e.g. static.example.com) or loading from a CDN was a way to increase that limit beyond 6. These separate domains are also often cookie-less as they are on separate domains which is good for performance and security. And finally if loading jQuery from code.jquery.com then you might benefit from the user already having downloaded it for another site so save that download completely (though with the number of versions of libraries and CDNs the chance of having a commonly used library already downloaded and in the browser cache is questionable in my opinion).
However separate domains requires setting up a separate connection. Which means a DNS lookup, a TCP connection and usually an HTTPS handshake too. This all takes time and especially if downloading just one asset (e.g. jQuery) then those can often eat up any benefits from having the assets hosted on a separate site! This is in fact why browsers limit the connections to 6 - there was a diminishing rate of return in increasing it beyond that. I've questioned the value of sharded domains for a while because of this and people shouldn't just assume that they will be faster.
HTTP/2 aims to solve the need for separate domains (aka sharded domains) by removing the need for separate connections by allowing multiplexing, thereby effectively removing the limit of 6 "connections", but without the downsides of separate connections. They also allow HTTP header compression, reducing the performance downside to sending large cookies back and forth.
So in that sense I would recommended just serving everything from your local server. Not everyone will be on HTTP/2 of course but the support is incredible strong so most users should.
However, the other benefit of a CDN is that they are usually globally distributed. So a user on the other side of the world can connect to a local CDN server, rather than come all the way back to your server. This helps with connection time (as TCP handshake and HTTPS handshake is based on shorter distances) and content can also be cached there. Though if the CDN has to refer back to the origin server for a lot of content then there is still a lag (though the benefits for the TCP and HTTPS setup are still there).
So in that sense I would advise to use a CDN. However I would say put all the content through this CDN rather than just some of it as you are suggesting, but you are right HTTP/1.1 could limit the usefulness of that. That's weird those as most commercial CDNs support HTTP/2, and you also say you have a "CDN server" (rather than a network of servers - plural) so maybe you mean a static domain, rather than a true CDN?
Either way it all comes down to testing as, as stated at the beginning of this answer it really depends on your app, your users and your servers and there is no one true, definite answer here.
Hopefully that gives you some idea of the things to consider. If you want to know more, because Stack Overflow really isn't the place for some of this and this answer is already long enough, then I've just written a book which spends large parts discussing all this: https://www.manning.com/books/http2-in-action

Implementing "extreme" bandwidth saving for web browsing with a compression proxy

I have a network connection where I pay per megabyte, so I'm interested in reducing my bandwidth usage as far as possible while still having a reasonably good browsing experience. I use this wonderful extension (https://bandwidth-hero.com/). This extension runs a image-compression proxy on my heroku account that accepts images URLs, and returns a low-quality version of those images.This reduces bandwidth usage by 30-40% when images are loaded.
To further reduce usage, I typically browse with both JavaScript and images disabled (there are various extensions for doing this in firefox/firefox-esr/google-chrome). This has an added bonus of blocking most ads (since they usually need JavaScript to run).
For daily browsing, the most efficient solution is using a text-mode browser in a virtual console such as elinks/lynx/links2 running over ssh (with zlib compression) on a VPS server. But sometimes using JavaScript becomes necessary, as sites will not render without it .Elinks is the only text-mode browser that even tries to support JavaScript, and even that support is quite rudimentary. When I have to come back to using firefox/chrome, I find my bandwidth usage shooting up. I would like to avoid this.
I find that bandwidth is used partially to get the 'raw' html files of the sites I'm browsing, but more often for the associated .js/.css files. These are typically highly compressible. On my local workstation, html+css+javascript files typically compress by a factor of more than 10x when using lzma(2) compression.
It seems to me that one way to do drastically reduce bandwidth consumption would be to use the same template as the bandwidth-hero extension, i.e. run a compression proxy either on a vps or on my heroku account but do so for text content (.html/.js/.css).
Ideally, I would like to run a compression proxy on my local machine. When I open a site (say www.stackoverflow.com), the browser should send a request to this local proxy. This local proxy then sends a request to a back-end running on heroku/vps. The heroku/vps back-end actually fetches all the content, and compresses it (lzma/bzip/gzip). The compressed content is sent back to my local proxy. The local proxy decompresses the content and finally gives it to the browser.
There is something like this mentioned in this answer (https://stackoverflow.com/a/42505732/10690958) for node.js . I am thinking of the same for python.
From what google searches show, HTTP can "automatically" ask for gzip versions of pages. But does this also apply for the associated files that are loaded by JavaScript, and for the css files? Perhaps, what I am thinking about is already implemented by default ?
Any pointers would be welcome. I was thinking of writing a local proxy in python,as I am reasonably fluent in it. But I know little about heroku or the intricacies of HTTP.
thanks.
Update: I found a possible solution here https://github.com/barnacs/compy
which does almost exactly what I need (minify+compress with brotli/gzip+transcode jpeg/gif/png). It uses go instead of python, but that does not really matter. It also has a docker image here https://hub.docker.com/r/andrewgaul/compy/ . Since I'm not very familiar with heroku, I cant figure out how to use this to run the compression proxy service on my account. The heroku docs also weren't of much help to me. Any pointers would be welcome.

Do websites share cached files?

We're currently doing optimizations to our web project when our lead told us to push the use of CDNs for external libraries as opposed to including them into a compile+compress process and shipping them off a cache-enabled nginx setup.
His assumption is that if the user has visits example.com which uses a CDN'ed version of jQuery, the jQuery is cached that time. If the user happens to visit example2.com and happen to use the same CDN'ed jQuery, the jQuery will be loaded from cache instead of over the network.
So my question is: Do domains actually share their cache?
I argued that even if it is possible the browser does share cache, the problem is that we are running on the assumption that the previous sites use the same exact CDN'ed file from the same exact CDN. What are the chances of running into a user browsing through a site using the same CDN'ed file? He said to use the largest CDN to increase chances.
So the follow-up question would be: If the browser does share cache, is it worth the hassle to optimize based on his assumption?
I have looked up topics about CDNs and I have found nothing about this "shared domain cache" or CDNs being used this way.

Well your lead is right this is basic HTTP.
All you are doing is indicating to the client where it can find the file.
The client then handles sending a request to the CDN in compliance with their caching rules.
But you shouldn't over-use CDNs for libraries either, keep in mind that if you need a specific version of the library, especially older ones, you won't be likely to get much cache hits because of version fragmentation.
For widely used and heavy libraries like jQuery you want the latest version of it is recommended.
If you can take them all from the same CDN all the better (ie: Google's) especially as http2 is coming.
Additionally they save you bandwidth, which can amount to a lot when you have high loads of traffic, and can reduce the load time for users far from your server (Google's is great for this).

Magento - Storing all images on cdn not my hosting server

I am new to magento and i have just starting setting up my first sites. One of the requirements i am after is to store all images files on a seperate server from which the site is hosted on. I have briefly looked into amazon cloudfront and the following plugin:
http://www.magentocommerce.com/magento-connect/cloudfront-cdn.html
This works alongside my cloudfront distribution setup so the images are being accessed from the cdn alongside the js,css etc when i check the source. My issue is they still reside on my own server too.
Is there a way to have everything just on a cdn so that my server disk space can be kept as low as possible with only the template files on there, no images?

Based on experience, I would really recommend you do not try completely removing any media files away from your actual server disk. The main role of CDN should just be to completely mirror these whenever files are new or updated and such.
However, if you really want to do this I would also sternly warn you that you do not attempt this with js and css files. The trouble is just not worth it. You'll see why later.
So we're left with media or mostly image files. These files are usually very large thus the reasoning behind moving it away from server disk.
My strategy and what I did before is I used an S3 bucket behind the Cloudfront CDN. I moved all stuff from the media directory to S3, Cloudfront configured to pull from S3, then Cloudfront CDN then is CNAME'd as media.mydomain.com. Obviously, I would then set my media base URLs (System > Configuration > General > Web) to http://media.mydomain.com/media/ and https://media.mydomain.com/media/.
It worked perfectly. No CORS issues at all because I did not touch CSS/JS base url paths. Because for those files, I just relied on the free Cloudflare CDN (yeah, yeah, I know).
Next thing I knew and saw defects with this setup is that all uploads do not work at all. WSIWYG uploads do not go to the S3 bucket immediately.. However, there was a solution using s3fuse though which then immediately degraded into a problem as it had bad memory leaks too.
What ultimately worked is we just paid for the additional disk space (we were using Amazon AWS), Wrapped the whole domain on Cloudflare CDN, and when we needed SSL we upgraded to Pro.
Simple, it works and it's head-ache free.
NB: I'm not connected with Cloudflare whatsoever, I'm just really really happy with their service.

Can Azure CDN propogate changes to all nodes with just one "miss"?

I'm using an Azure CDN endpoint on a hosted service (meaning, not a Blob Storage CDN endpoint).
The service is lazy rendering images, and once they are rendered, they are practically static (I can safely use Cache-Control:public, max-age=31536000).
In the naive implementation, there will be up to** X misses (X times the service will render an image) - Where X is the number of CDN nodes around the world.
There are two workarounds, as I see it:
The lazy created images are stored in Blob Storage, and later pulled from there.
Implement a cache in the Cloud Service.
Is there a way to propagate files to all nodes? Is there a better solution then having two caching layers (Cloud Service Cache / Blob Storage + CDN)?
** "Up to", depending on the geographical location of web requests. In my case, all around the world.

There is currently not a way for you to push something to one of the remote CDN nodes. This is a feature that many folks have asked Microsoft for in the CDN product.
Both workarounds would work. The first one benefits from not having to recreate them for the other CDN notes at all and will reduce the load on the server since it wouldn't be feeding these up after it renders them the first time. However, if it turns into you needing to get the request at the server anyway and then redirect to a version already in BLOB storage then you could easily just return the cached image as well. I think it depends on how many images you are talking about. If you have a LOT of them I'd lean more toward the first option.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio