How to force CloudFront cache in all locations - caching

I know that CloudFront cache data in edge locations after first "miss" but is there a way to avoid that first miss by forcefully caching my content to all edge servers..?
Even if not an official AWS solution, any nice work-around would do. My current strategy is usually browsing content through VPN, but it's not very easy.

Related

GCP CDN cache-miss granularity

I'm trying to make an accurate estimate of CDN usage on Google Cloud Platform but am not sure about the fill costs.
Fill costs are incurred on a cache miss and the data is gotten from origin or another cache. What's not specifically mentioned is how granular a "cache" miss is. That is - is it a cache miss for the region? zone? POP? node?
With an international distribution this could be make a huge difference in estimation.
According to the official documentation, cache fill charges vary based on source and destination. Source is the region of the origin server or, in the case of cache-to-cache cache fill, the region of the source cache. Destination is a geographic area determined by client IP address.
link
I asked Google support directly on this one and got back that cache fills occur in each "cache site." Or as they put it:
Cache fill is counted for each caching sites since cache fill occured between one cache location to another cache location.
The updated list of cache sites/locations is in their documentation.
At the time of writing that means a hypothetical max of 81 cache fills for a given result (not including expiring or being pushed out of the cache and re-filling etc.) - presuming your content is requested from each of these location as the cache is only filled when requested.

System Design: Global Caching and consistency

Lets take an example of Twitter. There is a huge cache which gets updated frequently. For example: if person Foo tweets and it has followers all across the globe. Ideally all the caches across all PoP needs to get updated. i.e. they should remain in sync
How does replication across datacenter (PoP) work for realtime caches ?
What tools/technologies are preferred ?
What are potential issues here in this system design ?
I am not sure there is a right/wrong answer to this, but here's my two pennies' worth of it.
I would tackle the problem from a slightly different angle: when a user posts something, that something goes in a distributed storage (not necessarily a cache) that is already redundant across multiple geographies. I would also presume that, in the interest of performance, these nodes are eventually consistent.
Now the caching. I would not design a system that takes care of synchronising all the caches each time someone does something. I would rather implement caching at the service level. Imagine a small service residing in a geographically distributed cluster. Each time a user tries to fetch data, the service checks its local cache - if it is a miss, it reads the tweets from the storage and puts a portion of them in a cache (subject to eviction policies). All subsequent accesses, if any, would be cached at a local level.
In terms of design precautions:
Carefully consider the DC / AZ topology in order to ensure sufficient bandwidth and low latency
Cache at the local level in order to avoid useless network trips
Cache updates don't happen from the centre to the periphery; cache is created when a cache miss happens
I am stating the obvious here, implement the right eviction policies in order to keep only the right objects in cache
The only message that should go from the centre to the periphery is a cache flush broadcast (tell all the nodes to get rid of their cache)
I am certainly missing many other things here, but hopefully this is good food for thought.

Not making a network call in case of Hazelcaste cache miss

I have enabled hazelcast near cache for one of my application. I case if cache miss from near cache Hazelcast is making a network call to look for the data.
The behaviour I am looking for is, in case of cache miss from "near cache" no network call should be made.
Any idea how to achieve that with Hazelcast?
I don't think this is possible in Hazelcast, at least I wouldn't know of any way. What is the reason you want to prevent it to call down to the cluster, I mean Near-Cache is a speed optimization (at the tradeoff of consistency) but not a full blown local cache. Maybe looking at the Continuous Query Cache would solve your issue, but I don't have enough information about your use case to understand your need :-)

What is the best way to handle a lot of images from a shared hosting plan?

I have a shared hosting plan and am designing a single page site which will include a slideshow. The browser typically limits the number of simultaneous requests to a single domain. I don't expect a lot of traffic, but I would like the traffic I do receive to have fast load times. I may be able to add unlimited subdomains, but does that really affect the speed for the customer considering they are probably the only one polling my server and all subdomains point to the same processor? I have already created two versions of every image, one for the slideshow, and one for larger format via AJAX request, but the lag times are still a little long for my taste. Any suggestions?
Before you contrive a bunch of subdomains to maximize parallel connections, you should profile your page load behavior so you know where most of the time is being spent. There might be easier and more rewarding optimizations to make first.
There are several tools that can help with this, use all of them:
https://developers.google.com/speed/pagespeed/
http://developer.yahoo.com/yslow/
http://www.webpagetest.org/
Some important factors to look at are cache optimization and image compression.
If you've done all those things, and you are sure that you want to use multiple (sub)domains, then I would recommend using a content delivery network (CDN) instead of hosting the static files (images) on the same shared server. You might consider Amazon's CloudFront service. It's super easy to set up, and reasonably priced.
Lastly, don't get carried away with too many (sub)domains, because each host name will require a separate DNS lookup; find a balance.

What's the best way to cache binary data?

I pre-generate 20+ million gzipped html pages, store them on disk, and serve them with a web server. Now I need this data to be accessible by multiple web servers. Rsync-ing the files takes too long. NFS seems like it may take too long.
I considered using a key/value store like Redis, but Redis only stores strings as values, and I suspect it will choke on gzipped files.
My current thinking is to use a simple MySQL/Postgres table with a string key and a binary value. Before I implement this solution, I wanted to see if anyone else had experience in this area and could offer advice.
I've head good about Redis, that's one.
I've also heard extremely positive things about memcached. It is suitable for binary data as well.
Take Facebook for example: These guys use memcached, also for the images!
As you know, images are in binary.
So, get memcached, get a machine to utilize it, a binder for PHP or whatever you use for your sites, and off you go! Good luck!
First off, why cache the gzips? Network latency and transmission time is orders of magnitude higher than the CPU time spent compressing the file so doing it on the fly maybe the simplest solution.
However,if you definitely have a need then I'm not sure a central database is going to be any quicker than a file share (of course you should be measuring not guessing these things!). A simple approach could be to host the original files on an NFS share and let each web server gzip and cache them locally on demand. memcached (as Poni suggests) is also a good alternative, but adds a layer of complexity.

Resources