Caching images in Memcached - caching

The user profile images are stored in a separate fileserver, and I am thinking of caching them in memcached. The memcached server is local to the app, and each image is less than 1MB.
But I saw over here that using memcached for images is a bad idea. Is it really? I am really not convinced.
Any best practices and suggestions? I am using SpyMemcached Java Client.

Linux automatically caches files that are read from disk. Caching proxies like Squid are also good at caching images.
So... there are certainly are better tools for the job. On the other hand, nginx recently added memcached support. Without context, it's really hard to judge that recommendation.
They might mean "Don't serve images from memcached via a PHP script", in which case, they're absolutely correct -- PHP adds tons of overhead. But I don't necessarily see how using Nginx's memcache feature to store and serve images would be a bad thing.
Edit: It appears that facebook may have cached profile images in memcached at one point.

Related

Implementing "extreme" bandwidth saving for web browsing with a compression proxy

I have a network connection where I pay per megabyte, so I'm interested in reducing my bandwidth usage as far as possible while still having a reasonably good browsing experience. I use this wonderful extension (https://bandwidth-hero.com/). This extension runs a image-compression proxy on my heroku account that accepts images URLs, and returns a low-quality version of those images.This reduces bandwidth usage by 30-40% when images are loaded.
To further reduce usage, I typically browse with both JavaScript and images disabled (there are various extensions for doing this in firefox/firefox-esr/google-chrome). This has an added bonus of blocking most ads (since they usually need JavaScript to run).
For daily browsing, the most efficient solution is using a text-mode browser in a virtual console such as elinks/lynx/links2 running over ssh (with zlib compression) on a VPS server. But sometimes using JavaScript becomes necessary, as sites will not render without it .Elinks is the only text-mode browser that even tries to support JavaScript, and even that support is quite rudimentary. When I have to come back to using firefox/chrome, I find my bandwidth usage shooting up. I would like to avoid this.
I find that bandwidth is used partially to get the 'raw' html files of the sites I'm browsing, but more often for the associated .js/.css files. These are typically highly compressible. On my local workstation, html+css+javascript files typically compress by a factor of more than 10x when using lzma(2) compression.
It seems to me that one way to do drastically reduce bandwidth consumption would be to use the same template as the bandwidth-hero extension, i.e. run a compression proxy either on a vps or on my heroku account but do so for text content (.html/.js/.css).
Ideally, I would like to run a compression proxy on my local machine. When I open a site (say www.stackoverflow.com), the browser should send a request to this local proxy. This local proxy then sends a request to a back-end running on heroku/vps. The heroku/vps back-end actually fetches all the content, and compresses it (lzma/bzip/gzip). The compressed content is sent back to my local proxy. The local proxy decompresses the content and finally gives it to the browser.
There is something like this mentioned in this answer (https://stackoverflow.com/a/42505732/10690958) for node.js . I am thinking of the same for python.
From what google searches show, HTTP can "automatically" ask for gzip versions of pages. But does this also apply for the associated files that are loaded by JavaScript, and for the css files? Perhaps, what I am thinking about is already implemented by default ?
Any pointers would be welcome. I was thinking of writing a local proxy in python,as I am reasonably fluent in it. But I know little about heroku or the intricacies of HTTP.
thanks.
Update: I found a possible solution here https://github.com/barnacs/compy
which does almost exactly what I need (minify+compress with brotli/gzip+transcode jpeg/gif/png). It uses go instead of python, but that does not really matter. It also has a docker image here https://hub.docker.com/r/andrewgaul/compy/ . Since I'm not very familiar with heroku, I cant figure out how to use this to run the compression proxy service on my account. The heroku docs also weren't of much help to me. Any pointers would be welcome.

Do websites share cached files?

We're currently doing optimizations to our web project when our lead told us to push the use of CDNs for external libraries as opposed to including them into a compile+compress process and shipping them off a cache-enabled nginx setup.
His assumption is that if the user has visits example.com which uses a CDN'ed version of jQuery, the jQuery is cached that time. If the user happens to visit example2.com and happen to use the same CDN'ed jQuery, the jQuery will be loaded from cache instead of over the network.
So my question is: Do domains actually share their cache?
I argued that even if it is possible the browser does share cache, the problem is that we are running on the assumption that the previous sites use the same exact CDN'ed file from the same exact CDN. What are the chances of running into a user browsing through a site using the same CDN'ed file? He said to use the largest CDN to increase chances.
So the follow-up question would be: If the browser does share cache, is it worth the hassle to optimize based on his assumption?
I have looked up topics about CDNs and I have found nothing about this "shared domain cache" or CDNs being used this way.
Well your lead is right this is basic HTTP.
All you are doing is indicating to the client where it can find the file.
The client then handles sending a request to the CDN in compliance with their caching rules.
But you shouldn't over-use CDNs for libraries either, keep in mind that if you need a specific version of the library, especially older ones, you won't be likely to get much cache hits because of version fragmentation.
For widely used and heavy libraries like jQuery you want the latest version of it is recommended.
If you can take them all from the same CDN all the better (ie: Google's) especially as http2 is coming.
Additionally they save you bandwidth, which can amount to a lot when you have high loads of traffic, and can reduce the load time for users far from your server (Google's is great for this).

What's a good alternative to page caching on Heroku?

I understand page caching isn't a good option on heroku since each dyno has an emepheral file system (so they wouldn't share files and it would get wiped out on each restart).
So I'm wondering what the best alternative is. I have a large amount of potential files that could get generated in a traditional page caching scenario (say 10GB-100GB) so redis/memcached don't seem like good options here. Redis can write out to disk, but my understanding is that once you exceed it's memory capacity, it's not the right solution to start reading off of disk.
Has anyone found a good solution here? I'm thinking maybe MongoStore. (And some way to run this in conjunction with redis since I'm using redis for some other scenarios.) Thanks!
If your site is 100% static content and never going to be dynamic, S3 may be a good option. You can then create a CNAME to the s3 domain. This allows you to leverage CloudFront should you need it. Otherwise, 100GB would have to go into the database, which is in turn then pulled up by your application.
Heroku's cedar stack allows for custom buildpacks. This one vendors nginx. This would be good if you envision transitioning to a more dynamic site.

Is memcache(d) necessary when using Cloudflare/Incapsula

If you need caching in your website to make database use lower, do you have to do it using memcache or memcached (in PHP, for example) or can you achieve this by using professional services like CloudFlare, Incapsula or others like that do some caching for you?
Services like Cloudflare cache your HTML and/or assets like images and CSS files in a CDN, so that your entire server is hit less often. This is great for semi-static sites but may not be the best fit for highly dynamic sites.
Local caches like memcached just store any data in a way that's fast to access. You can use that to cache database queries and lower your database activity, but you can also use it to store pre-computed data that would be expensive to re-create all the time or whatever else you may want to store non-permanently in a fast-to-access way.
Both solutions solve different problems. You may use both together, or either, or neither. It really depends on where exactly your bottleneck is and which solution fits your problem better.
I'm the CEO of CloudFlare and I'd say: more (intelligent) caching is almost always a good thing. While we can significantly decrease the load coming to your web server, to get the best performance it's still extremely important to optimize your web application and it's interaction with your database. To that end, memcache and other fast caching layers can play an important role and I'd never discourage them.
PS - we work great with dynamic sites. 95%+ of our sites are highly dynamic web applications.

Which schemaless datastores provide good performance?

I've recently written a web app that uses couchdb. I like couchdb and it suited the app - which has a lot of dynamic behaviour and simply pulls JSON directly from couchdb. Being able to upload images via a browser is nice and it's a snap to do tweaks to document data. The replication also has made deployment a breeze as the app is a couchapp, and all that's required to deploy is a replicate to the production server.
However for a new app I'm thinking off (think blog type thingy), I want good performance and it's one area I think couchdb is not strong in. The app will be predominantly read oriented (I'm estimating 90% reads to 10% writes).
Which datastores provide the best performance in a single server scenario? I'd be very interested to hear people's experiences in this...
I think MongoDB is beginning to look like the front runner performance wise for schemaless data stores.
We're currently in the processes of evaluating this for storing binary objects that can range from 10Kb to 50Mb and I've been very impressed with it's performance even on modest hardware.
If it is primarily read performance you are worried about why not just put a varnish proxy in front of couchdb? I use a couple of custom configurations in varnish to tell it not to actually query couchdb for cached objects despite couchdb specifying must-validate, then have a script with an active HTTP GET on _changes that uses the data from _changes in order to explicitly purge changed entries from varnish.
As a plus varnish lets you do URL rewriting, which I need. Most of the other solutions for it involve running something like apache or ngnix just to rewrite URLs for couchdb.

Resources