How can I improve the performance of this architecture? - performance

I'm running a website that is CPU heavy due to a lot of thumbnailing of images.
This is how I currently do things:
User uploads image to server
Server keeps a copy, and stores the image on Amazon S3
When an thumbnail is requested, server uses the local copy to generate it, and then stores it on S3; then gives the S3 URL to the client
Subsequent requests are optimized like this: Server caches S3 URL in memcached, so it won't do the work again; server never generates a thumbnail again if the file exists; the server uses mid-sized thumbnails to generate small-sized one, so not to work with large files of not necessary
Now, I'm hosting on a Linode 4G instance (8 cores with 4x priority, 4GB RAM), and despite my optiomizations and having a memcached hit ratio of 70%, my average CPU is 170%. I'm constantly seeing all 8 CPUs working with frequent spikes of 100% for many of them at the same time.
I'm using nginx and gunicorn to serve a Django application, and the thumbnails are generated with PIL.
How can I improve this architecture?
I was thinking about a few possibilities:
#1. Easiest: add a second identical server with a load balancer in front, so that they'd share the load.
The problem with this is that the two servers would not share the local image cache. Could I solve this by placing such share on a network drive, or would the latency ultimately hinder the gains?
#2. A little harder: split the thumbnailing code out of my app, as a separate webservice, that would run on a second server. This way the main application and database would not suffer from high CPU usage, and the web pages would be served fast. The thumbnails are anyway already served asynchronously with JavaScript
Can anyone recommend some other solution?

Are you sure your performance problems come from thumbnails? OK, I suppose you've checked that.
You can downsize and upload the 2 thumbnails to S3 immediately (or shortly) after user uploaded the image. This way you should be able to save unnecessary CPU load you're now wasting for every HTTP request checking those thumbnails and doing IPC with memcached.

In a way your problem is a "good" problem to have (or at least it could have been a lot worse), in that there are no dependencies between separate image resizing tasks, so you can trivially distribute them over multiple servers. A few comments:
Have you checked to see if there is anything you can do to make the image resizing operations faster? (Google brought this up, don't know if it's any help: http://dmmartins.appspot.com/blog/speeding-up-image-resizing-with-python-and-pil) Even if you still find you need to add more servers, anything you can do to make each resize operation more efficient will make each server go farther.
If your users keep becoming more and more, you will eventually need to "scale out", but for the short term, it is possible you could solve the problem simply by paying another $80 for the next "tier" of service (8 cores at 8x priority).
Is image resizing really your app's only bottleneck? If image resizing was "free", how much further can you scale on your existing server before rendering pages, running DB queries, etc. would limit throughput? If you don't know, it would be good to do some simulated load testing and find out. I ask because if rendering pages, DB queries, etc. are also bottlenecks, or are soon to become bottlenecks, you are going to have to distribute the app anyways. In that case, you might as well keep thumbnailing in the main app, and distribute it right now, rather than making your thumbnailing run as a web service on a 2nd server.
Regardless of whether you distribute the main app, or split out thumbnailing into a separate app on a different server, you need some kind of authoritative store to keep track of where each thumbnail is kept on S3. You can keep that information in memcached, in a database, or wherever you want. It doesn't really matter. Even if you keep it in memcached, that doesn't mean you can't share the cache between 2 servers -- 1 server can connect to a memcached instance running on the other server.
You asked if "the latency" of checking a cache which is held on a different server will "hinder the gains". I don't think you need to worry about that. Your problem is throughput, not latency. Those high-latency network operations parallelize very well. So if you just service more requests in parallel, you can still make full use of your CPUs (which is the resource bottleneck right now).

Related

No load time difference with and without varnish

I'am trying to cache static files on my server using varnish cache. I configured varnish to cache files with image extensions (.jpg, .png etc.). After that I open my website and debug it with browser developer tools and check load time of all images on my site and there is no difference in load time when I use varnish or not. There is a "HIT" in X-Cache entry in response header so images are available in my cache right? Any idea what can I doing wrong?
Ps. I'm using nginx as a backend server
Varnish shoudln't have a real impact on static files, especially when they're located on a SSD. Very heavy frequented sites may be an exception, particulary when the data is stored on a (slow) HDD. Here you have a huge amout of I/O which can be highly reduced by caching the images in the ram with Varnish. But these might be some special cases where caching of static files make sense. For nginx is also noticeable that this is a very fast webserver which is very good at delivering static files.
The main purpose for Varnish is HTML generated by some server-side backend like PHP, ASP.NET, and other languages which are designed for this task. Compared with serving static files it's very time-sensitive to generate dynamic content: The backend hat to work for example on database-querys which are very common in web-applications today or parsing templates. Wordpress is a widespread CMS and also a good example for this: Several 10k of php-code are executed on a single request and depending on the amout of plugins 100 database-querys and more are no exception.
So there are a lot of things to do for the server - for every request. For you as a site-owner this has the following effects:
The loadtime of the page increases which will result in to problems when its too high:
Visitors are not very patient and they're going to leave your page when they're thinking it's not fast enough. A online-shop which is making $100k per day can be loss up to $2.5 million per year by a delay of 1 second (see https://blog.kissmetrics.com/loading-time/ for more information)
As a result of this its not unexpected that Google is using the loadtime as an indicator for your ranking (see http://www.shoutmeloud.com/google-started-ranking-websites-based-on-load-time-and-speed.html)
Depending on the amount of visitors it can cost you money for more or more powerfull servers
Varnish can store the HTML generated by a backend in the RAM or on a hard drive. Especially with a SSD the latter make sense. Depending of the structure and use of your site, Varnish will at least improve the speed of your page and maybe also save money because less (powerfull) servers will do the job.
When Varnish is used as fronted for dynamic-generated content, you'll notice a noticeable difference. Depending of the application even a big difference. I configured varnish for a vBulletin based forum and could improve the page load time about 5 times.
Summarizing you should focus on caching dynamic pages instead of static stuff like images or clientscript because in most cases the webserver is already good enough to deliver those things. When static content is really slow, this can probably improved by using a CDN. Or maybe your webserver is not well configured for optimal speed. Perhaps there is no lifetime defined for images as example. This can have a negative impact on performance, especially on larger ones. But without further about your application and configuration its not possible to investigate the performance-issue and give concret tipps how this can be enhanced.

Openshift Prestashop 1.5.6 Performance Tuning

In the prestashop performance page, it offers caching with APC, memcached, File System, and XCache. There's a warning about making sure the infrastructure contains one front-end server. If not sure, ask your hosting company. What's available and best here?
I'm currently using the free 3 gear set up. 1 small gear with PHP 5.4 and scaling turned on (minimum 1, maximum all available), 1 small gear for mysql 5.5. the HA Proxy shows one of my gears as active/down (red) with 0 bytes of anything. I'm assuming because I don't have the High Availability turned on correctly. I'm getting page load times of 1.5 to 3 seconds on average. I know the small gear is low cpu, but is that a normal response time? How can I tune this to sub 1 with what I have? I'd like to see the performance capabilities and scaling abilities before I kick over a CC to grab bigger gears and more of them.
Speaking of scaling I see this The code in the git repository is copied to each new gear, but the data directory begins empty. I just recently made some post_deploy hooks to symlink an images directory so that my git updates from my local machine didn't destroy custom uploads. Am I going to have to switch to some kind of CDN? I believe I saw an option to set CDN servers in the application.
Thanks in advance,
Alan
Prestashop is probably a pretty beefy application to run on a small gear, even with moving the database to it's own gear. Your web gear would have to receive enough traffic to scale up to another web gear, but as long as you are just getting a few requests to it, it might respond slowly and not scale up. For a few cents an hour, it might be worth spinning up a medium/large gear and checking it's performance.
As for the data directory symlink. On a scaled application, anything you upload into your OPENSHIFT_DATA_DIR will not be synced across gears that are spun up. You might want to consider finding some kind of Amazon S3 plugin or similar.

Load times with #font-face vs. Google fonts or localhost files vs. CDN's

Is loading fonts via storing them on your server and using #font-face slower than loading them from Google's font API? Or does it always depend on the font and vary from situation to situation?
And the same for Javascript and other similar files: is it faster or slower to load from CDN's than to store the files on your server and load them (locally on the server)?
Or are there too many variables involved from situation to situation to generalize to a single answer? I would imagine that it depends on which CDN you're accessing and/or your personal server settings and the size/nature of the files you're loading, etc, but I was just curious if there might be a general rule or strategy to knowing which is faster?
A CDN might be faster, on the base that it is built with speed in mind (high performance, tuned web servers, good caching...) and it is usually composed by a network of geographically distributed servers, lowering latence both because they are nearer and because they share the load. Also, they could be directly placed on backbones, which allow for much faster transfer rates than a low-to-mid-priced server will ever do.
Thus said, for a low traffic website mostly visited from one specific country, in turn near to the server location, the difference in load is irrelevant.
The reason for using Google or jQuery CDN is both saving bandwidth (if the respective owner allows you to use theirs, of course) on your server and be sure you do not miss urgent patches, as they will push fixed versions on the CDN as soon as possible, while you have to get notified, download the new version, then load it on your server (although I guess that this is not a great issues in modern, sanboxed browsers).

What's the speediest web hosting choices out there that are scalable to large traffic spikes and can handle fast page loads?

Is cloud hosting the way to go? Or is there something better that delivers fast page loads?
The reason I ask is because I run a buddypress site on a bluehost dedicated server, but it seems to run slow at most times of the day. This scares me because at the moment the sites not live and I'm afraid when it gets traffic it'll become worse and my visitors will lose interest. I use Amazon Cloud to handle all my media, JS, and CSS files along with a catching plugin, but it still loads slow at times.
I feel like the problem is Bluehost, because I visit other sites running buddypress and their sites seem to load instantly. Im not web hosting savvy so can someone please point me in the right direction here?
The hosting choice depends on many factors such as technical requirements, growth rates, burst rates, budgets and more.
Bigger Hardware
To scale up hosting operation, your first choice is often just using a more powerful server, VPS, or cloud instance. The point is not so much cloud vs. dedicated but that you simply bring more compute power to the problem. Cloud can make scaling up easier - often with a few clicks.
Division of Labor
The next step often is division of labor. You offload database, static content, caching or other items to specific servers or services. For example, you could offload static content to a CDN. You could a dedicated database.
Once again, cloud vs non-cloud is not the issue. The point is to bring more resources to your hosting problems.
Pick the Right Application Stack
I cannot stress enough picking the right underlying technology for your needs. For example, I've recently helped a client switch from a Apache/PHP stack to a Varnish/Nginx/PHP-FPM stack for a very business Wordpress operation (>100 million page views/mo). This change boosted capacity by nearly 5X with modest hardware changes.
Same App. Different Story
Also just because you are using a specific application, it does not mean the same hosting setup will work for you. I don't know about the specific app you are using but with Drupal, Wordpress, Joomla, Vbulletin and others, the plugins, site design, themes and other items are critical to overall performance.
To complicate matter, user behavior is something to consider as well. Consider a discussion form that has a 95:1 read:post ratio. What if you do something in the design to encourage more posts and that ratio moves to 75:1. That means more database writes, less caching, etc.
In short, details matter, so get a good understanding of your application before you start to scale out hosting.
A hosting service is part of the solution. Another part is proper server configuration.
For instance this guy has optimized his setup to serve 10 million requests in a day off a micro-instance on AWS.
I think you should look at your server config first, then shop for other hosts. If you can't control server configuration, try AWS, Rackspace or other cloud services.
just an FYI: You can sign up for AWS and use a micro instance free for one year. The link I posted - he just optimized on the same server. You might have to upgrade to a small server because Amazon has stated that micro is only to handle spikes and sustained traffic.
Good luck.

Images in load balanced environment

I have a load balanced enviorment with over 10 web servers running IIS. All websites are accessing a single file storage that hosts all the pictures. We currently have 200GB of pictures - we store them in directories of 1000 images per directory. Right now all the images are in a single storage device (RAID 10) connected to a single server that serves as the file server. All web servers are connected to the file server on the same LAN.
I am looking to improve the architecture so that we would have no single point of failure.
I am considering two alternatives:
Replicate the file storage to all of the webservers so that they all access the data locally
replicate the file storage to another storage so if something happens to the current storage we would be able to switch to it.
Obviously the main operations done on the file storage are read, but there are also a lot of write operations. What do you think is the preferred method? Any other idea?
I am currently ruling out use of CDN as it will require an architecture change on the application which we cannot make right now.
Certain things i would normally consider before going for arch change is
what are the issues of current arch
what am i doing wrong with the current arch.(if this had been working for a while, minor tweaks will normally solve a lot of issues)
will it allow me to grow easily (here there will always be a upper limit). Based on the past growth of data, you can effectively plan it.
reliability
easy to maintain / monitor / troubleshoot
cost
200GB is not a lot of data, and you can go in for some home grown solution or use something like a NAS, which will allow you to expand later on. And have a hot swappable replica of it.
Replicating to storage of all the webservers is a very expensive setup, and as you said there are a lot of write operations, it will have a large overhead in replicating to all the servers(which will only increase with the number of servers and growing data). And there is also the issue of stale data being served by one of the other nodes. Apart from that troubleshooting replication issues will be a mess with 10 and growing nodes.
Unless the lookup / read / write of files is very time critical, replicating to all the webservers is not a good idea. Users(of web) will hardly notice the difference of 100ms - 200ms in loadtime.
There are some enterprise solutions for this sort of thing. But I don't doubt that they are expensive. NAS doesn’t scale well. And you have a single point of failure which is not good.
There are some ways that you can write code to help with this. You could cache the images on the web servers the first time they are requested, this will reduce the load on the image server.
You could get a master slave set up, so that you have one main image server but other servers which copy from this. You could load balance these, and put some logic in your code so that if a slave doesn’t have a copy of an image, you check on the master. You could also assign these in priority order so that if the master is not available the first slave then becomes the master.
Since you have so little data in your storage, it makes sense to buy several big HDs or use the free space on your web servers to keep copies. It will take down the strain on your backend storage system and when it fails, you can still deliver content for your users. Even better, if you need to scale (more downloads), you can simply add a new server and the stress on your backend won't change, much.
If I had to do this, I'd use rsync or unison to copy the image files in the exact same space on the web servers where they are on the storage device (this way, you can swap out the copy with a network file system mount any time).
Run rsync every now and then (for example after any upload or once in the night; you'll know better which sizes fits you best).
A more versatile solution would be to use a P2P protocol like Bittorreent. This way, you could publish all the changes on the storage backend to the web servers and they'd optimize the updates automatcially.

Resources