What is the preferred way of serving static files for an application that is deployed in a microservices architecture (in production)?
Let's say for simplicity that I have 3 application servers and one load-balancer that forwards requests to these servers.
Should the load-balancer store the files and serve them imminently upon request? OR..
Should the load-balancer forward static files requests to the different application instances (each request to a different instance)?
Is there a best practice for this?
As stated in other comments/answers, there are a lot of ways to handle this. This largely depends on what you actually need (version control, access control, http headers, CDN).
Specifically to your question, if it was me, I wouldn't deploy these files on the load balancers, because a newer version of the static files would require a downtime on the load balancers. Instead, I would build very simple Nginx/Caddy containers that their sole purpose is to serve these files, and have the LB route the traffic to those containers.
Best practice would be to store it in a service meant for static content like blob storage (in the cloud this would be Azure Storage, S3, etc.). If necessary, leverage a CDN to improvement latency and throughput to end users.
But as someone else commented, there are many ways to handle this depending on your particular requirements.
Related
Sorry for such a question, but I can not find any article on the web with cons on that, I guess it is about async uploading and downloading, but it's just a guess, is there somewhere a detailed info?
It's mostly about specialization, data locality, and concurrency.
Servers that are specialized at serving static content typically do so much faster than dynamic web servers (the web servers are optimized for the specific use-case).
You also have the advantage of storing your content in many zones to achieve better performance (the content is physically closer to the person requesting it), where-as web applications typically should be near its other dependencies, such as databases.
Lastly browsers (for http/1 at least) only allow a fixed number of connections per server, so if your images and api calls are on separate servers, one cannot influence the other in terms of request scheduling.
There are a lot of other reasons I'm sure, but these are just off the top of my head.
There is this example on amazon, a high traffic web application. I noticed that they are using S3 as their content delivery method. I was wondering if I need to have a Web Server for the content delivery and a Web App for my application. I don't understand why they have 2 web servers and 2 web app in the diagram.
And what is the best way to set up a website that serves images and static contents through S3 and the rest of the content through the regular storage.
My last question is, can I consider S3 as a main storage, reliable enough that I can only keep my static content there and don't have a normal storage as a backup ?
That is a very general diagram, specific diagrams will vary depending on the specifics of the overall architecture.
Having said that, I believe the Web Server represents something like Apache or Nginx and the App Server represent something like Rails, Rack Server, Unicorn, Gunicorn, Django, Sinatra, Flask, Jetty, Tomcat, etc. In some cases you can merge the Web Server and the App Server together like for example deploying Apache with python mod_wsgi to run your Django app. (So depends on Architecture)
what is the best way to set up a website that serves images and static
contents through S3 and the rest of the content through the regular
storage.
There's no really best way other than just point your dynamic content to your Databases (SQL and NoSQL) and point your static files to an S3 bucket (images, css, Jquery code, etc) You can also use third party modules depending on your application stack. For example you can accomplish this in Django with the django-storages module. You can find similar modules for other app stacks like Rails.
My last question is, can I consider S3 as a main storage, reliable
enough that I can only keep my static content there and don't have a
normal storage as a backup ?
S3 is pretty reliable, they provide a 99.999999999% reliability of your data. That goes down if you use their RRS (Reduced Redundancy Storage), but if you want to use it you probably want to back up your data in a non RRS bucket anyways. Anyhow, if it's extremely critical data, you are more than free to backup your data somewhere else just in case.
Notice in the diagram that they also recommend using CloudFront for your static files and this is especially useful if your users will be accessing your application from different geographical areas.
Hope this helps.
I can see that all big sites store the images on a complete different server. What are the benefits of this practice?
Load balancing.
Separation of dynamic and static content.
Static content is served from servers which are geographically (or in network "length") close to the client.
(Update) forgot to mention that browsers used to limit the number of concurrent requests to the same server or domain (don't know if it's still used) and using different domain names allowed the server to bypass this limitation.
This way each kind of server serves resources it's tuned up for so clients get pages faster.
This way, the browser won't send cookies when requesting images.
It also enables the use of location-aware CDNs for images only.
Does anyone know if there is an overview of the performance of different cache handlers for smarty?
I compared smarty file cache with a memcache handler, but it seemed memcache has a negative impact on performance.
I figured there would be a faster way to cache than through the filesystem... am I wrong?
I don't have a systematic answer for you, but in my experience, the file cache is the fastest. I should clarify that I haven't done any serious performance tests, but in all the time I've used Smarty, I have found the file cache to work best.
One this that definitely improves performance is to disable checking if the template files have changed. This avoids having to stat the tpl files.
File caching is ok when you have a single server instance or using shared drive (NFS) in a server cluster, but when you have a web server cluster (two or more web servers serving the same content), the problem with file based caching is not sync across the web servers. To perform a simple rsync on the caching directories is error prone. May work flawlessly for awhile but not a stable solution. The best solution for a cluster is to use distributed caching, that is memcache, which is a separate server running a memcached instance and each web server has PHP Memcache installed. Each server will then check for the existent of a cached page/item and if exists pulls from memcache otherwise will generate from the database and then save into memcached. When you are dealing with clusters, you cannot skimp on a good caching mechanism. If you are dealing with clusters, then your site already has more traffic (or will be) for a single server to handle.
There is beginners level cluster environment which can be implemented for a relative low cost. You can set up two colocated servers (nginx load balancer and a memcached server), then using free shared web hosting, you create an account of the same domain on those free hosting accounts and install your content. You configure your nginx load balancer to point to the IP addresses of the free web hosts. The free web hosts must have php5 memcache installed or the solution will not work.
Then you set you DNS for the domain with the registrar to point the NGINX IP (which would be a static ip if you are colocating). Now when someone access your domain, nginx redirects to one of your web server clusters located on the free hosting.
You may also want to consider a CDN to off load traffic when serving the static content.
I have a web application that consists of Website and REST API. Should I host them on the same server or should I host them on different servers? By "server" I mean a server cluster - several servers behind load balancer.
API is mostly inbound traffic, website - mostly outbound.
If it matters - hosted on Rackspace and/or AWS.
Here is what I see so far:
Benefits of having Website and REST API on the same server
Simple deployment
Simple scaling - something is slow - just launch another instance
Single load balancer configuration
Simple monitoring
Simple, simple, simple ...
Effective use of full duplex network (API - inbound, website - outbound)
Benefits of splitting
API overload will not affect website load time
Detailed monitoring (I will know which component uses resources at this moment)
Any comments?
Thank you
Alexander
Just as you stated, in most situations, there are more advantages in hosting the API on the same server as the website. So I would stick with that option.
But if you predict allot of traffic for either the website or the API, then maybe a separate server would be more suited.
If this is on a load balancer why don't you leave the services and pages on the same site and let the load balancer/cluster do its job?
Your list of advantages/disadvantages are operational considerations, but you should consider application needs as well.
Caching?
Security?
Other resources, i.e. filesystem
These may or may not apply, but if your application architecture is different between the two, be sure to factor this into your decision.