Setting up a Server as a CDN - performance

We've got a server and domain we use basically now as a big hard drive with video files and images (hosted by MediaTemple). What would it take to setup this server and domain as a CDN?
I saw this article:
http://www.riyaz.net/blog/how-to-setup-your-own-cdn-in-30-minutes/technology/890/
But that looks to be aliasing to the box, and not actually moving the content. Our content is actually hosted on a different box.

One of the tenets of a CDN is that content is geographically close to the client - if you only have one CDN server (rather than several replicated servers), it's not a CDN.
However, you can still get some of the benefits of a CDN. Browsers will typically only fetch 8 resources in parallel from any given hostname. You can give your 'CDN' server several subdomain hostnames and round-robin requests.
www1.example.com
www2.example.com
www3.example.com
...
This will effectively triple the number of concurrent requests a browser will make to your server, as it will see the three hostnames as three separate web servers.

Its basically like you creating a "best route possible" server for your client.
What you basically does is putting multiple IP addresses to one HOSTNAME example.
non static content *(Dynamic pages) are on WWW.Example.Com
Whereas the JSP,AVI etc are stored on media.cdn.example.com
media.cdn.example.com looks up as 1.2.3.4;8.8.9.9;103.10.4.5;etc
so the router on the user end will find nearest to that location and that will be your cdn.
another way is to force content be served using a certain route, and as such, pushes the router to do the same.

Related

Why should we use IP spoofing when performance testing?

Could anyone please tell me what is the use of IP spoofing in terms of Performance Testing?
There are two main reasons for using IP spoofing while load testing a web application:
Routing stickiness (a.k.a Persistence) - Many load balancers use IP stickiness when distriuting incoming load across applications servers. So, if you generate the load from the same IP, you could load only one application server instead of distributing the load to all application servers (This is also called Persistence: When we use Application layer information to stick a client to a single server). Using IP spoofing, you avoid this stickiness and make sure your load is distributed across all application servers.
IP Blocking - Some web applications detect a mass of HTTP requests coming from the same IP and block them to defend themselves. When you use IP spoofing you avoid being detected as a harmful source.
When it comes to load testing of web applications well behaved test should represent real user using real browser as close as possible, with all its stuff like:
Cookies
Headers
Cache
Handling of "embedded resources" (images, scripts, styles, fonts, etc.)
Think times
You might need to simulate requests originating from the different IP addresses if your application (or its infrastructure, like load balancer) assumes that each user uses unique IP address. Also DNS Caching on operating system of JVM level may lead to the situation when all your requests are basically hitting only one endpoint while others remain idle. So if there is a possibility it is better to mimic the requests in that way so they would come from the different addresses.

How to host images when using HTTP/2

We are migrating our page to HTTP/2.
When using HTTP/1 there was a limitation of 2 concurrent connections per host. Usually, a technique called sharding was used to work around that.
So content was delivered from www.example.com and the images from img.example.com.
Also, you wouldn't send all the cookies for www.example.com to the image domain, which also saves bandwidth (see What is a cookie free domain).
Things have changed with HTTP/2; what is the best way to serve images using HTTP/2?
same domain?
different domain?
Short:
No sharding is required, HTTP/2 web servers usually have a liberal connection limit.
As with HTTP/1.1, keep the files as small as possible, HTTP/2 still is bound by the same bandwidth physical constraints.
Multi-plexing is really a plus for concurrent image loading. There are a few demos out there, you can Google them. I can point you to the one that I did:
https://demo1.shimmercat.com/10/

What is the fastest way(/make loading faster) to serve images and other static content on a site?

Ours is an e-commerce site with lots of images and flash(same heavy flash rendered across all pages). All the static content is stored and served up from the webserver(IHS clustered-2 nodes). We still notice that the image delivery is slow. Is this approach correct at all? What are the alternative ways of doing this, like maybe serving up images using a third party vendor or implementing some kind of caching?
P.S. All our pages are https. Could this be a reason?
Edit1: The images are served up from
https too so the alerts are a non
issue?
Edit2: The loading is slower on IE and
most of our users are IE. I am not
sure if browser specific styling could
be causing the slower IE loading?(We
have some browser specific styling for
IE)
While serving pages over HTTP may be faster (though I doubt https is not monstrously slow, for small files), a good lot of browsers will complain if included resources such as images and JS are not on https:// URL's. This will give your customers annoying popup notifications.
There are high-performance servers for static file serving, but unless your SSL certificate works for multiple subdomains, there are a variety of complications. Putting the high-performance server in front of your dynamic content server and reverse proxying might be an option, if that server can do the SSL negotiation. For unix platforms, Nginx is pretty highly liked for its reverse proxying and static file serving. Proxy-cache setups like Squid may be an option too.
Serving static content on a cloud like amazon is an option, and some of the cloud providers let you use https as well, as long as you are fine with using a subdomain of their domain name (due to technical limitations in SSL)
You could create a CDN. If your site uses HTTPS, you'll need a HTTPS domain too (otherwise you might get a non-secure items warning).
Also, if your site uses cookies (and most do), having a CDN on a different domain (or if you use www.example.com, on cdn.example.com) you could host them there and not have the cookie data in the HTTP request come in.

When would you need multiple servers to host one web application?

Is that called "clustering" of servers? When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load? Also, is one "server" that's up and running the application called an "instance"?
[...] Is that called "clustering" of servers?
Clustering is indeed using transparently multiple nodes that are seen as a unique entity: the cluster. Clustering allows you to scale: you can spread your load on all the nodes and, if you need more power, you can add more nodes (short version). Clustering allows you to be fault tolerant: if one node (physical or logical) goes down, other nodes can still process requests and your service remains available (short version).
When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load?
In general, this is the job of a dedicated component called a "load balancer" (hardware, software) that can use many algorithms to balance the request: round-robin, FIFO, LIFO, load based...
In the case of EC2, you previously had to load balance with round-robin DNS and/or HA Proxy. See Introduction to Software Load Balancing with Amazon EC2. But for some time now, Amazon has launched load balancing and auto-scaling (beta) as part of their EC2 offerings. See Elastic Load Balancing.
Also, is one "server" that's up and running the application called an "instance"?
Actually, an instance can be many things (depending of who's speaking): a machine, a virtual machine, a server (software) up and running, etc.
In the case of EC2, you might want to read Amazon EC2 Instance Types.
Here is a real example:
This specific configuration is hosted at RackSpace in their Managed Colo group.
Requests pass through a Cisco Firewall. They are then routed across a Gigabit LAN to a Cisco CSS 11501 Content Services Switch (eg Load Balancer). The Load Balancer matches the incoming content to a content rule, handles the SSL decryption if necessary, and then forwards the traffic to one of several back-end web servers.
Each 5 seconds, the load balancer requests a URL on each webserver. If the webserver fails (two times in a row, IIRC) to respond with the correct value, that server is not sent any traffic until the URL starts responding correctly.
Further behind the webservers is a MySQL master / slave configuration. Connections may be mad to the master (for transactions) or to the slaves for read only requests.
Memcached is installed on each of the webservers, with 1 GB of ram dedicated to caching. Each web application may utilize the cluster of memcache servers to cache all kinds of content.
Deployment is handled using rsync to sync specific directories on a management server out to each webserver. Apache restarts, etc.. are handled through similar scripting over ssh from the management server.
The amount of traffic that can be handled through this configuration is significant. The advantages of easy scaling and easy maintenance are great as well.
For clustering, any web request would be handled by a load balancer, which being updated as to the current loads of the server forming the cluster, sends the request to the least burdened server. As for if it's an instance.....I believe so but I'd wait for confirmation first on that.
You'd' need a very large application to be bothered with thinking about clustering and the "fun" that comes with it software and hardware wise, though. Unless you're looking to start or are already running something big, it wouldn't' be anything to worry about.
Yes, it can be required for clustering. Typically as the load goes up you might find yourself with a frontend server that does url rewriting, https if required and caching with squid say. The requests get passed on to multiple backend servers - probably using cookies to associate a session with a particular backend if necessary. You might have the database on a separate server also.
I should add that there are other reasons why you might need multiple servers, for instance there may be a requirement that the database is not on the frontend server for security reasons

Why do some websites require "www"? [duplicate]

When browsing through the internet for the last few years, I'm seeing more and more pages getting rid of the 'www' subdomain.
Are there any good reasons to use or not to use the 'www' subdomain?
There are a ton of good reasons to include it, the best of which is here:
Yahoo Performance Best Practices
Due to the dot rule with cookies, if you don't have the 'www.' then you can't set two-dot cookies or cross-subdomain cookies a la *.example.com. There are two pertinent impacts.
First it means that any user you're giving cookies to will send those cookies back with requests that match the domain. So even if you have a subdomain, images.example.com, the example.com cookie will always be sent with requests to that domain. This creates overhead that wouldn't exist if you had made www.example.com the authoritative name. Of course you can use a CDN, but that depends on your resources.
Also, you then don't have the ability to set a cross-subdomain cookie. This seems evident, but this means allowing authenticated users to move between your subdomains is more of a technical challenge.
So ask yourself some questions. Do I set cookies? Do I care about potentially needless bandwidth expenditure? Will authenticated users be crossing subdomains? If you're really concerned with inconveniencing the user, you can always configure your server to take care of the www/no www thing automatically.
See dropwww and yes-www (saved).
Just after asking this question I came over the no-www page which says:
...Succinctly, use of the www subdomain
is redundant and time consuming to
communicate. The internet, media, and
society are all better off without it.
Take it from a domainer, Use both the www.domainname.com and the normal domainname.com
otherwise you are just throwing your traffic away to the browers search engine (DNS Error)
Actually it is amazing how many domains out there, especially amongst the top 100, correctly resolve for www.domainname.com but not domainname.com
There are MANY reasons to use the www sub-domain!
When writing a URL, it's easier to handwrite and type "www.stackoverflow.com", rather than "http://stackoverflow.com". Most text editors, email clients, word processors and WYSIWYG controls will automatically recognise both of the above and create hyperlinks. Typing just "stackoverflow.com" will not result in a hyperlink, after all it's just a domain name.. Who says there's a web service there? Who says the reference to that domain is a reference to its web service?
What would you rather write/type/say.. "www." (4 chars) or "http://" (7 chars) ??
"www." is an established shorthand way of unambiguously communicating the fact that the subject is a web address, not a URL for another network service.
When verbally communicating a web address, it should be clear from the context that it's a web address so saying "www" is redundant. Servers should be configured to return HTTP 301 (Moved Permanently) responses forwarding all requests for #.stackoverflow.com (the root of the domain) to the www subdomain.
In my experience, people who think WWW should be omitted tend to be people who don't understand the difference between the web and the internet and use the terms interchangeably, like they're synonymous. The web is just one of many network services.
If you want to get rid of www, why not change the your HTTP server to use a different port as well, TCP port 80 is sooo yesterday.. Let's change that to port 1234, YAY now people have to say and type "http://stackoverflow.com:1234" (eightch tee tee pee colon slash slash stack overflow dot com colon one two three four) but at least we don't have to say "www" eh?
There are several reasons, here are some:
1) The person wanted it this way on purpose
People use DNS for many things, not only the web. They may need the main dns name for some other service that is more important to them.
2) Misconfigured dns servers
If someone does a lookup of www to your dns server, your DNS server would need to resolve it.
3) Misconfigured web servers
A web server can host many different web sites. It distinguishes which site you want via the Host header. You need to specify which host names you want to be used for your website.
4) Website optimization
It is better to not handle both, but to forward one with a moved permanently http status code. That way the 2 addresses won't compete for inbound link ranks.
5) Cookies
To avoid problems with cookies not being sent back by the browser. This can also be solved with the moved permanently http status code.
6) Client side browser caching
Web browsers may not cache an image if you make a request to www and another without. This can also be solved with the moved permanently http status code.
There is no huge advantage to including-it or not-including-it and no one objectively-best strategy. “no-www.org” is a silly load of old dogma trying to present itself as definitive fact.
If the “big organisation that has many different services and doesn't want to have to dedicate the bare domain name to being a web server” scenario doesn't apply to you (and in reality it rarely does), which address you choose is a largely cultural matter. Are people where you are used to seeing a bare “example.org” domain written on advertising materials, would they immediately recognise it as a web address without the extra ‘www’ or ‘http://’? In Japan, for example, you would get funny looks for choosing the non-www version.
Whichever you choose, though, be consistent. Make both www and non-www versions accessible, but make one of them definitive, always link to that version, and make the other redirect to it (permanently, status code 301). Having both hostnames respond directly is bad for SEO, and serving any old hostname that resolves to your server leaves you open to DNS rebinding attacks.
Apart from the load optimization regarding cookies, there is also a DNS related reason for using the www subdomain. You can't use CNAME to the naked domain. On yes-www.org (saved) it says:
When using a provider such as Heroku or Akamai to host your web site, the provider wants to be able to update DNS records in case it needs to redirect traffic from a failing server to a healthy server. This is set up using DNS CNAME records, and the naked domain cannot have a CNAME record. This is only an issue if your site gets large enough to require highly redundant hosting with such a service.
As jdangel points out the www is good practice in some cookie situations but I believe there is another reason to use www.
Isn't it our responsibility to care for and protect our users. As most people expect www, you will give them a less than perfect experience by not programming for it.
To me it seems a little arrogant, to not set up a DNS entry just because in theory it's not required. There is no overhead in carrying the DNS entry and through redirects etc they can be redirected to a non www dns address.
Seriously don't loose valuable traffic by leaving your potential visitor with an unnecessary "site not found" error.
Additionally in a windows only network you might be able to set up a windows DNS server to avoid the following problem, but I don't think you can in a mixed environment of mac and windows. If a mac does a DNS query against a windows DNS mydomain.com will return all the available name servers not the webserver. So if in your browser you type mydomain.com you will have your browser query a name server not a webserver, in this case you need a subdomain (eg www.mydomain.com ) to point to the specific webserver.
Some sites require it because the service is configured on that particular set up to deliver web content via the www sub-domain only.
This is correct as www is the conventional sub-domain for "World Wide Web" traffic.
Just as port 80 is the standard port. Obviously there are other standard services and ports as well (http tcp/ip on port 80 is nothing special!)
Imagine mycompany...
mx1.mycompany.com 25 smtp, etc
ftp.mycompany.com 21 ftp
www.mycompany.com 80 http
Sites that don't require it basically have forwarding in dns or redirection of some-kind.
e.g.
*.mycompany.com 80 http
The onlty reason to do it as far as I can see is if you prefer it and you want to.

Resources