Launching an online app: do I use CDN, Amazon Services or a dedicated server? - performance

My web application requires as little lag as possible. I have tried hosting it on a dedicated server, but users on the other side of world have complained about latency issues.
So I am considering using CDN or Amazon services.... would either help resolve this?
The application uses a lot of AJAX, so latency can be an issue.

Amazon's Cloudfront, part of the Amazon Web Services (AWS) that you can purchase, is a CDN (Content Delivery Network) -- so asking whether to use Amazon or "a CDN" strikes me as a weird question, akin to asking whether you should drink Coke or "a soda" (given that Coke is "a soda"). Rather you should ask "should I use Amazon or another CDN?" just like you'd ask "should I drink Coke or another soda?".
Your decision among CDNs must be based on many parameters - cost, reliability, convenience, speed, and so forth. Unfortunately I have no first-hand experience of CloudFront; however, on paper, it seems particularly simple to use (especially if you're already using other AWS components, since getting data e.g. from S3 to CloudFront is fast and cheap indeed;-), and reasonably priced (based on usage). But I have no experience about its uptime record or delivery speed.

A content delivery network is a great idea to speed up delivery of static content (images, javascript, etc). You could even use this in combination with a dedicated server if you want.
You may also consider using a tool such as YSlow to analyze what may be causing your latency issues.

A CDN will only improve the performance of your static content -- if your Ajax code requires active content, then it won't help for that.
Amazon AWS might help, but it depends on the details of your application. Amazon isn't particularly well-known for delivering a low-latency solution.
Most apps that require low latency end up addressing the issue from many directions. A combination of a CDN and dedicated servers is certainly one approach. One key there is choosing the right data center for your servers (a low-latency hub).
In case it might help, I wrote a book about this subject: Ultra-Fast ASP.NET, which includes a discussion of client-side issues, hardware infrastructure, CDNs, caching, and many other issues that can impact latency.

Related

Launching a website to be accessed globally

I have a website that could be visited by countries in different continentals. I noticed that most hosting companies have data centers in the US only, which might affect the performance when people from India, for example, are visiting the site. AWS and google own data centers all around the world, so would this be a better choice to solve the above-mentioned doubt? Are they using some technology that makes the website located in all datacenters ?
More about the website :
It is a dynamic website which depends heavily on the database. It mostly involves text. Few ajax code is there.
It is a Q & A website.
You would use some sort of load balancer.
Such as
AWS Elastic Load Balancing
Cloud Load Balancing
Cloud providers such as AWS has something called edge locations. When you deploy a website code, AWS will deploy the same code to edge locations around the world. When a user visits your website and the request reaches to AWS, AWS will redirect the requests to the edge location that is geographically closer to the user. So that the request will be served to the user faster.
I noticed that most hosting companies have data centers in the US only, which might affect the performance when people from India, for example, are visiting the site.
If your web site has purely or mostly static content, it usually won't matter (read about web caching), unless its traffic is large. As a typical example, I manage http://refpersys.org/ (physically hosted by OVH in France) and it is well visible from India: the latency is less than a few hundred milliseconds.
If your web site is extremely dynamic, it could matter (e.g. if every keystroke in a web browser started from India required an AJAX call to the US-located host).
Read much more about HTTP and perhaps TCP/IP. Don't confuse the World Wide Web with the Internet (which existed before the Web).
If performance really matters to you, you would set up some distributed and load balanced web service, by hosting on each continent. You might for instance use some distributed database technologies for the data (read about database replication), e.g. with PostGreSQL as your back-end database.
Of course, you can find web hosting in India.
And all that has some cost, mostly software development and deployment (network sysadmin skills are rare).
It is a Q & A website.
Then it is not that critical (assuming a small to medium traffic), and you can afford (at first) a single hosting located in a single place. I assume no harm is done if a given answer becomes visible worldwide only after several minutes.
Once your website is popular enough, you would have resources (human labor and computing hosting) to redesign it. AFAIK, StackOverflow started with a single web hosting and later improved to its current state. Design your website with some agile software development mindset: the data (that is past questions and answers typed by human users) is the most important asset, so make sure to design your database schema correctly, taking into account database normalization), and ensure that your data is backed-up correctly and often enough. And web technologies are evolving too (in 2021 the Web won't be exactly the same as today in December 2019, see e.g. this question).
If you wanted a world-wide fault-proof Q & A website, you could get a PhD in designing it well enough. Global distributed database consistency is still a research topic (see e.g. this research report).

CDN vs Homegrown Caching

My understanding of a CDN (like Akamai or Limelight) is that they are heavy-duty caching services.
I also understand that they are very expensive. So I'm wondering why I can't just create my own cluster of replicated caching servers (using, say, EhCache or Memcached) for all my web app's caching needs (images, URL hits/responses, Javascripts, etc) and basically get the same thing?
In essence, from a developer's perspective, what are the benefits (technical or otherwise) of paying a CDN vs. just using your own caching solution? Or, if I have completely misunderstood what CDNs are, please correct my understanding! Thanks in advance!
The key to this is the N in CDN, Content Delivery Network.
The advantage of a CDN (a good one, at least) is that it's geographically distributed. What the big CDN providers do (the ones you mentioned along with others) is have hundreds or thousands of servers in facilities all over the world. This lets what ever resource the CDN is serving sit as geographically close as possible to the end user no matter where they are in the world.
You can certainly replicate the functionality. At a very basic level most CDN's are simply an object/value store which plenty of software can do - what you can't do, at the scale these guys do it at any rate, is have servers all over the world to serve and replicate these objects.
If all you're interested in is an efficient way to store and serve static files then an object store coupled with something like nginx would do just fine. A CDN is used to make sites load as fast as they possibly can, but they come at a price.
For the record, Amazon's Cloud Front, while not as distributed as Limelight or Akamai is a lot cheaper and a good middle ground compromise on cost vs service.
The advantage of big CDNs is that they own distributed resources all over the world, allowing them to serve the users from a servers near them. Besides caching, this is the major point that makes them fast.
To build your own CDN, you would have to install servers on multiple continents, arrange good Internet connectivity for them, setup caching and make sure you have the servers all synchronized. It's not impossible to do that, but it will be very cost-intensive and might not be profitable.
With all due respect, I disagree with the other two answers given thus far to this question. Just to be clear, I'm not saying they are wrong, I'm just offering a different perspective.
While it is certainly true that the key to CDN performance is the "N," as Bulk eloquently explained, that doesn't mean you can't build your own CDN. The question is whether or not it's worth the time (and by definition, the money).
We live in a world of cheap servers and even cheaper virtual machines. Sure the big CDN networks have thousands/millions of servers all over the world, but that's because they have thousands/millions of sites to serve. Depending on the size of your site/app, all you really need in terms of resources is as much as your site(s) need. If you're small, the minimum might be a VPS on each US coasts, one in Europe, maybe two in Asia, and one in Australia. Sure, the hardware costs are too high for your typical homepage, but they are certainly not extreme, and if you're looking at CDNs in the first place, they are probably within your budget.
To me, commercial CDN services just provide PaaS convenience, but there is nothing preventing you from getting IaaS and building up your own platform.
One more thing on this topic:
I once read a comment either by David Heinemeier Hansson (the creator of Ruby on Rails) or by someone referring to him that went something along the lines of: The owner(s) of 37 Signals were concerned DHH was using Ruby to build their application. At that point Ruby was still very obscure. Almost all web hosts were offering PHP, Perl, and Microsoft technologies. When asked about the fact that there were only a handful of Ruby hosts in the world, DHH asked, "well how many do you need?"
To me the point is you need to look at what's best for your needs and the needs of your application, and not necessarily what the guys with millions of servers think.

What's the speediest web hosting choices out there that are scalable to large traffic spikes and can handle fast page loads?

Is cloud hosting the way to go? Or is there something better that delivers fast page loads?
The reason I ask is because I run a buddypress site on a bluehost dedicated server, but it seems to run slow at most times of the day. This scares me because at the moment the sites not live and I'm afraid when it gets traffic it'll become worse and my visitors will lose interest. I use Amazon Cloud to handle all my media, JS, and CSS files along with a catching plugin, but it still loads slow at times.
I feel like the problem is Bluehost, because I visit other sites running buddypress and their sites seem to load instantly. Im not web hosting savvy so can someone please point me in the right direction here?
The hosting choice depends on many factors such as technical requirements, growth rates, burst rates, budgets and more.
Bigger Hardware
To scale up hosting operation, your first choice is often just using a more powerful server, VPS, or cloud instance. The point is not so much cloud vs. dedicated but that you simply bring more compute power to the problem. Cloud can make scaling up easier - often with a few clicks.
Division of Labor
The next step often is division of labor. You offload database, static content, caching or other items to specific servers or services. For example, you could offload static content to a CDN. You could a dedicated database.
Once again, cloud vs non-cloud is not the issue. The point is to bring more resources to your hosting problems.
Pick the Right Application Stack
I cannot stress enough picking the right underlying technology for your needs. For example, I've recently helped a client switch from a Apache/PHP stack to a Varnish/Nginx/PHP-FPM stack for a very business Wordpress operation (>100 million page views/mo). This change boosted capacity by nearly 5X with modest hardware changes.
Same App. Different Story
Also just because you are using a specific application, it does not mean the same hosting setup will work for you. I don't know about the specific app you are using but with Drupal, Wordpress, Joomla, Vbulletin and others, the plugins, site design, themes and other items are critical to overall performance.
To complicate matter, user behavior is something to consider as well. Consider a discussion form that has a 95:1 read:post ratio. What if you do something in the design to encourage more posts and that ratio moves to 75:1. That means more database writes, less caching, etc.
In short, details matter, so get a good understanding of your application before you start to scale out hosting.
A hosting service is part of the solution. Another part is proper server configuration.
For instance this guy has optimized his setup to serve 10 million requests in a day off a micro-instance on AWS.
I think you should look at your server config first, then shop for other hosts. If you can't control server configuration, try AWS, Rackspace or other cloud services.
just an FYI: You can sign up for AWS and use a micro instance free for one year. The link I posted - he just optimized on the same server. You might have to upgrade to a small server because Amazon has stated that micro is only to handle spikes and sustained traffic.
Good luck.

Why would you not want to use Cloud Computing [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Our company is considering moving from hosting our own servers to EC2 and I was wondering if this was a good idea.
I have seen a lot of stuff about can cloud computing (and specifically EC2) do x, or can it do y, but my real question is why would you NOT want to use it?
If you were setting up a business, what are the reasons (outside of cost) that you would choose to go through the trouble of managing your own servers?
I know there are a lot of cost calculations you can put in regarding bandwidth, disk usage etc, but there are of course, other costs regarding maintenance of your own server. For the sake of this discussion I am willing to consider the costs roughly equal.
I seem to remember that Joel Spolsky wrote a little blur on this at one time, but I was unable to find it.
Anyone have any reasons?
Thanks!
I can think of several reasons why not use EC2 (and I am talking about EC2, not grid comp in general):
Reliability: Amazon makes no guarantee as to the availability / down time / safety of EC2
Security: Amazon does not makes any guarantee as to whom it will disclose your data
Persistence: ensuring persistence of your data (that includes, effort to set up the system) is complicated over EC2
Management: there are very few integrated management tools for a cloud deployed on EC2
Network: the virtual network that allows EC2 instances to communicates has some quite painful limitations (latency, no multicast, arbitrary topological location)
And to finish that:
Cost: on the long run, if you are not using EC2 to absorb peak traffic, it is going to be much more costly than investing into your own servers (cheapo servers like Supermicro cost just a couple of hundred bucks...)
On the other side, I still think EC2 is a great way to soak up non-sensitive peak traffic, if your architecture allows it.
Some questions to ask:
What is the expected uptime, and how does downtime affect your business? What sort of service level agreement can you get, what are the penalties for missing it, and how confident are you that the SLA uptime goals will be met? (They may be better or worse at keeping the systems up than you are.)
How sensitive is the data you're proposing to put into the cloud? Again, we get into the questions of how secure the provider promises to be, what the contractual penalties and indemnities are, and how confident you are that the provider will live up to the agreement. Further, there may be external requirements. If you deal with health-related data in the US, you are subject to very strict requirements. If you deal with credit card data, you also have responsibilities (contractual, not legal).
How easy will it be to back out of the arrangement, should service not be what was expected, or if you find a better deal elsewhere? This includes not only getting your data back, but also some version of the applications you've been using. Consider the possibilities of your provider going bankrupt (Amazon isn't going to go bankrupt any time soon, but they could split off a cloud provider which could then go bankrupt), or having an internal reorganization. Bear in mind that a company in serious trouble may not be able to live up to your expectations of service.
How much independence are you going to have? Are you going to be running their software or software you pick? How easy will it be to reconfigure?
What is the pricing scheme? Is it possible for the bills to hit unacceptable levels without adequate warning?
What is the disaster plan? Ideally, it's running your software on servers in a different location from where the disaster hit.
What does your legal department (or retained corporate attorney) think of the contract? Is there a dispute resolution mechanism, and, if so, is it fair to you?
Finally, what do you expect to get out of moving to the cloud? What are you willing to pay? What can you compromise on, and what do you need?
Highly sensitive data might be better to control yourself. And there's legislation; some privacy sensitive information, for example, might not leave the the country.
Also, except for Microsoft Azure in combination with SDS, the data stores tend to be not relational, which is a nuisance in certain cases.
Maybe concern that that big a company will more likely be approached by an Agent Smith from the government to spy on everyone that a little small provider somewhere.
Big company - more customers - more data to aggregate and recognize patterns - more resources to organize a sophisticated watch system.
Maybe it's more of a fantasy but who ever knows?
If you don't have a paranoia it doesn't mean yet that you are not being watched.
The big one is: if Amazon goes down, there's nothing you can do to bring it back up.
I'm not talking about doomsday scenarios where the company disappears. I mean that you're at the mercy of their downtime, with little recourse of your own.
Security -- you don't know what is being done to your data
Dependency -- your business is now directly intertwined with the provider
There are different kinds of cloud computing with lots of different vendors providing it. It would make me nervous to code my apps to work with a single cloud vendor. that you specifically had to code for..amazon and Microsoft I believe you need to specifically code for that platform - maybe google too.
That said, I recently jettisoned my own dedicated servers and moved to Rackspaces Mosso Cloud platform (which have no proprietary coding necessary) and I am really, really pleased with it so far. Cut my costs in half, and performance is way better than before. My sql server databases are now running on 64Bit enterprise SQL server versions with 32G of ram - that would have cost me a fortune on my previous providers infrastructure.
As far as being out of luck when the cloud is down, that was true if my dedicated server went down - it never did, but if there was a hardware crash on my dedicated server, I am not sure it would be back on-line any quicker than rackspace could bring their cloud back up.
Lack of control.
Putting your software on someone else's cloud represents handing over some control. They might institute a file upload size limit, or memory limits which could ruin your application. A security vulnerbility in their control panel could get your site hacked.
Security issues are not relevant if your application does its own encryption. Amazon is then storing encrypted data that they have no way of decrypting.
But in addition to the uptime issues, Amazon could decide to increase their prices to whatever they want. If you're dependent on them, you'll just have to pay it.
Depends how much you trust your own infrastructure in comparison to a 3rd party cloud service. In my opinion, most businesses (at least not IT related) should choose the later.
Another thing you lose with the cloud is the ability to choose exactly what operating system you want to run. For example, the latest Fedora Linux kernel available on EC2 is FC8, and the latest Windows version is Server 2003.
Besides the issues raised regarding dependability, reliability, and cost is the issue of data ownership. When you locate data on someone else's server, you no longer control who views, accesses, modifies, or uses that data. While the cloud operators can limit your access, you possess no way of limiting theirs or limiting who they give access to. Yes, you can encrypt all the data on the server but you lack any way of knowing who possesses root access to the server itself and any means to stop others from downloading your encrypted data and cracking it open. You lose control over your data; depending on what type of apps you are running and the proprietary nature of the data involved, this could engender corporate security and/or liability risks.
The other factor to consider is what would happen to your company if Amazon and/or EC2 were to suddenly vanish overnight. While a seemingly preposterous position, it could happen. Would you be able to quickly fill the hole and restore service, or would your potentially revenue generating apps languish while the IT staff scramble to obtain servers and bandwidth to get them back online? Also, what would happen to your data? The cloud hard drive holding all your information still exists, somewhere, and could pose a potential liability risk depending on the information you stored there--items such as personal information, business transaction records etc.
If I was starting my own business now, I would go through the hassle of purchasing and maintaining my own severs so I retained data ownership. I could control root access to the hardware, as well as control who can access and modify the data.
Unanswered security questions.
Really, do you want your IP out there, where you're not the one in control of it?
Most cloud computing environment are at least partially vendor specific. There's no good way to move stuff from one cloud to another without having to do a lot of rewriting. That sort of lock-in puts you at the mercy of one vendor when it comes to downtime, price increases, etc. If you rent or own your own servers, hosting providers and colos are pretty much interchangeable. You always have the option of moving somewhere else.
This may change in the future, as these things become standardized, but for now tying yourself to the cloud means tying yourself to a specific vendor.
This is kind of like the "Why would you use Linux" comment I received from management many years ago. The response I got was that it is a solution in search of a problem.
So what are your goals and objectives in moving to EC2?
I'd be interested to know if you'd still want to move to a cloud, if it was your own.
Cloud computing has brought parallel programming a little closer to the masses, but you still have to understand how best to use it - otherwise you're going to waste compute cycles and bandwidth.
Re-architecting your application for most efficient use of a cloud computing service is non-trivial.
Besides what has already been said here, we have to consider uniformity across the business. Are all of you applications going to be hosted in the cloud, or only most? Is most enough to pull the trigger on using the cloud when you still have to have personnel to handle a few special servers?
In particular, there might be special hardware that you need to communicate with such modems to accept incoming data, or voice cards that make automated phone calls. I don't know how such things could be handled in a cloud environment.

What are the reasons for a "simple" website not to choose Cloud Based Hosting?

I have been doing some catching up lately by reading about cloud hosting.
For a client that has about the same characteristics as StackOverflow (Windows stack, same amount of visitors), I need to set up a hosting environment. Stackoverflow went from renting to buying.
The question is why didn't they choose cloud hosting?
Since Stackoverflow doesn't use any weird stuff that needs to run on a dedicated server and supposedly cloud hosting is 'the' solution, why not use it?
By getting answers to this question I hope to be able to make a weighted decision myself.
I honestly do not know why SO runs like it does, on privately owned servers.
However, I can assume why a website would prefer this:
Maintainability - when things DO go wrong, you want to be hands-on on the problem, and solve it as quickly as possible, without needing to count on some third-party. Of course the downside is that you need to be available 24/7 to handle these problems.
Scalability - Cloud hosting (or any external hosting, for that matter) is very convenient for a small to medium-sized site. And most of the hosting providers today do give you the option to start small (shared hosting for example) and grow to private servers/VPN/etc... But if you truly believe you will need that extra growth space, you might want to count only on your own infrastructure.
Full Control - with your own servers, you are never bound to any restrictions or limitations a hosting service might impose on you. Run whatever you want, hog your CPU or your RAM, whatever. It's your server. Many hosting providers do not give you this freedom (unless you pay up, of course :) )
Again, this is a cost-effectiveness issue, and each business will handle it differently.
I think this might be a big reason why:
Cloud databases are typically more
limited in functionality than their
local counterparts. App Engine returns
up to 1000 results. SimpleDB times out
within 5 seconds. Joining records from
two tables in a single query breaks
databases optimized for scale. App
Engine offers specialized storage and
query types such as geographical
coordinates.
The database layer of a cloud instance
can be abstracted as a separate
best-of-breed layer within a cloud
stack but developers are most likely
to use the local solution for both its
speed and simplicity.
From Niall Kennedy
Obviously I cannot say for StackOverflow, but I have a few clients that went the "cloud hosting" route. All of which are now frantically trying to get off of the cloud.
In a lot of cases, it just isn't 100% there yet. Limitations in user tracking (passing of requestor's IP address), fluctuating performance due to other load on the cloud, and unknown usage number are just a few of the issues that have came up.
From what I've seen (and this is just based on reading various blogged stories) most of the time the dollar-costs of cloud hosting just don't work out, especially given a little bit of planning or analysis. It's only really valuable for somebody who expects highly fluctuating traffic which defies prediction, or seasonal bursts. I guess in it's infancy it's just not quite competitive enough.
IIRC Jeff and Joel said (in one of the podcasts) that they did actually run the numbers and it didn't work out cloud-favouring.
I think Jeff said in one of the Podcasts that he wanted to learn a lot of things about hosting, and generally has fun doing it. Some headaches aside (see the SO blog), I think it's a great learning experience.
Cloud computing definitely has it's advantages as many of the other answers have noted, but sometimes you just want to be able to control every bit of your server.
I looked into it once for quite a small site. Running a small Amazon instance for a year would cost around £700 + bandwidth costs + S3 storage costs. VPS hosting with similar specs and a decent bandwidth allowance chucked in is around £500. So I think cost has a lot to do with it unless you are going to have fluctuating traffic and lots of it!
I'm sure someone from SO will answer it but "Isn't just more hassle"? Old school hosting is still cheap and unless you got big scalability problems why would you do cloud hosting?

Resources