Launching a website to be accessed globally - web-hosting

I have a website that could be visited by countries in different continentals. I noticed that most hosting companies have data centers in the US only, which might affect the performance when people from India, for example, are visiting the site. AWS and google own data centers all around the world, so would this be a better choice to solve the above-mentioned doubt? Are they using some technology that makes the website located in all datacenters ?
More about the website :
It is a dynamic website which depends heavily on the database. It mostly involves text. Few ajax code is there.
It is a Q & A website.

You would use some sort of load balancer.
Such as
AWS Elastic Load Balancing
Cloud Load Balancing

Cloud providers such as AWS has something called edge locations. When you deploy a website code, AWS will deploy the same code to edge locations around the world. When a user visits your website and the request reaches to AWS, AWS will redirect the requests to the edge location that is geographically closer to the user. So that the request will be served to the user faster.

I noticed that most hosting companies have data centers in the US only, which might affect the performance when people from India, for example, are visiting the site.
If your web site has purely or mostly static content, it usually won't matter (read about web caching), unless its traffic is large. As a typical example, I manage http://refpersys.org/ (physically hosted by OVH in France) and it is well visible from India: the latency is less than a few hundred milliseconds.
If your web site is extremely dynamic, it could matter (e.g. if every keystroke in a web browser started from India required an AJAX call to the US-located host).
Read much more about HTTP and perhaps TCP/IP. Don't confuse the World Wide Web with the Internet (which existed before the Web).
If performance really matters to you, you would set up some distributed and load balanced web service, by hosting on each continent. You might for instance use some distributed database technologies for the data (read about database replication), e.g. with PostGreSQL as your back-end database.
Of course, you can find web hosting in India.
And all that has some cost, mostly software development and deployment (network sysadmin skills are rare).
It is a Q & A website.
Then it is not that critical (assuming a small to medium traffic), and you can afford (at first) a single hosting located in a single place. I assume no harm is done if a given answer becomes visible worldwide only after several minutes.
Once your website is popular enough, you would have resources (human labor and computing hosting) to redesign it. AFAIK, StackOverflow started with a single web hosting and later improved to its current state. Design your website with some agile software development mindset: the data (that is past questions and answers typed by human users) is the most important asset, so make sure to design your database schema correctly, taking into account database normalization), and ensure that your data is backed-up correctly and often enough. And web technologies are evolving too (in 2021 the Web won't be exactly the same as today in December 2019, see e.g. this question).
If you wanted a world-wide fault-proof Q & A website, you could get a PhD in designing it well enough. Global distributed database consistency is still a research topic (see e.g. this research report).

Related

Is browser caching of images good enough to invalidate the need for server side storing?

I had an architecture question, and I had to rewrite the question title multiple times, since SO asked me to. So please feel free to correct it, if you feel so. I am not an expert in cache related things so I would very much appreciate some insights about my architecture related question.
So the situation is like this. We have a web based design app (frontend Javascript, backend PHP) which presents lots of clipart images to our customers who use that in creating online art work. Earlier, our app was loaded into an AWS machine and we used to have the clipart images also stored locally in the same server in order to not have any network transfer required to load the clipart and thus make the design app load time faster. The customer created designs were also saved into a backend MySQL server connected directly to the web based design app (in JSON and relational model).
A while before a new team joined to make a mobile version of this app, and they insisted that the cliparts should be loaded from a "central location" both for our web app, and for the mobile app they are creating. They also said that the design should also be stored into a "central database", accessible by the web and mobile apps (and there were some major re-architecting of the JSON structure as well)
So finally, the architecture changed such that, the cliparts now reside in a centralized location (S3 Server). And there is an "Asset Delivery and Storage (ADS) System" to which our design app makes requests for clipart images and gets served. (Please note that the cliparts repository is very large and only a subset of clipart images are served based on various parameters - such as the style of the design, account type of the customer etc). So this task is now done by the ADS system (written in python).
And since our web design app no longer has any local storage of cliparts nor logic of cliparts filtering (which got delegated to ADS, so no more server side PHP), it has also become a purely web based (front end Javasdcript) app without any server requirements and subsequently got moved to S3.
Now the real matter is that, our web app seems much more slower when initial loading, than when we had our on stash of cliparts stored in the server. I read that if an app requests for images, those images are cached in the browser and if the customer, for eg, loads the same order before that cache has expired then there is no repeat request that needs to be sent to the server (in this case ADS).
If that is true, is there any case I can really make to state that moving the clipart images from the design app server to the ADS system and having to send a request and load them every time a design is loaded has contributed in part to the recent slowness of the design app?
Also most times I hear the answer that "mobile app also does the same and is faster".I am not a mobile developer. Could there be some mobile cache tricks that help the mobile app to be much more "cache-efficient" than the purely web based design app, such that even though the architecture is same for both (sending request ADS for cliparts), the mobile app does it in a better and more efficient manner?
End note: I realise I am not asking a specific programming question. But from some of the notes I have read here, SO is a community for programmers, and I do not know of any other community that so well answers programming related questions. The architecture question I have is a genuine programming related question I face at work and sadly I am not skilled enough to understand if all the recent architectural changes there has any drawbacks that is causing our web app performance to degrade noticeably.
Thanks for reading, and I would really appreciate any pointers or even links to reading for better understanding this.
In chrome, open up the developer tools, and click on the network tab. 90% of the time you can identify the slow resource from there.

how to make website, use distributed file system- hadoop for data management

I am naive in big data technology, and have curiosity to relate it with the conventional application development.
The conventional way to develop any web application is to have a hosting server (or application server) and a database to manage the data.
But lets say, I have a huge data set which is generated by the website, (i.e. GBs per second), then the website will fall into the category of managing big data.
lets suppose, I have a cluster of 20 computers, with 200GB of hard drives and core i3 processor. So now I will have enough processing and storage power for the website. (of-course hadoop is scalable too, if I need more resources).
how to setup application server, to host the website in this cluster ?
will I need load balancers for my application server since there is higher velocity, of http request to the application server?
can anyone please guide !
thanks in advance.
EDIT:
I just wanted to take an overview idea of how web application development takes place in relation with big data. Let's imagine Facebook. It is basically a web application. How application servers and database management is done, for Facebook is my curiosity.
As it is a fact that such a big company like Facebook, will have to use distributed system. E.g. hadoop clusters. And my question relates with the same concept. But Facebook has huge clusters, and to understand the way it has been implemented is tough, in my question I mentioned cluster of 20 computers. If someone has experience in setting up the hadoop clusters for web application hosting, then I would request to share the knowledge
I don't know much about Hadoop, but if I were going to make a web site I would use Visual Studio.
https://msdn.microsoft.com/en-us/library/k4cbh4dh.aspx?f=255&MSPPError=-2147217396
https://www.youtube.com/watch?v=GIRmPB0xshw
Visual Studio Express is free and very easy to use.

What's the speediest web hosting choices out there that are scalable to large traffic spikes and can handle fast page loads?

Is cloud hosting the way to go? Or is there something better that delivers fast page loads?
The reason I ask is because I run a buddypress site on a bluehost dedicated server, but it seems to run slow at most times of the day. This scares me because at the moment the sites not live and I'm afraid when it gets traffic it'll become worse and my visitors will lose interest. I use Amazon Cloud to handle all my media, JS, and CSS files along with a catching plugin, but it still loads slow at times.
I feel like the problem is Bluehost, because I visit other sites running buddypress and their sites seem to load instantly. Im not web hosting savvy so can someone please point me in the right direction here?
The hosting choice depends on many factors such as technical requirements, growth rates, burst rates, budgets and more.
Bigger Hardware
To scale up hosting operation, your first choice is often just using a more powerful server, VPS, or cloud instance. The point is not so much cloud vs. dedicated but that you simply bring more compute power to the problem. Cloud can make scaling up easier - often with a few clicks.
Division of Labor
The next step often is division of labor. You offload database, static content, caching or other items to specific servers or services. For example, you could offload static content to a CDN. You could a dedicated database.
Once again, cloud vs non-cloud is not the issue. The point is to bring more resources to your hosting problems.
Pick the Right Application Stack
I cannot stress enough picking the right underlying technology for your needs. For example, I've recently helped a client switch from a Apache/PHP stack to a Varnish/Nginx/PHP-FPM stack for a very business Wordpress operation (>100 million page views/mo). This change boosted capacity by nearly 5X with modest hardware changes.
Same App. Different Story
Also just because you are using a specific application, it does not mean the same hosting setup will work for you. I don't know about the specific app you are using but with Drupal, Wordpress, Joomla, Vbulletin and others, the plugins, site design, themes and other items are critical to overall performance.
To complicate matter, user behavior is something to consider as well. Consider a discussion form that has a 95:1 read:post ratio. What if you do something in the design to encourage more posts and that ratio moves to 75:1. That means more database writes, less caching, etc.
In short, details matter, so get a good understanding of your application before you start to scale out hosting.
A hosting service is part of the solution. Another part is proper server configuration.
For instance this guy has optimized his setup to serve 10 million requests in a day off a micro-instance on AWS.
I think you should look at your server config first, then shop for other hosts. If you can't control server configuration, try AWS, Rackspace or other cloud services.
just an FYI: You can sign up for AWS and use a micro instance free for one year. The link I posted - he just optimized on the same server. You might have to upgrade to a small server because Amazon has stated that micro is only to handle spikes and sustained traffic.
Good luck.

Launching an online app: do I use CDN, Amazon Services or a dedicated server?

My web application requires as little lag as possible. I have tried hosting it on a dedicated server, but users on the other side of world have complained about latency issues.
So I am considering using CDN or Amazon services.... would either help resolve this?
The application uses a lot of AJAX, so latency can be an issue.
Amazon's Cloudfront, part of the Amazon Web Services (AWS) that you can purchase, is a CDN (Content Delivery Network) -- so asking whether to use Amazon or "a CDN" strikes me as a weird question, akin to asking whether you should drink Coke or "a soda" (given that Coke is "a soda"). Rather you should ask "should I use Amazon or another CDN?" just like you'd ask "should I drink Coke or another soda?".
Your decision among CDNs must be based on many parameters - cost, reliability, convenience, speed, and so forth. Unfortunately I have no first-hand experience of CloudFront; however, on paper, it seems particularly simple to use (especially if you're already using other AWS components, since getting data e.g. from S3 to CloudFront is fast and cheap indeed;-), and reasonably priced (based on usage). But I have no experience about its uptime record or delivery speed.
A content delivery network is a great idea to speed up delivery of static content (images, javascript, etc). You could even use this in combination with a dedicated server if you want.
You may also consider using a tool such as YSlow to analyze what may be causing your latency issues.
A CDN will only improve the performance of your static content -- if your Ajax code requires active content, then it won't help for that.
Amazon AWS might help, but it depends on the details of your application. Amazon isn't particularly well-known for delivering a low-latency solution.
Most apps that require low latency end up addressing the issue from many directions. A combination of a CDN and dedicated servers is certainly one approach. One key there is choosing the right data center for your servers (a low-latency hub).
In case it might help, I wrote a book about this subject: Ultra-Fast ASP.NET, which includes a discussion of client-side issues, hardware infrastructure, CDNs, caching, and many other issues that can impact latency.

Best scaling methodologies for a highly traffic web application?

We have a new project for a web app that will display banners ads on websites (as a network) and our estimate is for it to handle 20 to 40 billion impressions a month.
Our current language is in ASP...but are moving to PHP. Does PHP 5 has its limit with scaling web application? Or, should I have our team invest in picking up JSP?
Or, is it a matter of the app server and/or DB? We plan to use Oracle 10g as the database.
No offense, but I strongly suspect you're vastly overestimating how many impressions you'll serve.
That said:
PHP or other languages used in the application tier really have little to do with scalability. Since the application tier delegates it's state to the database or equivalent, it's straightforward to add as much capacity as you need behind appropriate load balancing. Choice of language does influence per server efficiency and hence costs, but that's different than scalability.
It's scaling the state/data storage that gets more complicated.
For your app, you have three basic jobs:
what ad do we show?
serving the add
logging the impression
Each of these will require thought and likely different tools.
The second, serving the add, is most simple: use a CDN. If you actually serve the volume you claim, you should be able to negotiate favorable rates.
Deciding which ad to show is going to be very specific to your network. It may be as simple as reading a few rows from a database that give ad placements for a given property for a given calendar period. Or it may be complex contextual advertising like google. Assuming it's more the former, and that the database of placements is small, then this is the simple task of scaling database reads. You can use replication trees or alternately a caching layer like memcached.
The last will ultimately be the most difficult: how to scale the writes. A common approach would be to still use databases, but to adopt a sharding scaling strategy. More exotic options might be to use a key/value store supporting counter instructions, such as Redis, or a scalable OLAP database such as Vertica.
All of the above assumes that you're able to secure data center space and network provisioning capable of serving this load, which is not trivial at the numbers you're talking.
You do realize that 40 billion per month is roughly 15,500 per second, right?
Scaling isn't going to be your problem - infrastructure period is going to be your problem. No matter what technology stack you choose, you are going to need an enormous amount of hardware - as others have said in the form of a farm or cloud.
This question (and the entire subject) is a bit subjective. You can write a dog slow program in any language, and host it on anything.
I think your best bet is to see how your current implementation works under load. Maybe just a few tweaks will make things work for you - but changing your underlying framework seems a bit much.
That being said - your infrastructure team will also have to be involved as it seems you have some serious load requirements.
Good luck!
I think that it is not matter of language, but it can be be a matter of database speed as CPU processing speed. Have you considered a web farm? In this way you can have more than one machine serving your application. There are some ways to implement this solution. You can start with two server and add more server as the app request more processing volume.
In other point, Oracle 10g is a very good database server, in my humble opinion you only need a stand alone Oracle server to commit the volume of request. Remember that a SQL server is faster as the people request more or less the same things each time and it happens in web application if you plan your database schema carefully.
You also have to check all the Ad Server application solutions and there are a very good ones, just try Google with "Open Source AD servers".
PHP will be capable of serving your needs. However, as others have said, your first limits will be your network infrastructure.
But your second limits will be writing scalable code. You will need good abstraction and isolation so that resources can easily be added at any level. Things like a fast data-object mapper, multiple data caching mechanisms, separate configuration files, and so on.

Resources