Is memcache(d) necessary when using Cloudflare/Incapsula - caching

If you need caching in your website to make database use lower, do you have to do it using memcache or memcached (in PHP, for example) or can you achieve this by using professional services like CloudFlare, Incapsula or others like that do some caching for you?

Services like Cloudflare cache your HTML and/or assets like images and CSS files in a CDN, so that your entire server is hit less often. This is great for semi-static sites but may not be the best fit for highly dynamic sites.
Local caches like memcached just store any data in a way that's fast to access. You can use that to cache database queries and lower your database activity, but you can also use it to store pre-computed data that would be expensive to re-create all the time or whatever else you may want to store non-permanently in a fast-to-access way.
Both solutions solve different problems. You may use both together, or either, or neither. It really depends on where exactly your bottleneck is and which solution fits your problem better.

I'm the CEO of CloudFlare and I'd say: more (intelligent) caching is almost always a good thing. While we can significantly decrease the load coming to your web server, to get the best performance it's still extremely important to optimize your web application and it's interaction with your database. To that end, memcache and other fast caching layers can play an important role and I'd never discourage them.
PS - we work great with dynamic sites. 95%+ of our sites are highly dynamic web applications.

Related

Is it better to use Cache or CDN?

I was studying about browser performance when loading static files and this doubt has come.
Some people say that use CDN static files (i.e. Google Code, jQuery
latest, AJAX CDN,...) is better for performance, because it requests
from another domain than the whole web page.
Other manner to improve the performance is to set the Expires header
equal to some months later, forcing the browser to cache the static
files and cutting down the requests.
I'm wondering which manner is the best, thinking about performance and
if I may combine both.
Ultimately it is better to employ both techniques if you are doing web performance optimization (WPO) of a site, also known as front-end optimization (FEO). They can work amazingly hand in hand. Although if I had to pick one over the other I'd definitely pick caching any day. In fact I'd say it's imperative that you setup proper resource caching for all web projects even if you are going to use a CDN.
Caching
Setting Expires headers and caching of resources is a must and should be done 100% of the time for your resources. There really is no excuse for not doing caching. On Apache this is super easy to config after enabling mod_expires.c and mod_headers.c. The HTML5 Boilerplate project has good implementation example in the .htaccess file and if your server is something else like nginx, lighttpd or IIS check out these other server configs.
Here's a good read if anyone is interested in learning about caching: Mark Nottingham's Caching Tutorial
Content Delivery Network
You mentioned Google Code, jQuery latest, AJAX CDN and I want to just touch on CDN in general including those you pay for and host your own resources on but the same applies if you are simply using the jquery hosted files cdn or loading something from http://cdnjs.com/ for example.
I would say a CDN is less important than setting server side header caching but a CDN can provide significant performance gains but your content delivery network performance will vary depending on the provider.
This is especially true if your traffic is a worldwide audience and the CDN provider has many worldwide edge/peer locations. It will also reduce your webhosting bandwidth significantly and cpu usage (a bit) since you're offloading some of the work to the CDN to deliver resources.
A CDN can, in some rarer cases, cause a negative impact on performance if the latency of the CDN ends up being slower then your server. Also if you over optimize and employ too much parallelization of resources (using multi subdomains like cdn1, cdn2, cdn3, etc) it is possible to end up slowing down the user experience and cause overhead with extra DNS lookups. A good balance is needed here.
One other negative impact that can happen is if the CDN is down. It has happened, and will happen again. This is more true with free CDN. If the CDN goes down for whatever reason, so does your site. It is yet another potential single point of failure (SPOF). For javascript resources you can get clever and load the resource from the CDN and should it fail, for whatever the case, then detect and load a local copy. Here's an example of loading jQuery from ajax.googleapis.com with a fallback (taken from the HTML5 Boilerplate):
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="js/vendor/jquery-1.8.2.min.js"><\/script>')</script>
Besides obvious free API resources out there (jquery, google api, etc) if you're using a CDN you may have to pay a fee for usage so it is going to add to hosting costs. Of course for some CDN you have to even pay extra to get access to certain locations, for example Asian nodes might be additional cost then North America.
For public applications, go for CDN.
Caching helps for repeated requests, but not for the first request.
To ensure fast load on first page visit use a CDN, chances are pretty good that the file is already cached by another site already.
As other have mentioned already CDN results are of course heavily cached too.
However if you have an intranet website you might want to host the files yourself as they typically load faster from an internal source than from a CDN.
You then also have the option to combine several files into one to reduce the number of requests.
A CDN has the benefit of providing multiple servers and automatically routing your traffic to the closest location to your client. This can result in faster delivery, optimized by location.
Also, static content doesn't require special application servers (like dynamic content) so for you to be able to offload it to a CDN means you completely reduce that traffic. A streaming video clip may be too big to cache or should not be cached. But you don't neccessarily want to support that bandwidth. A CDN will take on that traffic for you.
It is not always about the cache. A small application web server may just want to provide the dynamic content but needs a solution for the heavy hitting media that rarely changes. CDNs handle the scaling issue for you.
Agree with #Anthony_Hatzopoulos (+1)
CDN complements Caching; also in some cases, it will help optimize Caching directives.
For example, a company I work for has integrated behavior-learning algorithms into its CDN, to identify and dynamically cache generated objects.
Ordinarily, these objects would be un-Cachable (i.e. [Cache-Control: max-age=0] Http header). But in this case, the system is able to identify Caching possibilities and override original HTTP Header directions. (For example: a dynamically generated popular product that should be Cached, or popular Search result page that, while being generated dynamically, is still presented time over time in the same form to thousands of users).
And yes, before you ask, the system can also identify personalized data and very freshness, to prevent false positives... :)
Implementing such an algorithm was only possible due to a reverse-proxy CDN technology. This is an example of how CDN and Caching can complement each other, to create better and smarter acceleration solutions.
Above those experts quotes, the explanation are perfect to understand CDN tech and also cache
I would just provide my personal experience, I had worked on the joomla virtuemart site and unfortunately it will not allow update new joomla and virtuemart version cause it was too much customised fields in product pages, so once the visitor up to 900/DAY and lots user could not put their items in their basket because each time to called lots js and ajax called for order items takes too much time
After optimise the site, we decide to use CDN, then the performance is really getting good, along by record from gtmetrix, the first YSlow Score was 50% then after optimise + CDN it goes to 74%
https://gtmetrix.com/reports/www.florihana.com/jWlY35im
and from dashboard of CDN you could see which datacenter cost most and data charged most to get your improvement of marketing:
But yes to configure CDN it has to be careful of purge time and be balancing numbers of resource CDN cause if it down some problem you need to figure out which resource CDN cause
Hope this does help

CDN vs Homegrown Caching

My understanding of a CDN (like Akamai or Limelight) is that they are heavy-duty caching services.
I also understand that they are very expensive. So I'm wondering why I can't just create my own cluster of replicated caching servers (using, say, EhCache or Memcached) for all my web app's caching needs (images, URL hits/responses, Javascripts, etc) and basically get the same thing?
In essence, from a developer's perspective, what are the benefits (technical or otherwise) of paying a CDN vs. just using your own caching solution? Or, if I have completely misunderstood what CDNs are, please correct my understanding! Thanks in advance!
The key to this is the N in CDN, Content Delivery Network.
The advantage of a CDN (a good one, at least) is that it's geographically distributed. What the big CDN providers do (the ones you mentioned along with others) is have hundreds or thousands of servers in facilities all over the world. This lets what ever resource the CDN is serving sit as geographically close as possible to the end user no matter where they are in the world.
You can certainly replicate the functionality. At a very basic level most CDN's are simply an object/value store which plenty of software can do - what you can't do, at the scale these guys do it at any rate, is have servers all over the world to serve and replicate these objects.
If all you're interested in is an efficient way to store and serve static files then an object store coupled with something like nginx would do just fine. A CDN is used to make sites load as fast as they possibly can, but they come at a price.
For the record, Amazon's Cloud Front, while not as distributed as Limelight or Akamai is a lot cheaper and a good middle ground compromise on cost vs service.
The advantage of big CDNs is that they own distributed resources all over the world, allowing them to serve the users from a servers near them. Besides caching, this is the major point that makes them fast.
To build your own CDN, you would have to install servers on multiple continents, arrange good Internet connectivity for them, setup caching and make sure you have the servers all synchronized. It's not impossible to do that, but it will be very cost-intensive and might not be profitable.
With all due respect, I disagree with the other two answers given thus far to this question. Just to be clear, I'm not saying they are wrong, I'm just offering a different perspective.
While it is certainly true that the key to CDN performance is the "N," as Bulk eloquently explained, that doesn't mean you can't build your own CDN. The question is whether or not it's worth the time (and by definition, the money).
We live in a world of cheap servers and even cheaper virtual machines. Sure the big CDN networks have thousands/millions of servers all over the world, but that's because they have thousands/millions of sites to serve. Depending on the size of your site/app, all you really need in terms of resources is as much as your site(s) need. If you're small, the minimum might be a VPS on each US coasts, one in Europe, maybe two in Asia, and one in Australia. Sure, the hardware costs are too high for your typical homepage, but they are certainly not extreme, and if you're looking at CDNs in the first place, they are probably within your budget.
To me, commercial CDN services just provide PaaS convenience, but there is nothing preventing you from getting IaaS and building up your own platform.
One more thing on this topic:
I once read a comment either by David Heinemeier Hansson (the creator of Ruby on Rails) or by someone referring to him that went something along the lines of: The owner(s) of 37 Signals were concerned DHH was using Ruby to build their application. At that point Ruby was still very obscure. Almost all web hosts were offering PHP, Perl, and Microsoft technologies. When asked about the fact that there were only a handful of Ruby hosts in the world, DHH asked, "well how many do you need?"
To me the point is you need to look at what's best for your needs and the needs of your application, and not necessarily what the guys with millions of servers think.

how to store dynamically generated pages in html?

I'm working on ASP.net MVC3 Web application that is facing scalability issue.
For improving performance I want to store dynamically generated pages in html and serve them from generated html directly rather then querying database for each page request.
I'm sure this will dramatically increase performance.
Can any one share any hint / example / tutorial on how to do it? And what are challenges?
I would also like to know how others are handling performance issue for large e-commerce sites with at-least thousand categories and 200k products with at least 200-500 concurrent visitors? What are the best approaches?
Thanks in advance.
You shoult have a look at Improving Performance with Output Caching.
It provides several ways to cache the output of your controllers like this:
[OutputCache(Duration=10, VaryByParam="none")]
public ActionResult Index()
{
return View();
}
You don't need to do that, just enable the output cache. It will be the same, instead of hitting all your logic for creating the pages you will be retrieving a static one, but from the cache instead of disk.
I would look at implementing a cache system for commonly viewed pages. The .Net platform has some really nice cache libraries that can be used that would allow you to manage the cache in real time.
Cache Best Practices
msdn.microsoft.com/en-us/library/aa478965.aspx
I might also take a look a writing a restful API that can be load balanced across multiple nodes of a cluster.
Do you know where your capacity bottleneck is?
My guess is your DB is the bottleneck, but unless you measure to find out where the bottleneck is you're likely to spend a lot of time optimising things that may not make much difference.
First things to do are get hold of a copy of PAL and monitor the web server and DB to see what it tells you.
Also run SQL queries to diagnose the most expensive and frequent queries.
Measure you're actually page generation times and follow it down the stack to see what calls are being made and which are expensive and can be cached.
Rather than output caching, I'd generally introduce a caching layer infront of the webservers, and caching layer between the app server and the DB but without measuring it's very hard to judge.
If you want to know more about caching in general, I'd read this answer I wrote a while back How to get started with web caching, CDNs, and proxy servers?

Which schemaless datastores provide good performance?

I've recently written a web app that uses couchdb. I like couchdb and it suited the app - which has a lot of dynamic behaviour and simply pulls JSON directly from couchdb. Being able to upload images via a browser is nice and it's a snap to do tweaks to document data. The replication also has made deployment a breeze as the app is a couchapp, and all that's required to deploy is a replicate to the production server.
However for a new app I'm thinking off (think blog type thingy), I want good performance and it's one area I think couchdb is not strong in. The app will be predominantly read oriented (I'm estimating 90% reads to 10% writes).
Which datastores provide the best performance in a single server scenario? I'd be very interested to hear people's experiences in this...
I think MongoDB is beginning to look like the front runner performance wise for schemaless data stores.
We're currently in the processes of evaluating this for storing binary objects that can range from 10Kb to 50Mb and I've been very impressed with it's performance even on modest hardware.
If it is primarily read performance you are worried about why not just put a varnish proxy in front of couchdb? I use a couple of custom configurations in varnish to tell it not to actually query couchdb for cached objects despite couchdb specifying must-validate, then have a script with an active HTTP GET on _changes that uses the data from _changes in order to explicitly purge changed entries from varnish.
As a plus varnish lets you do URL rewriting, which I need. Most of the other solutions for it involve running something like apache or ngnix just to rewrite URLs for couchdb.

Best scaling methodologies for a highly traffic web application?

We have a new project for a web app that will display banners ads on websites (as a network) and our estimate is for it to handle 20 to 40 billion impressions a month.
Our current language is in ASP...but are moving to PHP. Does PHP 5 has its limit with scaling web application? Or, should I have our team invest in picking up JSP?
Or, is it a matter of the app server and/or DB? We plan to use Oracle 10g as the database.
No offense, but I strongly suspect you're vastly overestimating how many impressions you'll serve.
That said:
PHP or other languages used in the application tier really have little to do with scalability. Since the application tier delegates it's state to the database or equivalent, it's straightforward to add as much capacity as you need behind appropriate load balancing. Choice of language does influence per server efficiency and hence costs, but that's different than scalability.
It's scaling the state/data storage that gets more complicated.
For your app, you have three basic jobs:
what ad do we show?
serving the add
logging the impression
Each of these will require thought and likely different tools.
The second, serving the add, is most simple: use a CDN. If you actually serve the volume you claim, you should be able to negotiate favorable rates.
Deciding which ad to show is going to be very specific to your network. It may be as simple as reading a few rows from a database that give ad placements for a given property for a given calendar period. Or it may be complex contextual advertising like google. Assuming it's more the former, and that the database of placements is small, then this is the simple task of scaling database reads. You can use replication trees or alternately a caching layer like memcached.
The last will ultimately be the most difficult: how to scale the writes. A common approach would be to still use databases, but to adopt a sharding scaling strategy. More exotic options might be to use a key/value store supporting counter instructions, such as Redis, or a scalable OLAP database such as Vertica.
All of the above assumes that you're able to secure data center space and network provisioning capable of serving this load, which is not trivial at the numbers you're talking.
You do realize that 40 billion per month is roughly 15,500 per second, right?
Scaling isn't going to be your problem - infrastructure period is going to be your problem. No matter what technology stack you choose, you are going to need an enormous amount of hardware - as others have said in the form of a farm or cloud.
This question (and the entire subject) is a bit subjective. You can write a dog slow program in any language, and host it on anything.
I think your best bet is to see how your current implementation works under load. Maybe just a few tweaks will make things work for you - but changing your underlying framework seems a bit much.
That being said - your infrastructure team will also have to be involved as it seems you have some serious load requirements.
Good luck!
I think that it is not matter of language, but it can be be a matter of database speed as CPU processing speed. Have you considered a web farm? In this way you can have more than one machine serving your application. There are some ways to implement this solution. You can start with two server and add more server as the app request more processing volume.
In other point, Oracle 10g is a very good database server, in my humble opinion you only need a stand alone Oracle server to commit the volume of request. Remember that a SQL server is faster as the people request more or less the same things each time and it happens in web application if you plan your database schema carefully.
You also have to check all the Ad Server application solutions and there are a very good ones, just try Google with "Open Source AD servers".
PHP will be capable of serving your needs. However, as others have said, your first limits will be your network infrastructure.
But your second limits will be writing scalable code. You will need good abstraction and isolation so that resources can easily be added at any level. Things like a fast data-object mapper, multiple data caching mechanisms, separate configuration files, and so on.

Resources