Where should I install varnish? - amazon-ec2

So my web app is hosted on amazon using Opswork.
Currently I have a 1 dedicated instance for Postgresql, 1 instance as my webserver, and another dedicated instance running Redis for caching purposes.
I would like to improve the performance by adding Varnish. Given my architecture where should I install varnish? and also taking into account I may soon outgrow this solucion and be using more webservers behind a loadbalancer.
Any help would be appreciated!
Bye

Varnish will always be quicker if you run it with memory storage - so the one with the most free memory would be a good pick. Even if you don't have enough to spare for the storage, it also uses quite some memory for the connection handling when you reach a bit more traffic.
Further along the road when you want a load balancer a good start would be to use a dedicated server for varnish that also can do load balancing just fine. It's not as effecient as a lightweight dedicated loadbalancer but until you need multiple varnish servers (way down the road) there is generally no point in using anything before it.

You should use Varnish in front of Apache Web Server. However it's fine to reside on Web Server itself and point Load Balancers to Varnish.

Related

How to prevent being affected by data-center DDoS attack & maintainance related downtime?

I'm hosting a web application which should be highly-available. I'm hosting on multiple linodes and using a nodebalancer to distribute the traffic. My question might be stupid simple - but not long ago I was affected by a DDoS hitting the data-center. That made me think how I can be better prepared next time this happens.
The nodebalancer and servers are all in the same datacenter which should, of course, be fixed. But how does one go about doing this? If I have two load balancers in two different data centers - how can I setup the domain to point to both, but ignore the one affected by DDoS? Should I look into the DNS manager? Am I making things too complicated?
Really would appreciate some insights.
Thanks everyone...
You have to look at ways to load balance across datacenters. There's a few ways to do this, each with pros and cons.
If you have a lot of DB calls, running to datacenters HOT can introduce a lot of latency problems. What I would do is as follows.
Have the second datacenter (DC2) be a warm location. It is configured for everything to work and is constantly getting data from the master DB in DC 1, but isn't actively getting traffic.
Use a service like CLoudFlare for their extremely fast DNS switching. Have a service in DC2 that constantly pings the load balancer in DC1 to make sure that everything is up and well. When it has trouble contacting DC1, it can connect to CloudFlare via the API and switch the main 'A' record to point to DC2, in which case it now picks up the traffic.
I forget what CloudFlare calls it but it has a DNS feature that allows you to switch 'A' records almost instantly because the actual IP address given to the public is their own, they just route the traffic for you.
Amazon also have a similar feature with CloudFront I believe.
This plan is costly however as you're running much more infrastructure that rarely gets used. Linode is and will be rolling out more network improvements so hopefully this becomes less necessary.
For more advanced load balancing and HA, you can go with more "cloud" providers but it does come at a cost.
-Ricardo
Developer Evangelist, CircleCI, formally Linode

Amazon EC2 + Windows Server 2008 + Memcached = how?

We are building a system that would benefit greatly from a Distributed Caching mechanism, like Memcached. But i cant get my head around the configuration of Memcached daemons and clients finding each other on an Amazon Data Center. Do we manually setup the IP addresses of each memcache instance (they wont be dedicated, they will run on Web Servers or Worker Boxes) or is there a automagic way of getting them to talk to each other? I was looking at Microsoft Windows Server App Fabric Caching, but it seems to either need a file share or a domain to work correctly, and i have neither at the moment... given internal IP addresses are Transient on Amazon, i am wondering how you get around this...
I haven't setup a cluster of memcached servers before, but Membase is a solution that could take away all of the pain you are experiencing with memcached. Membase is basically memcached with a persistence layer underneath and comes with great cluster management software. Clustering servers together is as easy since all you need to do is tell the cluster what the ip address of the new node is. If you already have an application written for Memcached it will also work with Membase since Membase uses the Memcached protocol. It might be worth taking a look at.
I believe you could create an elastic ip in EC2 for each of the boxes that hold your memcached servers. These elastic ips can be dynamically mapped to any EC2 instance. Then your memcached clients just use the elastic ips as if they were static ip addresses.
http://alestic.com/2009/06/ec2-elastic-ip-internal
As you seemed to have discovered, Route53 is commonly used for these discovery purposes. For your specific use case, however, I would just use Amazon ElasticCache. Amazon has both memcached and redis compliant versions of ElasticCache and they manage the infrastructure for you including providing you with a DNS entry point. Also for managing things like asp.net session state, you might consider this article on the DynamoDB session state provider.
General rule of thumb: if you are developing a new app then try and leverage what the cloud provides vs. build it, it'll make your life way simpler.

THE FASTEST Smarty Cache Handler

Does anyone know if there is an overview of the performance of different cache handlers for smarty?
I compared smarty file cache with a memcache handler, but it seemed memcache has a negative impact on performance.
I figured there would be a faster way to cache than through the filesystem... am I wrong?
I don't have a systematic answer for you, but in my experience, the file cache is the fastest. I should clarify that I haven't done any serious performance tests, but in all the time I've used Smarty, I have found the file cache to work best.
One this that definitely improves performance is to disable checking if the template files have changed. This avoids having to stat the tpl files.
File caching is ok when you have a single server instance or using shared drive (NFS) in a server cluster, but when you have a web server cluster (two or more web servers serving the same content), the problem with file based caching is not sync across the web servers. To perform a simple rsync on the caching directories is error prone. May work flawlessly for awhile but not a stable solution. The best solution for a cluster is to use distributed caching, that is memcache, which is a separate server running a memcached instance and each web server has PHP Memcache installed. Each server will then check for the existent of a cached page/item and if exists pulls from memcache otherwise will generate from the database and then save into memcached. When you are dealing with clusters, you cannot skimp on a good caching mechanism. If you are dealing with clusters, then your site already has more traffic (or will be) for a single server to handle.
There is beginners level cluster environment which can be implemented for a relative low cost. You can set up two colocated servers (nginx load balancer and a memcached server), then using free shared web hosting, you create an account of the same domain on those free hosting accounts and install your content. You configure your nginx load balancer to point to the IP addresses of the free web hosts. The free web hosts must have php5 memcache installed or the solution will not work.
Then you set you DNS for the domain with the registrar to point the NGINX IP (which would be a static ip if you are colocating). Now when someone access your domain, nginx redirects to one of your web server clusters located on the free hosting.
You may also want to consider a CDN to off load traffic when serving the static content.

Should I host Website and REST API on the same server or split?

I have a web application that consists of Website and REST API. Should I host them on the same server or should I host them on different servers? By "server" I mean a server cluster - several servers behind load balancer.
API is mostly inbound traffic, website - mostly outbound.
If it matters - hosted on Rackspace and/or AWS.
Here is what I see so far:
Benefits of having Website and REST API on the same server
Simple deployment
Simple scaling - something is slow - just launch another instance
Single load balancer configuration
Simple monitoring
Simple, simple, simple ...
Effective use of full duplex network (API - inbound, website - outbound)
Benefits of splitting
API overload will not affect website load time
Detailed monitoring (I will know which component uses resources at this moment)
Any comments?
Thank you
Alexander
Just as you stated, in most situations, there are more advantages in hosting the API on the same server as the website. So I would stick with that option.
But if you predict allot of traffic for either the website or the API, then maybe a separate server would be more suited.
If this is on a load balancer why don't you leave the services and pages on the same site and let the load balancer/cluster do its job?
Your list of advantages/disadvantages are operational considerations, but you should consider application needs as well.
Caching?
Security?
Other resources, i.e. filesystem
These may or may not apply, but if your application architecture is different between the two, be sure to factor this into your decision.

Load Balancing in Amazon EC2?

We've been fighting with HAProxy for a few days now in Amazon EC2; the experience has so far been great, but we're stuck on squeezing more performance out of the software load balancer. We're not exactly Linux networking whizzes (we're a .NET shop, normally), but we've so far held our own, attempting to set proper ulimits, inspecting kernel messages and tcpdumps for any irregularities.
So far though, we've reached a plateau of about 1,700 requests/sec, at which point client timeouts abound (we've been using and tweaking httperf for this purpose). A coworker and I were listening to the most recent Stack Overflow podcast, in which the Reddit founders note that their entire site runs off one HAProxy node, and that it so far hasn't become a bottleneck. Ack! Either there's somehow not seeing that many concurrent requests, we're doing something horribly wrong, or the shared nature of EC2 is limiting the network stack of the Ec2 instance (we're using a large instance type). Considering the fact that both Joel and the Reddit founders agree that network will likely be the limiting factor, is it possible that's the limitation we're seeing?
Any thoughts are greatly appreciated!
Edit It looks like the actual issue was not, in fact, with the load balancer node! The culprit was actually the nodes running httperf, in this instance. As httperf builds and tears down a socket for each request, it spends a good amount of CPU time in the kernel. As we bumped the request rate higher, the TCP FIN TTL (being 60s by default) was keeping sockets around too long, and the ip_local_port_range's default was too low for this usage scenario. Basically, after a few minutes of the client (httperf) node constantly creating and destroying new sockets, the number of unused ports ran out, and subsequent 'requests' errored-out at this stage, yielding low request/sec numbers and a large amount of errors.
We also had looked at nginx, but We've been working with RighScale, and they've got drop-in scripts for HAProxy. Oh, and we've got too tight a deadline [of course] to switch out components unless it proves absolutely necessary. Mercifully, being on AWS allows us to test out another setup using nginx in parallel (if warranted), and make the switch overnight later on.
This page describes each of the sysctl variables fairly well (ip_local_port_range and tcp_fin_timeout were tuned, in this case).
Not answering the question directly, but EC2 now supports load balancing through Elastic Load Balancing rather than running your own load balancer in an EC2 instance.
EDIT: Amazon's Route 53 DNS service now offers a way to point a top-level domain at an ELB with an "alias" record. Since Amazon knows the current IP address of the ELB, it can return an A record for that current IP rather than having to use a CNAME record, while still being free to change the IP from time to time.
Not really an answer to your question, but nginx and pound both have good reputations as load-balancers. Wordpress just switched to nginx with good results.
But more specifically, to debug your problem. If you aren't seeing 100% cpu usage (including I/O wait), then you are network bound, yes.
EC2 internally uses a gigabit network, try using an XL instance, so you have the underlying hardware to yourself, and don't have to share that gigabit network port.
Yes, You could use an off-site load balancer.. and on bare metal LVS is a great choice, but your latency will be awful! Rumour has it that Amazon is going to fix the CNAME issue. However they are unlikely to add https, indepth or custom health checks, feedback agents, url matching, cookie insertion (and some people with good architecture would say quite right too.) However thats why Scalr, RightScale and others are using HAProxy usually two of them behind a round robin DNS entry. Here at Loadbalancer.org we are just about to launch our own EC2 load balancing appaliance:
http://blog.loadbalancer.org/ec2-load-balancer-appliance-rocks-and-its-free-for-now-anyway/
We are planning on using SSH scripts to intergrate with autoscaling in the same way rightscale does, any comments appreciated on the blog.
Thanks
I would look at switching to a off-site load balancer, not in the cloud and run something like IPVS on top of it. [The reason why it would be off of amazon's cloud is because of kernel stuff] If Amazon doesn't limit the source IP of packets coming out of the you could go with a unidirectional load balancing mechanism. We do something like this, and it gets us about 800,000 simultaneous requests [though we don't deal with latency]. I also would say use "ab2" (apache bench), as it is a little more user friendly, and easier to use in my humble opinion.
Even though your issue solved. KEMP Technologies now have a fully blown load balancer for AWS. Might save you some hassle.

Resources