HAProxy - routing to backend IP based on URL /path? - amazon-ec2

I'm trying to use HAProxy as a dynamic proxy for backend hosts based on partial /path regex match. The use case is routing from an HTTPS frontend to a large number of nodes that come and go frequently, without maintaining an explicit mapping of /path to server hostnames.
Specifically in this case the nodes are members of an Amazon EMR cluster, and I'd like to reverse-proxy/rewrite HTTP requests like:
<haproxy>/emr/ip-99-88-77-66:4040 -> 99.88.77.66:4040
<haproxy>/emr/ip-55-44-33-22/ganglia -> 55.44.33.22/ganglia
<haproxy>/emr/ip-11-11-11-11:8088/cluster/nodes -> 11.11.11.11:8088/cluster/nodes
...etc
dynamically.
As-in, parse the path beginning at /emr and proxy requests to an IP captured by the regex:
emr\/ip-(\d{1,3}-\d{1,3}-\d{1,3}-\d{1,3})(.*)
Is this possible with HAProxy? I know it's probably not the right tool for the job, but if possible (even non-performant) I'd like to use the tooling we already have in place.
tl;dr basically nginx proxy_pass, but with HAProxy and plucking a backend IP from the url.
Thanks!

Yes its possible by using url filters in haproxy, see below link for more details.
https://fossies.org/linux/haproxy/doc/internals/filters.txt

Yes this can be done. I would recommend you use ACLs, as well as Roundrobin & checks, which will allow you to check to see if that instance is up before routing to it with a check. That way, the system will only route to service instances that are up and running, and will only have them preloaded for use if they are up.
In addition, this will also allow you to constantly cycle in and out instances, such as if your AWS instance costs change with any other providers you may have, and allows you to load balance with maximum cost savings in mind.

yes, this is possible.. check the official manual:
Using ACLs and fetching samples

Related

rsyslog rewrite hostname before relay

I am setting up rsyslog in a multitenant environment to relay to a central server. Because it is multitenanted, I would like to prefix the hostname from the first rsyslog server with a customer specific prepend before relaying on to the central server. I had planned to set the prefix manually, however, the prefix is configured in another file on the server, and if this could be gathered from that file, that would be even better.
Because the first server will be relaying from multiple hosts, the prepend has to be a dynamic rewrite that includes the original hostname rather than a hard-coded overwrite of the same hostname for all entries, which I've seen in some examples.
Ideally, what I am trying do do is summarised by the following pseudocode:
ruleset(name="myrule"){
set $hostname = "<prefix>-%HOSTNAME%"
action(type="omfwd" target="remote-ip")
}
I will be responsible for both the intermediate relay and the central server, but each relay can host multiple customers, so I don't think that the rewrite can be done on the central server, but I have full control of both layers. Each customer is connected via a dedicated interface and I was planning for a separate ruleset attached to an input configured for each interface and the ruleset to include the customer specific prefix. For this reason, I think the config needs to be on the relay, but if there's a different way, then I am willing to try anything that meets the end-goal of making events customer-identifiable.
The reason for wanting to use the hostname rewrite is because this is in-line with how other tools are configured in the environment and it is highly desirable to keep a homogenous setup. However, if that is not possible, another method may be considered if the first is not technically feasible.
What is the correct way to do this?

Can a single ELB serve multiple domains? Can it serve multiple subdomains?

I am wondering if ELB can route http requests to different ASGs (or different instances if the backend is a single instance based rather than ASG based), based upon the domain name.
Say, I am company owning two domains and these two domains serve different services. Can I put a single ELB in front of the two different logic serving ASGs? (See the following diagram for what is in my mind)
(If the answer to the above question is 'NO', would you please explain why which may answer the next question all together?) And then I have a similar question, can ELB serve different subdomains from different ASGs (see the next diagram)?
No. An ELB evenly distributes traffic across the instances associated with it. Multiple AutoScaling groups can indeed be associated with a single ELB, however it isn't possible to influence the load balancing algorithm depending on any factor.
In your case, you need 2 ELBs.
A possible work around: If all your instances behind the ELB had Apache with Virtual Hosts running on them, you could serve different domains or subdomains using a single ELB. However, each of your instances would be identical - you wouldn't have some instances for domain 1 and some for domain 2.
The moral of the story is that when using ELBs, all of your instances behind the ELB need to be stateless and do the same thing. And, you cannot influence how the ELB distributes traffic to the nodes behind it.
A reading of the documentation would be of benefit to you.
ELB subdomain routing was added at least as early as this: https://aws.amazon.com/about-aws/whats-new/2017/04/elastic-load-balancing-adds-support-for-host-based-routing-and-increased-rules-on-its-application-load-balancer/ Depending on the specific scenarios such as tls requirements this may or may not address your needs.
The application ELB was extended further as documented here: https://aws.amazon.com/blogs/aws/new-advanced-request-routing-for-aws-application-load-balancers/ which allows additional features such as custom headers and more powerful boolean logic operations.
As of this writing, certain conditions apply for example tls and wild card certificates though https://aws.amazon.com/blogs/aws/new-application-load-balancer-sni/ addresses some of these.

HTTP Session Management while using Nginx as in "Round Robin" mode Load-balancer?

I'm trying to load-balance "2 Web Servers (running Apache/PHP)" by putting Nginx at in front of them. But I need to use Round Robin algorithm but when i do this, I can't manage to have the stable SESSIONS.
(I understand; if I use Round Robin, the SESSION information will be lost once i hit to the another Server on next load)
Is there a proper way to achieve this? Any kind advice for the industrial standards on this please?
FYI, I have already put these 2 Web Servers into GlusterFS as in Cluster. So I have a common storage (if you are going to suggest something based on this)
The nginx manual says that session affinity is in the commercial distribution only ("sticky" directive). If you don't use the commercial distribution, you'll have to grab a third-party "plugin" and rebuild the server with support
("sticky" should help you find the third party addons)
If there isn't any specific reason for using Round Robin, you can try to use ip_hash load balancing mechanism.
upstream myapp1 {
ip_hash;
server srv1.example.com;
server srv2.example.com;
server srv3.example.com;
}
If there is the need to tie a client to a particular application server — in other words, make the client’s session “sticky” or “persistent” in terms of always trying to select a particular server — the ip-hash load balancing mechanism can be used.
Please refer to nginx doc for load_balancing for more information.

How to configure kube-proxy master_url with multiple apiservers

I'm using a cluster setup with multiple apiservers with a loadbalancer in front of them for external access, with an installation on bare metal.
Like mentioned in the High Availability Kubernetes Clusters docs, I would like to use internal loadbalancing utilizing the kubernetes service within my cluster. This works fine so far, but I'm not sure what is the best way to set up the kube-proxy. It obviously cannot use the service IP, since it does the proxying to this one based on the data from the apiserver (master). I could use the IP of any one of the apiservers, but this would cause losing the high availability. So, the only viable option I currently see is to utilize my external loadbalancer, but this seems somehow wrong.
Anybody any ideas or best practices?
This is quite old question, but as the problem persists... here it goes.
There is a bug in the Kubernetes restclient, which does not allow to use more than one IP/URL, as it will pick up always the first IP/URL in the list. This affects to kube-proxy and also to kubelet, leaving a single point of failure in those tools if you don't use a load balancer (as you did) in a multi-master setup. The solution probably is not the most elegant solution ever, but currently (I think) is the easier one.
Other solution (which I prefer, but may not work for everyone and it does not solve all the problems) is to create a DNS entry that will round robin your API servers, but as pointed out in one of the links below, that only solves the load balancing, and not the HA.
You can see the progress of this story in the following links:
The kube-proxy/kubelet issue: https://github.com/kubernetes/kubernetes/issues/18174
The restclient PR: https://github.com/kubernetes/kubernetes/pull/30588
The "official" solution: https://github.com/kubernetes/kubernetes/issues/18174#issuecomment-199381822
I think the way it is meant to be set up is that you have a kube-proxy on each master node, so each kube-proxy points to its master on 127.0.0.1 / localhost
The podmaster determines which api-server should run, which in turns makes use of the local proxy of that master

1 A-record for every subdomain (10000+); any potential issues? Any other solution?

Most solutions I've read here for supporting subdomain-per-user at the DNS level are to point everything to one IP using *.domain.com.
It is an easy and simple solution, but what if I want to point first 1000 registered users to serverA, and next 1000 registered users to serverB? This is the preferred solution for us to keep our cost down in software and hardware for clustering.
alt text http://learn.iis.net/file.axd?i=1101
(diagram quoted from MS IIS site)
The most logical solution seems to have 1 x A-record per subdomain in Zone Datafiles. BIND doesn't seem to have any size limit on the Zone Datafiles, only restricted to memory available.
However, my team is worried about the latency of getting the new subdoamin up and ready, since creating a new subdomain consist of inserting a new A-record & restarting DNS server.
Is performance of restarting DNS server something we should worry about?
Thank you in advance.
UPDATE:
Seems like most of you suggest me to use a reverse proxy setup instead:
alt text http://learn.iis.net/file.axd?i=1102
(ARR is IIS7's reverse proxy solution)
However, here are the CONS I can see:
single point of failure
cannot strategically setup servers in different locations based on IP geolocation.
Use the wildcard DNS entry, then use load balancing to distribute the load between servers, regardless of what client they are.
While you're at it, skip the URL rewriting step and have your application determine which account it is based on the URL as entered (you can just as easily determine what X is in X.domain.com as in domain.com?user=X).
EDIT:
Based on your additional info, you may want to develop a "broker" that stores which clients are to access which servers. Make that public facing then draw from the resources associated with the client stored with the broker. Your front-end can be load balanced, then you can grab from the file/db servers based on who they are.
The front-end proxy with a wild-card DNS entry really is the way to go with this. It's how big sites like LiveJournal work.
Note that this is not just a TCP layer load-balancer - there are plenty of solutions that'll examine the host part of the URL to figure out which back-end server to forward the query too. You can easily do it with Apache running on a low-spec server with suitable configuration.
The proxy ensures that each user's session always goes to the right back-end server and most any session handling methods will just keep on working.
Also the proxy needn't be a single point of failure. It's perfectly possible and pretty easy to run two or more front-end proxies in a redundant configuration (to avoid failure) or even to have them share the load (to avoid stress).
I'd also second John Sheehan's suggestion that the application just look at the left-hand part of the URL to determine which user's content to display.
If using Apache for the back-end, see this post too for info about how to configure it.
If you use tinydns, you don't need to restart the nameserver if you modify its database and it should not be a bottleneck because it is generally very fast. I don't know whether it performs well with 10000+ entries though (it would surprise me if not).
http://cr.yp.to/djbdns.html

Resources