Persistent & clustered connections with traefik reverse proxy - cluster-computing

Let's say I have a cluster of database replicas that I would like to make available under a frontend. These databases replicate with each other. Can I have Traefik serve the same backend to the same client IP if possible, such that the UI can be made consistent even when the DBs are still replicating the newest state?

What you seem to be asking for is sticky sessions (aka session affinity) on a per-IP address basis.
Traefik supports cookie-based stickiness, which means that a cookie will be assigned on the initial request if the relevant Traefik option is enabled. Subsequent requests will then reach the same backend unless it fails to be reachable, at which point a new sticky backend will be selected.
The option can be enabled like this:
[backends]
[backends.backend1]
[backends.backend1.loadbalancer]
sticky = true
Documentation can be found here (search for "sticky sessions").
If you are running one of the dynamic providers with Traefik (e.g., Docker, Kubernetes, Marathon), there are usually labels/tags/annotations available you can set per-frontend. The TOML configuration file documentation contains all the details.
If you are looking for true IP address-based stickiness where the IP address space gets hashed and traffic evenly distributed across all backends: This isn't possible yet, although there's an open feature request.

Related

How to disable sticky sessions in Openshift3

If you scale up a Pod in Openshift3, all requests coming from the same client IP address are sent to container which has the session associated.
Is there any configuration to disable sticky sessions? How can I manage the options of internal HAProxy in Openshift?
For posterity, and since I had the same problem, I want to document the solution I used from Graham Dumpleton's excellent comment.
As it turns out, a cookie is set during the first request that redirects subsequent requests to the same back-end. To disable this behavior on a per-route basis:
oc annotate routes myroute haproxy.router.openshift.io/disable_cookies='true'
This prevents the cookie from being set and allows the balance algorithm to select the appropriate back-end for subsequent requests from the same client. To change the balance algorithm:
oc annotate routes myroute haproxy.router.openshift.io/balance='roundrobin'
With these two annotations set, requests from the same client IP address are sent to each back-end in turn, instead of the same back-end over and over.
oc set env dc/router ROUTER_TCP_BALANCE_SCHEME=roundrobin will change the load balancing algorithm haproxy uses for routes it just passes through (default is source). ROUTER_LOAD_BALANCE_ALGORITHM will change it for routes where it terminates TLS (default us leastconn).
More info on changing the internals of how haproxy works in the OCP 3.5 docs .

Loadbalancing web sockets - AWS Elastic Loadbalancer

I have a question about how to load balance web sockets with AWS elastic load balancer.
I have 2 EC2 instances behind AWS elastic load balancer.
When any user login, the user session will be established with one of the server, say EC2 instance1. Now, all the requests from the same user will be routed to EC2 instance1.
Now, I have a different stateless request coming from a different system. This request will have userId in it. This request might end up going to a EC2 instance2. We are supposed to send a notification to the user based on the userId in the request.
Now,
1) Assume, the user session is with the EC2 instance1, but the notification is originating from the EC2 instance2.
I am not sure how to notify the user browser in this case.
2) Is there any limitation on the websocket connection like 64K and how to overcome with multiple servers, since user is coming thru Load balancer.
Thanks
You will need something else to notify the browser's websocket's server end about the event coming from the other system. There are a couple of publish-subscribe based solution which might help, but without knowing more details it is a bit hard to figure out which solution fits the best. Redis is generally a good answer, and Elasticache supports it.
I found this regarding to AWS ELB's limits:
http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html#limits_elastic_load_balancer
But none of them seems to be related to your question.
Websocket requests start with HTTP communication before handing over to websockets. In theory if you could include a cookie in that initial HTTP request then the sticky session features of ELB would allow you to direct websockets to specific EC2 instances. However, your websocket client may not support this.
A preferred solution would be to make your EC2 instances stateless. Store the websocket session data in AWS Elasticache (Either Redis or Memcached) and then incoming connections will be able to access the session regardless of which EC2 instance is used.
The advantage of this solution is that you remove the dependency on individual EC2 instances and your application will scale and handle failures better.
If the ELB has too many incoming connections, then it should scale automatically. Although I can't find a reference for that. ELB's are relatively slow to scale - minutes rather than seconds, if you are expecting surges in traffic then AWS can "pre-warm" more ELB resource for you. This is done via support requests.
Also, factor in the ELB connection time out. By default this is 60 seconds, it can be increased via the AWS console or API. Your application needs to send at least 1 byte of traffic before the timeout or the ELB will drop the connection.
Recently had to hook up crossbar.io websockets with ALB. Basically there are two things to consider. 1) You need to set stickiness to 1 day on the target group attributes. 2) You either need something on the same port that returns static webpage if connection is not upgraded, or a separate port serving a static webpage with a custom health check specifying that port on the target group. Go for a ALB over ELB, ALB's have support for ws:// and wss://, they only lack the health check over websockets.

Google Compute Engine load balancing keep session

I have 2 TomEE servers on google's machines. they both serves the same application.
The web application have login page with jaas. an both servers works with the same DB.
when Im tring to access the servers separatly everything works fine.
but when I try to access via the load-balancer It's look like the load balancer hopping my requests between the two servers and therefore my web app not working well since the VM that I didn't login to rejects my requests.
my problem is how to make the session works well when loadbalancing the servers?
You want to look at the sessionAffinity feature of load balancer.
Specifically, per the load balancer target pool docs:
sessionAffinity
[Optional] Controls the method used to select a backend virtual machine instance. You can only set this value during the creation of the target pool. Once set, you cannot modify this value. The hash method selects a backend based on a subset of the following 5 values:
Source / Destination IP
Source / Destination Port
Layer 4 Protocol (TCP, UDP)
Possible hashes are:
NONE (i.e., no hash specified) (default)
5-tuple hashing, which uses the source and destination IPs, source and destination ports, and protocol. Each new connection can end up on any instance, but all traffic for a given connection will stay on the same instance if the instance stays healthy.
CLIENT_IP_PROTO
3-tuple hashing, which uses the source and destination IPs and the protocol. All connections from a client will end up on the same instance as long as they use the same protocol and the instance stays healthy.
CLIENT_IP
2-tuple hashing, which uses the source and destination IPs. All connections from a client will end up on the same instance regardless of protocol as long as the instance stays healthy.
5-tuple hashing provides a good distribution of traffic across many virtual machines. However, a second session from the same client may arrive on a different instance because the source port may change. If you want all sessions from the same client to reach the same backend, as long as the backend stays healthy, you can specify CLIENT_IP_PROTO or CLIENT_IP options.
In general, if you select a 3-tuple or 2-tuple method, it will provide for better session affinity than the default 5-tuple method, but the overall traffic may not be as evenly distributed.
Caution: If a large portion of your clients are behind a proxy server, you should not use CLIENT_IP_PROTO or CLIENT_IP. Using them would end up sending all the traffic from those clients to the same instance.

Load balancing with nginx

I want to stop serving requests to my back end servers if the load on those servers goes above a certain level. Anyone who is already surfing the site will still get routed but new connection will be sent to a static server busy page until the load drops below a pre determined level.
I can use cookies to let the current customers in but I can't find information on how to to routing based on a custom load metric.
Can anyone point me in the right direction?
Nginx has an HTTP Upstream module for load balancing. Checking the responsiveness of the backend servers is done with the max_fails and fail_timeout options. Routing to an alternate page when no backends are available is done with the backup option. I recommend translating your load metrics into the options that Nginx supplies.
Let's say though that Nginx is still seeing the backend as being "up" when the load is higher than you want. You may be able to adjust that further by tuning the max connections of the backend servers. So, maybe the backend servers can only handle 5 connections before the load is too high, so you tune it only allow 5 connections. Then on the front-end, Nginx will time-out immediately when trying to send a sixth connection, and mark that server as inoperative.
Another option is to handle this outside of Nginx. Software like Nagios can not only monitor load, but can also proactively trigger actions based on the monitor it does.
You can generate your Nginx configs from a template that has options to mark each upstream node as up or down. When a monitor detects that the upstream load is too high, it could re-generate the Nginx config from the template as appropriate and then reload Nginx.
A lightweight version of the same idea could done with a script that runs on the same machine as your Nagios server, and performs simple monitoring as well as the config file updates.

When would you need multiple servers to host one web application?

Is that called "clustering" of servers? When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load? Also, is one "server" that's up and running the application called an "instance"?
[...] Is that called "clustering" of servers?
Clustering is indeed using transparently multiple nodes that are seen as a unique entity: the cluster. Clustering allows you to scale: you can spread your load on all the nodes and, if you need more power, you can add more nodes (short version). Clustering allows you to be fault tolerant: if one node (physical or logical) goes down, other nodes can still process requests and your service remains available (short version).
When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load?
In general, this is the job of a dedicated component called a "load balancer" (hardware, software) that can use many algorithms to balance the request: round-robin, FIFO, LIFO, load based...
In the case of EC2, you previously had to load balance with round-robin DNS and/or HA Proxy. See Introduction to Software Load Balancing with Amazon EC2. But for some time now, Amazon has launched load balancing and auto-scaling (beta) as part of their EC2 offerings. See Elastic Load Balancing.
Also, is one "server" that's up and running the application called an "instance"?
Actually, an instance can be many things (depending of who's speaking): a machine, a virtual machine, a server (software) up and running, etc.
In the case of EC2, you might want to read Amazon EC2 Instance Types.
Here is a real example:
This specific configuration is hosted at RackSpace in their Managed Colo group.
Requests pass through a Cisco Firewall. They are then routed across a Gigabit LAN to a Cisco CSS 11501 Content Services Switch (eg Load Balancer). The Load Balancer matches the incoming content to a content rule, handles the SSL decryption if necessary, and then forwards the traffic to one of several back-end web servers.
Each 5 seconds, the load balancer requests a URL on each webserver. If the webserver fails (two times in a row, IIRC) to respond with the correct value, that server is not sent any traffic until the URL starts responding correctly.
Further behind the webservers is a MySQL master / slave configuration. Connections may be mad to the master (for transactions) or to the slaves for read only requests.
Memcached is installed on each of the webservers, with 1 GB of ram dedicated to caching. Each web application may utilize the cluster of memcache servers to cache all kinds of content.
Deployment is handled using rsync to sync specific directories on a management server out to each webserver. Apache restarts, etc.. are handled through similar scripting over ssh from the management server.
The amount of traffic that can be handled through this configuration is significant. The advantages of easy scaling and easy maintenance are great as well.
For clustering, any web request would be handled by a load balancer, which being updated as to the current loads of the server forming the cluster, sends the request to the least burdened server. As for if it's an instance.....I believe so but I'd wait for confirmation first on that.
You'd' need a very large application to be bothered with thinking about clustering and the "fun" that comes with it software and hardware wise, though. Unless you're looking to start or are already running something big, it wouldn't' be anything to worry about.
Yes, it can be required for clustering. Typically as the load goes up you might find yourself with a frontend server that does url rewriting, https if required and caching with squid say. The requests get passed on to multiple backend servers - probably using cookies to associate a session with a particular backend if necessary. You might have the database on a separate server also.
I should add that there are other reasons why you might need multiple servers, for instance there may be a requirement that the database is not on the frontend server for security reasons

Resources