How should map multiple requests to the same physical server through my load balancer if the source IP is constant - session

In my use case, an outside application would be making multiple requests to my application for different users and I need to relay all the requests for a particular user to same physical server through my load balancer.
Thought of using sticky session, but since the originating address would be common for all the requests not sure how should I go about it ?

Related

Load testing a web app which has a load balancer

I wrote a Jmeter test (that uses different user credentials) to load test a web app which has a load balancer and all it forwards the requests to a single node. How can I solve this?
I used the DNS Cache manager but that did not work.
Are there any other tools which I could use? (I looked into AWS Load testing but that too won't work because all the containers would get the same set of user credentials and when parallel tests are run they would fail.)
It depends on the load balancing mechanism used in your load balancer, it might be the case it's looking into the source IP address and forwarding requests from the same IP to the same backend node. You can try using multiple IP addresses (or aliases) and see whether it makes the difference. See IP Spoofing With JMeter: How to Simulate Requests from Different IP Addresses article for more details.
Also adding DNS Cache Manager might be not sufficient, you can try configurign a custom DNS resolver, i.e. 1.1.1.1 as the DNS server so each thread would resolve the underlying IP address on its own

Why should we use IP spoofing when performance testing?

Could anyone please tell me what is the use of IP spoofing in terms of Performance Testing?
There are two main reasons for using IP spoofing while load testing a web application:
Routing stickiness (a.k.a Persistence) - Many load balancers use IP stickiness when distriuting incoming load across applications servers. So, if you generate the load from the same IP, you could load only one application server instead of distributing the load to all application servers (This is also called Persistence: When we use Application layer information to stick a client to a single server). Using IP spoofing, you avoid this stickiness and make sure your load is distributed across all application servers.
IP Blocking - Some web applications detect a mass of HTTP requests coming from the same IP and block them to defend themselves. When you use IP spoofing you avoid being detected as a harmful source.
When it comes to load testing of web applications well behaved test should represent real user using real browser as close as possible, with all its stuff like:
Cookies
Headers
Cache
Handling of "embedded resources" (images, scripts, styles, fonts, etc.)
Think times
You might need to simulate requests originating from the different IP addresses if your application (or its infrastructure, like load balancer) assumes that each user uses unique IP address. Also DNS Caching on operating system of JVM level may lead to the situation when all your requests are basically hitting only one endpoint while others remain idle. So if there is a possibility it is better to mimic the requests in that way so they would come from the different addresses.

User state in a big (high traffic) application

Assumptions -
There are 4 servers sitting behind a reverse proxy which acts as a load balancer
Load Balancer is purely load balancing and sends a request to any of the 4 servers depending on their current load
Users need to be authenticated for accessing this application, and some space should hold state of all users, as reverse-proxy is only load balancing
Application needs to scale beyond 4 servers, say to 4000 servers.
Question -
In a large scale multi-server system who holds state of all the users - load balancer, each server, separate server?
Is the state of all users saved on all servers so that the load balancer can send a request to any server? How does this scale to 100m users?
You can use sticky sessions. It enables the load balancer to bind a user's session to a specific instance. This ensures that all requests from the user during the session are sent to the same instance. Read Sticky and NON-Sticky sessions.
Also let's say the instance gets killed due to some reason, in order to maintain the state, the authentication token and other information can also be saved on a separate redis cache, which is much faster to query. Read Session Management in microservices
In a stateless multi-server system, a separate server (authentication server) or a separate server cluster (authentication api) holds state of all users. If its a single authentication server for a large application, you can expect it to have RAM in the range of 100's of GBs, maybe more.
Nope, state of all users is usually not replicated on all application servers, it will be a huge waste of resources. Authentication server (or server cluster) may act as load balancer itself or forward all requests to a separate load balancer - true for a stateless application.
In a stateful application, individual servers hold state of users through sticky sessions.
If possible, try to keep your application stateless. A stateless application will have better performance and will be easier to scale out than a stateful application!

Getting (non-HTTP) Client IP with load-balancer

Say I want to run something like the nyan cat telnet server (http://miku.acm.uiuc.edu/) and I need to handle 10,000 concurrent connections total. I have 10 servers in addition to a load balancer. Each server can handle 1,000 concurrent connections, and I want to put a load balancer in front of it to randomly divide the traffic to the 10 servers.
From what I've read, it's fairly simple for a load balancer to pass an HTTP request (along with the client IP) to the backend server, perhaps with FastCGI or with an X- header.
What would be the simplest way for the load balancer to pass the client IP to the backend server in this case with a simple TCP server? Would a hardware load balancer be needed, or are there ways to do this simply through software?
In other words, is there a uniform way to pass client IP when load balancing for non-HTTP stuff? The same way Google gets client IP when they load-balances Google Talk XMPP server or their Gmail IMAP server
This isn't for anything in specific; I'm just curious about if and how it can be done. Thanks in advance!
The simplest way would be for the load balancer to make itself completely invisible and pass the connection on with the source and destination IP address unmolested. For this to work, the same IP address must be assigned (as a loopback address, not to a physical interface) to all 10 servers and that would be the IP address the clients connect to. Internet traffic to that IP address has to go to the load balancer. The load balancer must be the default gateway for the servers.

When would you need multiple servers to host one web application?

Is that called "clustering" of servers? When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load? Also, is one "server" that's up and running the application called an "instance"?
[...] Is that called "clustering" of servers?
Clustering is indeed using transparently multiple nodes that are seen as a unique entity: the cluster. Clustering allows you to scale: you can spread your load on all the nodes and, if you need more power, you can add more nodes (short version). Clustering allows you to be fault tolerant: if one node (physical or logical) goes down, other nodes can still process requests and your service remains available (short version).
When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load?
In general, this is the job of a dedicated component called a "load balancer" (hardware, software) that can use many algorithms to balance the request: round-robin, FIFO, LIFO, load based...
In the case of EC2, you previously had to load balance with round-robin DNS and/or HA Proxy. See Introduction to Software Load Balancing with Amazon EC2. But for some time now, Amazon has launched load balancing and auto-scaling (beta) as part of their EC2 offerings. See Elastic Load Balancing.
Also, is one "server" that's up and running the application called an "instance"?
Actually, an instance can be many things (depending of who's speaking): a machine, a virtual machine, a server (software) up and running, etc.
In the case of EC2, you might want to read Amazon EC2 Instance Types.
Here is a real example:
This specific configuration is hosted at RackSpace in their Managed Colo group.
Requests pass through a Cisco Firewall. They are then routed across a Gigabit LAN to a Cisco CSS 11501 Content Services Switch (eg Load Balancer). The Load Balancer matches the incoming content to a content rule, handles the SSL decryption if necessary, and then forwards the traffic to one of several back-end web servers.
Each 5 seconds, the load balancer requests a URL on each webserver. If the webserver fails (two times in a row, IIRC) to respond with the correct value, that server is not sent any traffic until the URL starts responding correctly.
Further behind the webservers is a MySQL master / slave configuration. Connections may be mad to the master (for transactions) or to the slaves for read only requests.
Memcached is installed on each of the webservers, with 1 GB of ram dedicated to caching. Each web application may utilize the cluster of memcache servers to cache all kinds of content.
Deployment is handled using rsync to sync specific directories on a management server out to each webserver. Apache restarts, etc.. are handled through similar scripting over ssh from the management server.
The amount of traffic that can be handled through this configuration is significant. The advantages of easy scaling and easy maintenance are great as well.
For clustering, any web request would be handled by a load balancer, which being updated as to the current loads of the server forming the cluster, sends the request to the least burdened server. As for if it's an instance.....I believe so but I'd wait for confirmation first on that.
You'd' need a very large application to be bothered with thinking about clustering and the "fun" that comes with it software and hardware wise, though. Unless you're looking to start or are already running something big, it wouldn't' be anything to worry about.
Yes, it can be required for clustering. Typically as the load goes up you might find yourself with a frontend server that does url rewriting, https if required and caching with squid say. The requests get passed on to multiple backend servers - probably using cookies to associate a session with a particular backend if necessary. You might have the database on a separate server also.
I should add that there are other reasons why you might need multiple servers, for instance there may be a requirement that the database is not on the frontend server for security reasons

Resources