Could anyone please tell me what is the use of IP spoofing in terms of Performance Testing?
There are two main reasons for using IP spoofing while load testing a web application:
Routing stickiness (a.k.a Persistence) - Many load balancers use IP stickiness when distriuting incoming load across applications servers. So, if you generate the load from the same IP, you could load only one application server instead of distributing the load to all application servers (This is also called Persistence: When we use Application layer information to stick a client to a single server). Using IP spoofing, you avoid this stickiness and make sure your load is distributed across all application servers.
IP Blocking - Some web applications detect a mass of HTTP requests coming from the same IP and block them to defend themselves. When you use IP spoofing you avoid being detected as a harmful source.
When it comes to load testing of web applications well behaved test should represent real user using real browser as close as possible, with all its stuff like:
Cookies
Headers
Cache
Handling of "embedded resources" (images, scripts, styles, fonts, etc.)
Think times
You might need to simulate requests originating from the different IP addresses if your application (or its infrastructure, like load balancer) assumes that each user uses unique IP address. Also DNS Caching on operating system of JVM level may lead to the situation when all your requests are basically hitting only one endpoint while others remain idle. So if there is a possibility it is better to mimic the requests in that way so they would come from the different addresses.
Related
I wrote a Jmeter test (that uses different user credentials) to load test a web app which has a load balancer and all it forwards the requests to a single node. How can I solve this?
I used the DNS Cache manager but that did not work.
Are there any other tools which I could use? (I looked into AWS Load testing but that too won't work because all the containers would get the same set of user credentials and when parallel tests are run they would fail.)
It depends on the load balancing mechanism used in your load balancer, it might be the case it's looking into the source IP address and forwarding requests from the same IP to the same backend node. You can try using multiple IP addresses (or aliases) and see whether it makes the difference. See IP Spoofing With JMeter: How to Simulate Requests from Different IP Addresses article for more details.
Also adding DNS Cache Manager might be not sufficient, you can try configurign a custom DNS resolver, i.e. 1.1.1.1 as the DNS server so each thread would resolve the underlying IP address on its own
I've used Gatling and Siege to load test my application. However, at certain points (especially when my load is higher), I would get a lot of gateway and requestTimeoutException errors. Since the requests doesn't seems to even get to the app, I presume the issue is to be my IP address being blocked due to the influx of traffic from 1 IP address. How do you overcome this? I'm assuming that the users that Gatling and Siege create to send concurrent requests are all under the same IP of my machine?
This is not possible for Gatling, the relevant feature request has been closed, you might want to consider using Apache JMeter instead, JMeter's HTTP Request sampler has "Source IP" field where you can put the needed IP address or alias
More information: Using IP Spoofing to Simulate Requests from Different IP Addresses with JMeter
In my use case, an outside application would be making multiple requests to my application for different users and I need to relay all the requests for a particular user to same physical server through my load balancer.
Thought of using sticky session, but since the originating address would be common for all the requests not sure how should I go about it ?
In server-side load balancing, the clients call an intermediate server, which then decides which instance of the actual server (or microservice) to call.
In client-side load balancing also, the clients call an intermediate server (the API gateway - Zuul for instance, configured with a load-balancer - Ribbon for instance and a naming server - Eureka for instance), which then decides which instance of the microservice to call.
Unless we include the API gateway as part of the client, the client still doesn't know the IP address of the exact server to which it should send the request. Seems to me, to be a lot like server-side load-balancing. Is there something I'm missing?
(Including the API gateway as part of client seems weird, since its usually deployed on a different server from the client)
In Client Side load balancing, the Client is doing the heavy lifting of discovery and connection to the origin server. The client may reference a lookup (Eureka, Consul, maybe DDNS), to discover what the end destination is and the registry will dole out a valid origin. The communication is direct, client to server without a middle man.
In Server Side load balancing, the client is dumb, and makes a call to a predetermined address (usually DNS or static IP). That device then either proxies (TCP or protocol level) the connection to the origin server based on either a lookup, heartbeat, etc.
I've seen benefits in client side routing in that as long as you have IP connectivity between client and server, the work of the infrastructure is trivial to add new services, locations, products, apps, etc. As long as the new server can "register" with the registry, and the client has IP access to the server, it just works and IT does not have to be involved in rolling out your new service.
The drawback is it makes the client a little more heavy, it does require IP access direct from client to server, and may be confusing for traditional IT folks and auditors. Each client needs to be aware of the registry and have code to make calls (or use a sidecar/sidekick).
I've seen it in practice where a group started to transition their apps to a Docker environment, and they were able to run their Docker based apps along side their non-docker versions at the same time w/o having to get IT involved and do a lot of experimentation and testing quickly and autonomously.
If you have autonomous teams, are highly advanced on the devops spectrum, and have a lot of trust with your teams, Client Side routing and load balancing may be a good experience for you.
Is that called "clustering" of servers? When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load? Also, is one "server" that's up and running the application called an "instance"?
[...] Is that called "clustering" of servers?
Clustering is indeed using transparently multiple nodes that are seen as a unique entity: the cluster. Clustering allows you to scale: you can spread your load on all the nodes and, if you need more power, you can add more nodes (short version). Clustering allows you to be fault tolerant: if one node (physical or logical) goes down, other nodes can still process requests and your service remains available (short version).
When a web request is sent, does it go through the main server, and if the main server can't handle the extra load, then it forwards it to the secondary servers that can handle the load?
In general, this is the job of a dedicated component called a "load balancer" (hardware, software) that can use many algorithms to balance the request: round-robin, FIFO, LIFO, load based...
In the case of EC2, you previously had to load balance with round-robin DNS and/or HA Proxy. See Introduction to Software Load Balancing with Amazon EC2. But for some time now, Amazon has launched load balancing and auto-scaling (beta) as part of their EC2 offerings. See Elastic Load Balancing.
Also, is one "server" that's up and running the application called an "instance"?
Actually, an instance can be many things (depending of who's speaking): a machine, a virtual machine, a server (software) up and running, etc.
In the case of EC2, you might want to read Amazon EC2 Instance Types.
Here is a real example:
This specific configuration is hosted at RackSpace in their Managed Colo group.
Requests pass through a Cisco Firewall. They are then routed across a Gigabit LAN to a Cisco CSS 11501 Content Services Switch (eg Load Balancer). The Load Balancer matches the incoming content to a content rule, handles the SSL decryption if necessary, and then forwards the traffic to one of several back-end web servers.
Each 5 seconds, the load balancer requests a URL on each webserver. If the webserver fails (two times in a row, IIRC) to respond with the correct value, that server is not sent any traffic until the URL starts responding correctly.
Further behind the webservers is a MySQL master / slave configuration. Connections may be mad to the master (for transactions) or to the slaves for read only requests.
Memcached is installed on each of the webservers, with 1 GB of ram dedicated to caching. Each web application may utilize the cluster of memcache servers to cache all kinds of content.
Deployment is handled using rsync to sync specific directories on a management server out to each webserver. Apache restarts, etc.. are handled through similar scripting over ssh from the management server.
The amount of traffic that can be handled through this configuration is significant. The advantages of easy scaling and easy maintenance are great as well.
For clustering, any web request would be handled by a load balancer, which being updated as to the current loads of the server forming the cluster, sends the request to the least burdened server. As for if it's an instance.....I believe so but I'd wait for confirmation first on that.
You'd' need a very large application to be bothered with thinking about clustering and the "fun" that comes with it software and hardware wise, though. Unless you're looking to start or are already running something big, it wouldn't' be anything to worry about.
Yes, it can be required for clustering. Typically as the load goes up you might find yourself with a frontend server that does url rewriting, https if required and caching with squid say. The requests get passed on to multiple backend servers - probably using cookies to associate a session with a particular backend if necessary. You might have the database on a separate server also.
I should add that there are other reasons why you might need multiple servers, for instance there may be a requirement that the database is not on the frontend server for security reasons