What is the difference between replenishRate and burstCapacity? - spring

In the Redis implementation of the RequestRateLimiter, we must specify two properties redis-rate-limiter.replenishRate and redis-rate-limiter.burstCapacity as arguments for the RequestRateLimiter filter.
According to the documentation,
The redis-rate-limiter.replenishRate is how many requests per second
do you want a user to be allowed to do, without any dropped requests.
This is the rate that the token bucket is filled.
The redis-rate-limiter.burstCapacity is the maximum number of requests
a user is allowed to do in a single second. This is the number of
tokens the token bucket can hold. Setting this value to zero will
block all requests.
From what I see, replenishRate is the rate at which the requests are being made, and the burstCapacity is the maximum requests that can be made (both under one second).
However, I can't seem to understand the difference between the two in a practical scenario.

It's easier to grasp with different time units, e.g:
replenish rate: 1000 requests per minute
burst capacity: 500 requests per second
The former controls that you never get more than 1000 requests in a minute while the latter allows you to support temporary load peaks of up to 500 requests in the same second. You could have one 500 burst in second 0, another 500 burst in second 1 and you would've reached the rate limit (1000 requests within the same minute), so new requests in the following 58 seconds would be dropped.
In the context of Spring Cloud Gateway (SCG) the documentation is kind of ambiguous (the rate limiter needs to be allowed some time...):
A steady rate is accomplished by setting the same value in
replenishRate and burstCapacity. Temporary bursts can be allowed by
setting burstCapacity higher than replenishRate. In this case, the
rate limiter needs to be allowed some time between bursts (according
to replenishRate), as two consecutive bursts will result in dropped
requests (HTTP 429 - Too Many Requests).
Extrapolating from the previous example I'd say that SCG works like this:
replenish rate: 1000 requests per second
burst capacity: 2000 requests per second
You are allowed to have a burst (peak) of 2000 requests in the same second (second 0). Since your replenish rate is 1000 rps, you've already passed two cycles' allowance so you couldn't send another message until second 3.

Related

jmeter thread group vs Constant Throughput Timer

I am conducting a performance test (TPS) using jmeter.
I am requesting about 10,000 TPS, but the following two results are different.
(Position that 10,000 TPS responds normally)
1000 thread x 600 target throughput(in samples per minute)
100 thread x 6000 target throughput(in samples per minute)
I think the two results should be the same, but why is the response time delayed as the thread increases?
I think the two results should be the same - why they would be the same?
Let's imagine your system has fixed response time of 1 second, in that case:
With 1000 threads you will get 1000 requests per second and you can limit the throughput to 10 requests per second using the Constant Throughput Timer
With 100 threads you will get 100 requests per second, no limiting is required
And what if response time is 2 seconds?
With 1000 threads you will get 500 requests per second
With 100 threads you will get 50 requests per second
Constant Throughput Timer:
acts precise enough on "minute" scale, if your test lasts less than minute it might not apply the throughput
can only pause the threads to limit the throughput (requests per minute) to the desired value. If current number of threads is not enough in order to conduct the required load - the time won't have any effect.
If you want to send requests at the rate of 10000 TPS it worth considering going for the Throughput Shaping Timer and Concurrency Thread Group combination connected via the Feedback Function in this case JMeter will be able to kick off extra threads if current number is not sufficient.
But also be informed that:
JMeter should be able to start as many threads as needed to send 10000 TPS so make sure to follow JMeter Best Practices or even consider going for Distributed Testing Mode
Application needs to be able to handle the load and respond fast enough, JMeter waits for the previous response before starting the new request so if application is able to serve i.e. 5000 requests per second only you won't be able to reach 10000 by any means

Read Throughputs in Summary Report

I have hard time to understand the throughput for multiple requests total throughput.
for example, I send 10 request to app server like request 1, request 2, so on, and I get results for them and for total, example, my request 1 has 17/ sec, request 2 has 18/sec and Total throughput is 115/sec.
so the application throughput is 17/sec or 115/sec ?
I don't know how to explain or understand the results.
I think you are confusing between response time and Throughput
Throughput is measured in requests per second
Throughput shows how many requests sent in one second
i.e. 30.0 requests/minute is saved as 0.5.

Azure Table Increased Latency

I'm trying to create an app which can efficiently write data into Azure Table. In order to test storage performance, I created a simple console app, which sends hardcoded entities in a loop. Each entry is 0.1 kByte. Data is sent in batches (100 items in each batch, 10 kBytes each batch). For every batch, I prepare entries with the same partition key, which is generated by incrementing a global counter - so I never send more than one request to the same partition. Also, I control a degree of parallelism by increasing/decreasing the number of threads. Each thread sends batches synchronously (no request overlapping).
If I use 1 thread, I see 5 requests per second (5 batches, 500 entities). At that time Azure portal metrics shows table latency below 100ms - which is quite good.
If I increase the number of treads up to 12 I see x12 increase in outgoing requests. This rate stays stable for a few minutes. But then, for some reason I start being throttled - I see latency increase and requests amount drop.
Below you can see account metrics - highlighted point shows 2K31 transactions (batches) per minute. It is 3850 entries per second. If threads are increased up to 50, then latency increases up to 4 seconds, and transaction rate drops to 700 requests per second.
According to documentation, I should be able to send up to 20K transaction per second within one account (my test account is used only for my performance test). 20K batches mean 200K entries. So the question is why I'm being throttled after 3K entries?
Test details:
Azure Datacenter: West US 2.
My location: Los Angeles.
App is written in C#, uses CosmosDB.Table nuget with the following configuration: ServicePointManager.DefaultConnectionLimit = 250, Nagles Algorithm is disabled.
Host machine is quite powerful with 1Gb internet link (i7, 8 cores, no high CPU, no high memory is observed during the test).
PS: I've read docs
The system's ability to handle a sudden burst of traffic to a partition is limited by the scalability of a single partition server until the load balancing operation kicks-in and rebalances the partition key range.
and waited for 30 mins, but the situation didn't change.
EDIT
I got a comment that E2E Latency doesn't reflect server problem.
So below is a new graph which shows not only E2E latency but also the server's one. As you can see they are almost identical and that makes me think that the source of the problem is not on the client side.

Why would AWS API Gateway let cached queries through to the backend?

I have a GET method in AWS Api Gateway. The cache is enabled for the stage, and works for most requests. However some requests seem to slip through to the backend no matter what I do. That is, some requests going through the API are not cached.
I have defined the parameter a, b & c to be cached; by checking their respective "caching"-box under the "request" settings. There are also other parameters which are not cached.
The request can either have all three parameters or just one:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=foo&d=qux
a, b & c can take on between 3 and 25 different values. But a can only have one value if b & c are present. Also b cannot be present without c and vice versa.
An example, say the cache's TTL is 60 I send this between time 0 and 10:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=quux&d=qux
example.com/?a=foo&b=quux&c=baz&d=qux
example.com/?a=foo&b=corge&c=fred&d=qux
example.com/?a=baz&d=qux
And then between time 30 and 40 I send the same requests and I might see the following log:
example.com/?a=foo&b=bar&c=baz&d=qux
example.com/?a=quux&d=qux
example.com/?a=baz&d=qux
So these requests were cached while the others weren't:
example.com/?a=foo&b=quux&c=baz&d=qux
example.com/?a=foo&b=corge&c=fred&d=qux
In the example above most were not cached but this is not the case in the real case; Most queries are cached. In the real case there are a fairly big number of requests coming in on the second run, about 600/s. In the first run the request-rate is about 1/s. The queries I see slipping through are among the first that would be requested by the application.
It seems unlikely that AWS API Gateway wouldn't be able to handle similar query rates (throttling is enabled at 10 000 requests and 5000 at burst) but yet it seems the first few queries the application sends slip through. Is this to be expected from API Gateway?
I was also thinking that there might be a cache size issue but increasing the cache does not seem to help.
So what reasons could there be for API Gateway to let seemingly cached requests slip through to the backend?
UPDATE: The nature of the application, which creates the requests is that it starts a request chain. Meaning, there are about 500-600 applications which all start at the same time. When they make a handful of asynchronously and then a chain of about 300-500 requests (synchronously).
With this in mind, The burst rate at 0 s is probably much higher. The ~600 requests/s stated above the average of ~36 000 queries over 60 s. Most of the requests would be done at the beginning of those 60 s but I don't have a number on the exact rate. An estimate might be about 1000-2000 requests/s for the first seconds and maybe even more (say 3000+) for the first second.
In short, I still don't know why this happens but I did manage to minimize the number of requests that slipped through.
I did this by having the requesting application delay the start (I explained the nature of the start sequence in the update to the question) by some random time. I let the application pick a random start time between 0 and 3 minutes to avoid spikes to API Gateway.
This didn't eliminate the phenomenon of requests slipping through but it lowered the number from about 500-1500 over 60s to between 0-10 over 3 minutes. Something my backend could easily handle, compared to the 1000+ over 60 s which was on the edge.
It seems to me that when API Gateway is flooded with a large number of requests over a short time it will just pass these requests through. I am surprised (and a little skeptical) that these numbers would be so large as to cause problems for AWS but that is what I see.
Perhaps this can be solved by changing the throttling levels, but I found no difference when playing around with it (mind you, I'm no expert!).

How are server hits/second more than active thread count? | Jmeter

I'm running a load test to test the throughput of a server by making HTTP requests through JMeter.
I'm using the Thread Stepper plugin that allows me to increase the number of threads I'm using to make the requests after a particular time period.
The following graphs show the number of active threads with time and another one shows the corresponding hits per second I was able to make.
The third graph shows the latencies of the requests. The fourth one shows the response per second.
I'm not able to correlate the four graphs together.
In the server hits per second, I'm able to make a maximum of around 240 requests per second with only 50 active threads. However, the latency of the request is around 1 second.
My understanding is that a single thread would make a request, and then wait for the response to return before making the second request.
Since the minimum latency in my case is around 1 second, how is JMeter able to hit 240 requests per second with only 50 threads?
Server hits per second, max of 240 with only 50 threads. How?
Response latencies (minimum latency of 1 sec)
Active threads with time (50 threads when server hits are 240/sec)
Response per second (max of 300/sec, how?)
My expectation is that the reasons could be in:
Response time is less than 1 second therefore JMeter is able to send more than one request per second with every thread
It might also be connected with HTTP redirections and/or Embedded Resources processing, as per plugin's documentation:
Hits uncludes child samples from transactions and embedded resources hits.
For example this single HTTP Request with 1 single user results in 20 sub-samples which are being counted by the "Server Hits Per Second" plugin.
I took some time at analyzing the four graphs you provided and it seems to make sense that Jmeter Graphs are plotted reasonably well (since you feel the Jmeter is plotting incorrectly I will try to explain why the graphs look normal to me) .Taking clue from the point 1 of the answer that #Dmitri T provided I start the below analysis:
1 . Like pointed by #Dimitry T, the number of responses are coming in more faster than than the number of hits(requests) sent to the server; which can be seen from the Number of responses/second graph as the first batch of hits is sent at -between 50 to 70 from 0 to first five minutes . The responses for this set of requests come a a much faster rate in i.e at 60 to 90 from 0 to the first five minutes.. the same trend is observed for the set of hits fired from five to 10 minutes (responses come faster than the requests(hits) i.e 100 to 150 responses compared to 85 to 130 hits) ...Hence by the continuous tned the Load Generator is able to send more hits and more hits and more hits for the 50 active threads...which gives the upwards positive slope coupled with the Thread Stepper plugin's capability..
Hence the hits and responses graph are in lock step pattern(marching in unison) with the response graph having a better slope compared to hits per second graph.
This upwards happy happy trend continues till the queuing effect due to entire processing capacity use ,takes place at 23 minutes. This point in time all the graphs seems to have a opposite effect of what they were doing up till now i.e for 22.59 minutes.
The response latency (i.e the time taken to get the response is increased from 23rd minute on . At the same time there is a drop in hits per second(maybe due to not enough threads available to load generator o fire next request as they(threads aka users) are in queue and have not exited the process to make the next request). This drop in requests have dropped the rate of receiving responses as seen from the number of responses graph. But still you can see "service center" still processing the requests efficiently i.e sending back request faster the arriving rate i.e as per queuing theory the service rate is faster then the arrival rate and hence reinforcing point 1 of our analysis.
At 60 users load .Something happens ..Queuing happens!!(Confirm this by checking drop in response time graph with Throughput graph drop at the same time.If yes then requests were piped-up at the server i.e queued.) and this is the point where all the service centers are busy.and hence a drop in response time which impact the user threads from being able to generate a new hits causing low in hits per second.
The error codes observed in number of responses per second graph namely the 400,403,500 and 504 seem to part of the response codes all, from the 10th user load onwards which may indicate a time bound or data issue(first 10 users of your csv have proper data in database and the rest don't)..
Or it could be with the "credit" or "debit" transaction since chances are both may conflict...or be deadlocked on a Bank account etc.
If you notice the nature of all the error codes they can be seen to be many where more volume of responses are received i.e till 23 minute and reduced in volume since the level of responses are less due to queuing from 23rd minute on wards.Hence directly proportional with response codes. The 504 (gateway timeout) error which is a sure sign of lot of time taken to process and the web server timing out means the load is high..so we can consider the load till 80 users ..i.e at 40th minute as a reasonable load bearing capacity of the system(Obliviously if more 504 errors are observed we can fix that point as the unstressed load the system can handle.)
***Important: Check your HITS per second Graph configuration :Another observation is that the metering parameter to plot the graph could be not in sync with the expected scale i.e per second .Since you are expecting Hits in seconds but in your Hits per second graph you per configuration to plot could be 500 ms i.e half a second.so this could cause the plotting to go up high i.e higher than 50hits per 50 users ..

Resources