Redis scale with number of clients - parallel-processing

I am testing Redis performance on my local machine and I want to know how well Redis can scale when number of parallel connections increases. My machine has 24 cores.
At first, I tested with -c = 8, the benchmark command is ./redis-benchmark -c 1 -n 100000 -t set,get. The result is around 70K requests/s. Then I run ./redis-benchmark -c 8 -n 100000 -t set,get. The result is 200K requests/s. Finally I run ./redis-benchmark -c 10 -n 100000 -t set,get. It's still around 200K requests/s. I expected the throughput to increase around 8 times when the number of parallel connections increase 8 times. Also, why -c = 8 and -c = 10 has no difference? Many thanks for your time.

Redis is single-threaded. The maximum QPS it can achieve, is limited by the power of a single processor. 200K might be the maximum QPS it can achieve (based on your hardware environment).
If you want to achieve higher QPS, you need a more powerful CPU or more Redis instances.

Related

Understanding MPI multi-host mode

I have two hosts with the same number of cores (20 = 10 virtual + 10 real).
Just to test my MPI performance I use simple matrix multiplication program.
I understand MPI with two hosts as "let's start maximum cores in every host"
The problem is the following:
In one-node mode I execute mpirun -n 20 mm and obtain executed time nearly 0.5 sec.
Then in multi-node mode I execute mpirun -n 20 --host srv1:20,srv2:20 mm and obtain time nearly 0.5 sec too.
So there are no better performance with two hosts usage (but it's expected).
What settings, options, configuration files (and so on) I should check&repair to get expected result?
Thanks.

How to verify JMeter's performance on a distributed performance test?

I'm doing a REST API performance test, where I have to do a lot of requests simultaneously. To do it I'm using 3 JMeter instances (1 master and 2 slaves).
To give you some more contest, I did a JMeter script with 2 thread groups, and on each group I have 150 threads and a constant throughput timer.
Here is the command line I use to launch the test:
./jmeter -n -t ./script.jmx -l ./samples.csv -e -o ./dashboard -R 127.0.0.1,192.168.1.96,192.168.1.175 -Gthroughput=900000 -Gduration=10 -Gvmnb=3 -G ./API.properties
In this command line, throughput is the total throughput that I'm aiming for the 3 servers (it's value is divided by vmnb, my 3rd variable, and then each server do this part of the throughput) and duration is the duration of the test.
In this case, the constant throughput should be 900K (300K per server) for 10 minutes. The ramp-up period is 5 minutes (duration/2)
Now my question:
If I understood correctly, at the end I should have 900K * 10 minutes = 9000K samples in my result file (per API).
On my JMeter's dashboard, I have only 200K and 160K samples for each url. even if it only manages to see the Master server (I think), I'm far away from the expected results, no?
dashboard image (I can't upload an image yet...)
Am I missing something or I'm having some performance issues with my VMs, and they aren't able to deliver the high throughput?
I would like to thank you all in advance for your help,
Best regards,
Marc
JMeter master doesn't generate any load, it only loads the test plan and sends it to the slave instances so in your setup you have 2 load generators
Constant Throughput Timer can only pause the threads to limit JMeter's throughput to the given value so you need to ensure that you have enough threads to produce the desired throughput. If your target is 9M samples in 10 minutes it means 900k samples per minute or 450k samples per minute per slave which gives 7500 requests per second. In order to have 7500 requests per second with 150 threads you need to have 0.02 seconds response time while your average response time is around 1 second.
Assuming the above I would recommend switching to Throughput Shaping Timer and Concurrency Thread Group combination. They can be connected via Scheduled Feedback Function so JMeter will be able to kick off extra threads to reach and maintain the defined throughput.
Also make sure to follow JMeter Best Practices as 7500 RPS is quite a high load and you need the confidence that JMeter is capable of sending requests fast enough

How does lftp calculate the throughput in parallel mode?

I'm using lftp (lftp --version shows Version 4.0.9) in mirror mode to test the performance of some sftp servers I'm specially interested in the throughput (bytes/sec) when I run lftp with a different number of concurrent connections.
When I ran the test with 25 concurrent connections it gave me a rather strange number of 5866 seconds as time to download. To check what was the real time spent in the download I used the time command (as suggested in this related question).
The output was:
$ time lftp -e 'mirror --parallel=25 (rest of the command-line)'
21732837094 bytes transferred in 5866 seconds (3.53M/s)
real 4m31.315s
user 1m25.977s
sys 1m38.041s
My first thought was that those 5866 seconds where the sum of the time spent by every connection, so dividing that by 25 gives me 234,64 seconds (or 03m54.64s) which is kind of far from 4m31.315s.
Does anyone have an insight on how the numbers from lftp are calculated?
Before lftp-4.5.0 mirror summed overlapping durations of the parallel transfers (incorrectly). It was fixed in 4.5.0 to count wall clock time when any of the transfers was active.

bash loop with curl evidencing non-linear scaling of response times

I wrote this simple Bash script to detect incidence of error-pages:
date;
iterations=10000;
count_error=0;
count_expected=0;
for ((counter = 0; counter < iterations; ++counter)); do
if curl -s http://www.example.com/example/path | grep -iq error;
then
((count_error++));
else
((count_expected++));
fi;
sleep 0.1;
done;
date;
echo count_error=$count_error count_expected=$count_expected
I'm finding total execution-time does not scale linearly with iteration count. 10 iterations 00:00:12, 100 in 00:01:46, 1000 in 00:17:24, 10000 in ~50 mins, 100000 in ~10 hrs
Can anyone provide insights into the non-linearity and/or improvements to the script? Is curl unable to fire requests at rate of 10/sec? Is GC having to periodically clear internal buffers filling up with response text ?
Here are a few thoughts:
You are not creating 10 requests per second here (as you stated in the question), instead you are running the requests sequentially, i.e. as many per second as possible.
The ; at the end of each line is not required in Bash.
When testing your script from my machine against a different URL, 10 iterations take 3 seconds, 100 take 31 seconds, and 1000 take 323 seconds, so the execution time scales linearly in this range.
You could try using htop or top to identify performance bottlenecks on your client or server.
The apache benchmark tool ab is a standard tool to benchmark web servers and available on most distributions. See manpage ab(1) for more information.

Maximizing parallel http requests on OS-X

I have to scrape thousands of different websites, as fast as possible. With my app running on a single process I was able to fetch 10 urls per second. Though if I fork it to 10 worker processes, I can reach 64 reqs/sec.
Why is so? Why I am limited to 10 reqs/sec on a single process and have to spawn workers to reach 64 reqs/sec
I am not reaching max sockets/host limit: all urls are from unique hosts.
I am not reaching max file descriptors limit (AFAIK): my ulimit -n is 2560, and lsof shows that my scraper never uses more than 20 file descriptors.
I've increased settings for kern.maxfiles, kern.maxfilesperproc, kern.ipc.somaxconn, and kern.ipc.maxsockets in sysctl.conf, and rebooted. No effect.
Tried increasing ulimit -n. No change.
Is there any limit I don't know about?

Resources