I'm trying to use redis pub/sub to transfer data between apps with high velocity (25000 message per second).
I have tested it as below:
topology:
1 publisher, 1 subscriber and redis server. All are hosted in the same pc.
pc hardware:
CPU: Intel(R) Core(TM) I7-4578U CPU#3.00GHz
Memory: 16.0GB
code:
Stopwatch sw = new Stopwatch();
sw.Start();
while (_started)
{
//db.PublishAsync(RawMessagesCapturedMsg.TopicGroupName, redisValue);
db.Publish(RawMessagesCapturedMsg.TopicGroupName, redisValue);
totalRedisMsg++;
if (totalRedisMsg % 10000 == 0)
{
Console.WriteLine("totalRedisMsg: {0} # {1}, time used(ms): {2}",
totalRedisMsg, DateTime.Now, sw.ElapsedMilliseconds);
}
}
sw.Stop();
results:
As shown in the results, it will take about 6 second to publish 10k messages.
I want to confirm that is it the actual performance of redis (or StackExchange.Redis)? Or there is something wrong with my test?
Update:
According to the accepted answer, I found out the reason is my message size is too large (300kB).
A things to check:
What is the CPU load? Is it full? If not, you're probably stuck on bandwidth or latency.
How much is the size of the message? Multiply it by the transfer
rate you see, is it comparable to the bandwidth you (expect to)
have?
What is the ping to the Redis instance? Maybe round trips are taking much time. In that case you can use many threads with many connections to increase throughput.
I had a benchmark at my hand that I used to answer another question. And in Java (lettuce client library) I have this results for 1 thread, local cpu i5-6400, remote cpu E5-2603 v4, 0.180ms ping to remote and the message is "hello".
Benchmark (address) Mode Cnt Score Error Units
LettuceThreads.pooled socket thrpt 5 35699.267 ± 706.946 ops/s
LettuceThreads.pooled localhost thrpt 5 28130.801 ± 9476.584 ops/s
LettuceThreads.pooled remote thrpt 5 3080.115 ± 422.390 ops/s
LettuceThreads.shared socket thrpt 5 41717.332 ± 3559.226 ops/s
LettuceThreads.shared localhost thrpt 5 31092.925 ± 9894.748 ops/s
LettuceThreads.shared remote thrpt 5 3920.260 ± 178.637 ops/s
Compare it to the hardware you have, maybe it will help you evaluate your library performance. Note, how performance drops 10x for remote, even knowing that CPU is slower 2x, that is a lot.
And the following is for 16 threads. So, as you see, bigger number of threads may help at least get the throughput despite the latency.
Benchmark (address) Mode Cnt Score Error Units
LettuceThreads.pooled socket thrpt 5 123846.426 ± 2926.807 ops/s
LettuceThreads.pooled localhost thrpt 5 83997.678 ± 31410.595 ops/s
LettuceThreads.pooled remote thrpt 5 31045.111 ± 2198.065 ops/s
LettuceThreads.shared socket thrpt 5 218331.662 ± 17459.352 ops/s
LettuceThreads.shared localhost thrpt 5 182296.689 ± 52163.154 ops/s
LettuceThreads.shared remote thrpt 5 30803.575 ± 2128.306 ops/s
Related
I have a Redis standalone instance in production. Earlier 8 instances of my application, each having 64 Redis connections(total 12*64) at a rate of 2000 QPS per instance would give me a latency of < 10ms(which I am fine with). Due to an increase in traffic, I had to increase the number of application instances to 16, while also decreasing the connection count per instance from 128 to 16 (total 16*16=256). This was done after benchmarking with memtier benchmark as below
12 Threads
64 Connections per thread
2000 Requests per thread
ALL STATS
========================================================================
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
------------------------------------------------------------------------
Sets 0.00 --- --- 0.00000 0.00
Gets 79424.54 516.26 78908.28 9.90400 2725.45
Waits 0.00 --- --- 0.00000 ---
Totals 79424.54 516.26 78908.28 9.90400 2725.45
16 Threads
16 Connections per thread
2000 Requests per thread
ALL STATS
========================================================================
Type Ops/sec Hits/sec Misses/sec Latency KB/sec
------------------------------------------------------------------------
Sets 0.00 --- --- 0.00000 0.00
Gets 66631.87 433.11 66198.76 3.32800 2286.47
Waits 0.00 --- --- 0.00000 ---
Totals 66631.87 433.11 66198.76 3.32800 2286.47
Redis benchmark gave similar results.
However, when I made this change in Production, (16*16), the latency shot up back to 60-70ms. I thought the connection count provisioned was less (which seemed unlikely) and went back to 64 connections (64*16), which as expected increased the latency further. For now, I have half of my applications hitting the master Redis and the other half connected to slave with each having 64 connections (8*64 to master, 8*64 to slave) and this works for me(8-10ms latency).
What could have gone wrong that the latency increased with 256 (16*16) connections but reduced with 512(64*8)connections even though the benchmark says otherwise? I agree to not fully trust the benchmark, but even as a guideline, these are polar opposite results.
Note: 1. Application and Redis are colocated, there is no network latency, memory used is about 40% in Redis and the fragmentation ratio is about 1.4. The application uses Jedis for connection pooling. 2. The latency does not include the overhead of Redis miss, only the Redis round trip is considered.
Running latency test provided in perf/*_lat on single processor and mutiprocessor server showing a huge variance in latency figures.
16 cpu machine
./remote_lat tcp://127.0.0.1:5555 30 1000
message size: 30 [B]
roundtrip count: 1000
average latency: 97.219 [us]
single cpu machine
/remote_lat tcp://127.0.0.1:5555 30 1000
message size: 30 [B]
roundtrip count: 1000
average latency: 27.195 [us]
Running both the processes in the same machine.
Using libzmq 4.2.5
update:
my laptop
intel core i5 7th generation
16 cpu server
Intel(R) Xeon(R) CPU # 2.30GHz
I have a problem with experiment on my computer. I've done 300 tests of parallel algorithm (32 threads) and seen, that runtime of about 10% tests is less than others. It looks like that: we have 100 tests with runtime of each about 100 ms, then we have 30 tests with runtime ~ 80 ms and again 170 tests with runtime ~100 ms. It happens every experiment. I used OpenMP, TBB, PTHREAD, std::Thread and it happens with every parallel technology.
What's the reason of that?
CPU: Intel® Core™ i7 Kaby Lake H 2800 - 3800 MHz
Cores: 4
Threads: 8
we have got TOTAL
Label: 10
Average: 1288
Median: 1278
90%: 1525
95%: 1525
99%: 1546
Min: 887
Max: 1546
Throughput: 6.406149903907751
KB/sec: 39.21264413837284
What do means of means KB/sec? please help me understand ot it
According to the Glossary
KB/s(Aggregate Report)
Throughput is measured in bytes and represents the amount of data that the Virtual users received from the server.Throughput KPI is measured in kilobytes(KB) per seconds.
So basically it is average amount of data received by JMeter from the application under test per second.
KB/sec is the speed of a connection.
KB meaning Kilobyte and sec meaning per second
You get faster speeds of MB/sec which is Megabyte and even faster speeds of GB/sec which is Gigabytes
1000 KB = 1 MB
1000 MB = 1 GB
Hope this helps :)
I just installed Ubuntu 10.10 server with NodeJS 0.4.6 using this guide: http://www.codediesel.com/linux/installing-node-js-on-ubuntu-10-04/ on my laptop:
Acer 5920G (Intel Core 2 Duo (2ghz), 4 gb ram)
After that I created a little test how nodejs would perform and wrote this little hello world script:
var http = require('http');
http.createServer(function(req, res) {
res.writeHead(200, {'Content-Type': 'text/html'});
res.write('Hello World');
res.end();
}).listen(8080);
Now to test the performance i used Apache Benchmark on Windows with the following settings
ab -r -c 1000 -n 10000 http://192.168.1.103:8000/
But the results are very low compared to http://zgadzaj.com/benchmarking-node-js-testing-performance-against-apache-php/
Server Software:
Server Hostname: 192.168.1.103
Server Port: 8000
Document Path: /
Document Length: 12 bytes
Concurrency Level: 1000
Time taken for tests: 23.373 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 760000 bytes
HTML transferred: 120000 bytes
Requests per second: 427.84 [#/sec] (mean)
Time per request: 2337.334 [ms] (mean)
Time per request: 2.337 [ms] (mean, across all concurrent requests)
Transfer rate: 31.75 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 1.3 1 28
Processing: 1236 2236 281.2 2327 2481
Waiting: 689 1522 169.5 1562 1785
Total: 1237 2238 281.2 2328 2484
Percentage of the requests served within a certain time (ms)
50% 2328
66% 2347
75% 2358
80% 2364
90% 2381
95% 2397
98% 2442
99% 2464
100% 2484 (longest request)
Any one got a clue? (Compile, Hardware problem, Drivers, Configuration, Slow script)
Edit 4-17 14:04 GMT+1
I am testing the machine over 1Gbit local connection. When I ping it gives me 0 ms so that would be good I guess. When I issue the apachebenchmark on my Windows 7 machine the CPU raises to 100% :|
It seems like you are running the test over a medium with a high Bandwidth-Delay Product; in your case, high latency (>1s). Assuming 1s delay, a 100MBit link and 76 Bytes per request, you need more than 150000 requests in parallel to saturate it.
First, test the latency (with ping or so). Also, watch the CPU and network usage on all participating machines. This will give you an indication of the bottleneck in your tests. What are the benchmark results for an Apache webserver?
Also, it could be hardware/driver problem. Watch dmesg on both machines. And although it's probably not the reason for this specific problem, don't forget to change the CPU speed governor to performance on both machines!