zeromq performance test. What's the accurate latency? - zeromq

I'm using zmq to carry message across process, and I want to do some performance test to get the latency and throughout.
The official site gives the guide to tell How to Run Performance Tests
For example, I tried:
local_lat tcp://*:15213 200 100000
remote_lat tcp://127.0.0.1:15213 200 100000
and get the result:
message size: 200 [B]
roundtrip count: 100000
average latency: 13.845 [us]
But when trying the pub-sub example in C++, I found the time interval between sending and receiving is about 150us. (I get the result by print log with timestamp)
Could anybody explain the difference between these two?
EDIT:
I found the question 0mq: pubsub latency continually growing with messages? The result give a nearly constant delay of 0.00015s, which is 150us, same as my test, 10x than the official performance test. Why is the difference?

I'm having the same problem: ZeroMQ - pub / sub latency
I ran wireshark on my example code which publishes a zeromq message every second. Here is the output of wireshark:
145 10.900249 10.0.1.6 -> 10.0.1.6 TCP 89 5557→51723 [PSH, ACK] Seq=158 Ack=95 Win=408192 Len=33 TSval=502262367 TSecr=502261368
146 10.900294 10.0.1.6 -> 10.0.1.6 TCP 56 51723→5557 [ACK] Seq=95 Ack=191 Win=408096 Len=0 TSval=502262367 TSecr=502262367
147 11.901993 10.0.1.6 -> 10.0.1.6 TCP 89 5557→51723 [PSH, ACK] Seq=191 Ack=95 Win=408192 Len=33 TSval=502263367 TSecr=502262367
148 11.902041 10.0.1.6 -> 10.0.1.6 TCP 56 51723→5557 [ACK] Seq=95 Ack=224 Win=408064 Len=0 TSval=502263367 TSecr=502263367
As you can see it's taking about 45 microseconds to send and acknowledge each message. At first I thought that the connection was getting re-established on each message but that's not it. So I turned my attention to the receiver...
while(true)
if(subscriver.recv(&message, ZMQ_NOBLOCK)) {
// print time
}
}
By adding the ZMQ_NOBLOCK and polling in a hard while loop I got the time down to 100us. That still seems large and it comes at the price of spiking one core. But I do feel like I understand the problem slightly better. Any insight would be appreciated.

Related

Slow Ingestion of Network Packets

I am using a Packet Capturing and Analysis Tool named as Cisco Joy for generating network flows.
Here is the link: https://github.com/cisco/joy
So Joy is a Packet capturing and analysis tool which uses a configuration file to capture Packets on a network interface and return json files as output in a directory.
I have configured Cisco Joy with AF_Packet to generate the network flows.
So I have been trying to process the network packets using tcpreplay on a virtual network interface at a speed of 3 GBPS but Joy is not receiving the packets at the same speed.
Actual: 450889 packets (397930525 bytes) sent in 1.06 seconds
Rated: 374307598.1 Bps, 2994.46 Mbps, 424122.22 pps
Flows: 12701 flows, 11947.01 fps, 450298 flow packets, 591 non-flow
Statistics for network device: vth0
Successful packets: 450889
Failed packets: 0
Truncated packets: 0
Retried packets (ENOBUFS): 0
Retried packets (EAGAIN): 0
Packets Received from Cisco Joy: 260850
So Here tcpreplay sent over 400k packets but Cisco Joy received only around 260k.
I have changed the buffer size of the length in which the packets have been captured but still I didn't find any resolution from that so anyone have any clue about this?

ZMQ PUB SUB LOST TCP CONNECTION ON PUB side

I use zmq PUB-SUB pattern to notice workers come, after running several days, LOST TCP CONNECTION on pub side last night.
I create ONE PUB on the server and have 230 SUB client.
among them, 90 SUB clients have a slow receive issue because of high CPU work after receiving pub messages. PUB lost TCP connections for these 90 subs.
pyzmq:17.0.0
python:2.7.5
In my program design, the slow SUB should be normal because of worker handle is slow, and HWM should protect PUB-SUB pattern. any suggestion?
[root#localhost apolo]# netstat -an|grep "127.0.0.1:5000 ESTABLISHED"|wc -l
230
[root#localhost apolo]# netstat -an|grep "0 127.0.0.1:5000"|wc -l
141
PUB code
zmq_publish = context.socket(zmq.PUB)
zmq_publish.bind("tcp://127.0.0.1:5000")
SUB code
zmq_subscripe = context.socket(zmq.SUB)
zmq_subscripe.connect("tcp://127.0.0.1:5000")

Performance issues running kafacat over slow speed link

I have weird performance issues with fetch.max.message.bytes parameter in librdkafka consumer implementation (version 0.11). I run some tests using kafkacat over slow speed network link (4 Mbps) and received following results:
1024 bytes = 1.740s
65536 bytes = 2.670s
131072 bytes = 7.070s
When I started debugging protocol messages I noticed a way to high RTT values.
|SEND|rdkafka| Sent FetchRequest (v4, 68 bytes # 0, CorrId 8)
|RECV|rdkafka| Received FetchResponse (v4, 131120 bytes, CorrId 8, rtt 607.68ms)
It seems that increase of fetch.max.message.bytes value causes very high network saturation, but it carries only single message per request.
On the other hand when I try kafka-console-consumer everything runs as expected (I get throughput 500 messages per second over the same network link).
Any ideas or suggestions where to look at?
You are most likely hitting issue #1384 which is a bug with the new v0.11.0 consumer. The bug is particularly evident on slow links or with MessageSets/batches with few messages.
A fix is on the way.

WRK benchmark: Please explain results

I'm trying to perform benchmark blocking vs non-blocking io.
As a blocking, I use spring-boot.
As a non-blocking - play framework.
I Call endpoint which makes 4 remote calls (sequentially)
Here are results:
Spring boot
Running 5m test # http://localhost:8080/remote-multiple
4 threads and 20000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 713.90ms 429.81ms 2.00s 82.16%
Req/Sec 33.04 22.55 340.00 68.84%
9602 requests in 5.00m, 201.85MB read
Socket errors: connect 15145, read 21942, write 0, timeout 2401
Requests/sec: 32.00
Transfer/sec: 688.83KB
Play framework
Running 5m test # http://localhost:9000/remote-multiple
4 threads and 20000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.40s 395.00ms 2.00s 54.73%
Req/Sec 37.97 21.21 230.00 70.71%
39792 requests in 5.00m, 846.41MB read
Socket errors: connect 15145, read 36185, write 60, timeout 35944
Requests/sec: 132.61
Transfer/sec: 2.82MB
Though Play shows higher Requests/sec, it has more errors, timeout, latency.
Can anybody pls explain, what do all those params in result mean?
Are Requests/sec - succesfull requests per second? etc
P.S.:
I run this benchmark on MBP 2013, Intel Core i7, 2.3 GHz, 16GB
If you post benchmarks : Start with a link to the actual benchmark code. It has no value without. Second : In general, testing code on the same machine is considered bad practice.

What's a great way to benchmark Apache locally on Linux?

I've been developing a web-site that uses Django and MySQL; what I want to know is how many HTTP requests my server can handle serving certain pages.
I have been using siege but I'm not sure if that's a good benchmarking tool.
ab, the Apache HTTP server benchmarking tool. Many options. An example of use with ten concurrent requests:
% ab -n 20 -c 10 http://www.bortzmeyer.org/
...
Benchmarking www.bortzmeyer.org (be patient).....done
Server Software: Apache/2.2.9
Server Hostname: www.bortzmeyer.org
Server Port: 80
Document Path: /
Document Length: 208025 bytes
Concurrency Level: 10
Time taken for tests: 9.535 seconds
Complete requests: 20
Failed requests: 0
Write errors: 0
Total transferred: 4557691 bytes
HTML transferred: 4551113 bytes
Requests per second: 2.10 [#/sec] (mean)
Time per request: 4767.540 [ms] (mean)
Time per request: 476.754 [ms] (mean, across all concurrent requests)
Transfer rate: 466.79 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 22 107 254.6 24 854
Processing: 996 3301 1837.9 3236 8139
Waiting: 23 25 1.3 25 27
Total: 1018 3408 1795.9 3269 8164
Percentage of the requests served within a certain time (ms)
50% 3269
66% 4219
...
(In that case, network latency was the main slowness factor.)
ab reports itself in the User-Agent field so, in the log of the HTTP server, you'll see something like:
2001:660:3003:8::4:69 - - [28/Jul/2009:12:22:45 +0200] "GET / HTTP/1.0" 200 208025 "-" "ApacheBench/2.3" www.bortzmeyer.org
ab is a widely used benchmarking tool that comes with apache httpd
Grinder is pretty good. It lets you simulate coordinated load from several client machines, which is more meaningful than from a single machine.
There's also JMeter.
I've used httperf and it's quite easy to use. There's a peepcode screencast on how to use it as well.

Resources