ZMQ PUB SUB LOST TCP CONNECTION ON PUB side - zeromq

I use zmq PUB-SUB pattern to notice workers come, after running several days, LOST TCP CONNECTION on pub side last night.
I create ONE PUB on the server and have 230 SUB client.
among them, 90 SUB clients have a slow receive issue because of high CPU work after receiving pub messages. PUB lost TCP connections for these 90 subs.
pyzmq:17.0.0
python:2.7.5
In my program design, the slow SUB should be normal because of worker handle is slow, and HWM should protect PUB-SUB pattern. any suggestion?
[root#localhost apolo]# netstat -an|grep "127.0.0.1:5000 ESTABLISHED"|wc -l
230
[root#localhost apolo]# netstat -an|grep "0 127.0.0.1:5000"|wc -l
141
PUB code
zmq_publish = context.socket(zmq.PUB)
zmq_publish.bind("tcp://127.0.0.1:5000")
SUB code
zmq_subscripe = context.socket(zmq.SUB)
zmq_subscripe.connect("tcp://127.0.0.1:5000")

Related

Postgres connect time delay on Windows

There is a long delay between "forked new backend" and "connection received", from about 200 to 13000 ms. Postgres 12.2, Windows Server 2016.
During this delay the client is waiting for the network packet to start the authentication. Example:
14:26:33.312 CEST 3184 DEBUG: forked new backend, pid=4904 socket=5340
14:26:33.771 CEST 172.30.100.238 [unknown] 4904 LOG: connection received: host=* port=56983
This was discussed earlier here:
Postegresql slow connect time on Windows
But I have not found a solution.
After rebooting the server the delay is much shorter, about 50 ms. Then it gradually increases in the course of a few hours. There are about 100 clients connected.
I use ip addresses only in "pg_hba.conf". "log_hostname" is off.
There is BitDefender running on the server but switching it off did not help. Further, Postgres files are excluded from BitDefender checks.
I used Process Monitor which revealed the following: Forking the postgres.exe process needs 3 to 4 ms. Then, after loading DLLs, postgres.exe is looking for custom and extended locale info of 648 locales. It finds none of these. This locale search takes 560 ms (there is a gap of 420 ms, though). Perhaps this step can be skipped by setting a connection parameter. After reading some TCP/IP parameters, there are no events for 388 ms. This time period overlaps the 420 ms mentioned above. Then postgres.exe creates a thread. The total connection time measured by the client was 823 ms.
Locale example, performed 648 times:
"02.9760160","RegOpenKey","HKLM\System\CurrentControlSet\Control\Nls\CustomLocale","REPARSE","Desired Access: Read"
"02.9760500","RegOpenKey","HKLM\System\CurrentControlSet\Control\Nls\CustomLocale","SUCCESS","Desired Access: Read"
"02.9760673","RegQueryValue","HKLM\System\CurrentControlSet\Control\Nls\CustomLocale\bg-BG","NAME NOT FOUND","Length: 532"
"02.9760827","RegCloseKey","HKLM\System\CurrentControlSet\Control\Nls\CustomLocale","SUCCESS",""
"02.9761052","RegOpenKey","HKLM\System\CurrentControlSet\Control\Nls\ExtendedLocale","REPARSE","Desired Access: Read"
"02.9761309","RegOpenKey","HKLM\System\CurrentControlSet\Control\Nls\ExtendedLocale","SUCCESS","Desired Access: Read"
"02.9761502","RegQueryValue","HKLM\System\CurrentControlSet\Control\Nls\ExtendedLocale\bg-BG","NAME NOT FOUND","Length: 532"
"02.9761688","RegCloseKey","HKLM\System\CurrentControlSet\Control\Nls\ExtendedLocale","SUCCESS",""
No events for 388 ms:
"03.0988152","RegCloseKey","HKLM\System\CurrentControlSet\Services\Tcpip6\Parameters\Winsock","SUCCESS",""
"03.4869332","Thread Create","","SUCCESS","Thread ID: 2036"

Smallest size of message in gRPC

I need to find out what is the average size of request in gRPC when Im sending a string to server and receiving a string from server?
I read somewhere it should be around 20 Bytes but what I see in Network Monitor app is that the request is above 500 Bytes. So is it the smallest possible size of a gRPC message size or what?
For a single rpc, gRPC needs to do a few things:
Establish HTTP/2
(Optional) Establish TLS
Exchange headers for the RPCs (size depends on schema)
Exchange actual RPC messages (size depends on schema)
Close connection
gRPC is meant to be used for many RPCs on a single connection so the smallest possible rpc message is really the bytes used for 4.
[Edit]
I checked and the minimum data exchanged for an rpc is over 500 bytes, in terms of raw IP packets.
I used the gRPC helloworld.proto, changed to send an int32.
Inspecting packets in Wireshark showed the following IP packet totals:
1286 bytes to establish connection, exchange headers and do the first rpc
564 bytes for each subsequent rpc
176 bytes for client Shutdown
Of those 546 "minimum" bytes:
67% was TCP/IP overhead (acknowledgements, packet headers)
10% was "trailer" data sent after the rpc

Running TCP and UDP in parallel using OMNeT++

I would like to use five TCP and five UDP streams (sending and receiving) in parallel on a single host, where the UDP traffic consists of video, and the TCP traffic is arbitrary. How do I use both transport layers in parallel on the same node?
In INET 3.0.0 there is an example nclients in examples\inet directory. It could be a good start point to prepare your model.
As long as the TCP and UDP traffic is independent, you can easily install several UDP and TCP applications at the same time in the same host. Something like this:
**.cli[*].numTcpApps = 2
**.cli[*].tcpApp[0].typename = "TelnetApp"
**.cli[*].tcpApp[1].typename = "TCPBasicClientApp"
**.cli[*].numUdpApps = 2
**.cli[*].udpApp[0].typename = "UDPVideoStreamSvr"
**.cli[*].udpApp[1].typename = "UDPVideoStreamSvr"
// ... further cofiguration of the applications

zeromq performance test. What's the accurate latency?

I'm using zmq to carry message across process, and I want to do some performance test to get the latency and throughout.
The official site gives the guide to tell How to Run Performance Tests
For example, I tried:
local_lat tcp://*:15213 200 100000
remote_lat tcp://127.0.0.1:15213 200 100000
and get the result:
message size: 200 [B]
roundtrip count: 100000
average latency: 13.845 [us]
But when trying the pub-sub example in C++, I found the time interval between sending and receiving is about 150us. (I get the result by print log with timestamp)
Could anybody explain the difference between these two?
EDIT:
I found the question 0mq: pubsub latency continually growing with messages? The result give a nearly constant delay of 0.00015s, which is 150us, same as my test, 10x than the official performance test. Why is the difference?
I'm having the same problem: ZeroMQ - pub / sub latency
I ran wireshark on my example code which publishes a zeromq message every second. Here is the output of wireshark:
145 10.900249 10.0.1.6 -> 10.0.1.6 TCP 89 5557→51723 [PSH, ACK] Seq=158 Ack=95 Win=408192 Len=33 TSval=502262367 TSecr=502261368
146 10.900294 10.0.1.6 -> 10.0.1.6 TCP 56 51723→5557 [ACK] Seq=95 Ack=191 Win=408096 Len=0 TSval=502262367 TSecr=502262367
147 11.901993 10.0.1.6 -> 10.0.1.6 TCP 89 5557→51723 [PSH, ACK] Seq=191 Ack=95 Win=408192 Len=33 TSval=502263367 TSecr=502262367
148 11.902041 10.0.1.6 -> 10.0.1.6 TCP 56 51723→5557 [ACK] Seq=95 Ack=224 Win=408064 Len=0 TSval=502263367 TSecr=502263367
As you can see it's taking about 45 microseconds to send and acknowledge each message. At first I thought that the connection was getting re-established on each message but that's not it. So I turned my attention to the receiver...
while(true)
if(subscriver.recv(&message, ZMQ_NOBLOCK)) {
// print time
}
}
By adding the ZMQ_NOBLOCK and polling in a hard while loop I got the time down to 100us. That still seems large and it comes at the price of spiking one core. But I do feel like I understand the problem slightly better. Any insight would be appreciated.

Getting AltQ working in pf.conf (limiting inbound Tor traffic)

I'm trying to learn the ropes on packet queuing, so I thought I'd set up a limitation on traffic coming into port 80 from known Tor Exit nodes. This is on FreeBSD 9, so OpenBSD-specific solutions might not apply (syntax/etc).
# Snipped to mainly the relevant parts
table <torlist> persist file "/var/db/torlist"
# ...
set block-policy return
scrub in all
scrub out on $ext_if all
# What I *want* to do is create a cue for known tor exit nodes
# no single one IP should be able to do more than 56k/sec
# but the combined bandwidth of all tor visitors should not
# exceed 512k/sec, basically limiting Tor visitors to something
# like dialup
altq on $ext_if cbq bandwidth 512k queue { qin-tor }
queue qin-tor bandwidth 56Kb cbq ( default rio )
# ...
block in log all
antispoof for { $ext_if, $tun_if }
antispoof quick for $int_if inet
### inbound web rules
# Main Jail ($IP4_PUB3 is my webserver IP)
pass in on $ext_if inet proto tcp from <torlist> to $IP4_PUB3 port www synproxy state queue qin-tor
pass in on $ext_if inet proto tcp to $IP4_PUB3 port www synproxy state
The problem is, when the altq, queue, and pass line specific for torlist are enabled, all connections are extremely slow. I've even tested my own IP against pfctl -t torlist -T test , and got back "0/1 addresses match", and if I test one from the list it's "1/1 addresses match"
So I'm not really educated in the matter of what exactly I'm doing wrong, I was assuming the pass in line with in it would only be applied to the IPs listed in that table, as such my own IP wouldn't validate on that rule and would pass onto the next one.
Getting it working isn't urgent, but any help in understanding where I'm failing would be greatly appreciated.
Turns out that I didn't quite understand how altq works. When I created a queue on my external interface with only one queue I created a default for all connections. As a result I had to define my top speed plus create a default queue for everything else.
For example if my system has 100Mb top
altq on $ext_if cbq bandwidth 100Mb queue { qin-www, qin-tor }
queue qin-www bandwidth 98Mb priority 1 cbq ( default borrow )
queue qin-tor bandwidth 56Kb priority 7 cbq ( rio )
...
pass in on $ext_if inet proto tcp to $IP4_PUB3 port www synproxy state
pass in on $ext_if inet proto tcp from <torlist> to $IP4_PUB3 port www synproxy state queue qin-tor
(doesn't need to be on top since pf parses all the rules unless you use 'quick')
In this way only those IPs matching in gets throttled down to the qin-tor queue, everything else not defined defaults to the qin-www queue.
The FAQ on OpenBSD's pf didn't seem to make this clear to me until I thought about why there would be an error for a "default", then I figured maybe it applies to the whole interface, so need to define a default for rules not marked to a specific queue.
So there it is... the solution to my 'simple' problem. Hopefully anyone else who has this problem comes accross this.
This is the FAQ I was going by for packet queueing: http://www.openbsd.org/faq/pf/queueing.html

Resources