Using the majordomo broker with asynchronous clients

Using the majordomo broker with asynchronous clients - zeromq

While reading the zeromq guide, I came across client code which sends 100k requests in a loop, and then receives the reply in a second loop.
#include "../include/mdp.h"
#include <time.h>
int main (int argc, char *argv [])
{
int verbose = (argc > 1 && streq (argv [1], "-v"));
mdp_client_t *session = mdp_client_new ("tcp://localhost:5555", verbose);
int count;
for (count = 0; count < 100000; count++) {
zmsg_t *request = zmsg_new ();
zmsg_pushstr (request, "Hello world");
mdp_client_send (session, "echo", &request);
}
printf("sent all\n");
for (count = 0; count < 100000; count++) {
zmsg_t *reply = mdp_client_recv (session,NULL,NULL);
if (reply)
zmsg_destroy (&reply);
else
break; // Interrupted by Ctrl-C
printf("reply received:%d\n", count);
}
printf ("%d replies received\n", count);
mdp_client_destroy (&session);
return 0;
}
I have added a counter to count the number of replies that the worker (test_worker.c) sends to the broker, and another counter in mdp_broker.c to count the number of replies the broker sends to a client. Both of these count up to 100k, but the client is receiving only around 37k replies.
If the number of client requests is set to around 40k, then it receives all the replies. Can someone please tell me why packets are lost when the client sends more than 40k asynchronous requests?
I tried setting the HWM to 100k for the broker socket, but the problem persists:
static broker_t *
s_broker_new (int verbose)
{
broker_t *self = (broker_t *) zmalloc (sizeof (broker_t));
int64_t hwm = 100000;
// Initialize broker state
self->ctx = zctx_new ();
self->socket = zsocket_new (self->ctx, ZMQ_ROUTER);
zmq_setsockopt(self->socket, ZMQ_SNDHWM, &hwm, sizeof(hwm));
zmq_setsockopt(self->socket, ZMQ_RCVHWM, &hwm, sizeof(hwm));
self->verbose = verbose;
self->services = zhash_new ();
self->workers = zhash_new ();
self->waiting = zlist_new ();
self->heartbeat_at = zclock_time () + HEARTBEAT_INTERVAL;
return self;
}

Without setting the HWM and using the default TCP settings, packet loss was being incurred with just 50k messages.
The following helped to mitigate the packet loss at the broker:
Setting the HWM for the zeromq socket.
Increasing the TCP send/receive buffer size.
This helped only up to a certain point. With two clients, each sending 100k messages, the broker was able to manage fine. But when the number of clients was increased to three, they stopped receiving all the replies.
Finally, what has helped me to ensure no packet loss is to change the design of the client code in the following way:
A client can send upto N messages at once. The client's RCVHWM and broker's SNDHWM should be sufficiently high to hold a total of N messages.
After that, for every reply received by the client, it sends two requests.

You send 100k messages, and then begin to receive them. Thus, the 100k messages should be stored in a buffer. When the buffer is exhausted and cannot store anymore messages, you reach the ZeroMQ's high water mark. Behaviour on high water mark is specified in ZeroMQ documentation.
In case of the above code, the broker may discard some of the messages since a majordomo broker uses the ROUTER socket. One of resolutions would be split the send/receive loops into separated threads

Why lost?
In ZeroMQ v2.1, a default value for ZMQ_HWM was INF (infinity), which helped the said test to be somewhat meaningful but at a cost of heavy risk of memory-overflow crashes, as the buffer allocation policy was not constrained / controlled so as to hit some physical limit.
As of ZeroMQ v3.0+, ZMQ_SNDHWM / ZMQ_RCVHWM default to 1000, which can be set afterwards.
You may also read an explicit warning, that
ØMQ does not guarantee that the socket will accept as many as ZMQ_SNDHWM messages, and the actual limit may be as much as 60-70% lower depending on the flow of messages on the socket.
Will splitting the sending / receiving part into separate threads help?
No.
Quick fix?
Yes, for the purpose of demo-test experimenting, set again infinite high-water marks, but be carefull to avoid such practice in any production-grade software.
Why to test a ZeroMQ performance in this way?
As said above, the original demo-test seems to have some meaning in its v2.1 implementation.
Since those days, ZeroMQ have evolved a lot. A very nice reading for your particular interest about performance envelopes, that may please building your further insight into this domain is in step by step guide with code examples on ZeroMQ protocol overheads/performance case-study on large file transfers
... we already run into a problem: if we send too much data to the ROUTER socket, we can easily overflow it. The simple but stupid solution is to put an infinite high-water mark on the socket. It's stupid because we now have no protection against exhausting the server's memory. Yet without an infinite HWM, we risk losing chunks of large files.
Try this: set the HWM to 1,000 (in ZeroMQ v3.x this is the default) and then reduce the chunk size to 100K so we send 10K chunks in one go. Run the test, and you'll see it never finishes. As the zmq_socket() man page says with cheerful brutality, for the ROUTER socket: "ZMQ_HWM option action: Drop".
We have to control the amount of data the server sends up-front. There's no point in it sending more than the network can handle. Let's try sending one chunk at a time. In this version of the protocol, the client will explicitly say, "Give me chunk N", and the server will fetch that specific chunk from disk and send it.
The best part, as far as I know, is in the commented progress of the resulting performance to the "model 3" flow-control and one can learn a lot from the great chapters and real-life remarks in the ZeroMQ Guide.

Related

How to prevent buffering/latency with PUB/SUB?

I'm sending video as a sequence of images (equals zmq messages) but sometimes, perhaps when the network is slow, they are received at a slower rate than they're sent and a growing latency appears, seemingly up to about a minute of video or 100s of images or megabytes of data. It usually clears itself eventually with the subscriber receiving messages at a faster rate than the publisher sends.
Instead, I want it to discard missed messages the same way it's supposed to if the subscriber is too slow recving them. I hoped zmq.CONFLATE=1 would do this but it doesn't. How then? I suspect they're being buffered at the publisher, which is not supposed to have any zmq buffer, or in the network stack somehow.
Simplified server code
context = zmq.Context()
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:12345")
camera = PiCamera()
stream = io.BytesIO()
for _ in camera.capture_continuous(stream, 'jpeg', use_video_port=True):
stream.truncate()
stream.seek(0)
socket.send(stream.read())
stream.seek(0)
Simplified client code
# Initialization
self.context = zmq.Context()
self.video_socket = self.context.socket(zmq.SUB)
self.video_socket.setsockopt(zmq.CONFLATE, 1)
self.video_socket.setsockopt(zmq.SUBSCRIBE, b"")
self.video_socket.connect("tcp://" + ip_address + ":12345")
def get_image(self):
# Receive the latest image
poll_result = self.video_socket.poll(timeout=0)
if poll_result == zmq.POLLIN:
return self.video_socket.recv()
else:
return None
The publisher is on a Raspberry Pi and the subscriber is on Windows.

I am not sure which version of python zmq you are using but based on the underlying c++ libzmq you need to:
Set the ZMQ_SNDHWM socket option on the server socket
Set the ZMQ_RCVHWM socket option on the client socket.
These options limit the number of messages to queue per completed connection in the case of pub/sub. If the queue grows larger than the HWM (high water mark) the messages will be discarded.
Also turn off conflate as that will interfere with these options.

Also set zmq.CONFLATE=1 on the server to keep only the latest message in the send queue.
Before binding the server socket
socket.setsockopt(zmq.CONFLATE, 1)
For some reason I mistakenly thought the PUB socket didn't have a send queue but it does.

Libevent does not echo properly when there is a delay

Based on the following code, I built a version of an echo server, but with a threaded delay. This was built because I've noticed that upon initial connection, my first send is sent back to the client, but the client does not receive it until a second send. My real-world use case is that I need to send messages to the server, do a lot of processing, and then send the result back... say 10-30 seconds later (could be hours in some cases).
http://www.wangafu.net/~nickm/libevent-book/Ref8_listener.html
So here is my code. For brevity's sake, I have only included the libevent-related code; not the threading code or other stuff. When debugging, a new connection is set up, the string buffer is filled properly, and debugging reveals that the writes go successfully.
http://pastebin.com/g02S2RTi
But I only receive the echo from the send-before-last. I send from the client numbers to validate this and when I send a 1 from the client, I receive nothing from the server via echo... even though the server is definitely writing to the buffer using evbuffer_add ( I have also tried this using bufferevent_write_buffer).
From the client when I send a 2, I then receive the 1 from the previous send. It's like my writes are being cached.... I have turned off nagle.
So, my question is: Does libevent cache sends using the following method?
evbuffer_add( outputBuffer, buffer, length );
Is there a way to flush this cache? Is there some other method to mark the cache as finished or complete? Can I force a send? It never sends on it's own... I have even put in delays. Replacing evbuffer_add with "send" works perfectly every time.

Most likely you are affected by Nagle algorithm - basically it buffers outgoing data, before sending it to the network. Take a look at this article: TCP/IP options for high-performance data transmission.
Here is an example how to disable buffering:
int flag = 1;
int result = setsockopt(sock, /* socket affected */
IPPROTO_TCP, /* set option at TCP level */
TCP_NODELAY, /* name of option */
(char *) &flag, /* the cast is historical
cruft */
sizeof(int)); /* length of option value */

sendto() dgrams do not block for ENOBUFS on OSX

This is more of a observation and also a suggestion for whats the best way to handle this scenario.
I have two threads one just pumps in data and another receives the data and does lot of work before sending it another socket. Both the threads are connected via a Domain socket. The protocol used here is UDP. I did not want to use TCP as it is stream based, which means if there is little space in the queue my data is split and sent. This is bad as Iam sending data that should not be split. Hence I used DGRAM. Interestingly when the send thread overwhelms the recv thread by pumping so much data, at some point the Domain socket buffer gets filled up and sendto() returns ENOBUFS. I was of the opinion that should this happen, sendto() would block until the buffer is available. This would be my desired behaviour. However this does not seem to be the case. I solve this problem in a rather weird way.
CPU Yield method
If I get ENOBUFS, I do a sched_yield(); as there is no pthread_yield() in OSX. After that I try to resend again. If that fails I keep doing the same until it is taken. This is bad as Iam wasting cpu cycles just doing something useless. I would love if sendto() blocked.
Sleep method
I tried to solve the same issue using sleep(1) instead of sched_yield() but this of no use as sleep() would put my process to sleep instead of just that send thread.
Both of them does not seem to work for me and Iam running out of options. Can someone suggest what is the best way to handle this issue? Is there some clever tricks Iam not aware of that can reduce unnecessary cpu cycles? btw, what the man page says about sentto() is wrong, based on this discussion http://lists.freebsd.org/pipermail/freebsd-hackers/2004-January/005385.html
The Upd code in kernel:
The udp_output function in /sys/netinet/udp_usrreq.c, seems clear:
/*
* Calculate data length and get a mbuf
* for UDP and IP headers.
*/
M_PREPEND(m, sizeof(struct udpiphdr), M_DONTWAIT);
if (m == 0) {
error = ENOBUFS;
if (addr)
splx(s);
goto release;
}

I'm not sure why sendto() isn't blocking for you... but you might try calling this function before you each call to sendto():
#include <stdio.h>
#include <sys/select.h>
// Won't return until there is space available on the socket for writing
void WaitUntilSocketIsReadyForWrite(int socketFD)
{
fd_set writeSet;
FD_ZERO(&writeSet);
FD_SET(socketFD, &writeSet);
if (select(socketFD+1, NULL, &writeSet, NULL, NULL) < 0) perror("select");
}
Btw how big are the packets that you are trying to send?

sendto() on OS X is really nonblocking (that is M_DONTWAIT flag for).
I suggest you to use stream based connection and just receive the whole data on the other side by using MSG_WAITALL flag of the recv function. If your data has strict structure than it would be simple, just pass the correct size to the recv. If not than just send some fixed-size control packet first with the size of the next chunk of data and then the data itself. On the receiver side you would be wait for control packet of fixed size and than the data of size from control packet.

ZMQ_DEALER sockets load balancing, failover behavior affected by bind() or connect()

There are a few existing questions that discuss how to use ZeroMQ to work around the possibility of dropped messages and most have been very instructive.
Still, there is one thing that just keep troubling me about the ZMQ_DEALER socket. I have been testing with a very simple case: 1 server and 2 clients, all using a single ZMQ_DEALER socket each. The server sends messages and the clients receive them.
If the server uses socket.bind() and the clients socket.connect(), we can observe proper round-robin balancing and killing one of the clients results in the server redirecting all its messages to the remaining client. No delay, no packet loss, works beautifully.
Now if I have the clients do socket.bind(), and the server socket.connect() (still using one single socket but connect to both clients), the server behavior is affected. After killing one of the clients, instead of redirecting its traffic to the remaining one, it will keep on load balancing to both until the number of messages in the queue hits the high watermark for the dead client.
The possibility of using connect on a socket already bound lead me to think that it would be a more or less symmetrical usage, but I would be curious to both know the why of such behavior, and if there is a way to replicate the failover of bound sockets to connected ones.
EDIT: in order to make the question a little more inspiring, here is some code for you to test this behavior.
This is the dealer:
// dealer.cc
// compile using something like this: g++ dealer.cc -o dealer -lzmq
#include <zmq.hpp>
#include <unistd.h>
#include <stdint.h>
int main() {
// prepare zmq
zmq::context_t context (1);
zmq::socket_t socket (context, ZMQ_DEALER);
socket.bind ("tcp://127.0.0.1:5555");
//socket.connect("tcp://127.0.0.1:5555");
//socket.connect("tcp://127.0.0.1:5556");
zmq::message_t msg;
int64_t more;
int counter = 0;
size_t more_size = sizeof more;
bool gotClients = false;
while (true) {
// send incrementing numbers
zmq::message_t world(sizeof(int));
memcpy(world.data(), &counter, sizeof(int));
socket.send(world);
counter++;
usleep(100000);
}
return 0;
}
And this is the client:
// client.cc
// compile using something like this: g++ client.cc -o client -lzmq
#include <zmq.hpp>
#include <iostream>
int main(int argc, char ** argv) {
zmq::context_t context (1);
zmq::socket_t socket (context, ZMQ_DEALER);
socket.connect("tcp://127.0.0.1:5555");
//socket.bind(argv[1]);
zmq::message_t msg;
while (true) {
socket.recv(&msg, 0);
std::cout << "Received a message: " << *(int *)msg.data() << std::endl;
}
return 0;
}
Create 2 clients, then start the dealer, then kill one of the clients before killing the dealer.
If you check the output, you can see not a single message was dropped or stuck in zmq queue limbo. The load was balanced as long as both clients were alive, and completely redirected to the remaining one when the "failure" occurred.
Now let's swap the connect()/bind () for their inverses (using the commented code).
We have to let the clients know which address to bind to so they should be started with the URL as follows:
./client tcp://127.0.0.1:5555
./client tcp://127.0.0.1:5556
Then, just as previously, start the dealer and kill one of the clients.
You can see that th remaining client only receives half of the dealer's messages, even after the first client was killed.
My understanding is that as long as the underlying ZMQ queue is not full, the dealer will continue queuing messages for the disconnected peer (which in this example, and given the default parameters, is going to be taking a really, reaaally long time)

First of all, you don't normally bind unless you are a server, i.e. listening on a socket. So I don't know why you would want the clients to bind. Binding is telling the OS that when packets arrive on a particular port number and a particular address, that you want to hear them in your app. This is at a lower level and is much more general than just ZeroMQ and I don't believe that ZeroMQ changes this.
When you are talking about bind and connect, you are not talking about ZeroMQ but about the lower level transport over which ZeroMQ messages travel.
Connecting to multiple servers from a single client socket C is another answer on the topic of bind.

Sending Large Data > 1 MB through Windows Sockets viz using the Send function

I am looking to send a large message > 1 MB through the windows sockets send api. Is there a efficient way to do this, I do not want to loop and then send the data in chunks. I have read somewhere that you can increase the socket buffer size and that could help. Could anyone please elaborate on this. Any help is appreciated

You should, and in fact must loop to send the data in chunks.
As explained in Beej's networking guide:
"send() returns the number of bytes actually sent out—this might be less than the number you told it to send! See, sometimes you tell it to send a whole gob of data and it just can't handle it. It'll fire off as much of the data as it can, and trust you to send the rest later."
This implies that even if you set the packet size to 1MB, the send() function may not send all of it, and you are forced to loop until the total number of bytes sent by your calls to send() total the number of bytes you are trying to send. In fact, the greater the size of the packet, the more likely it is that send() will not send it all.
Aside from all that, you don't want to send 1MB packets because if they get lost, you will have to transmit the entire 1MB packet again, whereas if you lost a 1K packet, retransmitting it is not a big deal.
In summary, you will have to loop your send() calls, and the receiver will even have to loop their recv() calls too. You will likely need to prepend a small header to each packet to tell the receiver how many bytes are being sent so the receiver can loop the appropriate number of times.
I suggest you take a look at Beej's network guide for more detailed info about send() and recv() and how to deal with this problem. It can be found at http://beej.us/guide/bgnet/output/print/bgnet_USLetter.pdf

Why don't you want to send it in chunks?
That's the way to do it in 99% of the cases.

What makes you think that sending in chunks is inefficient? The OS is likely to chunk large "send" calls anyway, and may coalesce small ones.
Likewise on the receiving side the client should be looping anyway as there's no guarantee of getting all the data in one go.

The windows sockets subsystem is not oblidged to send the whole buffer you provide anyway. You can't force it since some network level protocols have an upper limit for the packet size.

As a practical matter, you can actually allocate a large buffer and send in one call using Winsock. If you are not messing with socket buffer sizes, the buffer will generally be copied into kernel mode for sending anyway.
There is a theoretical possibility that it will return without sending everything, however, so you really should loop for correctness. The chunks you send should, however, be large (64k or the ballpark) to avoid repeated kernel transitions.

If you want to do a loop after all, you can use this C++ code:
#define DEFAULT_BUFLEN 1452
int SendStr(const SOCKET &ConnectSocket, const std::string &str, int strlen){
char sndbuf[DEFAULT_BUFLEN];
int sndbuflen = DEFAULT_BUFLEN;
int iResult;
int count = 0;
int len;
while(count < strlen){
len = min(strlen-count, sndbuflen);
//void * memcpy ( void * destination, const void * source, size_t num );
memcpy(sndbuf,str.data()+count,len);
// Send a buffer
iResult = send(ConnectSocket, sndbuf, len, 0);
// iResult: Bytes sent
if (iResult == SOCKET_ERROR){
throw WSAGetLastError();
}
else{
if(iResult > 0){
count+=iResult;
}
else{
break;
}
}
}
return count;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio