I don't see my ZeroMQ PUSH/PULL socket pair behaving as documented (here and here):
When a ZMQ_PUSH socket enters the mute state due to having reached the high water mark for all downstream nodes, or if there are no downstream nodes at all, then any zmq_send(3) operations on the socket shall block until the mute state ends or at least one downstream node becomes available for sending; messages are not discarded.
The Java example below shows a scenario where I expect the second call to send to block, since I've set the send HWM to 1, I've already sent one message and no message has been received on the PULL socket. What I'm seeing is that the send completes, returning true (indicating the send was successful).
The idea is to apply back pressure to the data flowing through this PUSH/PULL pair. The send calls on the push side should block the thread if it's outpacing the PULL side. Can someone suggest how I can achieve this?
I built jzmq from the HEAD of the master branch and am running against version 3.2.4 of zmq installed from brew on a Mac (I saw the same behaviour against zmq version 4.0.5_2 also installed with brew, with jzmq built from the same source).
import org.zeromq.ZMQ;
public class Main {
public static void main(String[] args) {
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket pullSocket = context.socket(ZMQ.PULL);
pullSocket.setRcvHWM(1);
pullSocket.bind("tcp://*:9944");
ZMQ.Socket pushSocket = context.socket(ZMQ.PUSH);
pushSocket.setSndHWM(1);
pushSocket.connect("tcp://localhost:9944");
boolean pushResult = pushSocket.send("a");
// Expecting this call to block, but it delivers the message:
pushResult = pushSocket.send("b");
String message = new String(pullSocket.recv());
message = new String(pullSocket.recv());
// This third call does block, since only two message were pushed:
message = new String(pullSocket.recv());
}
}
Related
I came across call back queue feature in RMQ. And its pretty fancy too. The whole idea is I have created One Message queue (queue1), its callback queue(queue1_cb) and its dlq(queue1_dlq). I am implementing HA feature with 2 nodes.
The problem comes when I am deploying 2 instances of my application(I have one sender and one receiver app in Spring boot). Both are listening to same HA cluster. The scenario is as below.
Sender publishes a message to RMQ.
Receiver app consumes message. Receiver app has to call third party API which is socket based API and its asynchronous so i do not get response in same connection. SO i store object of Channel & Message which i need to ack the message. (Please note i am delaying the ack till i receive response from third party API.
When i deploy 2 instances of receiver app, any instance will get response from third party API. And both will not have object of Channel and Message to ack message and send message to callback queue.
Can any one suggest me a solution on proiority?
Below is my code.
At Receiver side :
#Override
public void onMessage(Message arg0, Channel arg1) throws Exception {
String msg = new String (arg0.getBody());
AppObject obj = mapper.readValue(msg, AppObject.class);
Packet packet = new Packet();
packet.setChannel(arg1);
packet.setMessage(arg0);
packet.setAppObject(obj);
AppParam.objects.put(
String.valueOf(key , packet);
//Call third party API
}
At the time of acking and sending callback message:
public boolean pushMessageToCallBack(String key , AppObject packet, Channel channel, Message message){
RabbitTemplate replyRabbitTemplate = //Get the RabbitTemplate object. It is handled properly.
replyRabbitTemplate.convertAndSend(packet);
channel.basicAck(message.getMessageProperties().getDeliveryTag(), false);
}
You need a different callback queue for each instance or, more simply, just use Direct Reply-to where you don't need a queue at all.
Code I'm using:
client.blockingConnect();
try {
Wearable.MessageApi.sendMessage(client,
nodeId, path, message.getBytes("UTF-16"));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
client.disconnect();
The variables path, and message are strings that contain just what they're named after, and client, and nodeId are set with this code (which with the latest Android Wear release needs to be modified too accommodate multiple devices, but the not the current issue I'm working on):
client = new GoogleApiClient.Builder(context)
.addApi(Wearable.API)
.build();
while (nodeId.length() < 1) {
client.blockingConnect();
Wearable.NodeApi.getConnectedNodes(client).setResultCallback(new ResultCallback<NodeApi.GetConnectedNodesResult>() {
#Override
public void onResult(NodeApi.GetConnectedNodesResult nodes) {
for (Node node : nodes.getNodes()) {
nodeId = node.getId();
//nodeName = node.getDisplayName();
haveId = true;
status = ConnectionStatus.connected;
}
}
});
client.disconnect();
The problem I'm having is sometimes it works, sometimes quick, and other times after a long delay, and sometimes not at all. Tides, phase of the moon, humidity, butterflys flapping on the other side of the world, not sure what changes. Android wear reports the device as connected always though. Sometimes the messages are the same values, but still need to be handled separately, because when they happen it's important either the watch or mobile respond.
Is there anyway to improve the reliability?
I've tried:
sendMessage(String.valueOf(System.currentTimeMillis()), "wake up!");
But that don't go through sometimes either.
No, MessageApi is inherently unreliable. Think of it as UDP. You can use it if you want to deliver the message fast and you don't mind it will fail, because you can repeat it (for example, user switches track in your music app - either it works, or he will have to press the button again).
If you need reliability, use DataApi. It's slower, but has guarantees eventual consistency.
If you want both speed and guaranteed delivery, use both approaches - send both a message and set a data item with the same token. If the message is received, keep the token and ignore the data item later. If not, the data item will finally trigger the action.
EDIT
Document states that the messages will be delivered to a node only if the node is connected:
Messages are delivered to connected network nodes. A message is
considered successful if it has been queued for delivery to the
specified node. A message will only be queued if the specified node is
connected. The DataApi should be used for messages to nodes which are
not currently connected (to be delivered on connection).
I have a straightforward SignalR setup: OWIN-hosted .NET server and JavaScript client (both # v2.1.1). The client uses SignalR to synchronize its copy of an ordered event stream maintained in an Rx ReplaySubject on the server. When a client connects, it provides a startAfter query parameter that is used to initialize an IObserver against the ReplaySubject, and this observer then sends each event in the observed sequence to the client. Each event has a sequence number, and the client can tell, based on the event sequence number, if any event is missing in the sequence. (Which would be a serious problem in this application.)
The problem is that the client regularly receives only portions of the event sequence. In fact, there is a regular pattern to this. For every 250 events there is a large gap. So for example, each test shows that the first gap was from somewhere between 70 and 80 to 250. Why always 250? And from there on, the "skip-to" point is always in intervals of 250; e.g., a gap from 263 to 500, then one from 511 to 750, etc.. I have to assume that this is some kind of default buffer size.
Also, the first time a client connects to the server it always receives the entire sequence just fine. It's the subsequent connections that exhibit the regular skipping problem. So it seems like it's a server-side problem, and not a client problem at all.
I then added some checks to the server to ensure that the IObserver for each client is seeing all of the events in the correct order. It is. So it seems almost certain that the problem is on the SignalR server side and has nothing to do with Rx.
And finally, I checked to see if the dropped messages were perhaps just being delivered out of order (which I could live with, although I assumed SignalR provides an ordered-delivery guarantee). They are not - the messages just disappear into a void.
If it helps, I'm currently running locally, with IIS Express on Win 8.1 x64 and testing on IE Developer Channel as well as Chrome 36. The connection is using WebSockets. I couldn't find any reference to 250 as a special quantity in either the SignalR source (client or server) or the Rx.Net source.
Any suggestions on troubleshooting? I'd love to find a stable solution before I start building a complicated workaround.
Here's the relevant server-side code:
public class AllEventsReplaySource
{
private readonly IHubConnectionContext<dynamic> clients;
private readonly ReplaySubject<dynamic> allEvents;
private AllEventsReplaySource(IHubConnectionContext<dynamic> clients)
{
this.clients = clients;
this.allEvents = new ReplaySubject<dynamic>();
// (Not shown: code that generates the input to the ReplaySubject.)
}
public void SubscribeClient(string connectionId, int startAfter)
{
this.allEvents.Skip(startAfter).Subscribe(e =>
{
// (Not shown: code that verifies no skips are occurring at this point for a client.)
clients.Client(connectionId).notifyEvent(e);
});
}
private readonly static Lazy<AllEventsReplaySource> instance =
new Lazy<AllEventsReplaySource>(() => new AllEventsReplaySource(
GlobalHost.ConnectionManager.GetHubContext<AllEventsReplayHub>().Clients));
public static AllEventsReplaySource Instance
{
get { return instance.Value; }
}
}
[HubName("allEventsReplayHub")]
public class AllEventsReplayHub : Hub
{
private readonly AllEventsReplaySource source;
public AllEventsReplayHub()
: this(AllEventsReplaySource.Instance)
{ }
public AllEventsReplayHub(AllEventsReplaySource source)
{
this.source = source;
}
public override Task OnConnected()
{
var previousSequenceNumber = Int32.Parse(Context.QueryString["startAfter"]);
var connectionId = this.Context.ConnectionId;
AllEventsReplaySource.Instance.SubscribeClient(connectionId, previousSequenceNumber);
return base.OnConnected();
}
}
The issue you are experiencing seems consistent with a message buffer overflow. When SignalR releases messages from its buffer, it does so in 250 message fragments by default.
SignalR will buffer at least the last 1000 messages sent to a given connectionId. This means that when you send the 1251st message, the first 250 get dereferenced by the buffer. This explains why when a client first connects to the server, it receives the entire sequence of messages. You have to send at least 1251 messages to a given client before the buffer will drop fragments. Again, this is all assuming default settings.
While you could increase the DefaultMessageBufferSize, that probably will not fix your root problem. It seems that you are trying to send messages faster than the server can send them to the client. If you do that continuously, you will run out of buffer space no matter the size.
It's more common to reduce the DefaultMessageBufferSize rather than increase it, since the buffers can consume a lot of memory, especially if you are sending a lot of large unique messages to many different clients.
Your best bet to avoid overrunning the buffer is to have the client send an ACK at least every 1000 messages. Given this, it might be possible to avoid sending over 1000 unACKed messages thereby avoiding this problem altogether.
By the way, you can take a look at SignalR's message buffer implementation yourself if you feel so inclined. Note that the capacity constructor argument is the DefaultMessageBufferSize.
Fairly new to zeromq and trying to get a basic pub/sub to work. When I run the following (sub starting before pub) the publisher finishes but the subscriber hangs having not received all the messages - why ?
I think the socket is being closed but the messages have been sent ? Is there a way of ensuring all messages are received ?
Publisher:
import zmq
import random
import time
import tnetstring
context=zmq.Context()
socket=context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
y=0
for x in xrange(5000):
st = random.randrange(1,10)
data = []
data.append(random.randrange(1,100000))
data.append(int(time.time()))
data.append(random.uniform(1.0,10.0))
s = tnetstring.dumps(data)
print 'Sending ...%d %s' % (st,s)
socket.send("%d %s" % (st,s))
print "Messages sent: %d" % x
y+=1
print '*** SERVER FINISHED. # MESSAGES SENT = ' + str(y)
Subscriber :-
import sys
import zmq
import tnetstring
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
filter = "" # get all messages
socket.setsockopt(zmq.SUBSCRIBE, filter)
x=0
while True:
topic,data = socket.recv().split()
print "Topic: %s, Data = %s. Total # Messages = %d" % (topic,data,x)
x+=1
In ZeroMQ, clients and servers always try to reconnect; they won't go down if the other side disconnects (because in many cases you'd want them to resume talking if the other side comes up again). So in your test code, the client will just wait until the server starts sending messages again, unless you stop recv()ing messages at some point.
In your specific instance, you may want to investigate using the socket.close() and context.term(). It will block until all the messages have been sent. You also have the problem of a slow joiner. You can add a sleep after the bind, but before you start publishing. This works in a test case, but you will want to really understand what is the solution vs a band-aid.
You need to think of the PUB/SUB pattern like a radio. The sender and receiver are both asynchronous. The Publisher will continue to send even if no one is listening. The subscriber will only receive data if it is listening. If the network goes down in the middle, the data will be lost.
You need to understand this in order to design your messages. For example, if you design your messages to be "idempotent", it doesn't matter if you lose data. An example of this would be a status type message. It doesn't matter if you have any of the previous statuses. The latest one is correct and message loss doesn't matter. The benefits to this approach is that you end up with a more robust and performant system. The downsides are when you can't design your messages this way.
Your example includes a type of message that requires no loss. Another type of message would be transactional. For example, if you just sent the deltas of what changed in your system, you would not be able to lose the messages. Database replication is often managed this way which is why db replication is often so fragile. To try to provide guarantees, you need to do a couple things. One thing is to add a persistent cache. Each message sent needs to be logged in the persistent cache. Each message needs to be assigned a unique id (preferably a sequence) so that the clients can determine if they are missing a message. A second socket (ROUTER/REQ) needs to be added for the client to request the missing messages individually. Alternatively, you could just use the secondary socket to request resending over the PUB/SUB. The clients would then all receive the messages again (which works for the multicast version). The clients would ignore the messages they had already seen. NOTE: this follows the MAJORDOMO pattern found in the ZeroMQ guide.
An alternative approach is to create your own broker using the ROUTER/DEALER sockets. When the ROUTER socket saw each DEALER connect, it would store its ID. When the ROUTER needed to send data, it would iterate over all client IDs and publish the message. Each message should contain a sequence so that the client can know what missing messages to request. NOTE: this is a sort of reimplementation of Kafka from linkedin.
I have to send a lot of data to I client connected to my server in small blocks.
So, I have something like:
for(;;) {
messageEvent.getChannel().write("Hello World");
}
The problem is that, for some reason, client is receiving dirty data, like Netty buffer is not clear at each iteration, so we got something like "Hello WorldHello".
If I make a little change in my code putting a thread sleep everything works fine:
for(;;) {
messageEvent.getChannel().write("Hello World");
Thread.sleep(1000);
}
As MRAB said, if the server is sending multiple messages on a channel without indicating the end of each message, then client can not always read the messages correctly. By adding sleep time after writing a message, will not solve the root cause of the problem either.
To fix this problem, have to mark the end of each message in a way that other party can identify, if client and server both are using Netty, you can add LengthFieldPrepender and LengthFieldBasedFrameDecoder before your json handlers.
String encodedMsg = new Gson().toJson(
sendToClient,newTypeToken<ArrayList<CoordinateVO>>() {}.getType());
By default, Gson uses html escaping for content, sometime this will lead to wired encoding, you can disable this if required by using a Gson factory
final static GsonBuilder gsonBuilder = new GsonBuilder().disableHtmlEscaping();
....
String encodedMsg = gsonBuilder.create().toJson(object);
In neither case are you sending anything to indicate where one item ends and the next begins, or how long each item is.
In the second case the sleep is getting the channel time out and flush, so the client sees a 'break', which it interprets as the end of the item.
The client should never see this "dirty data". If thats really the case then its a bug. But to be hornest I can't think of anything that could lead to this in netty. As every Channel.write(..) event will be added to a queue which then get written to the client when possible. So every data that is passed in the write(..) method will just get written. There is no "concat" of the data.
Do you maybe have some custom Encoder in the pipeline that buffers the data before sending it to the client ?
It would also help if you could show the complete code that gives this behavoir so we see what handlers are in the pipeline etc.