Twisted reactor not calling functions from thread correctly - reactor

I am having problems with twisted.internet.reactor All my clients have completely identical environments, but only some experience this problem:
They correctly connectTCP to the server via ws and exchange first several messages. About one minute in, they should send a message to the server via
def execute(self, message, callback=None):
print(">>>", message, flush=True)
reactor.callFromThread(self._client_protocol_instance.send, message, callback)
self._client_protocol_instance.send method is defined as follows:
def send(self, command, callback):
print("send", command, callback, flush=True)
timestamp = int(time() * 1000000)
msg = (command.strip() + " --timestamp:" + str(timestamp))
if _self._debug:
_self._commands[str(timestamp)] = msg
if callback is not None:
_self._callbacks[str(timestamp)] = callback
payload = msg.encode()
_self._status_controller.set_state(payload)
self.sendMessage(payload)
First print shows up in stdout, but second one doesn't. I assume that send doesn't get executed. After reactor.run(), this is the only reference to the reactor in the entire program.
Killing client's process after this happens is immediately detected by the server, so the connection was still alive at that time.
What could be causing this?

I found the solution, the problem lied with the fact that the previous task wouldn't finish sometimes by the time it tried to send the message.
I solved it by moving all cpu-heavy response handling logic into threads to free up the reactor for other messages.

Related

Why are LINGER=0 and SNDTIMEO=0 used for Zyre actors' PAIR sockets?

Reviewing Pyre's (Python version of Zyre) source code, I saw the following:
def zcreate_pipe(ctx, hwm=1000):
backend = zsocket.ZSocket(ctx, zmq.PAIR)
frontend = zsocket.ZSocket(ctx, zmq.PAIR)
# ...
# close immediately on shutdown
backend.setsockopt(zmq.LINGER, 0)
frontend.setsockopt(zmq.LINGER, 0)
class ZActor(object):
# ...
def __init__(self, ctx, actor, *args, **kwargs):
# ...
self.pipe, self.shim_pipe = zhelper.zcreate_pipe(ctx)
# ...
def run(self):
self.shim_handler(*self.shim_args, **self.shim_kwargs)
self.shim_pipe.set(zmq.SNDTIMEO, 0)
self.shim_pipe.signal()
self.shim_pipe.close()
def destroy(self):
# ...
self.pipe.set(zmq.SNDTIMEO, 0)
self.pipe.send_unicode("$TERM")
self.pipe.wait()
self.pipe.close()
Interesting to me were the uses of LINGER=0 and SNDTIMEO=0.
The corresponding docs are here and here:
ZMQ_SNDTIMEO: Maximum time before a send operation returns with EAGAIN
[rather self-explanatory]
ZMQ_LINGER: Set linger period for socket shutdown
[...] The linger period determines how long pending messages which have yet to be sent to a peer shall linger in memory after a socket is closed with zmq_close(3), and further affects the termination of the socket's context with zmq_term(3). [...]
[...]
The value of 0 specifies no linger period. Pending messages shall be discarded immediately when the socket is closed with zmq_close().
[...]
So in short, the last message in both directions may not be sent. If send would block, SNDTIMEO=0 would kick in, and (presumably if there is still something in the send queue) LINGER=0 could discard the message during close.
That seems like a bad idea, because if $TERM is discarded, the actor isn't killed, and if the signal is discarded, the calling thread would just block. The only way it makes sense to me is if the messages may never be discarded (because of some characteristics of PAIR over inproc:// transport?), but then, why use the socket options in the first place?
What makes this code work as expected, why were the socket options used this way, and in what situation should/shouldn't I follow this example?
This looks like a latent bug (deadlock/hang) to me. If the actor doesn't read messages sent to it fast enough, the queue (with size 'hwm') could be full - meaning that zmq won't send anything, and the destroy() will end up waiting on a signal expected when the actor exits - but since the actor never receives "$TERM", it can't react - and unless there's a timeout in its recv(), it may wait forever for a message as well.
[ I notice there appears to be a skeptical comment just before the wait() in the destroy() method - so you may not be the first to notice this ]
I'd handle destroy() with care - practically speaking, you could code your solution so that overflowing the queue is highly unlikely - or you could check whether the send succeeds in destroy() - and if not - either try again (with timeout) or just skip the wait().

Send multiply messages in websocket using threads

I'm making a Ruby server using the em-websocket gem. When a client sends some message (e.g. "thread") the server creates two different threads and sends two anwsers to the client in parallel (I'm actually studying multithreading and websockets). Here's my code:
EM.run {
EM::WebSocket.run(:host => "0.0.0.0", :port => 8080) do |ws|
ws.onmessage { |msg|
puts "Recieved message: #{msg}"
if msg == "thread"
threads = []
threads << a = Thread.new {
sleep(1)
puts "1"
ws.send("Message sent from thread 1")
}
threads << b = Thread.new{
sleep(2)
puts "2"
ws.send("Message sent from thread 2")
}
threads.each { |aThread| aThread.join }
end
How it executes:
I'm sending "thread" message to a server
After one second in my console I see printed string "1". After another second I see "2".
Only after that both messages simultaneously are sent to the client.
The problem is that I want to send messages exactly at the same time when debug output "1" and "2" are sent.
My Ruby version is 1.9.3p194.
I don't have experience with EM, so take this with a pinch of salt.
However, at first glance, it looks like "aThread.join" is actually blocking the "onmessage" method from completing and thus also preventing the "ws.send" from being processed.
Have you tried removing the "threads.each" block?
Edit:
After having tested this in arch linux with both ruby 1.9.3 and 2.0.0 (using "test.html" from the examples of em-websocket), I am sure that even if removing the "threads.each" block doesn't fix the problem for you, you will still have to remove it as Thread#join will suspend the current thread until the "joined" threads are finished.
If you follow the function call of "ws.onmessage" through the source code, you will end up at the Connection#send_data method of the Eventmachine module and find the following within the comments:
Call this method to send data to the remote end of the network connection. It takes a single String argument, which may contain binary data. Data is buffered to be sent at the end of this event loop tick (cycle).
As "onmessage" is blocked by the "join" until both "send" methods have run, the event loop tick cannot finish until both sets of data are buffered and thus, all the data cannot be sent until this time.
If it is still not working for you after removing the "threads.each" block, make sure that you have restarted your eventmachine and try setting the second sleep to 5 seconds instead. I don't know how long a typical event loop takes in eventmachine (and I can't imagine it to be as long as a second), however, the documentation basically says that if several "send" calls are made within the same tick, they will all be sent at the same time. So increasing the time difference will make sure that this is not happening.
I think the problem is that you are calling sleep method, passing 1 to the first thread and 2 to the second thread.
Try removing sleep call on both threads or passing the same value on each call.

Publisher finishes before subscriber and messages are lost - why?

Fairly new to zeromq and trying to get a basic pub/sub to work. When I run the following (sub starting before pub) the publisher finishes but the subscriber hangs having not received all the messages - why ?
I think the socket is being closed but the messages have been sent ? Is there a way of ensuring all messages are received ?
Publisher:
import zmq
import random
import time
import tnetstring
context=zmq.Context()
socket=context.socket(zmq.PUB)
socket.bind("tcp://*:5556")
y=0
for x in xrange(5000):
st = random.randrange(1,10)
data = []
data.append(random.randrange(1,100000))
data.append(int(time.time()))
data.append(random.uniform(1.0,10.0))
s = tnetstring.dumps(data)
print 'Sending ...%d %s' % (st,s)
socket.send("%d %s" % (st,s))
print "Messages sent: %d" % x
y+=1
print '*** SERVER FINISHED. # MESSAGES SENT = ' + str(y)
Subscriber :-
import sys
import zmq
import tnetstring
# Socket to talk to server
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:5556")
filter = "" # get all messages
socket.setsockopt(zmq.SUBSCRIBE, filter)
x=0
while True:
topic,data = socket.recv().split()
print "Topic: %s, Data = %s. Total # Messages = %d" % (topic,data,x)
x+=1
In ZeroMQ, clients and servers always try to reconnect; they won't go down if the other side disconnects (because in many cases you'd want them to resume talking if the other side comes up again). So in your test code, the client will just wait until the server starts sending messages again, unless you stop recv()ing messages at some point.
In your specific instance, you may want to investigate using the socket.close() and context.term(). It will block until all the messages have been sent. You also have the problem of a slow joiner. You can add a sleep after the bind, but before you start publishing. This works in a test case, but you will want to really understand what is the solution vs a band-aid.
You need to think of the PUB/SUB pattern like a radio. The sender and receiver are both asynchronous. The Publisher will continue to send even if no one is listening. The subscriber will only receive data if it is listening. If the network goes down in the middle, the data will be lost.
You need to understand this in order to design your messages. For example, if you design your messages to be "idempotent", it doesn't matter if you lose data. An example of this would be a status type message. It doesn't matter if you have any of the previous statuses. The latest one is correct and message loss doesn't matter. The benefits to this approach is that you end up with a more robust and performant system. The downsides are when you can't design your messages this way.
Your example includes a type of message that requires no loss. Another type of message would be transactional. For example, if you just sent the deltas of what changed in your system, you would not be able to lose the messages. Database replication is often managed this way which is why db replication is often so fragile. To try to provide guarantees, you need to do a couple things. One thing is to add a persistent cache. Each message sent needs to be logged in the persistent cache. Each message needs to be assigned a unique id (preferably a sequence) so that the clients can determine if they are missing a message. A second socket (ROUTER/REQ) needs to be added for the client to request the missing messages individually. Alternatively, you could just use the secondary socket to request resending over the PUB/SUB. The clients would then all receive the messages again (which works for the multicast version). The clients would ignore the messages they had already seen. NOTE: this follows the MAJORDOMO pattern found in the ZeroMQ guide.
An alternative approach is to create your own broker using the ROUTER/DEALER sockets. When the ROUTER socket saw each DEALER connect, it would store its ID. When the ROUTER needed to send data, it would iterate over all client IDs and publish the message. Each message should contain a sequence so that the client can know what missing messages to request. NOTE: this is a sort of reimplementation of Kafka from linkedin.

Why is gevent.sleep(0.1) necessary in this example to prevent the app from blocking?

I'm pulling my hair out over this one. I'm trying to get the simplest of examples working with zeromq and gevent. I changed this script to use PUB/SUB sockets and when I run it the 'server' socket loops forever. If I uncomment the gevent.sleep(0.1) line then it works as expected and yields to the other green thread, which in this case is the client.
The problem is, why should I have to manually add a sleep call? I thought when I import the zmq.green version of zmq that the send and receive calls are non blocking and underneath do the task switching.
In other words, why should I have to add the gevent.sleep() call to get this example working? In Jeff Lindsey's original example, he's doing REQ/REP sockets and he doesn't need to add sleep calls...but when I changed this to PUB/SUB I need it there for this to yield to the client for processing.
#Notes: Code taken from slide: http://www.google.com/url?sa=t&rct=j&q=zeromq%20gevent&source=web&cd=27&ved=0CFsQFjAGOBQ&url=https%3A%2F%2Fraw.github.com%2Fstrangeloop%2F2011-slides%2Fmaster%2FLindsay-DistributedGeventZmq.pdf&ei=JoDNUO6OIePQiwK8noHQBg&usg=AFQjCNFa5g9ZliRVoN_yVH7aizU_fDMtfw&bvm=bv.1355325884,d.cGE
#Jeff Lindsey talk on gevent and zeromq
import gevent
from gevent import spawn
import zmq.green as zmq
context = zmq.Context()
def serve():
print 'server online'
socket = context.socket(zmq.PUB)
socket.bind("ipc:///tmp/jeff")
while True:
print 'send'
socket.send("World")
#gevent.sleep(0.1)
def client():
print 'client online'
socket = context.socket(zmq.SUB)
socket.connect("ipc:///tmp/jeff")
socket.setsockopt(zmq.SUBSCRIBE, '')
while True:
print 'recv'
message = socket.recv()
cl = spawn(client)
server = spawn(serve)
print 'joinall'
gevent.joinall([cl, server])
print 'end'
I thought when I import the zmq.green version of zmq that the send and receive calls are non blocking and underneath do the task switching.
zmq.green will only yield if these calls would block, it does not yield if they are ready (there's nothing to wait for). In your case the sender is always ready, so it never has a reason to yield.
Some pointers:
a minimal explicit yield is gevent.sleep(0), it doesn't need to be finite.
zmq.green only yields on blocking calls. That is, if a socket is always ready to send/recv when you ask it to, it will never yield.
socket.send only blocks when the socket is not ready to send (not (socket.events & zmq.POLLOUT)),
which can never actually be true of a PUB socket (you will see it at HWM for PUSH, DEALER, etc.).
in general, don't trust send to yield, because of the way zeromq works this will rarely be the case unless
you are exceeding the capacity of your configuration.
unlike send, recv regularly blocks in normal usage, so it yields on most calls. But if a peer is flooding your incoming buffer, repeated recv calls will not yield until there is nothing ready to receive, so you may again need to explicitly yield every so often to prevent starvation.
What zmq.green amounts to is turning send/recv into:
try:
socket.send(msg, zmq.NOBLOCK) # or recv
except zmq.ZMQError as e:
if e.errno == zmq.EAGAIN:
yield # and wait for socket to be ready, then try again
so if send/recv with NOBLOCK are always succeeding, the socket never yields.
To put it another way: If a socket has nothing to wait for, it won't wait.

Writing to channel in a loop

I have to send a lot of data to I client connected to my server in small blocks.
So, I have something like:
for(;;) {
messageEvent.getChannel().write("Hello World");
}
The problem is that, for some reason, client is receiving dirty data, like Netty buffer is not clear at each iteration, so we got something like "Hello WorldHello".
If I make a little change in my code putting a thread sleep everything works fine:
for(;;) {
messageEvent.getChannel().write("Hello World");
Thread.sleep(1000);
}
As MRAB said, if the server is sending multiple messages on a channel without indicating the end of each message, then client can not always read the messages correctly. By adding sleep time after writing a message, will not solve the root cause of the problem either.
To fix this problem, have to mark the end of each message in a way that other party can identify, if client and server both are using Netty, you can add LengthFieldPrepender and LengthFieldBasedFrameDecoder before your json handlers.
String encodedMsg = new Gson().toJson(
sendToClient,newTypeToken<ArrayList<CoordinateVO>>() {}.getType());
By default, Gson uses html escaping for content, sometime this will lead to wired encoding, you can disable this if required by using a Gson factory
final static GsonBuilder gsonBuilder = new GsonBuilder().disableHtmlEscaping();
....
String encodedMsg = gsonBuilder.create().toJson(object);
In neither case are you sending anything to indicate where one item ends and the next begins, or how long each item is.
In the second case the sleep is getting the channel time out and flush, so the client sees a 'break', which it interprets as the end of the item.
The client should never see this "dirty data". If thats really the case then its a bug. But to be hornest I can't think of anything that could lead to this in netty. As every Channel.write(..) event will be added to a queue which then get written to the client when possible. So every data that is passed in the write(..) method will just get written. There is no "concat" of the data.
Do you maybe have some custom Encoder in the pipeline that buffers the data before sending it to the client ?
It would also help if you could show the complete code that gives this behavoir so we see what handlers are in the pipeline etc.

Resources