Defer with celluloid ZMQ causing data issue - ruby

Sample Code to explain my problem.
Sender
The Sender, that sends the request to the Server(over ZMQ) in format
["sender-1", "sender-1-bdc1c14624076b691b8d9e15fbd754f1"]
..
["sender-99","sender-99-a2123d02c2989ef57da370bb13ba60e4"]
Server
The Server, upon receiving the data from sender relays it back to receiver in the same format.
["sender-1", "sender-1-bdc1c14624076b691b8d9e15fbd754f1"]
...
["sender-99","sender-99-a2123d02c2989ef57da370bb13ba60e4"]
Receiver
The Receiver, upon receiving the request from the Server just prints the message.
Description:
If I don't use a mutex defined in this line(inside the server). I see some data appearing at the receiver end which does not adhere to the above format/standard.
Example the Server would print (at this line)
"Sending sender-97 -- sender-97-9a284488454c8e8fd22bbbcf678895e9"
"Sending sender-98 -- sender-98-447eb5be94e7f6d949e764d7c88239ad"
But on the receiver end, I see messages that look like this.
sender-97 -- sender-98
sender-98-22d9f01a8f801f559a9ff4e388813251 --
Question:
To me, it seems like a possible Thread issue(I may be wrong). Wherein the data that is passed to (inside Server) Celluloid (send) is getting changed by other thread.
I was under the impression that Celluloid handles almost all of your threads issue.
Is my understanding correct about all this?

You need an asynchronous logger.
Whenever you use puts you are outputting to a buffer which is actually very slow, compared to other things. What you ought to do, is use an actor used purely in the place of puts everywhere.
The best thing would be to instantiate a supervised actor, say named :logger and then interact with it inside your other actors by Celluloid[:logger].async.ouput() or else forward output to Celluloid[:logger].async so that every time you use output or console or something like that, it fires off an asynchronous method call. That way, even though your actor doing work is moving on to something else, the console output will still be in perfect sequence.
Your console output is being mangled because of the above issue with asynchrony in programming itself, not Celluloid ... and this is an extremely common problem with and without ZeroMQ.
Yes defer is disrupting the sequence of your Receiver output, but no this is not a Thread error in Celluloid. This is the natural behavior of asynchronous operations. In this case, you need to remove defer {} and keep async.received_data() like you have it.
Otherwise, as you see, Server will bombard Receiver with messages out of sequence. It doesn't help either that you're directly managing threads on top of all this, rather than letting Celluloid::ZMQ do it for you.
Also: I would remove all "external" thread management, and remove the defer {} wrapper... and let Celluloid::ZMQ smooth everything out, and keep sequence for you.

Related

Cancel last sent message ZeroMQ (python) (dealer/router and pushh/pull)

How would one cancel the last sent message ?
I have this set up
The idea is that the client can ask for different types of large data.
The server reads the request from the client and answers an acknowledgement.
Once its data is ready, it pushes it through the other socket.
This enables queueing task on the server side when multiple clients are connected.
However, if the client decides that it does not need the data anymore, it can send a cancel message to the server.
I'm using asyncio.Queue for queueing messages, so I can easily empty the queue, however, I don't know how to drop a message that is in the push/pull pipe to free up the channel?
The kill switch example (Figure 19 - Parallel Pipeline with Kill Signaling) in https://zguide.zeromq.org/docs/chapter2/ is used to end the process. I just want to cancel it.
My idea was to close the socket on the server side and reopen it, but even with linger set to 0, the messages are not dropped.
EDIT: The messages are indeed dropped, but I feel the solution is wrong.
It doesn't really make any sense for ZeroMQ itself to have such a feature.
Suppose that it did have a cancel message feature. For it to operate as expected, you would be critically dependent on the speed of the network. You might develop on a slow network and their you have the time available to decide to cancel, submit the request and for that to take effect before anything has moved anywhere. But on a fast network you won't.
ZeroMQ is a bit like the post office. Once you have posted a letter, they are going to deliver it.
Other issues for a library developer would include how messages are identified, who can cancel a message, etc? It would get very complex for the library to do it and cater for all possible use cases, so it's not unreasonable that they've left such things as an exercise for the application developers.
Chop the Responses Up
You could divide the responses up into smaller messages, send them at some likely rate (proportionate to the network throughput) and check to see if a cancellation has been received before sending each chunk.
It's a bit fiddly, you'd need to know what kind of rate to send the smaller messages so that you don't starve the network, but don't over do it either.
Or, Convert to CSP
The problem lies in ZeroMQ implementing Actor Model, where the transport buffers messages. What you need is Communicating Sequential Processes, which does not buffer messages. You can implement this quite easily on top of ZeroMQ, basically all you need to do is have a two way message exchange going on basically like:
Peer1->Peer2: I'd like to send you a message
time passes
Peer2->Peer1: Okay send a message
Peer1->Peer2: Here is the message
time passes
Peer2->Peer1: I have received the message
end
And in doing this the peers would block, ie peer 1 does nothing else until it gets peer 2's final response.
This feels clunky, but it's what you have to do to reign in an Actor Model system and control where your messages are at any point in time. It's slower because there's more too-ing and fro-ing going on between the peers (in systems like Transputers, this was all done down at the electronic level, so it wasn't an encumberance on software).
The blocking can be a blessing, if throughput matters. Basically, if you find the sender is being blocked too much, that just means you haven't got enough receivers for the tasks they're performing. Actor Model can deceive, because buffering in the network / actor model implementation can temporarily soak up an excess of messages, adding a bit of latency that goes unnoticed.
Anyway, this way you can have a mechanism whereby the flow of messages is fully managed within the application, and not within the ZeroMQ library. If a client does send a "cancel my last request" message (using the above mechanism to send it), that either arrives before the reponse has started to be sent, or after the response has already been delivered to the client (using the mechanism above to send it). There is no intermediate state where a response is already on the way, but out of control of the applications.
CSP is a mode that I'd dearly like ZeroMQ to implement natively. It nearly does, in that you can control the socket high water marks. Unfortunately, a high water mark of 0 means "inifinite", not zero.
CSP itself is a 1970s idea, that saw some popularity and indeed silicon in the 1980s, early 1990s (Inmos, Transputers, Occam, etc) but has recently made something of a comeback in languages like Rust, Go, Erlang. There's even a MS-supplied library for .NET that does it too (not that they call it CSP).
The really big benefit of CSP is that it is algebraically analysable - a design can be analysed and proven to be free of deadlock, without having to do any testing. However, with Actor model systems you cannot do that, and testing will not confirm a lack of problems either. Complex, circular message flows in Actor model can easily lead to deadlock, but that might not occur until the network between computers becomes just a tiny bit busier. Deadlock can happen in CSP too, but it's basically guaranteed to happen every time, if the system has accidentally been architected to deadlock. This shows up in testing quite readily (so at least you know early on!).
As I alluded to early, CSP also doesn't deceive you into thinking there is enough compute resources in a system. If a sender has a strict schedule to keep, and the recipient(s) aren't keeping up, the sender ends up being blocked trying to send instead of waiting for fresh input. It's easy to detect that the real time requirement has not been met. Whereas with Actor model, the send launches messages off into some buffer, and so long as the receiver(s) on average keeps up, all appears to be OK. However, you have no visibility of whether messages are building up inside the (in this case) ZeroMQ's own buffers, so there is little notice of a trending problem in the overall system.

Server architecture: websocket multicast server?

What would be the simplest way to build a server that receives incoming connections via a websocket, and streams the data flowing in that socket out to n subscribers on other websockets. Think for example of a streaming application, where one person is broadcasting to n consumers.
Neglecting things like authentication, what would be the simplest way to build a server that can achieve this? I'm a little confused about what would happen when a chunk of data hits the server. It would go into a buffer in memory, then how would it be distributed to the n consumers waiting for it? Some sort of circular buffer? Are websockets an appropriate protocol for this? Thanks.
Here's one using the Ruby Plezi framework (I'm the author, so I'm biased):
require 'plezi'
class Client
# Plezi recognizes websocket handlers by the presence of the
# `on_message` callback.
def on_message data
true
end
protected
# this will be out event.
def publish data
write data
end
end
class Streamer
def on_message data
Client.broadcast :publish, data
end
end
# the streamer will connect to the /streamer path
route '/streamer', Streamer
# the client will connect to the /streamer path
route '/', Client
# on irb, we start the server by exiting the `irb` terminal
exit
You can test it with the Ruby terminal (irb) - it's that simple.
I tested the connections using the Websocket.org echo test with two browser windows, one "streaming" and the other listening.
use ws://localhost:3000/streamer for the streamer websocket connection
use ws://localhost:3000/ for the client's connection.
EDIT (relating to your comment regarding the Library and architecture)
The magic happens in the IO core, which I placed in a separate Ruby gem (Ruby libraries are referred to as 'gems') called Iodine.
Iodine leverages Ruby's Object Oriented approach (in Ruby, everything is an object) to handle broadcasting.
A good entry point for digging through that piece of the code is here. When you encounter the method each, note that it's inherited from the core Protocol and uses an Array derived from the IO map.
Iodine's websocket implementation iterates through the array of IO handlers (the value half of a key=>value map), and if the IO handler is a Websocket it will "broadcast" the message to that IO handler by invoking the on_broadcst callback. The callback is invoked asynchronously and it locks the IO handler while being executed, to avoid conflicts.
Plezi leverages Iodine's broadcast method and uses the same concept so that the on_broadcast callback will filter out irrelevant messages.
Unicasting works a little bit differently, for performance reasons, but it's mostly similar.
I'm sorry for using a lot of shorthand in my code... pre-Ruby habits I guess. I use the condition ? when_true : when_false shorthand a lot and tend to squish stuff into single lines... but it should be mostly readable.
Good luck!

Ruby Sockets and parallel event handling

I'm writing a library that can interact with a socket server that transmits data as events to certain actions my library sends it.
I created an Actions module that formats the actions so that the server can read it. It also generates an action_id, because the events parser can identify it with the action that sent it. There are more than one event per action possible.
While I'm sending my action to the server, the event parser is still getting data from the server, so they work independent from each other (but then again they do work together: events response aggregator triggers the action callback).
In my model, I want to get a list of some resource from the server. The server sends its data one line at a time, but that's being handled by the events aggregator, so don't worry about that.
Okay, my problem:
In my model I am requesting the resources, but since the events are being parsed in another thread, I need to do a "infinite" loop that checks if the list is filled, and then break out to return it to the consumer of the model (e.g. my controller).
Is there another (better) way of doing this or am I on the right track? I would love your thoughts :)
Here is my story in code: https://gist.github.com/anonymous/8652934
Check out Ruby EventMachine.
It's designed to simplify this sort of reactor pattern application.
It depends on the implementation. In the code you provide you're not showing how actually the request and responses are processed.
If you know exactly the number of responses you're supposed to receive, in each one you could check if all are completed, then execute an specific action. e.g.
# suppose response_receiver is the method which receives the server response
def response_receiver data
#responses_list << data
if #response_list.size == #expected_size
# Execute some action
end
end

boost::asio sending data faster than receiving over TCP. Or how to disable buffering

I have created a client/server program, the client starts
an instance of Writer class and the server starts an instance of
Reader class. Writer will then write a DATA_SIZE bytes of data
asynchronously to the Reader every USLEEP mili seconds.
Every successive async_write request by the Writer is done
only if the "on write" handler from the previous request had
been called.
The problem is, If the Writer (client) is writing more data into the
socket than the Reader (server) is capable of receiving this seems
to be the behaviour:
Writer will start writing into (I think) system buffer and even
though the data had not yet been received by the Reader it will be
calling the "on write" handler without an error.
When the buffer is full, boost::asio won't fire the "on write"
handler anymore, untill the buffer gets smaller.
In the meanwhile, the Reader is still receiving small chunks
of data.
The fact that the Reader keeps receiving bytes after I close
the Writer program seems to prove this theory correct.
What I need to achieve is to prevent this buffering because the
data need to be "real time" (as much as possible).
I'm guessing I need to use some combination of the socket options that
asio offers, like the no_delay or send_buffer_size, but I'm just guessing
here as I haven't had success experimenting with these.
I think that the first solution that one can think of is to use
UDP instead of TCP. This will be the case as I'll need to switch to
UDP for other reasons as well in the near future, but I would
first like to find out how to do it with TCP just for the sake
of having it straight in my head in case I'll have a similar
problem some other day in the future.
NOTE1: Before I started experimenting with asynchronous operations in asio library I had implemented this same scenario using threads, locks and asio::sockets and did not experience such buffering at that time. I had to switch to the asynchronous API because asio does not seem to allow timed interruptions of synchronous calls.
NOTE2: Here is a working example that demonstrates the problem: http://pastie.org/3122025
EDIT: I've done one more test, in my NOTE1 I mentioned that when I was using asio::iosockets I did not experience this buffering. So I wanted to be sure and created this test: http://pastie.org/3125452 It turns out that the buffering is there event with asio::iosockets, so there must have been something else that caused it to go smoothly, possibly lower FPS.
TCP/IP is definitely geared for maximizing throughput as intention of most network applications is to transfer data between hosts. In such scenarios it is expected that a transfer of N bytes will take T seconds and clearly it doesn't matter if receiver is a little slow to process data. In fact, as you noticed TCP/IP protocol implements the sliding window which allows the sender to buffer some data so that it is always ready to be sent but leaves the ultimate throttling control up to the receiver. Receiver can go full speed, pace itself or even pause transmission.
If you don't need throughput and instead want to guarantee that the data your sender is transmitting is as close to real time as possible, then what you need is to make sure the sender doesn't write the next packet until he receives an acknowledgement from the receiver that it has processed the previous data packet. So instead of blindly sending packet after packet until you are blocked, define a message structure for control messages to be sent back from the receiver back to the sender.
Obviously with this approach, your trade off is that each sent packet is closer to real-time of the sender but you are limiting how much data you can transfer while slightly increasing total bandwidth used by your protocol (i.e. additional control messages). Also keep in mind that "close to real-time" is relative because you will still face delays in the network as well as ability of the receiver to process data. So you might also take a look at the design constraints of your specific application to determine how "close" do you really need to be.
If you need to be very close, but at the same time you don't care if packets are lost because old packet data is superseded by new data, then UDP/IP might be a better alternative. However, a) if you have reliable deliver requirements, you might ends up reinventing a portion of tcp/ip's wheel and b) keep in mind that certain networks (corporate firewalls) tend to block UDP/IP while allowing TCP/IP traffic and c) even UDP/IP won't be exact real-time.

Concurrent web requests with Ruby (Sinatra?)?

I have a Sinatra app that basically takes some input values and then finds data matching those values from external services like Flickr, Twitter, etc.
For example:
input:"Chattanooga Choo Choo"
Would go out and find images at Flickr on the Chattanooga Choo Choo and tweets from Twitter, etc.
Right now I have something like:
#images = Flickr::...find...images..
#tweets = Twitter::...find...tweets...
#results << #images
#results << #tweets
So my question is, is there an efficient way in Ruby to run those requests concurrently? Instead of waiting for the images to finish before the tweets finish.
Threads would work, but it's a crude tool. You could try something like this:
flickr_thread = Thread.start do
#flickr_result = ... # make the Flickr request
end
twitter_thread = Thread.start do
#twitter_result = ... # make the Twitter request
end
# this makes the main thread wait for the other two threads
# before continuing with its execution
flickr_thread.join
twitter_thread.join
# now both #flickr_result and #twitter_result have
# their values (unless an error occurred)
You'd have to tinker a bit with the code though, and add proper error detection. I can't remember right now if instance variables work when declared inside the thread block, local variables wouldn't unless they were explicitly declared outside.
I wouldn't call this an elegant solution, but I think it works, and it's not too complex. In this case there is luckily no need for locking or synchronizations apart from the joins, so the code reads quite well.
Perhaps a tool like EventMachine (in particular the em-http-request subproject) might help you, if you do a lot of things like this. It could probably make it easier to code at a higher level. Threads are hard to get right.
You might consider making a client side change to use asynchronous Ajax requests to get each type (image, twitter) independently. The problem with server threads (one of them anyway) is that if one service hangs, the entire request hangs waiting for that thread to finish. With Ajax, you can load an images section, a twitter section, etc, and if one hangs the others will still show their results; eventually you can timeout the requests and show a fail whale or something in that section only.
Yes why not threads?
As i understood. As soon as the user submit a form, you want to process all request in parallel right? You can have one multithread controller (Ruby threads support works really well.) where you receive one request, then you execute in parallel the external queries services and then you answer back in one response or in the client side you send one ajax post for each service and process it (maybe each external service has your own controller/actions?)
http://github.com/pauldix/typhoeus
parallel/concurrent http requests
Consider using YQL for this. It supports subqueries, so that you can pull everything you need with a single (client-side, even) call that just spits out JSON of what you need to render. There are tons of tutorials out there already.

Resources