Ruby: How do I receive and send data parallel? - ruby

I want to create an application that sends and receives data parallel, like a chat application. It gets input and also sends some output, but NOT only if it receives data. I want to use UDP as protocol. I'm using ruby 1.9.3.
here's the code that receives data:
#s = UDPSocket.new
#s.bind(localhost, 1234)
Socket.udp_server_loop_on([#s]) do |message, sender|
#do something
end
This code should run independent from the rest of the application, it shouldn't block it.
Should I use a thread? I've never tried a network program and I'm not a professional developer, so please be patient. Perhaps my code/design is just crap, so feel free to tell me how this is done by professionals! ;)

UDP lends itself this sort of non-blocking processing quite naturally since you're receiving individual, atomic messages over your socket and can reply in the same fashion.
Inside that loop, just be sure to process things quickly and send response messages. If you make long blocking calls it will hold up your loop and affect response times.
EventMachine provides a structure for writing asynchronous applications and has its own methods for handling UDP and TCP sockets.
Don't forget to look at solutions which are already implemented. For chat applications, Socket.IO is a great place to start.

You should take a look at Eventmachine gem which handles blocking IO very efficiently.
Among others it also offers TCP and UDP server/client API.

Related

Detecting socket connection using ZeroMQ STREAM sockets

I am building a new application that receives data from a number of external devices and needs to make it available to a number of different components. ZeroMQ seems purpose-built for the "data bus" aspect of my architecture.
I recently became aware that zmq STREAM sockets can connect to native TCP sockets and send/received messages. Using zmq throughout has a lot of appeal, but I have one problem that I don't know how to get around.
One of my devices needs to be set up. That is, I connect a socket to it, send it some configuration information, then sit back and wait for it to send me data. The device also has a "reset" capability (useful in some contexts), that requires re-sending the configuration information. Doing this depends upon having visibility to the setup/tear-down stage of the socket interface. I need to know when a new connection is established, so I can send the necessary configuration messages.
It seems that zmq is purposely designed to shield me from that knowledge. Is there a way to do what I want? Or should I just use regular sockets for this interface?
Well, it turns out that reading (the right version of) the fine manual can be instructive.
When a connection is made, a zero-length message will be received by the application. Similarly, when the peer disconnects (or the connection is lost), a zero-length message will be received by the application.
I guess all that remains is to disambiguate between connect and disconnect. Still looking for advice from the community, if others have dealt with this situation before.
Following up on your own answer, I would hesitate to rely on that zero length connect/disconnect message as your whole strategy - that seems needlessly fragile. It's not clear to me from your question which end is persistent and which end needs configuration information, but I expect that one end knows it's resetting and reconnecting, and that end needs configuration information from the peer, so it should ask for it with a message when it needs it, to which the peer responds with the requested information.
If the peer does not yet have the required configuration information before it receives some other message, it could either queue up that work or it could respond back with the need for the config, and then have the rest of the network handle that need appropriately.
You shouldn't need stream/tcp sockets to make that work, it should work with more standard ZMQ socket types, you just need to build the robustness into your application rather than trying to get it for free from TCP/socket actions.
If I've missed your point, and what I'm suggesting won't work for some reason, you will have to give more specific information about your network topology for anyone else to understand what a suitable solution might be.

How do I do multiple publishers with a single endpoint in ZeroMQ?

I'm attempting to do a pub/sub architecture where multiple publishers and multiple subscribers exist on the same bus. According to what I've read on the internet, only one socket should ever call bind(), and all others (whether pub or sub) should call connect().
The problem is, with this approach I'm finding that only the publisher that actually calls bind() on the socket ever publishes messages. All of my publishers that call connect() seem to fail silently and don't actually publish any messages to the bus. I've confirmed this isn't a subscriber key issue, as I've written a simple "sniffer" app that subscribes to all messages on the bus, and it is only showing the publisher that called bind().
If I attempt multiple binds with the publisher, the "expected" zmq behavior of silently stealing the bus occurs with ipc, and a port in use error is thrown with tcp.
I've verified this behavior with ipc and tcp endpoints, but ultimately the full system will be using epgm. I assume (though of course may be wrong) that in this situation I wouldn't need a broker since there's no dynamic discovery occurring (endpoints are known, whether ipc, tcp, or epgm multicast).
Is there something I'm missing, perhaps a socket setting, that would be causing the connecting publishers to not actually send their data? According to the literature I've seen on the internet, I'm doing things the "correct" way but it still doesn't work.
For reference, my publisher class has the following methods for setting up the endpoint:
ZmqPublisher::ZmqPublisher()
: m_zmqContext(1), m_zmqSocket(m_zmqContext, ZMQ_PUB)
{}
void ZmqPublisher::bindEndpoint(std::string ep)
{
m_zmqSocket.bind(ep.c_str());
}
void ZmqPublisher::connect(std::string ep)
{
m_zmqSocket.connect(ep.c_str());
}
So ultimately, my question is this: What is the proper way to handle multiple publishers on the same endpoint, and why am I not seeing messages from more than one publisher?
It may or may not be relevant, but The 0MQ Guide has the following slightly enigmatic remark:
In theory with ØMQ sockets, it does not matter which end connects and which end binds. However, in practice there are undocumented differences that I'll come to later. For now, bind the PUB and connect the SUB, unless your network design makes that impossible.
I've not yet discovered where the "come to later" actually happens, but I don't use pub/sub so much, and haven't read the "Advanced Pub-Sub Patterns" part of the guide in great detail.
However, the idea of multiple publishers on a single end-point, to me, suggests the need for an XPUB/XSUB style broker; it's not about dynamic discovery, it's about single point of contact and routing. Ultimately, I think a broker-based topology would simplify your application, and make it easier to identify problems.
Your mistake was that you call a single publisher with bind and others with connect. This is not supported with plain PUB-SUB pattern.
Plain PUB-SUB in ZeroMQ supports only two scenarios (see the image below):
single publisher, multiple subscribers
single subscriber, multiple publishers
In both cases, the party that is "single" must bind and the party that is "multiple" must connect. Otherwise, if you want many-to-many, you can use XPUB-XSUB or some other pattern.

Asynchronous methods calls with Ruby like with Ajax

I am working with XMPP and I have a message callback which is activated on the event of every message being sent. My aim is to send the data arriving by the message to an API within the callback and based on the response send something back using the XMPP client.
User type message (Browser Chat Client)
Message arrives to the server via XMPP
Message is sent to the API
Response is received
Response is sent back to the chat client.
My code for this is as follows
admin_muc_client.activate_message_callbacks do |m|
sender = m.from.resource
room_slug = m.from.node
message = m.body
r = HTTParty.get('http://example.com/api/v1/query?msg=message')
Rails.logger.debug(r.inspect)
admin_muc_client.send_message("Message #{r.body}") if m.from.resource != 'admin'
end
My concern here is that since the callback is evented and the request to the API would be a blocking call this could become a bottleneck for the entire application.
I would prefer to use something like AJAX for Javascript which would fire the request and when the response is available send data. How could I implement that in Ruby?
I have looked at delayed_job and backgroundrb which look like tools for fire and forget operations. I would need something that activates a callback in an asynchronous manner with the response.
I would really really appreciate some help on how to achieve the asynchronous behavior that i want. I am also familiar with message queues like RabbitMQ which I feel would be addition of significant complexity.
Did you look at girl_friday? From it's wiki -
girl_friday is a Ruby library for performing asynchronous tasks. Often times you don't want to block a web response by performing some task, like sending an email, so you can just use this gem to perform it in the background. It works with any Ruby application, including Rails 3 applications.
Why not use any of the zillions of other async solutions (Resque, dj, etc)? Because girl_friday is easier and more efficient than those solutions: girl_friday runs in your Rails process and uses the actor pattern for safe concurrency. Because it runs in the same process, you don't have to monitor a separate set of processes, deploy a separate codebase, waste hundreds of extra MB in RAM for those processes, etc. See my intro to Actors in Ruby for more detail.
You do need to write thread-safe code. This is not hard to do: the actor pattern means that you get a message and process that message. There is no shared data which requires locks and could lead to deadlock in your application code. Because girl_friday does use Threads under the covers, you need to ensure that your Ruby environment can execute Threads efficiently. JRuby, Rubinius 1.2 and Ruby 1.9.2 should be sufficient for most applications. I do not support Ruby 1.8 because of its poor threading support.
I think this is what you are looking for.

Messaging system design

I am looking for a way to send requests and receive call backs from another party.
The only gotcha is that we do not now how it will be designed/deployed on the receiver side.
We do have the text/JSON based messages defined and agreed upon.
Looked at RabbitMQ and others, but each requires a server that would need to be maintained.
Thanks,
RabbitMQ is pretty easy to maintain. You would use two queues, one for requests and the other for replies. Use the AMQP correlation_id header to tag requests and replies so that when a reply message is received it can be matched with the orginal request.
However, if a broker is not for you, then use ZeroMQ. It is a client library available for a dozen or more languages and it enforces messaging patterns over top of sockets. This means that your app does not have to do all the low level socket management. Instead you declare the socket as REQ/REP and ZeroMQ handles all the rest. You just send messages in any format you desire, and you get messages back.
I've used ZeroMQ to implement a memcache style application in Python using REQ/REP.
#user821692: You have to agree not only message format but also destination/transport protocol. For e.g. if both communicating parties has access to same queue physically located anywhere, then they can communicate pre-defined messages. You may also look of sending messages over HTTP..

Web sockets make ajax/CORS obsolete?

Will web sockets when used in all web browsers make ajax obsolete?
Cause if I could use web sockets to fetch data and update data in realtime, why would I need ajax? Even if I use ajax to just fetch data once when the application started I still might want to see if this data has changed after a while.
And will web sockets be possible in cross-domains or only to the same origin?
WebSockets will not make AJAX entirely obsolete and WebSockets can do cross-domain.
AJAX
AJAX mechanisms can be used with plain web servers. At its most basic level, AJAX is just a way for a web page to make an HTTP request. WebSockets is a much lower level protocol and requires a WebSockets server (either built into the webserver, standalone, or proxied from the webserver to a standalone server).
With WebSockets, the framing and payload is determined by the application. You could send HTML/XML/JSON back and forth between client and server, but you aren't forced to. AJAX is HTTP. WebSockets has a HTTP friendly handshake, but WebSockets is not HTTP. WebSockets is a bi-directional protocol that is closer to raw sockets (intentionally so) than it is to HTTP. The WebSockets payload data is UTF-8 encoded in the current version of the standard but this is likely to be changed/extended in future versions.
So there will probably always be a place for AJAX type requests even in a world where all clients support WebSockets natively. WebSockets is trying to solve situations where AJAX is not capable or marginally capable (because WebSockets its bi-directional and much lower overhead). But WebSockets does not replace everything AJAX is used for.
Cross-Domain
Yes, WebSockets supports cross-domain. The initial handshake to setup the connection communicates origin policy information. The wikipedia page shows an example of a typical handshake: http://en.wikipedia.org/wiki/WebSockets
I'll try to break this down into questions:
Will web sockets when used in all web browsers make ajax obsolete?
Absolutely not. WebSockets are raw socket connections to the server. This comes with it's own security concerns. AJAX calls are simply async. HTTP requests that can follow the same validation procedures as the rest of the pages.
Cause if I could use web sockets to fetch data and update data in realtime, why would I need ajax?
You would use AJAX for simpler more manageable tasks. Not everyone wants to have the overhead of securing a socket connection to simply allow async requests. That can be handled simply enough.
Even if I use ajax to just fetch data once when the application started I still might want to see if this data has changed after a while.
Sure, if that data is changing. You may not have the data changing or constantly refreshing. Again, this is code overhead that you have to account for.
And will web sockets be possible in cross-domains or only to the same origin?
You can have cross domain WebSockets but you have to code your WS server to accept them. You have access to the domain (host) header which you can then use to accept / deny requests. This can, however, be spoofed by something as simple as nc. In order to truly secure the connection you will need to authenticate the connection by other means.
Websockets have a couple of big downsides in terms of scalability that ajax avoids. Since ajax sends a request/response and closes the connection (..or shortly after) if someone stays on the web page it doesn't use server resources when idling. Websockets are meant to stream data back to the browser, and they tie up server resources to do so. Servers have a limit in how many simultaneous connections they can keep open at one time. Not to mention depending on your server side technology, they may tie up a thread to handle the socket. So websockets have more resource intensive requirements for both sides per connection. You could easily exhaust all of your threads servicing clients and then no new clients could come in if lots of users are just sitting on the page. This is where nodejs, vertx, netty can really help out, but even those have upper limits as well.
Also there is the issue of state of the underlying socket, and writing the code on both sides that carry on the stateful conversation which isn't something you have to do with ajax style because it's stateless. Websockets require you create a low level protocol which is solved for you with ajax. Things like heart beating, closing idle connections, reconnection on errors, etc are vitally important now. These are things you didn't have to solve when using AJAX because it was stateless. State is very important to the stability of your app and more importantly the health of your server. It's not trivial. Pre-HTTP we built a lot of stateful TCP protocols (FTP, telnet, SSH), and then HTTP happened. And no one did that stuff much anymore because even with its limitations HTTP was surprisingly easier and more robust. Websockets bring back the good and the bad of stateful protocols. You'll learn soon enough if you didn't get a dose of that last go around.
If you need streaming of realtime data this extra overhead is warranted because polling the server to get streamed data is worse, but if all you are doing is user interaction->request->response->update UI, then ajax is easier and will use less resources because once the response is sent the conversation is over and no additional server resources are used. So I think it's a tradeoff and the architect has to decide which tool fits their problem. AJAX has its place, and websockets have their place.
Update
So the architecture of your server is what matters when we are talking about threads. If you are using a traditionally multi-threaded server (or processes) where a each socket connection gets its own thread to respond to requests then websockets matter a lot to you. So for each connection we have a socket, and eventually the OS will fall over if you have too many of these, and the same goes for threads (more so for processes). Threads are heavier than sockets (in terms of resources) so we try and conserve how many threads we have running simultaneously. That means creating a thread pool which is just a fixed number of threads that is shared among all sockets. But once a socket is opened the thread is used for the entire conversation. The length of those conversations govern how quickly you can repurpose those threads for new sockets coming in. The length of your conversation governs how much you can scale. However if you are streaming this model doesn't work well for scaling. You have to break the thread/socket design.
HTTP's request/response model makes it very efficient in turning over threads for new sockets. If you are just going to use request/response use HTTP its already built and much easier than reimplementing something like that in websockets.
Since websockets don't have to be request/response as HTTP and can stream data if your server has a fixed number of threads in its thread pool and you have the same number of websockets tying up all of your threads with active conversations, you can't service new clients coming in! You've reached your maximum capacity. That's where protocol design is important too with websockets and threads. Your protocol might allow you to loosen the thread per socket per conversation model that way people just sitting there don't use a thread on your server.
That's where asynchronous single thread servers come in. In Java we often call this NIO for non-blocking IO. That means it's a different API for sockets where sending and receiving data doesn't block the thread performing the call.
So traditional in blocking sockets when you call socket.read() or socket.write() they wait until the data is received or sent before returning control to your program. That means your program is stuck waiting for the socket data to come in or go out until you can do anything else. That's why we have threads so we can do work concurrently (at the same time). Send this data to client X while I wait on data from client Y. Concurrencies is the name of the game when we talk about servers.
In a NIO server we use a single thread to handle all clients and register callbacks to be notified when data arrives. For example
socket.read( function( data ) {
// data is here! Now you can process it very quickly without waiting!
});
The socket.read() call will return immediately without reading any data, but our function we provided will be called when it comes in. This design radically changes how you build and architect your code because if you get hung up waiting on something you can't receive any new clients. You have a single thread you can't really do two things at once! You have to keep that one thread moving.
NIO, Asynchronous IO, Event based program as this is all known as, is a much more complicated system design, and I wouldn't suggest you try and write this if you are starting out. Even very Senior programmers find it very hard to build a robust systems. Since you are asynchronous you can't call APIs that block. Like reading data from the DB or sending messages to other servers have to be performed asynchronously. Even reading/writing from the file system can slow your single thread down lowering your scalability. Once you go asynchronous it's all asynchronous all the time if you want to keep the single thread moving. That's where it gets challenging because eventually you'll run into an API, like DBs, that is not asynchronous and you have to adopt more threads at some level. So a hybrid approaches are common even in the asynchronous world.
The good news is there are other solutions that use this lower level API already built that you can use. NodeJS, Vertx, Netty, Apache Mina, Play Framework, Twisted Python, Stackless Python, etc. There might be some obscure library for C++, but honestly I wouldn't bother. Server technology doesn't require the very fastest languages because it's IO bound more than CPU bound. If you are a die hard performance nut use Java. It has a huge community of code to pull from and it's speed is very close (and sometimes better) than C++. If you just hate it go with Node or Python.
Yes, yes it does. :D
The earlier answers lack imagination. I see no more reason to use AJAX if websockets are available to you.

Resources