Server architecture: websocket multicast server? - websocket

What would be the simplest way to build a server that receives incoming connections via a websocket, and streams the data flowing in that socket out to n subscribers on other websockets. Think for example of a streaming application, where one person is broadcasting to n consumers.
Neglecting things like authentication, what would be the simplest way to build a server that can achieve this? I'm a little confused about what would happen when a chunk of data hits the server. It would go into a buffer in memory, then how would it be distributed to the n consumers waiting for it? Some sort of circular buffer? Are websockets an appropriate protocol for this? Thanks.

Here's one using the Ruby Plezi framework (I'm the author, so I'm biased):
require 'plezi'
class Client
# Plezi recognizes websocket handlers by the presence of the
# `on_message` callback.
def on_message data
true
end
protected
# this will be out event.
def publish data
write data
end
end
class Streamer
def on_message data
Client.broadcast :publish, data
end
end
# the streamer will connect to the /streamer path
route '/streamer', Streamer
# the client will connect to the /streamer path
route '/', Client
# on irb, we start the server by exiting the `irb` terminal
exit
You can test it with the Ruby terminal (irb) - it's that simple.
I tested the connections using the Websocket.org echo test with two browser windows, one "streaming" and the other listening.
use ws://localhost:3000/streamer for the streamer websocket connection
use ws://localhost:3000/ for the client's connection.
EDIT (relating to your comment regarding the Library and architecture)
The magic happens in the IO core, which I placed in a separate Ruby gem (Ruby libraries are referred to as 'gems') called Iodine.
Iodine leverages Ruby's Object Oriented approach (in Ruby, everything is an object) to handle broadcasting.
A good entry point for digging through that piece of the code is here. When you encounter the method each, note that it's inherited from the core Protocol and uses an Array derived from the IO map.
Iodine's websocket implementation iterates through the array of IO handlers (the value half of a key=>value map), and if the IO handler is a Websocket it will "broadcast" the message to that IO handler by invoking the on_broadcst callback. The callback is invoked asynchronously and it locks the IO handler while being executed, to avoid conflicts.
Plezi leverages Iodine's broadcast method and uses the same concept so that the on_broadcast callback will filter out irrelevant messages.
Unicasting works a little bit differently, for performance reasons, but it's mostly similar.
I'm sorry for using a lot of shorthand in my code... pre-Ruby habits I guess. I use the condition ? when_true : when_false shorthand a lot and tend to squish stuff into single lines... but it should be mostly readable.
Good luck!

Related

Defer with celluloid ZMQ causing data issue

Sample Code to explain my problem.
Sender
The Sender, that sends the request to the Server(over ZMQ) in format
["sender-1", "sender-1-bdc1c14624076b691b8d9e15fbd754f1"]
..
["sender-99","sender-99-a2123d02c2989ef57da370bb13ba60e4"]
Server
The Server, upon receiving the data from sender relays it back to receiver in the same format.
["sender-1", "sender-1-bdc1c14624076b691b8d9e15fbd754f1"]
...
["sender-99","sender-99-a2123d02c2989ef57da370bb13ba60e4"]
Receiver
The Receiver, upon receiving the request from the Server just prints the message.
Description:
If I don't use a mutex defined in this line(inside the server). I see some data appearing at the receiver end which does not adhere to the above format/standard.
Example the Server would print (at this line)
"Sending sender-97 -- sender-97-9a284488454c8e8fd22bbbcf678895e9"
"Sending sender-98 -- sender-98-447eb5be94e7f6d949e764d7c88239ad"
But on the receiver end, I see messages that look like this.
sender-97 -- sender-98
sender-98-22d9f01a8f801f559a9ff4e388813251 --
Question:
To me, it seems like a possible Thread issue(I may be wrong). Wherein the data that is passed to (inside Server) Celluloid (send) is getting changed by other thread.
I was under the impression that Celluloid handles almost all of your threads issue.
Is my understanding correct about all this?
You need an asynchronous logger.
Whenever you use puts you are outputting to a buffer which is actually very slow, compared to other things. What you ought to do, is use an actor used purely in the place of puts everywhere.
The best thing would be to instantiate a supervised actor, say named :logger and then interact with it inside your other actors by Celluloid[:logger].async.ouput() or else forward output to Celluloid[:logger].async so that every time you use output or console or something like that, it fires off an asynchronous method call. That way, even though your actor doing work is moving on to something else, the console output will still be in perfect sequence.
Your console output is being mangled because of the above issue with asynchrony in programming itself, not Celluloid ... and this is an extremely common problem with and without ZeroMQ.
Yes defer is disrupting the sequence of your Receiver output, but no this is not a Thread error in Celluloid. This is the natural behavior of asynchronous operations. In this case, you need to remove defer {} and keep async.received_data() like you have it.
Otherwise, as you see, Server will bombard Receiver with messages out of sequence. It doesn't help either that you're directly managing threads on top of all this, rather than letting Celluloid::ZMQ do it for you.
Also: I would remove all "external" thread management, and remove the defer {} wrapper... and let Celluloid::ZMQ smooth everything out, and keep sequence for you.

Ruby Sockets and parallel event handling

I'm writing a library that can interact with a socket server that transmits data as events to certain actions my library sends it.
I created an Actions module that formats the actions so that the server can read it. It also generates an action_id, because the events parser can identify it with the action that sent it. There are more than one event per action possible.
While I'm sending my action to the server, the event parser is still getting data from the server, so they work independent from each other (but then again they do work together: events response aggregator triggers the action callback).
In my model, I want to get a list of some resource from the server. The server sends its data one line at a time, but that's being handled by the events aggregator, so don't worry about that.
Okay, my problem:
In my model I am requesting the resources, but since the events are being parsed in another thread, I need to do a "infinite" loop that checks if the list is filled, and then break out to return it to the consumer of the model (e.g. my controller).
Is there another (better) way of doing this or am I on the right track? I would love your thoughts :)
Here is my story in code: https://gist.github.com/anonymous/8652934
Check out Ruby EventMachine.
It's designed to simplify this sort of reactor pattern application.
It depends on the implementation. In the code you provide you're not showing how actually the request and responses are processed.
If you know exactly the number of responses you're supposed to receive, in each one you could check if all are completed, then execute an specific action. e.g.
# suppose response_receiver is the method which receives the server response
def response_receiver data
#responses_list << data
if #response_list.size == #expected_size
# Execute some action
end
end

Ruby HTTP server without networking

I am trying to add an HTTP server to an existing Ruby application. The application is based around a select loop, and I want to handle incoming HTTP requests there too (it is important to process the requests in the same thread, or I have to jump through hoops to marshal them there).
Ruby has plenty of solutions for standalone HTTP servers, but I can't seem to find a library which implements an HTTP server on an existing socket. I don't want the HTTP library to open a port and wait, I want to feed it sockets.
The basic logic I'm looking for is this:
handler = SomeHTTPParsingLibrary.new
# set up handler callbacks, etc on handler...
while socket = get_incoming_connection()
handler.handle_request(socket)
end
Are there any existing Ruby libraries that can work like this? HTTP is a simple enough protocol, but there are enough irritating details involved (I need cookies, basic auth, etc) that I'd rather not roll my own.
You may have to roll your sleeves up a bit to figure out what methods to call, but I'd suggest trying the HTTPParser class from within mongrel.
A quick glance through the code in httprequest.rb (webrick - from ruby stdlib) seems like it might suit your purpose.
A WEBrick::HTTPRequest object is able to accept a socket as an argument to its parse() method. It will then block, and return when the request object has been fully populated with the incoming HTTP request.
eg:
res = HTTPResponse.new(#config)
req = HTTPRequest.new(#config)
# some code to "select" a socket goes here
# sock is active, hand it over to the req object for reading.
req.parse(sock)
res.request_method = req.request_method
Of course, this assumes that this thread will block will the current request handling is complete.
OTOH, something like tmm1/http_parser.rb might also fit your needs, but sacrifice other things (like handling cookies) in favor of speed.

realtime communication with ruby

I'm about to write a game server with ruby. One feature of the game includes player walking around & others should be able to see it.
I've already written a pure socket demo using event machine. But since most of the communication are going to be http-based, so I'm looking for some http polling solution. And of course I could write it with event machine, but is there any gem out there for this kind of job already?
I've tried something like faye, but most of these are for a messaging system, like subscribing & publish to a channel, I seem not to be able to control what clients I should push to. In my case I need to be able to push to specific clients, like one guy moves from 10,10 to 20,20, only those around him (maybe from 0,0 to 30,30, but not a guy at 40,50) need to receive the message.
------------pregress with cramp
Here's a quick update. I'm working on cramp, with 5000 connections, and 100 client move each second, the CPU usage is almost 100%. When I double both figures, the CPU usage is still 100% or so, and the response is very slow.
Clearly I'm not using every resource I had, instead there's only one CPU core occupied. Need more work on it.
------------Node.js's turn
#aam1r
Actually Node.js is doing better than cramp. With 5000 connections and 100 client moving per seoncd, the Cpu usage is over 60%. When I doubled to 10000 connections and 200 client moving per second, the CPU usage is 100% and response is becoming slow. Same problem here, either cramp or Node.js can only use one cpu core per process. That's a problem.
------------What about JRuby?
Because of the presence of GIL, there's no true multi-thread simultanious execution with Ruby MRI. None with Node.js either.So I'm going to give JRuby a try.
When a client moves, use another thread to find all the other clients need to notify(which is a CPU-heavy work). Then push the result to a channel.
The main thread simply subscribes the channel. When it gets the result, push them to the clients.
Need some time to write a demo though.
I would recommend using Espresso with Server-Sent Events.
On the server-side you define a streaming action:
class App < E
map :/
attr_reader :connections
def subscribe
#connections ||= []
stream :keep_open do |conn|
connections << conn
conn.callback { connections.delete conn }
end
end
private
def communicate_to_clients
connections.each do |conn|
conn << 'some message'
end
end
The :keep_open option will instruct the server to not close connection.
Then open a connection with Javascript:
pool = new EventSource('/subscribe');
pool.on_message = function(msg) {
// here you receive messages sent by server
// via communicate_to_clients method
}
I would suggest not using polling. Polling would result in too much overhead since you'll be making new connections every time make a new request. Also, it won't be real-time enough for you (i.e. you will poll every X seconds -- not instantly)
Instead, I would suggest using something like Cramp. From their website:
Cramp is a fully asynchronous real-time web application framework in
Ruby. It is built on top of EventMachine and primarily designed for
working with larger number of open connections and providing
full-duplex bi-directional communication.
All your clients would maintain a persistent connection through which they can send/receive messages. There won't be overhead of making a new connection every time and messages will be sent in real-time since clients won't be checking "every X seconds".
You can also use Node.js instead of Cramp. It's a Javascript framework that can be used to develop real-time applications.
Here are some more resources that should help you out:
Slideshow on using Node.js with Ruby
Discussion on "Real time ruby apps: CRAMP vs NODE.JS"

Rack concurrency - rack.multithread, async.callback, or both?

I'm attempting to fully understand the options for concurrent request handling in Rack. I've used async_sinatra to build a long-polling app, and am now experimenting with bare-metal Rack using throw :async and/or Thin's --threaded flag. I am comfortable with the subject, but there are some things I just can't make sense of. (No, I am not mistaking concurrency for parallelism, and yes, I do understand the limitations imposed by the GIL).
Q1. My tests indicate that thin --threaded (i.e. rack.multithread=true) runs requests concurrently in separate threads (I assume using EM), meaning long-running request A will not block request B (IO aside). This means my application does not require any special coding (e.g. callbacks) to achieve concurrency (again, ignoring blocking DB calls, IO, etc.). This is what I believe I have observed - is it correct?
Q2. There is another, more oft discussed means of achieving concurrency, involving EventMachine.defer and throw :async. Strictly speaking, requests are not handled using threads. They are dealt with serially, but pass their heavy lifting and a callback off to EventMachine, which uses async.callback to send a response at a later time. After request A has offloaded its work to EM.defer, request B is begun. Is this correct?
Q3. Assuming the above are more-or-less correct, is there any particular advantage to one method over the other? Obviously --threaded looks like a magic bullet. Are there any downsides? If not, why is everyone talking about async_sinatra / throw :async / async.callback ? Perhaps the former is "I want to make my Rails app a little snappier under heavy load" and the latter is better-suited for apps with many long-running requests? Or perhaps scale is a factor? Just guessing here.
I'm running Thin 1.2.11 on MRI Ruby 1.9.2. (FYI, I have to use the --no-epoll flag, as there's a long-standing, supposedly-resolved-but-not-really problem with EventMachine's use of epoll and Ruby 1.9.2. That's beside the point, but any insight is welcome.)
Note: I use Thin as synonym for all web servers implementing the async Rack extension (i.e. Rainbows!, Ebb, future versions of Puma, ...)
Q1. Correct. It will wrap the response generation (aka call) in EventMachine.defer { ... }, which will cause EventMachine to push it onto its built-in thread pool.
Q2. Using async.callback in conjunction with EM.defer actually makes not too much sense, as it would basically use the thread-pool, too, ending up with a similar construct as described in Q1. Using async.callback makes sense when only using eventmachine libraries for IO. Thin will send the response to the client once env['async.callback'] is called with a normal Rack response as argument.
If the body is an EM::Deferrable, Thin will not close the connection until that deferrable succeeds. A rather well kept secret: If you want more than just long polling (i.e. keep the connection open after sending a partial response), you can also return an EM::Deferrable as body object directly without having to use throw :async or a status code of -1.
Q3. You're guessing correct. Threaded serving might improve the load on an otherwise unchanged Rack application. I see a 20% improve for simple Sinatra applications on my machine with Ruby 1.9.3, even more when running on Rubinius or JRuby, where all cores can be utilized. The second approach is useful if you write your application in an evented manner.
You can throw a lot of magic and hacks on top of Rack to have a non-evented application make use of those mechanisms (see em-synchrony or sinatra-synchrony), but that will leave you in debugging and dependency hell.
The async approach makes real sense with applications that tend to be best solved with an evented approach, like a web chat. However, I would not recommend using the threaded approach for implementing long-polling, because every polling connection will block a thread. This will leave you with either a ton of threads or connections you can't deal with. EM's thread pool has a size of 20 threads by default, limiting you to 20 waiting connections per process.
You could use a server that creates a new thread for every incoming connection, but creating threads is expensive (except on MacRuby, but I would not use MacRuby in any production app). Examples are serv and net-http-server. Ideally, what you want is an n:m mapping of requests and threads. But there's no server out there offering that.
If you want to learn more on the topic: I gave a presentation about this at Rocky Mountain Ruby (and a ton of other conferences). A video recording can be found on confreaks.

Resources