How to connect to multiple WebSockets with Ruby? - ruby

Using faye-websocket and EventMachine the code looks very similar to faye-websocket's client example:
require 'faye/websocket'
require 'eventmachine'
def setup_socket(url)
EM.run {
ws = Faye::WebSocket::Client.new(url)
ws.on :open do ... end
ws.on :message do ... end
ws.on :close do ... end
}
end
I'd like to have multiple connections open parallely. I can't simply call setup_socket multiple times as the execution won't exit the EM.run clause. I've tried to run setup_socket multiple times in separate threads as:
urls.each do |url|
Thread.new { setup_socket(url) }
end
But it doesn't seem to do anyhting as the puts statements don't reach the output.
I'm not restricted to use faye-websocket but it seemed most people use this library. If possible I'd like to avoid multithreading. I'd also not like to lose the possiblity to make changes (e.g. add a new websocket) over time. Therefore moving the iteration of URLs inside the EM.run clause is not desired but instead starting multiple EMs would be more beneficial. I found an example for starting multiple servers via EM in a very clean way. I'm looking for something similar.
How can I connect to multiple WebSockets at the same time?

Here's one way to do it.
First, you have to accept that the EM thread needs to be running. Without this thread you won't be able to process any current connections. So you just can't get around that.
Then, in order to add new URLs to the EM thread you then need some way to communicate from the main thread to the EM thread, so you can tell it to launch a new connection. This can be done with EventMachine::Channel.
So what we can build now is something like this:
#channel = EventMachine::Channel.new
Thread.new {
EventMachine.run {
#channel.subscribe { |url|
ws = Faye::...new(url)
...
}
}
}
Then in the main thread, any time you want to add a new URL to the event loop, you just use this:
def setup_socket(url)
#channel.push(url)
end

Here's another way to do it... Use Iodine's native websocket support (or the Plezi framework) instead of em-websocket...
...I'm biased (I'm the author), but I think they make it a lot easier. Also, Plezi offers automatic scaling with Redis, so it's easy to grow.
Here's an example using Plezi, where each Controller acts like a channel, with it's own URL and Websocket callback (although I think Plezi's Auto-Dispatch is easier than the lower level on_message callback). This code can be placed in a config.ru file:
require 'plezi'
# Once controller / channel for all members of the "Red" group
class RedGroup
def index # HTTP index for the /red URL
"return the RedGroup client using `:render`".freeze
end
# handle websocket messages
def on_message data
# in this example, we'll send the data to all the members of the other group.
BlueGroup.broadcast :handle_message, data
end
# This is the method activated by the "broadcast" message
def handle_message data
write data # write the data to the client.
end
end
# the blue group controller / channel
class BlueGroup
def index # HTTP index for the /blue URL
"return the BlueGroup client using `:render`".freeze
end
# handle websocket messages
def on_message data
# in this example, we'll send the data to all the members of the other group.
RedGroup.broadcast :handle_message, data
end
# This is the method activated by the "broadcast" message
def handle_message data
write data
end
end
# the routes
Plezi.route '/red', RedGroup
Plezi.route '/blue', BlueGroup
# Set the Rack application
run Plezi.app
P.S.
I wrote this answer also because em-websocket might fail or hog resources in some cases. I'm not sure about the details, but it was noted both on the websocket-shootout benchmark and the AnyCable Websocket Benchmarks.

Related

Single thread still handles concurrency request?

Ruby process is single thread. When we start a single process using thin server, why are we still able to handle concurrency request?
require 'sinatra'
require 'thin'
set :server, %w[thin]
get '/test' do
sleep 2 <----
"success"
end
What is inside thin that can handle concurrency request? If it is due to event-machine framework, the code above is actually a sync code which is not for EM used.
Quoting the chapter: "Non blocking IOs/Reactor pattern" in
http://merbist.com/2011/02/22/concurrency-in-ruby-explained/:
"this is the approach used by Twisted, EventMachine and Node.js. Ruby developers can use EventMachine or
an EventMachine based webserver like Thin as well as EM clients/drivers to make non blocking async calls."
The heart of the matter regard EventMachine.defer
*
used for integrating blocking operations into EventMachine's control flow.
The action of defer is to take the block specified in the first parameter (the "operation")
and schedule it for asynchronous execution on an internal thread pool maintained by EventMachine.
When the operation completes, it will pass the result computed by the block (if any)
back to the EventMachine reactor.
Then, EventMachine calls the block specified in the second parameter to defer (the "callback"),
as part of its normal event handling loop.
The result computed by the operation block is passed as a parameter to the callback.
You may omit the callback parameter if you don't need to execute any code after the operation completes.
*
Essentially, in response to an HTTP request, the server executes that you wrote,
invokes the process method in the Connecction class.
have a look at the code in $GEM_HOME/gems/thin-1.6.2/lib/thin/connection.rb:
# Connection between the server and client.
# This class is instanciated by EventMachine on each new connection
# that is opened.
class Connection < EventMachine::Connection
# Called when all data was received and the request
# is ready to be processed.
def process
if threaded?
#request.threaded = true
EventMachine.defer(method(:pre_process), method(:post_process))
else
#request.threaded = false
post_process(pre_process)
end
end
..here is where a threaded connection invoke EventMachine.defer
The reactor
To see where is activated the EventMachine reactor
should follow the initialization of the program:
Notice that for all Sinatra applications and middleware ($GEM_HOME/gems/sinatra-1.4.5/base.rb)
can run the Sinatra app as a self-hosted server using Thin, Puma, Mongrel, or WEBrick.
def run!(options = {}, &block)
return if running?
set options
handler = detect_rack_handler
....
the method detect_rack_handler returns the first Rack::Handler
return Rack::Handler.get(server_name.to_s)
in our test we require thin therefore it returns a Thin rack handler and setup a threaded server
# Starts the server by running the Rack Handler.
def start_server(handler, server_settings, handler_name)
handler.run(self, server_settings) do |server|
....
server.threaded = settings.threaded if server.respond_to? :threaded=
$GEM_HOME/gems/thin-1.6.2/lib/thin/server.rb
# Start the server and listen for connections.
def start
raise ArgumentError, 'app required' unless #app
log_info "Thin web server (v#{VERSION::STRING} codename #{VERSION::CODENAME})"
...
log_info "Listening on #{#backend}, CTRL+C to stop"
#backend.start { setup_signals if #setup_signals }
end
$GEM_HOME/gems/thin-1.6.2/lib/thin/backends/base.rb
# Start the backend and connect it.
def start
#stopping = false
starter = proc do
connect
yield if block_given?
#running = true
end
# Allow for early run up of eventmachine.
if EventMachine.reactor_running?
starter.call
else
#started_reactor = true
EventMachine.run(&starter)
end
end

How do I loop the restart of a daemon?

I am trying to use Ruby's daemon gem and loop the restart of a daemon that has its own loop. My code looks like this now:
require 'daemons'
while true
listener = Daemons.call(:force => true) do
users = accounts.get_updated_user_list
TweetStream::Client.new.follow(users) do |status|
puts "#{status.text}"
end
end
sleep(60)
listener.restart
end
Running this gives me the following error (after 60 seconds):
undefined method `restart' for #<Daemons::Application:0x007fc5b29f5658> (NoMethodError)
So obviously Daemons.call doesn't return a controllable daemon like I think it does. What do I need to do to set this up correctly. Is a daemon the right tool here?
I think this is what you're after, although I haven't tested it.
class RestartingUserTracker
def initialize
#client = TweetStream::Client.new
end
def handle_status(status)
# do whatever it is you're going to do with the status
end
def fetch_users
accounts.get_updated_user_list
end
def restart
#client.stop_stream
users = fetch_users
#client.follow(users) do |status|
handle_status(status)
end
end
end
EM.run do
client = RestartingUserTracker.new
client.restart
EM::PeriodicTimer.new(60) do
client.restart
end
end
Here's how it works:
TweetStream uses EventMachine internally, as a way of polling the API forever and handling the responses. I can see why you might have felt stuck, because the normal TweetStream API blocks forever and doesn't give you a way to intervene at any point. However, TweetStream does allow you to set up other things in the same event loop. In your case, a timer. I found the documentation on how to do that here: https://github.com/intridea/tweetstream#removal-of-on_interval-callback
By starting up our own EventMachine reactor, we're able to inject our own code into the reactor as well as use TweetStream. In this case, we're using a simple timer that just restarts the client every 60 seconds.
EventMachine is an implementation of something called the Reactor Pattern. If you want to fully understand and maintain this code, it would serve you well to find some resources about it and gain a full understanding. The reactor pattern is very powerful, but can be difficult to grasp at first.
However, this code should get you started. Also, I'd consider renaming the RestartingUserTracker to something more appropriate.

Ruby and Celluloid

Due to some limitations I want to switch my current project from EventMachine/EM-Synchrony to Celluloid but I've some trouble to get in touch with it. The project I am coding on is a web harvester which should crawl tons of pages as fast as possible.
For the basic understanding of Celluloid I've generated 10.000 dummy pages on a local web server and wanna crawl them by this simple Celluloid snippet:
#!/usr/bin/env jruby --1.9
require 'celluloid'
require 'open-uri'
IDS = 1..9999
BASE_URL = "http://192.168.0.20/files"
class Crawler
include Celluloid
def read(id)
url = "#{BASE_URL}/#{id}"
puts "URL: " + url
open(url) { |x| x.read }
end
end
pool = Crawler.pool(size: 100)
IDS.to_a.map do |id|
pool.future(:read, id)
end
As far as I understand Celluloid, futures are the way to go to get the response of a fired request (comparable to callbacks in EventMachine), right? The other thing is, every actor runs in its own thread, so I need some kind of batching the requests cause 10.000 threads would result in errors on my OSX dev machine.
So creating a pool is the way to go, right? BUT: the code above iterates over the 9999 URLs but only 1300 HTTP requests are sent to the web server. So something goes wrong with limiting the requests and iterating over all URLs.
Likely your program is exiting as soon as all of your futures are created. With Celluloid a future will start execution but you can't be assured of it finishing until you call #value on the future object. This holds true for futures in pools as well. Probably what you need to do is change it to something like this:
crawlers = IDS.to_a.map do |id|
begin
pool.future(:read, id)
rescue DeadActorError, MailboxError
end
end
crawlers.compact.each { |crawler| crawler.value rescue nil }

Test Ruby TCPSocket server

I have a small application, serving connections(like a chat). It catches the connection, grabs login from it, then listens to the data and broadcasts it to each connection, except sender.
The problem is i'm not a very advanced tester and do not know, how this can be tested.
# Handle each connection
def serve(io)
io.puts("LOGIN\n")
# Listen for identifier
user = io.gets.chomp
...
# Add connection to the list
#mutex.synchronize { #chatters[user] = io }
# Get and broadcast input until connection returns nil
loop do
incoming = io.gets
broadcast(incoming, io)
end
end
#Send message out to everyone, but sender
def broadcast(message="", sender)
# Mutex for safety - GServer uses threads
#mutex.synchronize do
#chatters.each do |chatter|
socket = chatter[1]
# Do not send to sender
if sock != sender
sock.print(message)
end
end
end
end
If you just want to do unit testing, you could use RSpec mocks (or some other mocking framework) to stub your methods and ensure that the logic works the way you expect. If you actually want to drive an integration test, that's a lot more work, and will require that you create a separate reader and writer for the socket so that you can actually test each piece of the conversation independently for expected behavior.
Someone else has apparently blogged about a similar issue to yours. Perhaps that example will help you.
If your question is more about the test cases you should write, instead of about how to test sockets, then you may want to rewrite your question so that answers will be more on-target.

Ruby Eventmachine queueing problem

I have a Http client written in Ruby that can make synchronous requests to URLs. However, to quickly execute multiple requests I decided to use Eventmachine. The idea is to
queue all the requests and execute them using eventmachine.
class EventMachineBackend
...
...
def execute(request)
$q ||= EM.Queue.new
$q.push(request)
$q.pop {|request| request.invoke}
EM.run{EM.next_tick {EM.stop}}
end
...
end
Forgive my use of a global queue variable. I will refactor it later. Is what I am doing in EventMachineBackend#execute the right way of using Eventmachine queues?
One problem I see in my implementation is it is essentially synchronous. I push a request, pop and execute the request and wait for it to complete.
Could anyone suggest a better implementation.
Your the request logic has to be asynchronous for it to work with EventMachine, I suggest that you use em-http-request. You can find an example on how to use it here, it shows how to run the requests in parallel. An even better interface for running multiple connections in parallel is the MultiRequest class from the same gem.
If you want to queue requests and only run a fixed number of them in parallel you can do something like this:
EM.run do
urls = [...] # regular array with URLs
active_requests = 0
# this routine will be used as callback and will
# be run when each request finishes
when_done = proc do
active_requests -= 1
if urls.empty? && active_requests == 0
# if there are no more urls, and there are no active
# requests it means we're done, so shut down the reactor
EM.stop
elsif !urls.empty?
# if there are more urls launch a new request
launch_next.call
end
end
# this routine launches a request
launch_next = proc do
# get the next url to fetch
url = urls.pop
# launch the request, and register the callback
request = EM::HttpRequest.new(url).get
request.callback(&when_done)
request.errback(&when_done)
# increment the number of active requests, this
# is important since it will tell us when all requests
# are done
active_requests += 1
end
# launch three requests in parallel, each will launch
# a new requests when done, so there will always be
# three requests active at any one time, unless there
# are no more urls to fetch
3.times do
launch_next.call
end
end
Caveat emptor, there may very well be some detail I've missed in the code above.
If you think it's hard to follow the logic in my example, welcome to the world of evented programming. It's really tricky to write readable evented code. It all goes backwards. Sometimes it helps to start reading from the end.
I've assumed that you don't want to add more requests after you've started downloading, it doesn't look like it from the code in your question, but should you want to you can rewrite my code to use an EM::Queue instead of a regular array, and remove the part that does EM.stop, since you will not be stopping. You can probably remove the code that keeps track of the number of active requests too, since that's not relevant. The important part would look something like this:
launch_next = proc do
urls.pop do |url|
request = EM::HttpRequest.new(url).get
request.callback(&launch_next)
request.errback(&launch_next)
end
end
Also, bear in mind that my code doesn't actually do anything with the response. The response will be passed as an argument to the when_done routine (in the first example). I also do the same thing for success and error, which you may not want to do in a real application.

Resources