Ruby and Celluloid - ruby

Due to some limitations I want to switch my current project from EventMachine/EM-Synchrony to Celluloid but I've some trouble to get in touch with it. The project I am coding on is a web harvester which should crawl tons of pages as fast as possible.
For the basic understanding of Celluloid I've generated 10.000 dummy pages on a local web server and wanna crawl them by this simple Celluloid snippet:
#!/usr/bin/env jruby --1.9
require 'celluloid'
require 'open-uri'
IDS = 1..9999
BASE_URL = "http://192.168.0.20/files"
class Crawler
include Celluloid
def read(id)
url = "#{BASE_URL}/#{id}"
puts "URL: " + url
open(url) { |x| x.read }
end
end
pool = Crawler.pool(size: 100)
IDS.to_a.map do |id|
pool.future(:read, id)
end
As far as I understand Celluloid, futures are the way to go to get the response of a fired request (comparable to callbacks in EventMachine), right? The other thing is, every actor runs in its own thread, so I need some kind of batching the requests cause 10.000 threads would result in errors on my OSX dev machine.
So creating a pool is the way to go, right? BUT: the code above iterates over the 9999 URLs but only 1300 HTTP requests are sent to the web server. So something goes wrong with limiting the requests and iterating over all URLs.

Likely your program is exiting as soon as all of your futures are created. With Celluloid a future will start execution but you can't be assured of it finishing until you call #value on the future object. This holds true for futures in pools as well. Probably what you need to do is change it to something like this:
crawlers = IDS.to_a.map do |id|
begin
pool.future(:read, id)
rescue DeadActorError, MailboxError
end
end
crawlers.compact.each { |crawler| crawler.value rescue nil }

Related

How to connect to multiple WebSockets with Ruby?

Using faye-websocket and EventMachine the code looks very similar to faye-websocket's client example:
require 'faye/websocket'
require 'eventmachine'
def setup_socket(url)
EM.run {
ws = Faye::WebSocket::Client.new(url)
ws.on :open do ... end
ws.on :message do ... end
ws.on :close do ... end
}
end
I'd like to have multiple connections open parallely. I can't simply call setup_socket multiple times as the execution won't exit the EM.run clause. I've tried to run setup_socket multiple times in separate threads as:
urls.each do |url|
Thread.new { setup_socket(url) }
end
But it doesn't seem to do anyhting as the puts statements don't reach the output.
I'm not restricted to use faye-websocket but it seemed most people use this library. If possible I'd like to avoid multithreading. I'd also not like to lose the possiblity to make changes (e.g. add a new websocket) over time. Therefore moving the iteration of URLs inside the EM.run clause is not desired but instead starting multiple EMs would be more beneficial. I found an example for starting multiple servers via EM in a very clean way. I'm looking for something similar.
How can I connect to multiple WebSockets at the same time?
Here's one way to do it.
First, you have to accept that the EM thread needs to be running. Without this thread you won't be able to process any current connections. So you just can't get around that.
Then, in order to add new URLs to the EM thread you then need some way to communicate from the main thread to the EM thread, so you can tell it to launch a new connection. This can be done with EventMachine::Channel.
So what we can build now is something like this:
#channel = EventMachine::Channel.new
Thread.new {
EventMachine.run {
#channel.subscribe { |url|
ws = Faye::...new(url)
...
}
}
}
Then in the main thread, any time you want to add a new URL to the event loop, you just use this:
def setup_socket(url)
#channel.push(url)
end
Here's another way to do it... Use Iodine's native websocket support (or the Plezi framework) instead of em-websocket...
...I'm biased (I'm the author), but I think they make it a lot easier. Also, Plezi offers automatic scaling with Redis, so it's easy to grow.
Here's an example using Plezi, where each Controller acts like a channel, with it's own URL and Websocket callback (although I think Plezi's Auto-Dispatch is easier than the lower level on_message callback). This code can be placed in a config.ru file:
require 'plezi'
# Once controller / channel for all members of the "Red" group
class RedGroup
def index # HTTP index for the /red URL
"return the RedGroup client using `:render`".freeze
end
# handle websocket messages
def on_message data
# in this example, we'll send the data to all the members of the other group.
BlueGroup.broadcast :handle_message, data
end
# This is the method activated by the "broadcast" message
def handle_message data
write data # write the data to the client.
end
end
# the blue group controller / channel
class BlueGroup
def index # HTTP index for the /blue URL
"return the BlueGroup client using `:render`".freeze
end
# handle websocket messages
def on_message data
# in this example, we'll send the data to all the members of the other group.
RedGroup.broadcast :handle_message, data
end
# This is the method activated by the "broadcast" message
def handle_message data
write data
end
end
# the routes
Plezi.route '/red', RedGroup
Plezi.route '/blue', BlueGroup
# Set the Rack application
run Plezi.app
P.S.
I wrote this answer also because em-websocket might fail or hog resources in some cases. I'm not sure about the details, but it was noted both on the websocket-shootout benchmark and the AnyCable Websocket Benchmarks.

How do I loop the restart of a daemon?

I am trying to use Ruby's daemon gem and loop the restart of a daemon that has its own loop. My code looks like this now:
require 'daemons'
while true
listener = Daemons.call(:force => true) do
users = accounts.get_updated_user_list
TweetStream::Client.new.follow(users) do |status|
puts "#{status.text}"
end
end
sleep(60)
listener.restart
end
Running this gives me the following error (after 60 seconds):
undefined method `restart' for #<Daemons::Application:0x007fc5b29f5658> (NoMethodError)
So obviously Daemons.call doesn't return a controllable daemon like I think it does. What do I need to do to set this up correctly. Is a daemon the right tool here?
I think this is what you're after, although I haven't tested it.
class RestartingUserTracker
def initialize
#client = TweetStream::Client.new
end
def handle_status(status)
# do whatever it is you're going to do with the status
end
def fetch_users
accounts.get_updated_user_list
end
def restart
#client.stop_stream
users = fetch_users
#client.follow(users) do |status|
handle_status(status)
end
end
end
EM.run do
client = RestartingUserTracker.new
client.restart
EM::PeriodicTimer.new(60) do
client.restart
end
end
Here's how it works:
TweetStream uses EventMachine internally, as a way of polling the API forever and handling the responses. I can see why you might have felt stuck, because the normal TweetStream API blocks forever and doesn't give you a way to intervene at any point. However, TweetStream does allow you to set up other things in the same event loop. In your case, a timer. I found the documentation on how to do that here: https://github.com/intridea/tweetstream#removal-of-on_interval-callback
By starting up our own EventMachine reactor, we're able to inject our own code into the reactor as well as use TweetStream. In this case, we're using a simple timer that just restarts the client every 60 seconds.
EventMachine is an implementation of something called the Reactor Pattern. If you want to fully understand and maintain this code, it would serve you well to find some resources about it and gain a full understanding. The reactor pattern is very powerful, but can be difficult to grasp at first.
However, this code should get you started. Also, I'd consider renaming the RestartingUserTracker to something more appropriate.

Ruby Celluloid and resources consumption

I'm new to Celluloid and have some questions about pools and futures. I'm building a simple web crawler (see the example at bottom). My URLS array dozen of thousands of URLs, so the example is stripped to some hundred.
What I now want to do is to group to max. 50 req/s using futures, get their callbacks and crawl further 50 urls etc. The problem I have with this code: I would expect that it would maximum 50 threads but it spawns upto 400 and more in my case. If the input data increases, the code snippet finishes because it cannot spawn further requests (OS limits, OSX in my case).
Why are there so many threads spawned and how to avoid this? I need a fast crawler which uses all resources the OS provides but not more than this :) So 2.000 threads seem to be the limit at OSX, all above this value let the code crashes.
#!/usr/bin/env jruby
require 'celluloid'
require 'open-uri'
URLS = ["http://instagr.am/p/Clh2","http://instagr.am/p/EKpI1","http://instagr.am/p/G-PoDSS6zX","http://instagr.am/p/G5YjYMC4MW","http://instagr.am/p/G6sEojDvgy","http://instagr.am/p/G7LGzIjvMp","http://instagr.am/p/G9RQlkQAc9","http://instagr.am/p/HChQX4SMdy","http://instagr.am/p/HDRNwKojXS","http://instagr.am/p/HDjzB-RYMz","http://instagr.am/p/HDkLCGgSjX","http://instagr.am/p/HE2Xgjj0rn","http://instagr.am/p/HE5M9Lp0MC","http://instagr.am/p/HEW5I2RohI","http://instagr.am/p/HEzv41gS6m","http://instagr.am/p/HG2WCVTQwQ","http://instagr.am/p/HG5XWovFFa","http://instagr.am/p/HGwQvEiSmA","http://instagr.am/p/HH0navKTcf","http://instagr.am/p/HH2OzNQIn8","http://instagr.am/p/HH2kTskO2e","http://instagr.am/p/HH3GaNlTbd","http://instagr.am/p/HH3QbejSMF","http://instagr.am/p/HH3S17HnW5","http://instagr.am/p/HH3dQPqYmJ","http://instagr.am/p/HH3egLxVJU","http://instagr.am/p/HH3nVPS1i0","http://instagr.am/p/HH3zdlB3e-","http://instagr.am/p/HH40eevAr2","http://instagr.am/p/HH49zqInZc","http://instagr.am/p/HH4EMQNnpx","http://instagr.am/p/HH4KCKoc-7","http://instagr.am/p/HH4asXlbpp","http://instagr.am/p/HH4yNBydG2","http://instagr.am/p/HH5M5vCCWu","http://instagr.am/p/HH5MXqLQaz","http://instagr.am/p/HH5YeDpw88","http://instagr.am/p/HH5b89nlyH","http://instagr.am/p/HH61z-Fb-R","http://instagr.am/p/HH68sgJDZZ","http://instagr.am/p/HH69Tlt91p","http://instagr.am/p/HH6BwRgqe4","http://instagr.am/p/HH6E6aGS44","http://instagr.am/p/HH6EEYJgSo","http://instagr.am/p/HH6H7htWJo","http://instagr.am/p/HH6hBRzZZD","http://instagr.am/p/HH6xEExaco","http://instagr.am/p/HH6xcVscEg","http://instagr.am/p/HH70aWB1No","http://instagr.am/p/HH73nUMBMI","http://instagr.am/p/HH74ogvrX5","http://instagr.am/p/HH76mRwZnp","http://instagr.am/p/HH77CPmYE0","http://instagr.am/p/HH78hPNnzQ","http://instagr.am/p/HH7ADox4JO","http://instagr.am/p/HH7KFdOeTE","http://instagr.am/p/HH7KJNGDSG","http://instagr.am/p/HH7KJtpxyA","http://instagr.am/p/HH7KjwpM-J","http://instagr.am/p/HH7Q","http://instagr.am/p/HH7QCiqsOX","http://instagr.am/p/HH7R9er-Oq","http://instagr.am/p/HH7SoqgRYB","http://instagr.am/p/HH7YhZGA75","http://instagr.am/p/HH7aHSJd3D","http://instagr.am/p/HH7bPrMLTB","http://instagr.am/p/HH7bQUnKyn","http://instagr.am/p/HH7c2yADVv","http://instagr.am/p/HH7cEXSCTC","http://instagr.am/p/HH7dxAlxr4","http://instagr.am/p/HH7eJTwO8K","http://instagr.am/p/HH7efCKQ-0","http://instagr.am/p/HH7fczIMyr","http://instagr.am/p/HH7gVnBjad","http://instagr.am/p/HH7gYljc-0","http://instagr.am/p/HH7gYpMKH7","http://instagr.am/p/HH7hDMo_Za","http://instagr.am/p/HH7hfhighk","http://instagr.am/p/HH7hpVm92Q","http://instagr.am/p/HH7hssHUyN","http://instagr.am/p/HH7iS0on88","http://instagr.am/p/HH7j6It5zy","http://instagr.am/p/HH7j75jipU","http://instagr.am/p/HH7j76pkjl","http://instagr.am/p/HH7jMlQLaG","http://instagr.am/p/HH7kHaPbBn","http://instagr.am/p/HH7kKZopDN","http://instagr.am/p/HH7lOFkkOV","http://instagr.am/p/HH7lQtstCP","http://instagr.am/p/HH7l_Aurfu","http://instagr.am/p/HH7m0JDpOC","http://instagr.am/p/HH7m2POzdu","http://instagr.am/p/HH7mHPL0cI","http://instagr.am/p/HH7mL2BdJL","http://instagr.am/p/HH7mN3snzl","http://instagr.am/p/HH7mXJEJIt","http://instagr.am/p/HH7mZAKfdo","http://instagr.am/p/HH7mbxmSnp","http://instagr.am/p/HH7mkHIRM2","http://instagr.am/p/HH7ml5CcLM","http://instagr.am/p/HH7mnxEAJ8","http://instagr.am/p/HH7mqFk38v","http://instagr.am/p/HH7mqtAaOP","http://instagr.am/p/HH7mytlLQm","http://instagr.am/p/HH7n29K0Q1","http://instagr.am/p/HH7naXyW_g","http://instagr.am/p/HH7ncNPJOX","http://instagr.am/p/HH7ndmC0DH","http://instagr.am/p/HH7nifiLCI","http://instagr.am/p/HH7rWttci5","http://instagr.am/p/HH8--LwWs_","http://instagr.am/p/HH8-0DkaPE","http://instagr.am/p/HH8-2CLQEV","http://instagr.am/p/HH8-4gSIJo","http://instagr.am/p/HH8-4liH8g","http://instagr.am/p/HH8-5TCi2b","http://instagr.am/p/HH8-6AKI4j","http://instagr.am/p/HH8-8MtC6l","http://instagr.am/p/HH8-A-gpce","http://instagr.am/p/HH8-A-pXLv","http://instagr.am/p/HH8-BEFQb6","http://instagr.am/p/HH8-C9IxAs","http://instagr.am/p/HH8-CMRIT9","http://instagr.am/p/HH8-DMiDM3","http://instagr.am/p/HH8-Dwg_5V","http://instagr.am/p/HH8-DyHmmX","http://instagr.am/p/HH8-IEnIBo","http://instagr.am/p/HH8-KBCg0f","http://instagr.am/p/HH8-Kbm9Jb","http://instagr.am/p/HH8-LHryjV","http://instagr.am/p/HH8-LIKIXR","http://instagr.am/p/HH8-MdpM-m","http://instagr.am/p/HH8-N9pzfv","http://instagr.am/p/HH8-NbqDLG","http://instagr.am/p/HH8-NwoEwm","http://instagr.am/p/HH8-ODsfzo","http://instagr.am/p/HH8-OHE0p8","http://instagr.am/p/HH8-QFmasl","http://instagr.am/p/HH8-QaA7Rb","http://instagr.am/p/HH8-R-poCB","http://instagr.am/p/HH8-S5PDIy","http://instagr.am/p/HH8-SqHrOY","http://instagr.am/p/HH8-SzPREN","http://instagr.am/p/HH8-U1r5VK","http://instagr.am/p/HH8-UjEeXv","http://instagr.am/p/HH8-VaRadH","http://instagr.am/p/HH8-WFIPij","http://instagr.am/p/HH8-WHRwHP","http://instagr.am/p/HH8-X-SkFA","http://instagr.am/p/HH8-a5icLX","http://instagr.am/p/HH8-aSRpdn","http://instagr.am/p/HH8-aTm5g8","http://instagr.am/p/HH8-aatV6Q","http://instagr.am/p/HH8-azAmc5","http://instagr.am/p/HH8-bcLP_v","http://instagr.am/p/HH8-dGrMku","http://instagr.am/p/HH8-dKABGr","http://instagr.am/p/HH8-eFTTJ8","http://instagr.am/p/HH8-eLRwvK","http://instagr.am/p/HH8-ehmwGz","http://instagr.am/p/HH8-h-D72a","http://instagr.am/p/HH8-hhmEOT","http://instagr.am/p/HH8-ibSZTj","http://instagr.am/p/HH8-jospUb","http://instagr.am/p/HH8-kMpc2F","http://instagr.am/p/HH8-kNBmGm","http://instagr.am/p/HH8-lArilF","http://instagr.am/p/HH8-lWTDwj","http://instagr.am/p/HH8-mNnqZL","http://instagr.am/p/HH8-n4sGGS","http://instagr.am/p/HH8-n9xHbn","http://instagr.am/p/HH8-pYx3JZ","http://instagr.am/p/HH8-pppok3","http://instagr.am/p/HH8-qoy3LK","http://instagr.am/p/HH8-qvROzb","http://instagr.am/p/HH8-qytoRH","http://instagr.am/p/HH8-rOyW_y","http://instagr.am/p/HH8-s9KXi6","http://instagr.am/p/HH8-sVyS7K","http://instagr.am/p/HH8-sbnQEO","http://instagr.am/p/HH8-txJV-e","http://instagr.am/p/HH8-u0Mewa","http://instagr.am/p/HH8-u1BFJ-","http://instagr.am/p/HH8-uXBu_r","http://instagr.am/p/HH8-ujO2m1","http://instagr.am/p/HH8-v7pm7L","http://instagr.am/p/HH8-vBRADm","http://instagr.am/p/HH8-vkwQNF","http://instagr.am/p/HH8-x5R6u2","http://instagr.am/p/HH8-xArCJB","http://instagr.am/p/HH8-xOxnVQ","http://instagr.am/p/HH8-xrmqCf","http://instagr.am/p/HH8-y4Li29","http://instagr.am/p/HH8-yamwjM","http://instagr.am/p/HH802xDyEm","http://instagr.am/p/HH804Gw-Fe","http://instagr.am/p/HH804hAMqQ","http://instagr.am/p/HH805wBvVI","http://instagr.am/p/HH806SguSx","http://instagr.am/p/HH806rEtcY","http://instagr.am/p/HH809ClkbW","http://instagr.am/p/HH809kPN-5","http://instagr.am/p/HH80Cxst8p","http://instagr.am/p/HH80E3Ibo0","http://instagr.am/p/HH80ELOZpk","http://instagr.am/p/HH80EVFFIz","http://instagr.am/p/HH80FngJs0","http://instagr.am/p/HH80M0kiBG","http://instagr.am/p/HH80cKKQ_E","http://instagr.am/p/HH80gaBUzQ","http://instagr.am/p/HH80lSDT71","http://instagr.am/p/HH80mYOHwX","http://instagr.am/p/HH80nfAYsL","http://instagr.am/p/HH80pUNIO2","http://instagr.am/p/HH80sxRLtt","http://instagr.am/p/HH80vbDjj0","http://instagr.am/p/HH80w7xI-m","http://instagr.am/p/HH80wDHTN4","http://instagr.am/p/HH81-5RjEB","http://instagr.am/p/HH811fo-_e","http://instagr.am/p/HH813tkiVZ","http://instagr.am/p/HH813vkGMo","http://instagr.am/p/HH814RDHuG","http://instagr.am/p/HH814TOYiW","http://instagr.am/p/HH8179vxAg","http://instagr.am/p/HH81AwC6db","http://instagr.am/p/HH81BGyWUr","http://instagr.am/p/HH81FoFjxm","http://instagr.am/p/HH81H-IH_i","http://instagr.am/p/HH81MnoSaI","http://instagr.am/p/HH81MtN3bH","http://instagr.am/p/HH81O1Cfe7","http://instagr.am/p/HH81RprFKO","http://instagr.am/p/HH81Z2pq3V","http://instagr.am/p/HH81aCPRem","http://instagr.am/p/HH81aVTWZm","http://instagr.am/p/HH81bBo8cM","http://instagr.am/p/HH81k2xVJ4","http://instagr.am/p/HH81kERlbh","http://instagr.am/p/HH81vqHC0M","http://instagr.am/p/HH81vqyti3","http://instagr.am/p/HH81wbS-cj","http://instagr.am/p/HH81xfEjvZ","http://instagr.am/p/HH81zsrbsz","http://instagr.am/p/HH823tDEIP","http://instagr.am/p/HH823ytt2P","http://instagr.am/p/HH825MgnYc","http://instagr.am/p/HH827QrTPF","http://instagr.am/p/HH82AWzhzS","http://instagr.am/p/HH82EGE05q","http://instagr.am/p/HH82FDu8Mf","http://instagr.am/p/HH82HTmdze","http://instagr.am/p/HH82L-iG-U","http://instagr.am/p/HH82NpFsn7","http://instagr.am/p/HH82YTOqEF","http://instagr.am/p/HH82bpEdvj","http://instagr.am/p/HH82cShmmV","http://instagr.am/p/HH82czP-SU","http://instagr.am/p/HH82h9LhYy","http://instagr.am/p/HH82iizf4G","http://instagr.am/p/HH82jUw184","http://instagr.am/p/HH82mrnPeW","http://instagr.am/p/HH82t9u8Mg","http://instagr.am/p/HH82tPH1El","http://instagr.am/p/HH82wzhczs","http://instagr.am/p/HH82zzjj7W","http://instagr.am/p/HH83-3oaAb","http://instagr.am/p/HH83-AlcOq","http://instagr.am/p/HH8302rtlY","http://instagr.am/p/HH833ty-ck","http://instagr.am/p/HH834lswSl","http://instagr.am/p/HH835DFp5j","http://instagr.am/p/HH835FKCBP","http://instagr.am/p/HH835UmKXt","http://instagr.am/p/HH835qnQot","http://instagr.am/p/HH8383zIXz","http://instagr.am/p/HH8384ROzS","http://instagr.am/p/HH83AMP4a0","http://instagr.am/p/HH83B5B1Nt","http://instagr.am/p/HH83CqkA0O","http://instagr.am/p/HH83DpMRPq","http://instagr.am/p/HH83EjPNA_","http://instagr.am/p/HH83Frqolx","http://instagr.am/p/HH83KmM8EC","http://instagr.am/p/HH83RJuxBF","http://instagr.am/p/HH83WCuGEA","http://instagr.am/p/HH83XtGGIV","http://instagr.am/p/HH83ZKNcTS","http://instagr.am/p/HH83aNohKe","http://instagr.am/p/HH83bCudp9","http://instagr.am/p/HH83f0vFsx","http://instagr.am/p/HH83gsmWCm","http://instagr.am/p/HH83gyJWp5","http://instagr.am/p/HH83k0h0C3","http://instagr.am/p/HH83nDlyBo","http://instagr.am/p/HH83nSlA26","http://instagr.am/p/HH83nfnS7m","http://instagr.am/p/HH83puJ0UJ","http://instagr.am/p/HH83qGPaXH","http://instagr.am/p/HH83r9D_FK","http://instagr.am/p/HH83uAFKtr","http://instagr.am/p/HH83uJxZeV","http://instagr.am/p/HH83vcTWsX","http://instagr.am/p/HH83xtmDSU","http://instagr.am/p/HH841GGzT3","http://instagr.am/p/HH841UMarm","http://instagr.am/p/HH841VgcD4","http://instagr.am/p/HH8429HDTT","http://instagr.am/p/HH842SMBUn","http://instagr.am/p/HH842cRA6V","http://instagr.am/p/HH842nNboH","http://instagr.am/p/HH844ISVI_","http://instagr.am/p/HH844QPBbt","http://instagr.am/p/HH8460RADl","http://instagr.am/p/HH846VkDLB","http://instagr.am/p/HH846jSV9B","http://instagr.am/p/HH847YpeiM","http://instagr.am/p/HH848JoFPh","http://instagr.am/p/HH849dRQnD","http://instagr.am/p/HH84EBB-rW","http://instagr.am/p/HH84GXHQEN","http://instagr.am/p/HH84IOO6Hd","http://instagr.am/p/HH84K7vdZp","http://instagr.am/p/HH84O1vefu","http://instagr.am/p/HH84O2hj7y","http://instagr.am/p/HH84OALIqP","http://instagr.am/p/HH84PVk-tn","http://instagr.am/p/HH84RquusO","http://instagr.am/p/HH84TnhJKv","http://instagr.am/p/HH84WQH1En","http://instagr.am/p/HH84XPiGqI","http://instagr.am/p/HH84YLH5ty","http://instagr.am/p/HH84YpLGfC","http://instagr.am/p/HH84Ywvdk6","http://instagr.am/p/HH84ZdzhTA","http://instagr.am/p/HH84afzC-V","http://instagr.am/p/HH84ctJ5s1","http://instagr.am/p/HH84dTHX9F","http://instagr.am/p/HH84fXPKi5","http://instagr.am/p/HH84fhto0L","http://instagr.am/p/HH84geJyhL","http://instagr.am/p/HH84hUpz82","http://instagr.am/p/HH84iYKYQp","http://instagr.am/p/HH84kFDSyv","http://instagr.am/p/HH84nNH_1J","http://instagr.am/p/HH84o1D3Um","http://instagr.am/p/HH84ohtzcL","http://instagr.am/p/HH84pNDJcd","http://instagr.am/p/HH84pOH6TN","http://instagr.am/p/HH84pXMYZd","http://instagr.am/p/HH84qkJ0i3","http://instagr.am/p/HH84sTvixj","http://instagr.am/p/HH84tan8wH","http://instagr.am/p/HH84w1gm7Z","http://instagr.am/p/HH84yNv-z-","http://instagr.am/p/HH84zAoMEl","http://instagr.am/p/HH85-0RTj8","http://instagr.am/p/HH850YgA3T","http://instagr.am/p/HH850pPNBB","http://instagr.am/p/HH850tOWXm","http://instagr.am/p/HH851nnMar","http://instagr.am/p/HH851yhV8o","http://instagr.am/p/HH852bqPAx","http://instagr.am/p/HH852nDatV","http://instagr.am/p/HH852pxXn5","http://instagr.am/p/HH853TsOYx","http://instagr.am/p/HH854_ob--","http://instagr.am/p/HH854kL_yC","http://instagr.am/p/HH8563jp99","http://instagr.am/p/HH856HhpBi","http://instagr.am/p/HH857CEjxZ","http://instagr.am/p/HH857URkql","http://instagr.am/p/HH857UqVCN","http://instagr.am/p/HH8580SWLd","http://instagr.am/p/HH858wITqb","http://instagr.am/p/HH85AXKxP5","http://instagr.am/p/HH85CIL_yB","http://instagr.am/p/HH85CKCp4U","http://instagr.am/p/HH85DLn-09","http://instagr.am/p/HH85Dnljqy","http://instagr.am/p/HH85E0Jcj3","http://instagr.am/p/HH85EKR9fm","http://instagr.am/p/HH85EgBaHm","http://instagr.am/p/HH85ElD4b_","http://instagr.am/p/HH85HBm9f4","http://instagr.am/p/HH85HFrCl3","http://instagr.am/p/HH85JYunBd","http://instagr.am/p/HH85LVoMhr","http://instagr.am/p/HH85LWCbeC","http://instagr.am/p/HH85MKFbQt","http://instagr.am/p/HH85NJv80J","http://instagr.am/p/HH85NUvTvk","http://instagr.am/p/HH85NyufqK","http://instagr.am/p/HH85PZOR6d","http://instagr.am/p/HH85Q2M2uh","http://instagr.am/p/HH85T2Ofcs","http://instagr.am/p/HH85VUKVTZ","http://instagr.am/p/HH85VVKoly","http://instagr.am/p/HH85VdK6R1","http://instagr.am/p/HH85Vfmn0-","http://instagr.am/p/HH85VxIOrP","http://instagr.am/p/HH85WoR6Ls","http://instagr.am/p/HH85Ztrf-m","http://instagr.am/p/HH85aLrxjq","http://instagr.am/p/HH85bOR6u0","http://instagr.am/p/HH85cZLXr6","http://instagr.am/p/HH85ckD-JY","http://instagr.am/p/HH85d6JlSW","http://instagr.am/p/HH85dUwcKY","http://instagr.am/p/HH85fUuT6W","http://instagr.am/p/HH85fiAaOe","http://instagr.am/p/HH85gMJBEP","http://instagr.am/p/HH85gVFvEt","http://instagr.am/p/HH85hIveqD","http://instagr.am/p/HH85hZAKiO","http://instagr.am/p/HH85i8CyMs","http://instagr.am/p/HH85jQhUo7","http://instagr.am/p/HH85kBSD2v","http://instagr.am/p/HH85lsFy6K","http://instagr.am/p/HH85mLnJky","http://instagr.am/p/HH85oVROzU","http://instagr.am/p/HH85p9ylU4","http://instagr.am/p/HH85qENCWv","http://instagr.am/p/HH85qhNMcj","http://instagr.am/p/HH85s3N7vl","http://instagr.am/p/HH85sJSPSe","http://instagr.am/p/HH85tStObP","http://instagr.am/p/HH85txnKmv","http://instagr.am/p/HH85uAvobc","http://instagr.am/p/HH85x8rlE0","http://instagr.am/p/HH85xZuNO8","http://instagr.am/p/HH85yYQwXj","http://instagr.am/p/HH85z3wB35","http://instagr.am/p/HH86-kGL7D","http://instagr.am/p/HH860TiUwo","http://instagr.am/p/HH861PIkO2","http://instagr.am/p/HH862smvKU","http://instagr.am/p/HH863NxVJ5","http://instagr.am/p/HH863Zw3Z2","http://instagr.am/p/HH865_H-sk","http://instagr.am/p/HH865nxSRn","http://instagr.am/p/HH868cli0X","http://instagr.am/p/HH86AKRtq-","http://instagr.am/p/HH86CjRQTU","http://instagr.am/p/HH86D8sx0Q","http://instagr.am/p/HH86DqzgPd","http://instagr.am/p/HH86EkSO83","http://instagr.am/p/HH86FzO4D_","http://instagr.am/p/HH86G8MkDx","http://instagr.am/p/HH86GGwXA0","http://instagr.am/p/HH86Hwq8Nu","http://instagr.am/p/HH86HzG4HB","http://instagr.am/p/HH86HzJtHN","http://instagr.am/p/HH86IMLZYc","http://instagr.am/p/HH86IXyAMV","http://instagr.am/p/HH86IwQ8Qk","http://instagr.am/p/HH86KXQuR6","http://instagr.am/p/HH86LTkJJE","http://instagr.am/p/HH86NrLH5g","http://instagr.am/p/HH86SWRNOm","http://instagr.am/p/HH86TJI64M","http://instagr.am/p/HH86TtpBEQ","http://instagr.am/p/HH86UYKOhh","http://instagr.am/p/HH86VGlnpW","http://instagr.am/p/HH86ZfpEEw","http://instagr.am/p/HH86_NPrK2","http://instagr.am/p/HH86afDy4x","http://instagr.am/p/HH86afyFIQ","http://instagr.am/p/HH86bGLfNe","http://instagr.am/p/HH86btlKts","http://instagr.am/p/HH86c9u8Wq","http://instagr.am/p/HH86cSR3_L","http://instagr.am/p/HH86fEHdLt","http://instagr.am/p/HH86g5m-xm","http://instagr.am/p/HH86gKCqXT","http://instagr.am/p/HH86gQGfPG","http://instagr.am/p/HH86hlSkE9","http://instagr.am/p/HH86hzhSep","http://instagr.am/p/HH86inFZGd","http://instagr.am/p/HH86ipACWp","http://instagr.am/p/HH86jXw0gP","http://instagr.am/p/HH86l3lWdw","http://instagr.am/p/HH86lBIcoK","http://instagr.am/p/HH86lbyhFP","http://instagr.am/p/HH86maiG-V","http://instagr.am/p/HH86mgqMW3","http://instagr.am/p/HH86mqFBiB","http://instagr.am/p/HH86mwm5g5","http://instagr.am/p/HH86oeJWA2","http://instagr.am/p/HH86ofFMPU","http://instagr.am/p/HH86ohOsAZ","http://instagr.am/p/HH86olHu0A","http://instagr.am/p/HH86otvR1t","http://instagr.am/p/HH86pHTMXp","http://instagr.am/p/HH86qENjaU","http://instagr.am/p/HH86rtCd4L","http://instagr.am/p/HH86s0N3Ay","http://instagr.am/p/HH86sDPkW6","http://instagr.am/p/HH86tLMj5x"]
class Crawler
include Celluloid
def fetch(id)
uri = URI("http://api.instagram.com/oembed?url=#{id}")
req = open(uri).read
end
end
URLS.each_slice(50).map do |idset|
pool = Crawler.pool(size: 50)
crawlers = idset.to_a.map do |id|
begin
pool.future(:fetch, id)
rescue
nil
end
end
crawlers.compact.each do |resp|
puts resp.value.size rescue nil
end
end
Split the class. It's been told on wiki to never do pool of a worker inside it.
https://github.com/celluloid/celluloid/wiki/Pools
Gotcha: Don't make pools inside workers!
Using MyWorker.pool within MyWorker will result in an unbounded
explosion of worker threads.
Update
If you want to limit your pool just create it outside the each_slice block so you use always the same Threads I guess.
pool = Crawler.pool(size: 50)
URLS.each_slice(50).map do |idset|
crawlers = idset.to_a.map do |id|
begin
pool.future(:fetch, id)
rescue
nil
end
end
# ...
Each iteration through the slice of 50 you're resetting the value of pool, which likely is dereferencing your poolmanager. Since actors aren't garbage collected just by being dereferenced (you have to call #terminate) you're probably piling up your old pools. It should be ok to just make one pool, and create all your futures at once (if you keep the return value small the future object itself is small). If you do find that you have to slice, instantiate your pool outside the each_slice and it will continue to use the same pool without making a new one each time around. If for some other reason you want to get a new pool each time, call terminate on the pool before you dereference it. Also be sure you're working with celluloid 0.12.0+ as it fixes an issue where pool workers weren't being terminated when the pool was.
When I iterate around actors, I've found this bit of logging to be useful to be sure I don't have any actor leaks:
logger.info "Actors left: #{Celluloid::Actor.all.to_set.length} Alive: #{Celluloid::Actor.all.to_set.reject { |a| a.nil? || !a.alive? }.length}"

what is the advantage of EventMachine

This is my test case, I found that the EM is not faster than the general TCP server
the EM server:
require 'rubygems'
require 'benchmark'
require 'eventmachine'
class Handler < EventMachine::Connection
def receive_data(data)
operation = proc do
# simulate a long running request
a = []
n = 5000
for i in 1..n
a << rand(n)
a.sort!
end
end
# Callback block to execute once the request is fulfilled
callback = proc do |res|
send_data "send_response\n"
end
puts data
EM.defer(operation, callback)
end
end
EventMachine::run {
EventMachine.epoll
EventMachine::start_server("0.0.0.0", 8080, Handler)
puts "Listening..."
}
and my benchmark test:
require 'rubygems'
require 'benchmark'
require 'socket'
Benchmark.bm do |x|
x.report("times:") do
for i in 1..20
TCPSocket.open "127.0.0.1", 8080 do |s|
s.send "#{i}th sending\n", 0
if line = s.gets
puts line
end
puts "#{i}th sending"
end
end
end
end
Simplicity compared to threads, not speed. Look here for more insights: EventMachine: Fast and Scalable Event-Driven I/O Framework
The citation that applies to your question:
A lot has been written about the fact that event-driven programs are not theoretically any faster than threaded ones, and that is true. But in practice, I think the event-driven model is easier to work with, if you want to get to extremely high scalability and performance while still ensuring maximum robustness. I write programs that have to run for months or years without crashing, leaking memory, or exhibiting any kind of lumpy performance, so in practice, event-driven programming works better. Now, here's the problem with event-driven programming: you have to write "backwards." A threaded model stores your program state (inefficiently) in local variables on a runtime stack. In EM you have to do that yourself, which is very unintuitive to programmers who are used to threads. This is why I'm interested in fibers, because it opens the possibility of writing what looks to the programmer like blocking I/O, but still is evented and uses no threads.
We just went through this exercise on our project yesterday. Conceptual hurdles abound.
Have a look at this demo rails app by Ilya Grigorik. He uses Apache Benchmark to hit the server concurrently as if you were getting traffic from multiple visitors to your site. This is where you get an advantage from eventmachine. Instead of having all the calls to the database line up behind each other, they are sent in asynchronously and the results are dramatic. If you install the demo you can see the difference by replacing the em_mysql2 adaptor(fast) with the mysql2 adaptor(slow) in database.yml
Likewise, if you hit eventmachine in a loop, you are constrained by the synchronous nature of the loop itself(slow).
One thing - you should call EM.epoll before entering the event loop with EM.run instead of inside it.
EventMachine.epoll
EventMachine::run {
EventMachine::start_server("0.0.0.0", 8080, Handler)
puts "Listening..."
}

EventMachine: What is the maximum of parallel HTTP requests EM can handle?

I'm building a distributed web-crawler and trying to get maximum out of resources of each single machine. I run parsing functions in EventMachine through Iterator and use em-http-request to make asynchronous HTTP requests. For now I have 100 iterations that run at the same time and it seems that I can't pass over this level. If I increase a number of iteration it doesn't affect the speed of crawling. However, I get only 10-15% cpu load and 20-30% of network load, so there's plenty of room to crawl faster.
I'm using Ruby 1.9.2. Is there any way to improve the code to use resources effectively or maybe I'm even doing it wrong?
def start_job_crawl
#redis.lpop #queue do |link|
if link.nil?
EventMachine::add_timer( 1 ){ start_job_crawl() }
else
#parsing link, using asynchronous http request,
#doing something with the content
parse(link)
end
end
end
#main reactor loop
EM.run {
EM.kqueue
#redis = EM::Protocols::Redis.connect(:host => "127.0.0.1")
#redis.errback do |code|
puts "Redis error: #{code}"
end
#100 parallel 'threads'. Want to increase this
EM::Iterator.new(0..99, 100).each do |num, iter|
start_job_crawl()
end
}
if you are using select()(which is the default for EM), the most is 1024 because select() limited to 1024 file descriptors.
However it seems like you are using kqueue, so it should be able to handle much more than 1024 file descriptors at once.
which is the value of your EM.threadpool_size ?
try enlarging it, I suspect the limit is not in the kqueue but in the pool handling the requests...

Resources