Building an asynchronous queue in Ruby - ruby

I need to process jobs off of a queue within a process, with IO performed asynchronously. That's pretty straightforward. The gotcha is that those jobs can add additional items to the queue.
I think I've been fiddling with this problem too long so my brain is cloudy — it shouldn't be too difficult. I keep coming up with an either-or scenario:
The queue can perform jobs asynchronously and results can be joined in afterward.
The queue can synchronously perform jobs until the last finishes and the queue is empty.
I've been fiddling with everything from EventMachine and Goliath (both of which can use EM::HttpRequest) to Celluloid (never actually got around to building something with it though), and writing Enumerators using Fibers. My brain is fried though.
What I'd like, simply, is to be able to do this:
items = [1,2,3]
items.each do |item|
if item.has_particular_condition?
items << item.process_one_way
elsif item.other_condition?
items << item.process_another_way
# ...
end
end
#=> [1,2,3,4,5,6]
...where 4, 5, and 6 were all results of processing the original items in the set, and 7, 8, and 9 are results from processing 4, 5, and 6. I don't need to worry about indefinitely processing the queue because the data I'm processing will end after a couple of iterations.
High-level guidance, comments, links to other libraries, etc are all welcome, as well as lower-level implementation code examples.

I have had similar requirements in the past and what you need is a solid, high performance work queue from the sounds of it. I recommend you check out beanstalkd which I discovered over a year ago and have since been using to process thousands and thousands of jobs reliably in ruby.
In particular, I have started developing solid ruby libraries around beanstalkd. In particular, be sure to check out backburner which is a production ready work queue in ruby using beanstalkd. The syntax and setup are easy, defining how jobs process is quick, handling job failures and retries is all built in as are job scheduling and a lot more.
Let me know if you have any questions but I think beanstalkd and backburner would fit your requirements quite well.

I wound up implementing something a little less ideal — basically just wrapping an EM Fiber Iterator in a loop that terminates once no new results are queued.
require 'set'
class SetRunner
def initialize(seed_queue)
#results = seed_queue.to_set
end
def run
begin
yield last_loop_results, result_bucket
end until new_loop_results.empty?
return #results
end
def last_loop_results
result_bucket.shift(result_bucket.count)
end
def result_bucket
#result_bucket ||= #results.to_a
end
def new_loop_results
# .add? returns nil if already in the set
result_bucket.each { |item| #results.add? item }.compact
end
end
Then, to use it with EventMachine:
queue = [1,2,3]
results = SetRunner.new(queue).run do |set, output|
EM::Synchrony::FiberIterator.new(set, 3).each do |item|
output.push(item + 3) if item <= 6
end
end
# => [1,2,3,4,5,6,7,8,9]
Then each set will get run with the concurrency level passed to the FiberIterator, but the results from each set will get run in the next iteration of the outer SetRunner loop.

Related

How to reschedule in rufus-scheduler?

I'm writing a Telegram-Bot's server in Ruby, and I want to repeat running some code. But the problem is the code I want to repeatedly run is dynamic, how can I reschedule it?
I'm not sure I am answering to your question, but it's fairly easy to reuse a block with different schedules.
require 'rufus-scheduler'
s = Rufus::Scheduler.new
job = lambda do
puts "hello #{Time.now}"
end
s.in('1s', &job)
# later on, rescheduling...
s.in('2s', &job)
s.join # just so that the example doesn't end here
You can also use a Handler and schedule it multiple times: https://github.com/jmettraux/rufus-scheduler#scheduling-handler-classes

Which of Ruby's concurrency devices would be best suited for this scenario?

The whole threads/fibers/processes thing is confusing me a little. I have a practical problem that can be solved with some concurrency, so I thought this was a good opportunity to ask professionals and people more knowledgable than me about it.
I have a long array, let's say 3,000 items. I want to send a HTTP request for each item in the array.
Actually iterating over the array, generating requests, and sending them is very rapid. What takes time is waiting for each item to be received, processed, and acknowledged by the party I'm sending to. I'm essentially sending 100 bytes, waiting 2 seconds, sending 100 bytes, waiting 2 seconds.
What I would like to do instead is send these requests asynchronously. I want to send a request, specify what to do when I get the response, and in the meantime, send the next request.
From what I can see, there are four concurrency options I could use here.
Threads.
Fibers.
Processes; unsuitable as far as I know because multiple processes accessing the same array isn't feasible/safe.
Asynchronous functionality like JavaScript's XMLHttpRequest.
The simplest would seem to be the last one. But what is the best, simplest way to do that using Ruby?
Failing #4, which of the remaining three is the most sensible choice here?
Would any of these options also allow me to say "Have no more than 10 pending requests at any time"?
This is your classic producer/consumer problem and is nicely suited for threads in Ruby. Just create a Queue
urls = [...] # array with bunches of urls
require "thread"
queue = SizedQueue.new(10) # this will only allow 10 items on the queue at once
p1 = Thread.new do
url_slice = urls.each do |url|
response = do_http_request(url)
queue << response
end
queue << "done"
end
consumer = Thread.new do
http_response = queue.pop(true) # don't block when zero items are in queue
Thread.exit if http_response == "done"
process(http_response)
end
# wait for the consumer to finish
consumer.join
EventMachine as an event loop and em-synchrony as a Fiber wrapper for it's callbacks into synchronous code
Copy Paste from em-synchrony README
require "em-synchrony"
require "em-synchrony/em-http"
require "em-synchrony/fiber_iterator"
EM.synchrony do
concurrency = 2
urls = ['http://url.1.com', 'http://url2.com']
results = []
EM::Synchrony::FiberIterator.new(urls, concurrency).each do |url|
resp = EventMachine::HttpRequest.new(url).get
results.push resp.response
end
p results # all completed requests
EventMachine.stop
end
This is an IO bounded case that fits more in both:
Threading model: no problem with MRI Ruby in this case cause threads work well with IO cases; GIL effect is almost zero.
Asynchronous model, which proves(in practice and theory) to be far superior than threads when it comes to IO specific problems.
For this specific case and to make things far simpler, I would have gone with Typhoeus HTTP client which has a parallel support that works as the evented(Asynchronous) concurrency model.
Example:
hydra = Typhoeus::Hydra.new
%w(url1 url2 url3).each do |url|
request = Typhoeus::Request.new(url, followlocation: true)
request.on_complete do |response|
# do something with response
end
hydra.queue(request)
end
hydra.run # this is a blocking call that returns once all requests are complete

Ruby Celluloid and resources consumption

I'm new to Celluloid and have some questions about pools and futures. I'm building a simple web crawler (see the example at bottom). My URLS array dozen of thousands of URLs, so the example is stripped to some hundred.
What I now want to do is to group to max. 50 req/s using futures, get their callbacks and crawl further 50 urls etc. The problem I have with this code: I would expect that it would maximum 50 threads but it spawns upto 400 and more in my case. If the input data increases, the code snippet finishes because it cannot spawn further requests (OS limits, OSX in my case).
Why are there so many threads spawned and how to avoid this? I need a fast crawler which uses all resources the OS provides but not more than this :) So 2.000 threads seem to be the limit at OSX, all above this value let the code crashes.
#!/usr/bin/env jruby
require 'celluloid'
require 'open-uri'
URLS = ["http://instagr.am/p/Clh2","http://instagr.am/p/EKpI1","http://instagr.am/p/G-PoDSS6zX","http://instagr.am/p/G5YjYMC4MW","http://instagr.am/p/G6sEojDvgy","http://instagr.am/p/G7LGzIjvMp","http://instagr.am/p/G9RQlkQAc9","http://instagr.am/p/HChQX4SMdy","http://instagr.am/p/HDRNwKojXS","http://instagr.am/p/HDjzB-RYMz","http://instagr.am/p/HDkLCGgSjX","http://instagr.am/p/HE2Xgjj0rn","http://instagr.am/p/HE5M9Lp0MC","http://instagr.am/p/HEW5I2RohI","http://instagr.am/p/HEzv41gS6m","http://instagr.am/p/HG2WCVTQwQ","http://instagr.am/p/HG5XWovFFa","http://instagr.am/p/HGwQvEiSmA","http://instagr.am/p/HH0navKTcf","http://instagr.am/p/HH2OzNQIn8","http://instagr.am/p/HH2kTskO2e","http://instagr.am/p/HH3GaNlTbd","http://instagr.am/p/HH3QbejSMF","http://instagr.am/p/HH3S17HnW5","http://instagr.am/p/HH3dQPqYmJ","http://instagr.am/p/HH3egLxVJU","http://instagr.am/p/HH3nVPS1i0","http://instagr.am/p/HH3zdlB3e-","http://instagr.am/p/HH40eevAr2","http://instagr.am/p/HH49zqInZc","http://instagr.am/p/HH4EMQNnpx","http://instagr.am/p/HH4KCKoc-7","http://instagr.am/p/HH4asXlbpp","http://instagr.am/p/HH4yNBydG2","http://instagr.am/p/HH5M5vCCWu","http://instagr.am/p/HH5MXqLQaz","http://instagr.am/p/HH5YeDpw88","http://instagr.am/p/HH5b89nlyH","http://instagr.am/p/HH61z-Fb-R","http://instagr.am/p/HH68sgJDZZ","http://instagr.am/p/HH69Tlt91p","http://instagr.am/p/HH6BwRgqe4","http://instagr.am/p/HH6E6aGS44","http://instagr.am/p/HH6EEYJgSo","http://instagr.am/p/HH6H7htWJo","http://instagr.am/p/HH6hBRzZZD","http://instagr.am/p/HH6xEExaco","http://instagr.am/p/HH6xcVscEg","http://instagr.am/p/HH70aWB1No","http://instagr.am/p/HH73nUMBMI","http://instagr.am/p/HH74ogvrX5","http://instagr.am/p/HH76mRwZnp","http://instagr.am/p/HH77CPmYE0","http://instagr.am/p/HH78hPNnzQ","http://instagr.am/p/HH7ADox4JO","http://instagr.am/p/HH7KFdOeTE","http://instagr.am/p/HH7KJNGDSG","http://instagr.am/p/HH7KJtpxyA","http://instagr.am/p/HH7KjwpM-J","http://instagr.am/p/HH7Q","http://instagr.am/p/HH7QCiqsOX","http://instagr.am/p/HH7R9er-Oq","http://instagr.am/p/HH7SoqgRYB","http://instagr.am/p/HH7YhZGA75","http://instagr.am/p/HH7aHSJd3D","http://instagr.am/p/HH7bPrMLTB","http://instagr.am/p/HH7bQUnKyn","http://instagr.am/p/HH7c2yADVv","http://instagr.am/p/HH7cEXSCTC","http://instagr.am/p/HH7dxAlxr4","http://instagr.am/p/HH7eJTwO8K","http://instagr.am/p/HH7efCKQ-0","http://instagr.am/p/HH7fczIMyr","http://instagr.am/p/HH7gVnBjad","http://instagr.am/p/HH7gYljc-0","http://instagr.am/p/HH7gYpMKH7","http://instagr.am/p/HH7hDMo_Za","http://instagr.am/p/HH7hfhighk","http://instagr.am/p/HH7hpVm92Q","http://instagr.am/p/HH7hssHUyN","http://instagr.am/p/HH7iS0on88","http://instagr.am/p/HH7j6It5zy","http://instagr.am/p/HH7j75jipU","http://instagr.am/p/HH7j76pkjl","http://instagr.am/p/HH7jMlQLaG","http://instagr.am/p/HH7kHaPbBn","http://instagr.am/p/HH7kKZopDN","http://instagr.am/p/HH7lOFkkOV","http://instagr.am/p/HH7lQtstCP","http://instagr.am/p/HH7l_Aurfu","http://instagr.am/p/HH7m0JDpOC","http://instagr.am/p/HH7m2POzdu","http://instagr.am/p/HH7mHPL0cI","http://instagr.am/p/HH7mL2BdJL","http://instagr.am/p/HH7mN3snzl","http://instagr.am/p/HH7mXJEJIt","http://instagr.am/p/HH7mZAKfdo","http://instagr.am/p/HH7mbxmSnp","http://instagr.am/p/HH7mkHIRM2","http://instagr.am/p/HH7ml5CcLM","http://instagr.am/p/HH7mnxEAJ8","http://instagr.am/p/HH7mqFk38v","http://instagr.am/p/HH7mqtAaOP","http://instagr.am/p/HH7mytlLQm","http://instagr.am/p/HH7n29K0Q1","http://instagr.am/p/HH7naXyW_g","http://instagr.am/p/HH7ncNPJOX","http://instagr.am/p/HH7ndmC0DH","http://instagr.am/p/HH7nifiLCI","http://instagr.am/p/HH7rWttci5","http://instagr.am/p/HH8--LwWs_","http://instagr.am/p/HH8-0DkaPE","http://instagr.am/p/HH8-2CLQEV","http://instagr.am/p/HH8-4gSIJo","http://instagr.am/p/HH8-4liH8g","http://instagr.am/p/HH8-5TCi2b","http://instagr.am/p/HH8-6AKI4j","http://instagr.am/p/HH8-8MtC6l","http://instagr.am/p/HH8-A-gpce","http://instagr.am/p/HH8-A-pXLv","http://instagr.am/p/HH8-BEFQb6","http://instagr.am/p/HH8-C9IxAs","http://instagr.am/p/HH8-CMRIT9","http://instagr.am/p/HH8-DMiDM3","http://instagr.am/p/HH8-Dwg_5V","http://instagr.am/p/HH8-DyHmmX","http://instagr.am/p/HH8-IEnIBo","http://instagr.am/p/HH8-KBCg0f","http://instagr.am/p/HH8-Kbm9Jb","http://instagr.am/p/HH8-LHryjV","http://instagr.am/p/HH8-LIKIXR","http://instagr.am/p/HH8-MdpM-m","http://instagr.am/p/HH8-N9pzfv","http://instagr.am/p/HH8-NbqDLG","http://instagr.am/p/HH8-NwoEwm","http://instagr.am/p/HH8-ODsfzo","http://instagr.am/p/HH8-OHE0p8","http://instagr.am/p/HH8-QFmasl","http://instagr.am/p/HH8-QaA7Rb","http://instagr.am/p/HH8-R-poCB","http://instagr.am/p/HH8-S5PDIy","http://instagr.am/p/HH8-SqHrOY","http://instagr.am/p/HH8-SzPREN","http://instagr.am/p/HH8-U1r5VK","http://instagr.am/p/HH8-UjEeXv","http://instagr.am/p/HH8-VaRadH","http://instagr.am/p/HH8-WFIPij","http://instagr.am/p/HH8-WHRwHP","http://instagr.am/p/HH8-X-SkFA","http://instagr.am/p/HH8-a5icLX","http://instagr.am/p/HH8-aSRpdn","http://instagr.am/p/HH8-aTm5g8","http://instagr.am/p/HH8-aatV6Q","http://instagr.am/p/HH8-azAmc5","http://instagr.am/p/HH8-bcLP_v","http://instagr.am/p/HH8-dGrMku","http://instagr.am/p/HH8-dKABGr","http://instagr.am/p/HH8-eFTTJ8","http://instagr.am/p/HH8-eLRwvK","http://instagr.am/p/HH8-ehmwGz","http://instagr.am/p/HH8-h-D72a","http://instagr.am/p/HH8-hhmEOT","http://instagr.am/p/HH8-ibSZTj","http://instagr.am/p/HH8-jospUb","http://instagr.am/p/HH8-kMpc2F","http://instagr.am/p/HH8-kNBmGm","http://instagr.am/p/HH8-lArilF","http://instagr.am/p/HH8-lWTDwj","http://instagr.am/p/HH8-mNnqZL","http://instagr.am/p/HH8-n4sGGS","http://instagr.am/p/HH8-n9xHbn","http://instagr.am/p/HH8-pYx3JZ","http://instagr.am/p/HH8-pppok3","http://instagr.am/p/HH8-qoy3LK","http://instagr.am/p/HH8-qvROzb","http://instagr.am/p/HH8-qytoRH","http://instagr.am/p/HH8-rOyW_y","http://instagr.am/p/HH8-s9KXi6","http://instagr.am/p/HH8-sVyS7K","http://instagr.am/p/HH8-sbnQEO","http://instagr.am/p/HH8-txJV-e","http://instagr.am/p/HH8-u0Mewa","http://instagr.am/p/HH8-u1BFJ-","http://instagr.am/p/HH8-uXBu_r","http://instagr.am/p/HH8-ujO2m1","http://instagr.am/p/HH8-v7pm7L","http://instagr.am/p/HH8-vBRADm","http://instagr.am/p/HH8-vkwQNF","http://instagr.am/p/HH8-x5R6u2","http://instagr.am/p/HH8-xArCJB","http://instagr.am/p/HH8-xOxnVQ","http://instagr.am/p/HH8-xrmqCf","http://instagr.am/p/HH8-y4Li29","http://instagr.am/p/HH8-yamwjM","http://instagr.am/p/HH802xDyEm","http://instagr.am/p/HH804Gw-Fe","http://instagr.am/p/HH804hAMqQ","http://instagr.am/p/HH805wBvVI","http://instagr.am/p/HH806SguSx","http://instagr.am/p/HH806rEtcY","http://instagr.am/p/HH809ClkbW","http://instagr.am/p/HH809kPN-5","http://instagr.am/p/HH80Cxst8p","http://instagr.am/p/HH80E3Ibo0","http://instagr.am/p/HH80ELOZpk","http://instagr.am/p/HH80EVFFIz","http://instagr.am/p/HH80FngJs0","http://instagr.am/p/HH80M0kiBG","http://instagr.am/p/HH80cKKQ_E","http://instagr.am/p/HH80gaBUzQ","http://instagr.am/p/HH80lSDT71","http://instagr.am/p/HH80mYOHwX","http://instagr.am/p/HH80nfAYsL","http://instagr.am/p/HH80pUNIO2","http://instagr.am/p/HH80sxRLtt","http://instagr.am/p/HH80vbDjj0","http://instagr.am/p/HH80w7xI-m","http://instagr.am/p/HH80wDHTN4","http://instagr.am/p/HH81-5RjEB","http://instagr.am/p/HH811fo-_e","http://instagr.am/p/HH813tkiVZ","http://instagr.am/p/HH813vkGMo","http://instagr.am/p/HH814RDHuG","http://instagr.am/p/HH814TOYiW","http://instagr.am/p/HH8179vxAg","http://instagr.am/p/HH81AwC6db","http://instagr.am/p/HH81BGyWUr","http://instagr.am/p/HH81FoFjxm","http://instagr.am/p/HH81H-IH_i","http://instagr.am/p/HH81MnoSaI","http://instagr.am/p/HH81MtN3bH","http://instagr.am/p/HH81O1Cfe7","http://instagr.am/p/HH81RprFKO","http://instagr.am/p/HH81Z2pq3V","http://instagr.am/p/HH81aCPRem","http://instagr.am/p/HH81aVTWZm","http://instagr.am/p/HH81bBo8cM","http://instagr.am/p/HH81k2xVJ4","http://instagr.am/p/HH81kERlbh","http://instagr.am/p/HH81vqHC0M","http://instagr.am/p/HH81vqyti3","http://instagr.am/p/HH81wbS-cj","http://instagr.am/p/HH81xfEjvZ","http://instagr.am/p/HH81zsrbsz","http://instagr.am/p/HH823tDEIP","http://instagr.am/p/HH823ytt2P","http://instagr.am/p/HH825MgnYc","http://instagr.am/p/HH827QrTPF","http://instagr.am/p/HH82AWzhzS","http://instagr.am/p/HH82EGE05q","http://instagr.am/p/HH82FDu8Mf","http://instagr.am/p/HH82HTmdze","http://instagr.am/p/HH82L-iG-U","http://instagr.am/p/HH82NpFsn7","http://instagr.am/p/HH82YTOqEF","http://instagr.am/p/HH82bpEdvj","http://instagr.am/p/HH82cShmmV","http://instagr.am/p/HH82czP-SU","http://instagr.am/p/HH82h9LhYy","http://instagr.am/p/HH82iizf4G","http://instagr.am/p/HH82jUw184","http://instagr.am/p/HH82mrnPeW","http://instagr.am/p/HH82t9u8Mg","http://instagr.am/p/HH82tPH1El","http://instagr.am/p/HH82wzhczs","http://instagr.am/p/HH82zzjj7W","http://instagr.am/p/HH83-3oaAb","http://instagr.am/p/HH83-AlcOq","http://instagr.am/p/HH8302rtlY","http://instagr.am/p/HH833ty-ck","http://instagr.am/p/HH834lswSl","http://instagr.am/p/HH835DFp5j","http://instagr.am/p/HH835FKCBP","http://instagr.am/p/HH835UmKXt","http://instagr.am/p/HH835qnQot","http://instagr.am/p/HH8383zIXz","http://instagr.am/p/HH8384ROzS","http://instagr.am/p/HH83AMP4a0","http://instagr.am/p/HH83B5B1Nt","http://instagr.am/p/HH83CqkA0O","http://instagr.am/p/HH83DpMRPq","http://instagr.am/p/HH83EjPNA_","http://instagr.am/p/HH83Frqolx","http://instagr.am/p/HH83KmM8EC","http://instagr.am/p/HH83RJuxBF","http://instagr.am/p/HH83WCuGEA","http://instagr.am/p/HH83XtGGIV","http://instagr.am/p/HH83ZKNcTS","http://instagr.am/p/HH83aNohKe","http://instagr.am/p/HH83bCudp9","http://instagr.am/p/HH83f0vFsx","http://instagr.am/p/HH83gsmWCm","http://instagr.am/p/HH83gyJWp5","http://instagr.am/p/HH83k0h0C3","http://instagr.am/p/HH83nDlyBo","http://instagr.am/p/HH83nSlA26","http://instagr.am/p/HH83nfnS7m","http://instagr.am/p/HH83puJ0UJ","http://instagr.am/p/HH83qGPaXH","http://instagr.am/p/HH83r9D_FK","http://instagr.am/p/HH83uAFKtr","http://instagr.am/p/HH83uJxZeV","http://instagr.am/p/HH83vcTWsX","http://instagr.am/p/HH83xtmDSU","http://instagr.am/p/HH841GGzT3","http://instagr.am/p/HH841UMarm","http://instagr.am/p/HH841VgcD4","http://instagr.am/p/HH8429HDTT","http://instagr.am/p/HH842SMBUn","http://instagr.am/p/HH842cRA6V","http://instagr.am/p/HH842nNboH","http://instagr.am/p/HH844ISVI_","http://instagr.am/p/HH844QPBbt","http://instagr.am/p/HH8460RADl","http://instagr.am/p/HH846VkDLB","http://instagr.am/p/HH846jSV9B","http://instagr.am/p/HH847YpeiM","http://instagr.am/p/HH848JoFPh","http://instagr.am/p/HH849dRQnD","http://instagr.am/p/HH84EBB-rW","http://instagr.am/p/HH84GXHQEN","http://instagr.am/p/HH84IOO6Hd","http://instagr.am/p/HH84K7vdZp","http://instagr.am/p/HH84O1vefu","http://instagr.am/p/HH84O2hj7y","http://instagr.am/p/HH84OALIqP","http://instagr.am/p/HH84PVk-tn","http://instagr.am/p/HH84RquusO","http://instagr.am/p/HH84TnhJKv","http://instagr.am/p/HH84WQH1En","http://instagr.am/p/HH84XPiGqI","http://instagr.am/p/HH84YLH5ty","http://instagr.am/p/HH84YpLGfC","http://instagr.am/p/HH84Ywvdk6","http://instagr.am/p/HH84ZdzhTA","http://instagr.am/p/HH84afzC-V","http://instagr.am/p/HH84ctJ5s1","http://instagr.am/p/HH84dTHX9F","http://instagr.am/p/HH84fXPKi5","http://instagr.am/p/HH84fhto0L","http://instagr.am/p/HH84geJyhL","http://instagr.am/p/HH84hUpz82","http://instagr.am/p/HH84iYKYQp","http://instagr.am/p/HH84kFDSyv","http://instagr.am/p/HH84nNH_1J","http://instagr.am/p/HH84o1D3Um","http://instagr.am/p/HH84ohtzcL","http://instagr.am/p/HH84pNDJcd","http://instagr.am/p/HH84pOH6TN","http://instagr.am/p/HH84pXMYZd","http://instagr.am/p/HH84qkJ0i3","http://instagr.am/p/HH84sTvixj","http://instagr.am/p/HH84tan8wH","http://instagr.am/p/HH84w1gm7Z","http://instagr.am/p/HH84yNv-z-","http://instagr.am/p/HH84zAoMEl","http://instagr.am/p/HH85-0RTj8","http://instagr.am/p/HH850YgA3T","http://instagr.am/p/HH850pPNBB","http://instagr.am/p/HH850tOWXm","http://instagr.am/p/HH851nnMar","http://instagr.am/p/HH851yhV8o","http://instagr.am/p/HH852bqPAx","http://instagr.am/p/HH852nDatV","http://instagr.am/p/HH852pxXn5","http://instagr.am/p/HH853TsOYx","http://instagr.am/p/HH854_ob--","http://instagr.am/p/HH854kL_yC","http://instagr.am/p/HH8563jp99","http://instagr.am/p/HH856HhpBi","http://instagr.am/p/HH857CEjxZ","http://instagr.am/p/HH857URkql","http://instagr.am/p/HH857UqVCN","http://instagr.am/p/HH8580SWLd","http://instagr.am/p/HH858wITqb","http://instagr.am/p/HH85AXKxP5","http://instagr.am/p/HH85CIL_yB","http://instagr.am/p/HH85CKCp4U","http://instagr.am/p/HH85DLn-09","http://instagr.am/p/HH85Dnljqy","http://instagr.am/p/HH85E0Jcj3","http://instagr.am/p/HH85EKR9fm","http://instagr.am/p/HH85EgBaHm","http://instagr.am/p/HH85ElD4b_","http://instagr.am/p/HH85HBm9f4","http://instagr.am/p/HH85HFrCl3","http://instagr.am/p/HH85JYunBd","http://instagr.am/p/HH85LVoMhr","http://instagr.am/p/HH85LWCbeC","http://instagr.am/p/HH85MKFbQt","http://instagr.am/p/HH85NJv80J","http://instagr.am/p/HH85NUvTvk","http://instagr.am/p/HH85NyufqK","http://instagr.am/p/HH85PZOR6d","http://instagr.am/p/HH85Q2M2uh","http://instagr.am/p/HH85T2Ofcs","http://instagr.am/p/HH85VUKVTZ","http://instagr.am/p/HH85VVKoly","http://instagr.am/p/HH85VdK6R1","http://instagr.am/p/HH85Vfmn0-","http://instagr.am/p/HH85VxIOrP","http://instagr.am/p/HH85WoR6Ls","http://instagr.am/p/HH85Ztrf-m","http://instagr.am/p/HH85aLrxjq","http://instagr.am/p/HH85bOR6u0","http://instagr.am/p/HH85cZLXr6","http://instagr.am/p/HH85ckD-JY","http://instagr.am/p/HH85d6JlSW","http://instagr.am/p/HH85dUwcKY","http://instagr.am/p/HH85fUuT6W","http://instagr.am/p/HH85fiAaOe","http://instagr.am/p/HH85gMJBEP","http://instagr.am/p/HH85gVFvEt","http://instagr.am/p/HH85hIveqD","http://instagr.am/p/HH85hZAKiO","http://instagr.am/p/HH85i8CyMs","http://instagr.am/p/HH85jQhUo7","http://instagr.am/p/HH85kBSD2v","http://instagr.am/p/HH85lsFy6K","http://instagr.am/p/HH85mLnJky","http://instagr.am/p/HH85oVROzU","http://instagr.am/p/HH85p9ylU4","http://instagr.am/p/HH85qENCWv","http://instagr.am/p/HH85qhNMcj","http://instagr.am/p/HH85s3N7vl","http://instagr.am/p/HH85sJSPSe","http://instagr.am/p/HH85tStObP","http://instagr.am/p/HH85txnKmv","http://instagr.am/p/HH85uAvobc","http://instagr.am/p/HH85x8rlE0","http://instagr.am/p/HH85xZuNO8","http://instagr.am/p/HH85yYQwXj","http://instagr.am/p/HH85z3wB35","http://instagr.am/p/HH86-kGL7D","http://instagr.am/p/HH860TiUwo","http://instagr.am/p/HH861PIkO2","http://instagr.am/p/HH862smvKU","http://instagr.am/p/HH863NxVJ5","http://instagr.am/p/HH863Zw3Z2","http://instagr.am/p/HH865_H-sk","http://instagr.am/p/HH865nxSRn","http://instagr.am/p/HH868cli0X","http://instagr.am/p/HH86AKRtq-","http://instagr.am/p/HH86CjRQTU","http://instagr.am/p/HH86D8sx0Q","http://instagr.am/p/HH86DqzgPd","http://instagr.am/p/HH86EkSO83","http://instagr.am/p/HH86FzO4D_","http://instagr.am/p/HH86G8MkDx","http://instagr.am/p/HH86GGwXA0","http://instagr.am/p/HH86Hwq8Nu","http://instagr.am/p/HH86HzG4HB","http://instagr.am/p/HH86HzJtHN","http://instagr.am/p/HH86IMLZYc","http://instagr.am/p/HH86IXyAMV","http://instagr.am/p/HH86IwQ8Qk","http://instagr.am/p/HH86KXQuR6","http://instagr.am/p/HH86LTkJJE","http://instagr.am/p/HH86NrLH5g","http://instagr.am/p/HH86SWRNOm","http://instagr.am/p/HH86TJI64M","http://instagr.am/p/HH86TtpBEQ","http://instagr.am/p/HH86UYKOhh","http://instagr.am/p/HH86VGlnpW","http://instagr.am/p/HH86ZfpEEw","http://instagr.am/p/HH86_NPrK2","http://instagr.am/p/HH86afDy4x","http://instagr.am/p/HH86afyFIQ","http://instagr.am/p/HH86bGLfNe","http://instagr.am/p/HH86btlKts","http://instagr.am/p/HH86c9u8Wq","http://instagr.am/p/HH86cSR3_L","http://instagr.am/p/HH86fEHdLt","http://instagr.am/p/HH86g5m-xm","http://instagr.am/p/HH86gKCqXT","http://instagr.am/p/HH86gQGfPG","http://instagr.am/p/HH86hlSkE9","http://instagr.am/p/HH86hzhSep","http://instagr.am/p/HH86inFZGd","http://instagr.am/p/HH86ipACWp","http://instagr.am/p/HH86jXw0gP","http://instagr.am/p/HH86l3lWdw","http://instagr.am/p/HH86lBIcoK","http://instagr.am/p/HH86lbyhFP","http://instagr.am/p/HH86maiG-V","http://instagr.am/p/HH86mgqMW3","http://instagr.am/p/HH86mqFBiB","http://instagr.am/p/HH86mwm5g5","http://instagr.am/p/HH86oeJWA2","http://instagr.am/p/HH86ofFMPU","http://instagr.am/p/HH86ohOsAZ","http://instagr.am/p/HH86olHu0A","http://instagr.am/p/HH86otvR1t","http://instagr.am/p/HH86pHTMXp","http://instagr.am/p/HH86qENjaU","http://instagr.am/p/HH86rtCd4L","http://instagr.am/p/HH86s0N3Ay","http://instagr.am/p/HH86sDPkW6","http://instagr.am/p/HH86tLMj5x"]
class Crawler
include Celluloid
def fetch(id)
uri = URI("http://api.instagram.com/oembed?url=#{id}")
req = open(uri).read
end
end
URLS.each_slice(50).map do |idset|
pool = Crawler.pool(size: 50)
crawlers = idset.to_a.map do |id|
begin
pool.future(:fetch, id)
rescue
nil
end
end
crawlers.compact.each do |resp|
puts resp.value.size rescue nil
end
end
Split the class. It's been told on wiki to never do pool of a worker inside it.
https://github.com/celluloid/celluloid/wiki/Pools
Gotcha: Don't make pools inside workers!
Using MyWorker.pool within MyWorker will result in an unbounded
explosion of worker threads.
Update
If you want to limit your pool just create it outside the each_slice block so you use always the same Threads I guess.
pool = Crawler.pool(size: 50)
URLS.each_slice(50).map do |idset|
crawlers = idset.to_a.map do |id|
begin
pool.future(:fetch, id)
rescue
nil
end
end
# ...
Each iteration through the slice of 50 you're resetting the value of pool, which likely is dereferencing your poolmanager. Since actors aren't garbage collected just by being dereferenced (you have to call #terminate) you're probably piling up your old pools. It should be ok to just make one pool, and create all your futures at once (if you keep the return value small the future object itself is small). If you do find that you have to slice, instantiate your pool outside the each_slice and it will continue to use the same pool without making a new one each time around. If for some other reason you want to get a new pool each time, call terminate on the pool before you dereference it. Also be sure you're working with celluloid 0.12.0+ as it fixes an issue where pool workers weren't being terminated when the pool was.
When I iterate around actors, I've found this bit of logging to be useful to be sure I don't have any actor leaks:
logger.info "Actors left: #{Celluloid::Actor.all.to_set.length} Alive: #{Celluloid::Actor.all.to_set.reject { |a| a.nil? || !a.alive? }.length}"

Parallelism in Ruby

I've got a loop in my Ruby build script that iterates over each project and calls msbuild and does various other bits like minify CSS/JS.
Each loop iteration is independent of the others so I'd like to parallelise it.
How do I do this?
I've tried:
myarray.each{|item|
Thread.start {
# do stuff
}
}
puts "foo"
but Ruby just seems to exit straight away (prints "foo"). That is, it runs over the loop, starts a load of threads, but because there's nothing after the each, Ruby exits killing the other threads :(
I know I can do thread.join, but if I do this inside the loop then it's no longer parallel.
What am I missing?
I'm aware of http://peach.rubyforge.org/ but using that I get all kinds of weird behaviour that look like variable scoping issues that I don't know how to solve.
Edit
It would be useful if I could wait for all child-threads to execute before putting "foo", or at least the main ruby thread exiting. Is this possible?
Store all your threads in an array and loop through the array calling join:
threads = myarray.map do |item|
Thread.start do
# do stuff
end
end
threads.each { |thread| thread.join }
puts "foo"
Use em-synchrony here :). Fibers are cute.
require "em-synchrony"
require "em-synchrony/fiber_iterator"
# if you realy need to get a Fiber per each item
# in real life you could set concurrency to, for example, 10 and it could even improve performance
# it depends on amount of IO in your job
concurrency = myarray.size
EM.synchrony do
EM::Synchrony::FiberIterator.new(myarray, concurrency).each do |url|
# do some job here
end
EM.stop
end
Take into account that ruby threads are green threads, so you dont have natively true parallelism. I f this is what you want I would recommend you to take a look to JRuby and Rubinius:
http://www.engineyard.com/blog/2011/concurrency-in-jruby/

What happens when you don't join your Threads?

I'm writing a ruby program that will be using threads to do some work. The work that is being done takes a non-deterministic amount of time to complete and can range anywhere from 5 to 45+ seconds. Below is a rough example of what the threading code looks like:
loop do # Program loop
items = get_items
threads = []
for item in items
threads << Thread.new(item) do |i|
# do work on i
end
threads.each { |t| t.join } # What happens if this isn't there?
end
end
My preference would be to skip joining the threads and not block the entire application. However I don't know what the long term implications of this are, especially because the code is run again almost immediately. Is this something that is safe to do? Or is there a better way to spawn a thread, have it do work, and clean up when it's finished, all within an infinite loop?
I think it really depends on the content of your thread work. If, for example, your main thread needed to print "X work done", you would need to join to guarantee that you were showing the correct answer. If you have no such requirement, then you wouldn't necessarily need to join up.
After writing the question out, I realized that this is the exact thing that a web server does when serving pages. I googled and found the following article of a Ruby web server. The loop code looks pretty much like mine:
loop do
session = server.accept
request = session.gets
# log stuff
Thread.start(session, request) do |session, request|
HttpServer.new(session, request, basePath).serve()
end
end
Thread.start is effectively the same as Thread.new, so it appears that letting the threads finish and die off is OK to do.
If you split up a workload to several different threads and you need to combine at the end the solutions from the different threads you definately need a join otherwise you could do it without a join..
If you removed the join, you could end up with new items getting started faster than the older ones get finished. If you're working on too many items at once, it may cause performance issues.
You should use a Queue instead (snippet from http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html):
require 'thread'
queue = Queue.new
producer = Thread.new do
5.times do |i|
sleep rand(i) # simulate expense
queue << i
puts "#{i} produced"
end
end
consumer = Thread.new do
5.times do |i|
value = queue.pop
sleep rand(i/2) # simulate expense
puts "consumed #{value}"
end
end
consumer.join

Resources