EventMachine: What is the maximum of parallel HTTP requests EM can handle? - ruby

I'm building a distributed web-crawler and trying to get maximum out of resources of each single machine. I run parsing functions in EventMachine through Iterator and use em-http-request to make asynchronous HTTP requests. For now I have 100 iterations that run at the same time and it seems that I can't pass over this level. If I increase a number of iteration it doesn't affect the speed of crawling. However, I get only 10-15% cpu load and 20-30% of network load, so there's plenty of room to crawl faster.
I'm using Ruby 1.9.2. Is there any way to improve the code to use resources effectively or maybe I'm even doing it wrong?
def start_job_crawl
#redis.lpop #queue do |link|
if link.nil?
EventMachine::add_timer( 1 ){ start_job_crawl() }
else
#parsing link, using asynchronous http request,
#doing something with the content
parse(link)
end
end
end
#main reactor loop
EM.run {
EM.kqueue
#redis = EM::Protocols::Redis.connect(:host => "127.0.0.1")
#redis.errback do |code|
puts "Redis error: #{code}"
end
#100 parallel 'threads'. Want to increase this
EM::Iterator.new(0..99, 100).each do |num, iter|
start_job_crawl()
end
}

if you are using select()(which is the default for EM), the most is 1024 because select() limited to 1024 file descriptors.
However it seems like you are using kqueue, so it should be able to handle much more than 1024 file descriptors at once.

which is the value of your EM.threadpool_size ?
try enlarging it, I suspect the limit is not in the kqueue but in the pool handling the requests...

Related

How do I properly use Threads to connect ping a url?

I am trying to ping a large amount of urls and retrieve information regarding the certificate of the url. As I read in this thoughtbot article here Thoughtbot Threads and others, I've read that the best way to do this is by using Threads. When I implement threads however, I keep running into Timeout errors and other problems for urls that I can retrieve successfully on their own. I've been told in another related question that I asked earlier that I should not use Timeout with Threads. However, the examples I see wrap API/NET::HTTP/TCPSocket calls in the Timeout block and based opn what I've read, that entire API/NET::HTTP/TCP Socket call will be nested within the Thread. Here is my code:
class SslClient
attr_reader :url, :port, :timeout
def initialize(url, port = '443', timeout = 30)
#url = url
#port = port
#timeout = timeout
end
def ping_for_certificate_info
context = OpenSSL::SSL::SSLContext.new
certificates = nil
verify_result = nil
Timeout.timeout(timeout) do
tcp_client = TCPSocket.new(url, port)
ssl_client = OpenSSL::SSL::SSLSocket.new tcp_client, context
ssl_client.hostname = url
ssl_client.sync_close = true
ssl_client.connect
certificates = ssl_client.peer_cert_chain
verify_result = ssl_client.verify_result
tcp_client.close
end
{certificate: certificates.first, verify_result: verify_result }
rescue => error
puts url
puts error.inspect
end
end
[VERY LARGE LIST OF URLS].map do |url|
Thread.new do
ssl_client = SslClient.new(url)
cert_info = ssl_client.ping_for_certificate_info
puts cert_info
end
end.map(&:value)
If you run this code in your terminal, you will see many Timeout errors and ERNNO:TIMEDOUT errors for sites like fandango.com, fandom.com, mcaffee.com, google.de etc that should return information. When I run these individually however I get the information I need. When I run them in the thread they tend to fail especially for domains that have a foreign domain name. What I'm asking is whether I am using Threads correctly. This snippet of code that I've pasted is part of a larger piece of code that interacts with ActiveRecord objects in rails depending on the results given. Am I using Timeout and Threads correctly? What do I need to do to make this work? Why would a ping work individually but not wrapped in a thread? Help would be greatly appreciated.
There are several issues:
You'd not spawn thousands of threads, use a connection pool (e.g https://github.com/mperham/connection_pool) so you have maximum 20-30 concurrent requests going (this maximum number should be determined by testing at which point network performance drops and you get these timeouts).
It's difficult to guarantee that your code is not broken when you use threads, that's why I suggest you use something where others figured it out for you, like https://github.com/httprb/http (with examples for thread safety and concurrent requests like https://github.com/httprb/http/wiki/Thread-Safety). There are other libs out there (Typhoeus, patron) but this one is pure Ruby so basic thread safety is easier to achieve.
You should not use Timeout (see https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying and https://medium.com/#adamhooper/in-ruby-dont-use-timeout-77d9d4e5a001). Use IO.select or something else.
Also, I suggest you learn about threading issues like deadlocks, starvations and all the gotchas. In your case you are doing a starvation of network resources because all the threads are fighting for bandwidth/network.

Which of Ruby's concurrency devices would be best suited for this scenario?

The whole threads/fibers/processes thing is confusing me a little. I have a practical problem that can be solved with some concurrency, so I thought this was a good opportunity to ask professionals and people more knowledgable than me about it.
I have a long array, let's say 3,000 items. I want to send a HTTP request for each item in the array.
Actually iterating over the array, generating requests, and sending them is very rapid. What takes time is waiting for each item to be received, processed, and acknowledged by the party I'm sending to. I'm essentially sending 100 bytes, waiting 2 seconds, sending 100 bytes, waiting 2 seconds.
What I would like to do instead is send these requests asynchronously. I want to send a request, specify what to do when I get the response, and in the meantime, send the next request.
From what I can see, there are four concurrency options I could use here.
Threads.
Fibers.
Processes; unsuitable as far as I know because multiple processes accessing the same array isn't feasible/safe.
Asynchronous functionality like JavaScript's XMLHttpRequest.
The simplest would seem to be the last one. But what is the best, simplest way to do that using Ruby?
Failing #4, which of the remaining three is the most sensible choice here?
Would any of these options also allow me to say "Have no more than 10 pending requests at any time"?
This is your classic producer/consumer problem and is nicely suited for threads in Ruby. Just create a Queue
urls = [...] # array with bunches of urls
require "thread"
queue = SizedQueue.new(10) # this will only allow 10 items on the queue at once
p1 = Thread.new do
url_slice = urls.each do |url|
response = do_http_request(url)
queue << response
end
queue << "done"
end
consumer = Thread.new do
http_response = queue.pop(true) # don't block when zero items are in queue
Thread.exit if http_response == "done"
process(http_response)
end
# wait for the consumer to finish
consumer.join
EventMachine as an event loop and em-synchrony as a Fiber wrapper for it's callbacks into synchronous code
Copy Paste from em-synchrony README
require "em-synchrony"
require "em-synchrony/em-http"
require "em-synchrony/fiber_iterator"
EM.synchrony do
concurrency = 2
urls = ['http://url.1.com', 'http://url2.com']
results = []
EM::Synchrony::FiberIterator.new(urls, concurrency).each do |url|
resp = EventMachine::HttpRequest.new(url).get
results.push resp.response
end
p results # all completed requests
EventMachine.stop
end
This is an IO bounded case that fits more in both:
Threading model: no problem with MRI Ruby in this case cause threads work well with IO cases; GIL effect is almost zero.
Asynchronous model, which proves(in practice and theory) to be far superior than threads when it comes to IO specific problems.
For this specific case and to make things far simpler, I would have gone with Typhoeus HTTP client which has a parallel support that works as the evented(Asynchronous) concurrency model.
Example:
hydra = Typhoeus::Hydra.new
%w(url1 url2 url3).each do |url|
request = Typhoeus::Request.new(url, followlocation: true)
request.on_complete do |response|
# do something with response
end
hydra.queue(request)
end
hydra.run # this is a blocking call that returns once all requests are complete

Ruby Celluloid and resources consumption

I'm new to Celluloid and have some questions about pools and futures. I'm building a simple web crawler (see the example at bottom). My URLS array dozen of thousands of URLs, so the example is stripped to some hundred.
What I now want to do is to group to max. 50 req/s using futures, get their callbacks and crawl further 50 urls etc. The problem I have with this code: I would expect that it would maximum 50 threads but it spawns upto 400 and more in my case. If the input data increases, the code snippet finishes because it cannot spawn further requests (OS limits, OSX in my case).
Why are there so many threads spawned and how to avoid this? I need a fast crawler which uses all resources the OS provides but not more than this :) So 2.000 threads seem to be the limit at OSX, all above this value let the code crashes.
#!/usr/bin/env jruby
require 'celluloid'
require 'open-uri'
URLS = ["http://instagr.am/p/Clh2","http://instagr.am/p/EKpI1","http://instagr.am/p/G-PoDSS6zX","http://instagr.am/p/G5YjYMC4MW","http://instagr.am/p/G6sEojDvgy","http://instagr.am/p/G7LGzIjvMp","http://instagr.am/p/G9RQlkQAc9","http://instagr.am/p/HChQX4SMdy","http://instagr.am/p/HDRNwKojXS","http://instagr.am/p/HDjzB-RYMz","http://instagr.am/p/HDkLCGgSjX","http://instagr.am/p/HE2Xgjj0rn","http://instagr.am/p/HE5M9Lp0MC","http://instagr.am/p/HEW5I2RohI","http://instagr.am/p/HEzv41gS6m","http://instagr.am/p/HG2WCVTQwQ","http://instagr.am/p/HG5XWovFFa","http://instagr.am/p/HGwQvEiSmA","http://instagr.am/p/HH0navKTcf","http://instagr.am/p/HH2OzNQIn8","http://instagr.am/p/HH2kTskO2e","http://instagr.am/p/HH3GaNlTbd","http://instagr.am/p/HH3QbejSMF","http://instagr.am/p/HH3S17HnW5","http://instagr.am/p/HH3dQPqYmJ","http://instagr.am/p/HH3egLxVJU","http://instagr.am/p/HH3nVPS1i0","http://instagr.am/p/HH3zdlB3e-","http://instagr.am/p/HH40eevAr2","http://instagr.am/p/HH49zqInZc","http://instagr.am/p/HH4EMQNnpx","http://instagr.am/p/HH4KCKoc-7","http://instagr.am/p/HH4asXlbpp","http://instagr.am/p/HH4yNBydG2","http://instagr.am/p/HH5M5vCCWu","http://instagr.am/p/HH5MXqLQaz","http://instagr.am/p/HH5YeDpw88","http://instagr.am/p/HH5b89nlyH","http://instagr.am/p/HH61z-Fb-R","http://instagr.am/p/HH68sgJDZZ","http://instagr.am/p/HH69Tlt91p","http://instagr.am/p/HH6BwRgqe4","http://instagr.am/p/HH6E6aGS44","http://instagr.am/p/HH6EEYJgSo","http://instagr.am/p/HH6H7htWJo","http://instagr.am/p/HH6hBRzZZD","http://instagr.am/p/HH6xEExaco","http://instagr.am/p/HH6xcVscEg","http://instagr.am/p/HH70aWB1No","http://instagr.am/p/HH73nUMBMI","http://instagr.am/p/HH74ogvrX5","http://instagr.am/p/HH76mRwZnp","http://instagr.am/p/HH77CPmYE0","http://instagr.am/p/HH78hPNnzQ","http://instagr.am/p/HH7ADox4JO","http://instagr.am/p/HH7KFdOeTE","http://instagr.am/p/HH7KJNGDSG","http://instagr.am/p/HH7KJtpxyA","http://instagr.am/p/HH7KjwpM-J","http://instagr.am/p/HH7Q","http://instagr.am/p/HH7QCiqsOX","http://instagr.am/p/HH7R9er-Oq","http://instagr.am/p/HH7SoqgRYB","http://instagr.am/p/HH7YhZGA75","http://instagr.am/p/HH7aHSJd3D","http://instagr.am/p/HH7bPrMLTB","http://instagr.am/p/HH7bQUnKyn","http://instagr.am/p/HH7c2yADVv","http://instagr.am/p/HH7cEXSCTC","http://instagr.am/p/HH7dxAlxr4","http://instagr.am/p/HH7eJTwO8K","http://instagr.am/p/HH7efCKQ-0","http://instagr.am/p/HH7fczIMyr","http://instagr.am/p/HH7gVnBjad","http://instagr.am/p/HH7gYljc-0","http://instagr.am/p/HH7gYpMKH7","http://instagr.am/p/HH7hDMo_Za","http://instagr.am/p/HH7hfhighk","http://instagr.am/p/HH7hpVm92Q","http://instagr.am/p/HH7hssHUyN","http://instagr.am/p/HH7iS0on88","http://instagr.am/p/HH7j6It5zy","http://instagr.am/p/HH7j75jipU","http://instagr.am/p/HH7j76pkjl","http://instagr.am/p/HH7jMlQLaG","http://instagr.am/p/HH7kHaPbBn","http://instagr.am/p/HH7kKZopDN","http://instagr.am/p/HH7lOFkkOV","http://instagr.am/p/HH7lQtstCP","http://instagr.am/p/HH7l_Aurfu","http://instagr.am/p/HH7m0JDpOC","http://instagr.am/p/HH7m2POzdu","http://instagr.am/p/HH7mHPL0cI","http://instagr.am/p/HH7mL2BdJL","http://instagr.am/p/HH7mN3snzl","http://instagr.am/p/HH7mXJEJIt","http://instagr.am/p/HH7mZAKfdo","http://instagr.am/p/HH7mbxmSnp","http://instagr.am/p/HH7mkHIRM2","http://instagr.am/p/HH7ml5CcLM","http://instagr.am/p/HH7mnxEAJ8","http://instagr.am/p/HH7mqFk38v","http://instagr.am/p/HH7mqtAaOP","http://instagr.am/p/HH7mytlLQm","http://instagr.am/p/HH7n29K0Q1","http://instagr.am/p/HH7naXyW_g","http://instagr.am/p/HH7ncNPJOX","http://instagr.am/p/HH7ndmC0DH","http://instagr.am/p/HH7nifiLCI","http://instagr.am/p/HH7rWttci5","http://instagr.am/p/HH8--LwWs_","http://instagr.am/p/HH8-0DkaPE","http://instagr.am/p/HH8-2CLQEV","http://instagr.am/p/HH8-4gSIJo","http://instagr.am/p/HH8-4liH8g","http://instagr.am/p/HH8-5TCi2b","http://instagr.am/p/HH8-6AKI4j","http://instagr.am/p/HH8-8MtC6l","http://instagr.am/p/HH8-A-gpce","http://instagr.am/p/HH8-A-pXLv","http://instagr.am/p/HH8-BEFQb6","http://instagr.am/p/HH8-C9IxAs","http://instagr.am/p/HH8-CMRIT9","http://instagr.am/p/HH8-DMiDM3","http://instagr.am/p/HH8-Dwg_5V","http://instagr.am/p/HH8-DyHmmX","http://instagr.am/p/HH8-IEnIBo","http://instagr.am/p/HH8-KBCg0f","http://instagr.am/p/HH8-Kbm9Jb","http://instagr.am/p/HH8-LHryjV","http://instagr.am/p/HH8-LIKIXR","http://instagr.am/p/HH8-MdpM-m","http://instagr.am/p/HH8-N9pzfv","http://instagr.am/p/HH8-NbqDLG","http://instagr.am/p/HH8-NwoEwm","http://instagr.am/p/HH8-ODsfzo","http://instagr.am/p/HH8-OHE0p8","http://instagr.am/p/HH8-QFmasl","http://instagr.am/p/HH8-QaA7Rb","http://instagr.am/p/HH8-R-poCB","http://instagr.am/p/HH8-S5PDIy","http://instagr.am/p/HH8-SqHrOY","http://instagr.am/p/HH8-SzPREN","http://instagr.am/p/HH8-U1r5VK","http://instagr.am/p/HH8-UjEeXv","http://instagr.am/p/HH8-VaRadH","http://instagr.am/p/HH8-WFIPij","http://instagr.am/p/HH8-WHRwHP","http://instagr.am/p/HH8-X-SkFA","http://instagr.am/p/HH8-a5icLX","http://instagr.am/p/HH8-aSRpdn","http://instagr.am/p/HH8-aTm5g8","http://instagr.am/p/HH8-aatV6Q","http://instagr.am/p/HH8-azAmc5","http://instagr.am/p/HH8-bcLP_v","http://instagr.am/p/HH8-dGrMku","http://instagr.am/p/HH8-dKABGr","http://instagr.am/p/HH8-eFTTJ8","http://instagr.am/p/HH8-eLRwvK","http://instagr.am/p/HH8-ehmwGz","http://instagr.am/p/HH8-h-D72a","http://instagr.am/p/HH8-hhmEOT","http://instagr.am/p/HH8-ibSZTj","http://instagr.am/p/HH8-jospUb","http://instagr.am/p/HH8-kMpc2F","http://instagr.am/p/HH8-kNBmGm","http://instagr.am/p/HH8-lArilF","http://instagr.am/p/HH8-lWTDwj","http://instagr.am/p/HH8-mNnqZL","http://instagr.am/p/HH8-n4sGGS","http://instagr.am/p/HH8-n9xHbn","http://instagr.am/p/HH8-pYx3JZ","http://instagr.am/p/HH8-pppok3","http://instagr.am/p/HH8-qoy3LK","http://instagr.am/p/HH8-qvROzb","http://instagr.am/p/HH8-qytoRH","http://instagr.am/p/HH8-rOyW_y","http://instagr.am/p/HH8-s9KXi6","http://instagr.am/p/HH8-sVyS7K","http://instagr.am/p/HH8-sbnQEO","http://instagr.am/p/HH8-txJV-e","http://instagr.am/p/HH8-u0Mewa","http://instagr.am/p/HH8-u1BFJ-","http://instagr.am/p/HH8-uXBu_r","http://instagr.am/p/HH8-ujO2m1","http://instagr.am/p/HH8-v7pm7L","http://instagr.am/p/HH8-vBRADm","http://instagr.am/p/HH8-vkwQNF","http://instagr.am/p/HH8-x5R6u2","http://instagr.am/p/HH8-xArCJB","http://instagr.am/p/HH8-xOxnVQ","http://instagr.am/p/HH8-xrmqCf","http://instagr.am/p/HH8-y4Li29","http://instagr.am/p/HH8-yamwjM","http://instagr.am/p/HH802xDyEm","http://instagr.am/p/HH804Gw-Fe","http://instagr.am/p/HH804hAMqQ","http://instagr.am/p/HH805wBvVI","http://instagr.am/p/HH806SguSx","http://instagr.am/p/HH806rEtcY","http://instagr.am/p/HH809ClkbW","http://instagr.am/p/HH809kPN-5","http://instagr.am/p/HH80Cxst8p","http://instagr.am/p/HH80E3Ibo0","http://instagr.am/p/HH80ELOZpk","http://instagr.am/p/HH80EVFFIz","http://instagr.am/p/HH80FngJs0","http://instagr.am/p/HH80M0kiBG","http://instagr.am/p/HH80cKKQ_E","http://instagr.am/p/HH80gaBUzQ","http://instagr.am/p/HH80lSDT71","http://instagr.am/p/HH80mYOHwX","http://instagr.am/p/HH80nfAYsL","http://instagr.am/p/HH80pUNIO2","http://instagr.am/p/HH80sxRLtt","http://instagr.am/p/HH80vbDjj0","http://instagr.am/p/HH80w7xI-m","http://instagr.am/p/HH80wDHTN4","http://instagr.am/p/HH81-5RjEB","http://instagr.am/p/HH811fo-_e","http://instagr.am/p/HH813tkiVZ","http://instagr.am/p/HH813vkGMo","http://instagr.am/p/HH814RDHuG","http://instagr.am/p/HH814TOYiW","http://instagr.am/p/HH8179vxAg","http://instagr.am/p/HH81AwC6db","http://instagr.am/p/HH81BGyWUr","http://instagr.am/p/HH81FoFjxm","http://instagr.am/p/HH81H-IH_i","http://instagr.am/p/HH81MnoSaI","http://instagr.am/p/HH81MtN3bH","http://instagr.am/p/HH81O1Cfe7","http://instagr.am/p/HH81RprFKO","http://instagr.am/p/HH81Z2pq3V","http://instagr.am/p/HH81aCPRem","http://instagr.am/p/HH81aVTWZm","http://instagr.am/p/HH81bBo8cM","http://instagr.am/p/HH81k2xVJ4","http://instagr.am/p/HH81kERlbh","http://instagr.am/p/HH81vqHC0M","http://instagr.am/p/HH81vqyti3","http://instagr.am/p/HH81wbS-cj","http://instagr.am/p/HH81xfEjvZ","http://instagr.am/p/HH81zsrbsz","http://instagr.am/p/HH823tDEIP","http://instagr.am/p/HH823ytt2P","http://instagr.am/p/HH825MgnYc","http://instagr.am/p/HH827QrTPF","http://instagr.am/p/HH82AWzhzS","http://instagr.am/p/HH82EGE05q","http://instagr.am/p/HH82FDu8Mf","http://instagr.am/p/HH82HTmdze","http://instagr.am/p/HH82L-iG-U","http://instagr.am/p/HH82NpFsn7","http://instagr.am/p/HH82YTOqEF","http://instagr.am/p/HH82bpEdvj","http://instagr.am/p/HH82cShmmV","http://instagr.am/p/HH82czP-SU","http://instagr.am/p/HH82h9LhYy","http://instagr.am/p/HH82iizf4G","http://instagr.am/p/HH82jUw184","http://instagr.am/p/HH82mrnPeW","http://instagr.am/p/HH82t9u8Mg","http://instagr.am/p/HH82tPH1El","http://instagr.am/p/HH82wzhczs","http://instagr.am/p/HH82zzjj7W","http://instagr.am/p/HH83-3oaAb","http://instagr.am/p/HH83-AlcOq","http://instagr.am/p/HH8302rtlY","http://instagr.am/p/HH833ty-ck","http://instagr.am/p/HH834lswSl","http://instagr.am/p/HH835DFp5j","http://instagr.am/p/HH835FKCBP","http://instagr.am/p/HH835UmKXt","http://instagr.am/p/HH835qnQot","http://instagr.am/p/HH8383zIXz","http://instagr.am/p/HH8384ROzS","http://instagr.am/p/HH83AMP4a0","http://instagr.am/p/HH83B5B1Nt","http://instagr.am/p/HH83CqkA0O","http://instagr.am/p/HH83DpMRPq","http://instagr.am/p/HH83EjPNA_","http://instagr.am/p/HH83Frqolx","http://instagr.am/p/HH83KmM8EC","http://instagr.am/p/HH83RJuxBF","http://instagr.am/p/HH83WCuGEA","http://instagr.am/p/HH83XtGGIV","http://instagr.am/p/HH83ZKNcTS","http://instagr.am/p/HH83aNohKe","http://instagr.am/p/HH83bCudp9","http://instagr.am/p/HH83f0vFsx","http://instagr.am/p/HH83gsmWCm","http://instagr.am/p/HH83gyJWp5","http://instagr.am/p/HH83k0h0C3","http://instagr.am/p/HH83nDlyBo","http://instagr.am/p/HH83nSlA26","http://instagr.am/p/HH83nfnS7m","http://instagr.am/p/HH83puJ0UJ","http://instagr.am/p/HH83qGPaXH","http://instagr.am/p/HH83r9D_FK","http://instagr.am/p/HH83uAFKtr","http://instagr.am/p/HH83uJxZeV","http://instagr.am/p/HH83vcTWsX","http://instagr.am/p/HH83xtmDSU","http://instagr.am/p/HH841GGzT3","http://instagr.am/p/HH841UMarm","http://instagr.am/p/HH841VgcD4","http://instagr.am/p/HH8429HDTT","http://instagr.am/p/HH842SMBUn","http://instagr.am/p/HH842cRA6V","http://instagr.am/p/HH842nNboH","http://instagr.am/p/HH844ISVI_","http://instagr.am/p/HH844QPBbt","http://instagr.am/p/HH8460RADl","http://instagr.am/p/HH846VkDLB","http://instagr.am/p/HH846jSV9B","http://instagr.am/p/HH847YpeiM","http://instagr.am/p/HH848JoFPh","http://instagr.am/p/HH849dRQnD","http://instagr.am/p/HH84EBB-rW","http://instagr.am/p/HH84GXHQEN","http://instagr.am/p/HH84IOO6Hd","http://instagr.am/p/HH84K7vdZp","http://instagr.am/p/HH84O1vefu","http://instagr.am/p/HH84O2hj7y","http://instagr.am/p/HH84OALIqP","http://instagr.am/p/HH84PVk-tn","http://instagr.am/p/HH84RquusO","http://instagr.am/p/HH84TnhJKv","http://instagr.am/p/HH84WQH1En","http://instagr.am/p/HH84XPiGqI","http://instagr.am/p/HH84YLH5ty","http://instagr.am/p/HH84YpLGfC","http://instagr.am/p/HH84Ywvdk6","http://instagr.am/p/HH84ZdzhTA","http://instagr.am/p/HH84afzC-V","http://instagr.am/p/HH84ctJ5s1","http://instagr.am/p/HH84dTHX9F","http://instagr.am/p/HH84fXPKi5","http://instagr.am/p/HH84fhto0L","http://instagr.am/p/HH84geJyhL","http://instagr.am/p/HH84hUpz82","http://instagr.am/p/HH84iYKYQp","http://instagr.am/p/HH84kFDSyv","http://instagr.am/p/HH84nNH_1J","http://instagr.am/p/HH84o1D3Um","http://instagr.am/p/HH84ohtzcL","http://instagr.am/p/HH84pNDJcd","http://instagr.am/p/HH84pOH6TN","http://instagr.am/p/HH84pXMYZd","http://instagr.am/p/HH84qkJ0i3","http://instagr.am/p/HH84sTvixj","http://instagr.am/p/HH84tan8wH","http://instagr.am/p/HH84w1gm7Z","http://instagr.am/p/HH84yNv-z-","http://instagr.am/p/HH84zAoMEl","http://instagr.am/p/HH85-0RTj8","http://instagr.am/p/HH850YgA3T","http://instagr.am/p/HH850pPNBB","http://instagr.am/p/HH850tOWXm","http://instagr.am/p/HH851nnMar","http://instagr.am/p/HH851yhV8o","http://instagr.am/p/HH852bqPAx","http://instagr.am/p/HH852nDatV","http://instagr.am/p/HH852pxXn5","http://instagr.am/p/HH853TsOYx","http://instagr.am/p/HH854_ob--","http://instagr.am/p/HH854kL_yC","http://instagr.am/p/HH8563jp99","http://instagr.am/p/HH856HhpBi","http://instagr.am/p/HH857CEjxZ","http://instagr.am/p/HH857URkql","http://instagr.am/p/HH857UqVCN","http://instagr.am/p/HH8580SWLd","http://instagr.am/p/HH858wITqb","http://instagr.am/p/HH85AXKxP5","http://instagr.am/p/HH85CIL_yB","http://instagr.am/p/HH85CKCp4U","http://instagr.am/p/HH85DLn-09","http://instagr.am/p/HH85Dnljqy","http://instagr.am/p/HH85E0Jcj3","http://instagr.am/p/HH85EKR9fm","http://instagr.am/p/HH85EgBaHm","http://instagr.am/p/HH85ElD4b_","http://instagr.am/p/HH85HBm9f4","http://instagr.am/p/HH85HFrCl3","http://instagr.am/p/HH85JYunBd","http://instagr.am/p/HH85LVoMhr","http://instagr.am/p/HH85LWCbeC","http://instagr.am/p/HH85MKFbQt","http://instagr.am/p/HH85NJv80J","http://instagr.am/p/HH85NUvTvk","http://instagr.am/p/HH85NyufqK","http://instagr.am/p/HH85PZOR6d","http://instagr.am/p/HH85Q2M2uh","http://instagr.am/p/HH85T2Ofcs","http://instagr.am/p/HH85VUKVTZ","http://instagr.am/p/HH85VVKoly","http://instagr.am/p/HH85VdK6R1","http://instagr.am/p/HH85Vfmn0-","http://instagr.am/p/HH85VxIOrP","http://instagr.am/p/HH85WoR6Ls","http://instagr.am/p/HH85Ztrf-m","http://instagr.am/p/HH85aLrxjq","http://instagr.am/p/HH85bOR6u0","http://instagr.am/p/HH85cZLXr6","http://instagr.am/p/HH85ckD-JY","http://instagr.am/p/HH85d6JlSW","http://instagr.am/p/HH85dUwcKY","http://instagr.am/p/HH85fUuT6W","http://instagr.am/p/HH85fiAaOe","http://instagr.am/p/HH85gMJBEP","http://instagr.am/p/HH85gVFvEt","http://instagr.am/p/HH85hIveqD","http://instagr.am/p/HH85hZAKiO","http://instagr.am/p/HH85i8CyMs","http://instagr.am/p/HH85jQhUo7","http://instagr.am/p/HH85kBSD2v","http://instagr.am/p/HH85lsFy6K","http://instagr.am/p/HH85mLnJky","http://instagr.am/p/HH85oVROzU","http://instagr.am/p/HH85p9ylU4","http://instagr.am/p/HH85qENCWv","http://instagr.am/p/HH85qhNMcj","http://instagr.am/p/HH85s3N7vl","http://instagr.am/p/HH85sJSPSe","http://instagr.am/p/HH85tStObP","http://instagr.am/p/HH85txnKmv","http://instagr.am/p/HH85uAvobc","http://instagr.am/p/HH85x8rlE0","http://instagr.am/p/HH85xZuNO8","http://instagr.am/p/HH85yYQwXj","http://instagr.am/p/HH85z3wB35","http://instagr.am/p/HH86-kGL7D","http://instagr.am/p/HH860TiUwo","http://instagr.am/p/HH861PIkO2","http://instagr.am/p/HH862smvKU","http://instagr.am/p/HH863NxVJ5","http://instagr.am/p/HH863Zw3Z2","http://instagr.am/p/HH865_H-sk","http://instagr.am/p/HH865nxSRn","http://instagr.am/p/HH868cli0X","http://instagr.am/p/HH86AKRtq-","http://instagr.am/p/HH86CjRQTU","http://instagr.am/p/HH86D8sx0Q","http://instagr.am/p/HH86DqzgPd","http://instagr.am/p/HH86EkSO83","http://instagr.am/p/HH86FzO4D_","http://instagr.am/p/HH86G8MkDx","http://instagr.am/p/HH86GGwXA0","http://instagr.am/p/HH86Hwq8Nu","http://instagr.am/p/HH86HzG4HB","http://instagr.am/p/HH86HzJtHN","http://instagr.am/p/HH86IMLZYc","http://instagr.am/p/HH86IXyAMV","http://instagr.am/p/HH86IwQ8Qk","http://instagr.am/p/HH86KXQuR6","http://instagr.am/p/HH86LTkJJE","http://instagr.am/p/HH86NrLH5g","http://instagr.am/p/HH86SWRNOm","http://instagr.am/p/HH86TJI64M","http://instagr.am/p/HH86TtpBEQ","http://instagr.am/p/HH86UYKOhh","http://instagr.am/p/HH86VGlnpW","http://instagr.am/p/HH86ZfpEEw","http://instagr.am/p/HH86_NPrK2","http://instagr.am/p/HH86afDy4x","http://instagr.am/p/HH86afyFIQ","http://instagr.am/p/HH86bGLfNe","http://instagr.am/p/HH86btlKts","http://instagr.am/p/HH86c9u8Wq","http://instagr.am/p/HH86cSR3_L","http://instagr.am/p/HH86fEHdLt","http://instagr.am/p/HH86g5m-xm","http://instagr.am/p/HH86gKCqXT","http://instagr.am/p/HH86gQGfPG","http://instagr.am/p/HH86hlSkE9","http://instagr.am/p/HH86hzhSep","http://instagr.am/p/HH86inFZGd","http://instagr.am/p/HH86ipACWp","http://instagr.am/p/HH86jXw0gP","http://instagr.am/p/HH86l3lWdw","http://instagr.am/p/HH86lBIcoK","http://instagr.am/p/HH86lbyhFP","http://instagr.am/p/HH86maiG-V","http://instagr.am/p/HH86mgqMW3","http://instagr.am/p/HH86mqFBiB","http://instagr.am/p/HH86mwm5g5","http://instagr.am/p/HH86oeJWA2","http://instagr.am/p/HH86ofFMPU","http://instagr.am/p/HH86ohOsAZ","http://instagr.am/p/HH86olHu0A","http://instagr.am/p/HH86otvR1t","http://instagr.am/p/HH86pHTMXp","http://instagr.am/p/HH86qENjaU","http://instagr.am/p/HH86rtCd4L","http://instagr.am/p/HH86s0N3Ay","http://instagr.am/p/HH86sDPkW6","http://instagr.am/p/HH86tLMj5x"]
class Crawler
include Celluloid
def fetch(id)
uri = URI("http://api.instagram.com/oembed?url=#{id}")
req = open(uri).read
end
end
URLS.each_slice(50).map do |idset|
pool = Crawler.pool(size: 50)
crawlers = idset.to_a.map do |id|
begin
pool.future(:fetch, id)
rescue
nil
end
end
crawlers.compact.each do |resp|
puts resp.value.size rescue nil
end
end
Split the class. It's been told on wiki to never do pool of a worker inside it.
https://github.com/celluloid/celluloid/wiki/Pools
Gotcha: Don't make pools inside workers!
Using MyWorker.pool within MyWorker will result in an unbounded
explosion of worker threads.
Update
If you want to limit your pool just create it outside the each_slice block so you use always the same Threads I guess.
pool = Crawler.pool(size: 50)
URLS.each_slice(50).map do |idset|
crawlers = idset.to_a.map do |id|
begin
pool.future(:fetch, id)
rescue
nil
end
end
# ...
Each iteration through the slice of 50 you're resetting the value of pool, which likely is dereferencing your poolmanager. Since actors aren't garbage collected just by being dereferenced (you have to call #terminate) you're probably piling up your old pools. It should be ok to just make one pool, and create all your futures at once (if you keep the return value small the future object itself is small). If you do find that you have to slice, instantiate your pool outside the each_slice and it will continue to use the same pool without making a new one each time around. If for some other reason you want to get a new pool each time, call terminate on the pool before you dereference it. Also be sure you're working with celluloid 0.12.0+ as it fixes an issue where pool workers weren't being terminated when the pool was.
When I iterate around actors, I've found this bit of logging to be useful to be sure I don't have any actor leaks:
logger.info "Actors left: #{Celluloid::Actor.all.to_set.length} Alive: #{Celluloid::Actor.all.to_set.reject { |a| a.nil? || !a.alive? }.length}"

Why Sinatra request takes EM thread?

Sinatra app receives requests for long running tasks and EM.defer them, launching them in EM's internal pool of 20 threads. When there are more than 20 EM.defer running, they are stored in EM's threadqueue by EM.defer.
However, it seems Sinatra won't service any requests until there is an EM thread available to handle them. My question is, isn't Sinatra suppose to use the reactor of the main thread to service all requests? Why am I seeing an add on the threadqueue when I make a new request?
Steps to reproduce:
Access /track/
Launch 30 /sleep/ reqs to fill the threadqueue
Access /ping/ and notice the add in the threadqueue as well as the delay
Code to reproduce it:
require 'sinatra'
#monkeypatch EM so we can access threadpools
module EventMachine
def self.queuedDefers
#threadqueue==nil ? 0: #threadqueue.size
end
def self.availThreads
#threadqueue==nil ? 0: #threadqueue.num_waiting
end
def self.busyThreads
#threadqueue==nil ? 0: #threadpool_size - #threadqueue.num_waiting
end
end
get '/track/?' do
EM.add_periodic_timer(1) do
p "Busy: " + EventMachine.busyThreads.to_s + "/" +EventMachine.threadpool_size.to_s + ", Available: " + EventMachine.availThreads.to_s + "/" +EventMachine.threadpool_size.to_s + ", Queued: " + EventMachine.queuedDefers.to_s
end
end
get '/sleep/?' do
EM.defer(Proc.new {sleep 20}, Proc.new {body "DONE"})
end
get '/ping/?' do
body "pong"
end
I tried the same thing on Rack/Thin (no Sinatra) and works as it's supposed to, so I guess Sinatra is causing it.
Ruby version: 1.9.3.p125
EventMachine: 1.0.0.beta.4.1
Sinatra: 1.3.2
OS: Windows
Ok, so it seems Sinatra starts Thin in threaded mode by default causing the above behavior.
You can add
set :threaded, false
in your Sinatra configure section and this will prevent the Reactor defering requests on a separate thread, and blocking when under load.
Source1
Source2
Unless I'm misunderstanding something about your question, this is pretty much how EventMachine works. If you check out the docs for EM.defer, they state:
Don't write a deferred operation that will block forever. If so, the
current implementation will not detect the problem, and the thread
will never be returned to the pool. EventMachine limits the number of
threads in its pool, so if you do this enough times, your subsequent
deferred operations won't get a chance to run.
Basically, there's a finite number of threads, and if you use them up, any pending operations will block until a thread is available.
It might be possible to bump threadpool_size if you just need more threads, although ultimately that's not a long-term solution.
Is Sinatra multi threaded? is a really good question here on SO about Sinatra and threads. In short, Sinatra is awesome but if you need decent threading you might need to look elsewhere.

Improving ruby code performance with threading not working

I am trying to optimize this piece of ruby code with Thread, which involves a fair bit of IO and network activity. Unfortunately that is not going down too well.
# each host is somewhere in the local network
hosts.each { |host|
# reach out to every host with something for it to do
# wait for host to complete the work and get back
}
My original plan was to wrap the internal of the loop into a new Thread for each iteration. Something like:
# each host is somewhere in the local network
hosts.each { |host|
Thread.new {
# reach out to every host with something for it to do
# wait for host to complete the work and get back
}
}
# join all threads here before main ends
I was hoping that since this I/O bound even without ruby 1.9 I should be able to gain something but nope, nothing. Any ideas how this might be improved?
I'm not sure how many hosts you have, but if it's a lot you may want to try a producer/consumer model with a fixed number of threads:
require 'thread'
THREADS_COUNT = (ARGV[0] || 4).to_i
$q = Queue.new
hosts.each { |h| $q << h }
threads = (1..THREADS_COUNT).map {
Thread.new {
begin
loop {
host = $q.shift(true)
# do something with host
}
rescue ThreadError
# queue is empty, pass
end
}
}
threads.each(&:join)
If this all too complicated, you may try using xargs -P. ;)
#Fanatic23, before you draw any conclusions, instrument your code with puts and see whether the network requests are actually overlapping. In each call to puts, print a status string indicating the line which is executing, along with Time.now and Time.now.nsec.
You say "since this [is] I/O bound even without ruby 1.9 I should be able to gain something".
Vain hope. :-)
When a thread in Ruby 1.8 blocks on IO, the entire process has blocked on IO. This is because it's using green theads.
Upgrade to Ruby 1.9, and you'll have access to your platform's native threads implementation. For more, see:
http://en.wikipedia.org/wiki/Green_threads
Enjoy!

Resources