I have an application that makes several slow http calls on certain inbound API requests and I'd like those to run in parallel because there are several and they are slow.
For a thread pool, I've previously used http://burgestrand.se/articles/quick-and-simple-ruby-thread-pool.html.
Are there any architecturally sound solutions for running this in parallel, with or without a thread pool?
Edit
My apologies, I was watching a movie while typing this up and wrote "serial" in the places where I have italicized "parallel". Thanks to #Catnapper for the catch. How embarassing
For good leads try Sidekiq:
http://mperham.github.com/sidekiq/
And Celluloid:
http://www.unlimitednovelty.com/2011/05/introducing-celluloid-concurrent-object.html
Related
I'm trying to interact with Google's Calendar API. My tests so far show response times of 5-10 seconds to insert a single event, and I may need to export thousands of events at once [don't ask]. This seems likely to spam the heck out of my queues for unreasonable amounts of time. (95% of current jobs in this app finish in <300ms, so this will make it harder to allocate resources appropriately.)
I'm currently using Faraday in this app to call other, faster Google APIs. The Faraday wiki suggests using Typhoeus for parallel HTTP requests; however, using Typhoeus with Sidekiq was deemed "a bad idea" as of 2014.
Is Typhoeus still a bad idea? If so, is it reasonable to spawn N threads in a Sidekiq worker, make an HTTP request within each thread, and then wait for all threads to rejoin? Is there some other way to accomplish this extremely I/O-bound task without throwing more workers at the problem? Should I ask my manager to increase our Sidekiq Enterprise spend? ;) Or should I just throw these jobs in a low-priority queue and tell our users with ridiculous habits that they'll just have to wait?
It's reasonable to use threads within Sidekiq job threads. It's not reasonable to build your own threading infrastructure. You can use a reusable thread pool with the concurrent-ruby or parallel gems, you can use an http client which is thread-safe and allows concurrent requests, etc. HTTP.rb is a good one from Tony Arcieri but plain old net/http will work too:
https://github.com/httprb/http/wiki/Thread-Safety
Just remember that there's a few complexities: the job might be retried, how do you handle errors that the HTTP client raises? If you don't split these requests 1-to-1 with jobs, you might need to track each or idempotency becomes an issue.
And you are always welcome to increase your Sidekiq Enterprise thread count. :-D
I understand that multiple processes (worker processes) can be used in order to offload the web processes of an Heroku app, before my app gets a lot of traffic, would it make sense to keep the potentially blocking tasks to some separate threads and calling them asynchronously instead of using multiple processes?
I see no reason why this would be a problematic approach in the beginning, but I was wondering if there is some reasons I didn't thought about that would not make it a good decision to start like that?
Thank you
I'm working on a Ruby script that will be making hundreds of network requests (via open-uri) to various APIs and I'd like to do this in parallel since each request is slow, and blocking.
I have been looking at using Thread or Process to achieve this but I'm not sure which method to use.
With regard to network request, when should i use a Thread over Process, or does it not matter?
Before going into detail, there is already a library solving your problem. Typhoeus is optimized to run a large number of HTTP requests in parallel and is based on the libcurl library.
Like a modern code version of the mythical beast with 100 serpent
heads, Typhoeus runs HTTP requests in parallel while cleanly
encapsulating handling logic.
Threads will be run in the same process as your application. Since Ruby 1.9 native threads are used as the underlying implementation. Resources can be easily shared across threads, as they all can access the mutual state of the application. The problem, however, is that you cannot utilize the multiple cores of your CPU with most Ruby implementations.
Ruby uses the Global Interpreter Lock (GIL). GIL is a locking mechanism to ensure that the mutual state is not corrupted due to parallel modifications from different threads. Other Ruby implementations like JRuby, Rubinius or MacRuby offer an approach without GIL.
Processes run separately from each other. Processes do not share resources, which means every process has its own state. This can be a problem, if you want to share data across your requests. A process also allocates its own stack of memory. You could still share data by using a messaging bus like RabitMQ.
I cannot recommend to use either only threads or only processes. If you want to implement that yourself, you should use both. Fork for every n requests a new processes which then again spawns a number of threads to issue the HTTP requests. Why?
If you fork for every HTTP request another process, this will result in too many processes. Although your operating system might be able to handle this, the overhead is still tremendous. Some HTTP requests might finish very fast, so why bother with an extra process, just run them in another thread.
As I understand, Ruby 1.9 uses OS threads but only one thread will still actually be running concurrently (though one thread may be doing blocking IO while another thread is doing processing). The threading examples I've seen just use Thread.new to launch a new thread. Coming from a Java background, I typically use thread pools as to not launch to many new threads since they are "heavyweight."
Is there a thread pool construct built into ruby? I didn't see one in the default language libraries. Or are there is a standard gem that is typically used? Since OS level threading is a newer feature of ruby, I don't know how mature the libraries are for it.
You are correct in that the default C Ruby interpreter only executes one thread at a time (other C based dynamic languages such as Python have similar restrictions). Because of this restriction, threading is not really that common in Ruby and as a result there is no default threadpool library. If there are tasks to be done in parallel, people typically uses processes since processes can scale over multiple servers.
If you do need to use threads, I would recommend you use https://github.com/meh/ruby-threadpool on the JRuby platform, which is a Ruby interpreter running on the JVM. That should be right up your alley, and because it is running on the virtual machine it will have true threading.
The accepted answer is correct, But, there are many tasks in which threads are fine. after all there are some reasons why it is there. even though it can only run a thread at a time. it is still can be considered parallel in many real life situations.
for example when we have 100 long running process in which each takes approximate 10 minutes to complete. by using threads in ruby, even with all those restrictions, if we define a threadpool of 10 tasks at time, it will run much faster than 100*10 minutes when running without threads. examples include, live capturing of file changes, sending large number of web requests (such as status check)
You can understand how pooling works by reading https://blog.codeship.com/understanding-fundamental-ruby-abstraction-concurrency/ . in production code use https://github.com/meh/ruby-thread#pool
We are experiencing slow processing of requests under heavy load. When looking at the currently running requests during these bursts I can see many requests to our web-service code.
The number of requests is not that large but they appear to be stuck in a preprocessing state. Below is an example:
We are running an IIS7 app pool in classic mode due to the need to support some legacy code.
Other requests continue to be processed but these stuck requests gradually seem to fill up the available threads leading to slow processing of other pages.
Does anyone have any idea on where these requests are getting stuck.
There appears to be no resource issue with the DB and the requests state show suggest this is all preprocessing.
We have run load tests on the code involved on local machines and can not replicate the issue.
Another possible factor is we are making use of MVC and UrlRouting.
Many thanks for any help.
Some issues only happen at production servers unfortunately, as load test can never simulate real world users.
You can try to capture hang dumps when performance is bad, and then analyze them (on your own or open a support case via http://support.microsoft.com to work with Microsoft support).
Usually you might have hit the famous thread pool bottleneck, http://support.microsoft.com/kb/821268. Dump analysis can easily tell the culprit and help locate a solution.
Why not move them into their own AppPool to separate them from the Classic ASP app - you'll then have more options to tune.