Limiting the number of threads to be executed - ruby

I am trying to limit the number of threads that must be executed simultaneously or to separate the elements of the array in a certain amount and send them separately (splice) but I have no idea where and how to start. The code is executing the Threads quickly and this is hindering many.
My code:
file = File.open('lista.txt').read
file.gsub!(/\r\n?/, "")
arr = []
file.each_line do |log|
arr << Thread.new{
puts login = Utils.req("GET","http://httpbin.org/GET",{
'Host': 'api.httpbin.org',
'content-type': 'application/json',
'accept': '*/*',
},
false)
}
end
arr.each(&:join)

If you want to limit how many threads are running concurrently, use a thread pool. This creates a "pool" of worker threads and limits the number of threads running at the same time. You can write your own, or use a gem like celluloid.
class Whatever
include Celluloid
def login_request
Utils.req(
"POST",
"http://httpbin.org/post",
{
'Host': 'api.httpbin.org',
'content-type': 'application/json',
'accept': '*/*',
},
false
)
end
end
# Only 5 threads will run at a time.
pool = Whatever.pool(size: 5)
file.each_line do |log|
split = log.split("|")
id = split[0]
number = split[1]
# If there's a free working in the pool, this will execute immediately.
# If not, it will wait until there is a free worker.
pool.login_request
end

You could use a thread pool. Here's a very basic one using Queue:
queue = Queue.new
pool = Array.new(5) do
Thread.new do
loop do
line = queue.pop
break if line == :stop
# do something with line, e.g.
# id, number = line.split('|')
# Utils.req(...)
end
end
end
It creates 5 threads and stores them in an array for later reference. Each thread runs a loop, calling Queue#pop to fetch a line from the queue. If the queue is empty, pop will suspend the thread until data becomes available. So initially, the threads will just sit there waiting for work to come:
queue.num_waiting #=> 5
pool.map(&:status) #=> ["sleep", "sleep", "sleep", "sleep", "sleep"]
Once the thread has retrieved a line, it will process it (to be implemented) and fetch a new one (or fall asleep again if there is none). If the line happens to be the symbol :stop, the thread will break the loop and terminate. (a very pragmatic approach)
To fill the queue, we can use a simple loop in our main thread:
file.each_line { |line| queue << line }
Afterwards, we push 5 :stop symbols to the end of the queue and then wait for the threads to finish:
pool.each { queue << :stop }
pool.each(&:join)
At this point, no threads are waiting for the queue – they all terminated normally:
queue.num_waiting #=> 0
pool.map(&:status) #=> [false, false, false, false, false]
Note that Queue isn't limited to strings or symbols. You can push any Ruby object to it. So instead of pushing the raw line, you could push the pre-processed data:
file.each_line do |line|
id, number = line.split('|')
queue << [id, number]
end
This way, your threads don't have to know about the log format. They can just work on id and number:
loop do
value = queue.pop
break if value == :stop
id, number = value
# do something with id and number
end
Instead of writing your own thread pool, you could of course use one of the various concurrency gems.

Related

How to asynchronously collect results from new threads created in real time in ruby

I would like to continously check the table in the DB for the commands to run.
Some commands might take 4minutes to complete, some 10 seconds.
Hence I would like to run them in threads. So every record creates new thread, and after thread is created, record gets removed.
Because the DB lookup + Thread creation will run in an endless loop, how do I get the 'response' from the Thread (thread will issue shell command and get response code which I would like to read) ?
I thought about creating two Threads with endless loop each:
- first for DB lookups + creating new threads
- second for ...somehow reading the threads results and acting upon each response
Or maybe I should use fork, or os spawn a new process?
You can have each thread push its results onto a Queue, then your main thread can read from the Queue. Reading from a Queue is a blocking operation by default, so if there are no results, your code will block and wait on the read.
http://ruby-doc.org/stdlib-2.0.0/libdoc/thread/rdoc/Queue.html
Here is an example:
require 'thread'
jobs = Queue.new
results = Queue.new
thread_pool = []
pool_size = 5
(1..pool_size).each do |i|
thread_pool << Thread.new do
loop do
job = jobs.shift #blocks waiting for a task
break if job == "!NO-MORE-JOBS!"
#Otherwise, do job...
puts "#{i}...."
sleep rand(1..5) #Simulate the time it takes to do a job
results << "thread#{i} finished #{job}" #Push some result from the job onto the Queue
#Go back and get another task from the Queue
end
end
end
#All threads are now blocking waiting for a job...
puts 'db_stuff'
db_stuff = [
'job1',
'job2',
'job3',
'job4',
'job5',
'job6',
'job7',
]
db_stuff.each do |job|
jobs << job
end
#Threads are now attacking the Queue like hungry dogs.
pool_size.times do
jobs << "!NO-MORE-JOBS!"
end
result_count = 0
loop do
result = results.shift
puts "result: #{result}"
result_count +=1
break if result_count == 7
end

Is there a better way to make multiple HTTP requests asynchronously in Ruby?

I'm trying to make multiple HTTP requests in Ruby. I know it can be done in NodeJS quite easily. I'm trying to do it in Ruby using threads, but I don't know if that's the best way. I haven't had a successful run for high numbers of requests (e.g. over 50).
require 'json'
require 'net/http'
urls = [
{"link" => "url1"},
{"link" => "url2"},
{"link" => "url3"}
]
urls.each_value do |thing|
Thread.new do
result = Net::HTTP.get(URI.parse(thing))
json_stuff = JSON::parse(result)
info = json["person"]["bio"]["info"]
thing["name"] = info
end
end
# Wait until threads are done.
while !urls.all? { |url| url.has_key? "name" }; end
puts urls
Any thoughts?
Instead of the while clause you used, you can call Thread#join to make the main thread wait for other threads.
threads = []
urls.each_value do |thing|
threads << Thread.new do
result = Net::HTTP.get(URI.parse(thing))
json_stuff = JSON::parse(result)
info = json["person"]["bio"]["info"]
thing["name"] = info
end
end
# Wait until threads are done.
threads.each { |aThread| aThread.join }
Your way might work, but it's going to end up in a busy loop, eating up CPU cycles when it really doesn't need to. A better way is to only check whether you're done when a request completes. One way to accomplish this would be to use a Mutex and a ConditionVariable.
Using a mutex and condition variable, we can have the main thread waiting, and when one of the worker threads receives its response, it can wake up the main thread. The main thread can then see if any URLs remain to be downloaded; if so, it'll just go to sleep again, waiting; otherwise, it's done.
To wait for a signal:
mutex.synchronize { cv.wait mutex }
To wake up the waiting thread:
mutex.synchronize { cv.signal }
You might want to check for done-ness and set thing['name'] inside the mutex.synchronize block to avoid accessing data in multiple threads simultaneously.

Odd bug with DataMapper, Mutexes, and Threads?

I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end

Thread and Queue

I am interested in knowing what would be the best way to implement a thread based queue.
For example:
I have 10 actions which I want to execute with only 4 threads. I would like to create a queue with all the 10 actions placed linearly and start the first 4 action with 4 threads, once one of the thread is done executing, the next one will start etc - So at a time, the number of thread is either 4 or less than 4.
There is a Queue class in thread in the standard library. Using that you can do something like this:
require 'thread'
queue = Queue.new
threads = []
# add work to the queue
queue << work_unit
4.times do
threads << Thread.new do
# loop until there are no more things to do
until queue.empty?
# pop with the non-blocking flag set, this raises
# an exception if the queue is empty, in which case
# work_unit will be set to nil
work_unit = queue.pop(true) rescue nil
if work_unit
# do work
end
end
# when there is no more work, the thread will stop
end
end
# wait until all threads have completed processing
threads.each { |t| t.join }
The reason I pop with the non-blocking flag is that between the until queue.empty? and the pop another thread may have pop'ed the queue, so unless the non-blocking flag is set we could get stuck at that line forever.
If you're using MRI, the default Ruby interpreter, bear in mind that threads will not be absolutely concurrent. If your work is CPU bound you may just as well run single threaded. If you have some operation that blocks on IO you may get some parallelism, but YMMV. Alternatively, you can use an interpreter that allows full concurrency, such as jRuby or Rubinius.
There area a few gems that implement this pattern for you; parallel, peach,and mine is called threach (or jruby_threach under jruby). It's a drop-in replacement for #each but allows you to specify how many threads to run with, using a SizedQueue underneath to keep things from spiraling out of control.
So...
(1..10).threach(4) {|i| do_my_work(i) }
Not pushing my own stuff; there are plenty of good implementations out there to make things easier.
If you're using JRuby, jruby_threach is a much better implementation -- Java just offers a much richer set of threading primatives and data structures to use.
Executable descriptive example:
require 'thread'
p tasks = [
{:file => 'task1'},
{:file => 'task2'},
{:file => 'task3'},
{:file => 'task4'},
{:file => 'task5'}
]
tasks_queue = Queue.new
tasks.each {|task| tasks_queue << task}
# run workers
workers_count = 3
workers = []
workers_count.times do |n|
workers << Thread.new(n+1) do |my_n|
while (task = tasks_queue.shift(true) rescue nil) do
delay = rand(0)
sleep delay
task[:result] = "done by worker ##{my_n} (in #{delay})"
p task
end
end
end
# wait for all threads
workers.each(&:join)
# output results
puts "all done"
p tasks
You could use a thread pool. It's a fairly common pattern for this type of problem.
http://en.wikipedia.org/wiki/Thread_pool_pattern
Github seems to have a few implementations you could try out:
https://github.com/search?type=Everything&language=Ruby&q=thread+pool
Celluloid have a worker pool example that does this.
I use a gem called work_queue. Its really practic.
Example:
require 'work_queue'
wq = WorkQueue.new 4, 10
(1..10).each do |number|
wq.enqueue_b("Thread#{number}") do |thread_name|
puts "Hello from the #{thread_name}"
end
end
wq.join

Ruby threading pass control to main

I am programming an application in Ruby which creates a new thread for every new job. So this is like a queue manager, where I check how many threads can be started from a database. Now when a thread finishes, I want to call the method to start a new job (i.e. a new thread). I do not want to create nested threads, so is there any way to join/terminate/exit the calling thread and pass control over to the main thread? Just to make the situation clear, there can be other threads running at this time.
I tried simply joining the calling thread, if its not the main thread and I get the following error;
"thread 0x7f8cf8dcf438 tried to join itself"
Any suggestions will be highly appreciated.
Thanks in advance.
I'd propose two solutions:
the first one is effectively to join on a thread, but join has to be called from the main thread (assuming you started all of your worker threads from the main) :
def thread_proc(s)
sleep rand(5)
puts "#{Thread.current.inspect}: #{s}"
end
strings = ["word", "test", "again", "value", "fox", "car"]
threads = []
2.times {
threads << Thread.new(strings.shift) { |s| thread_proc(s) }
}
while !threads.empty?
threads.each { |t|
t.join
threads << Thread.new(strings.shift) { |s| thread_proc(s) } unless strings.empty?
threads.delete(t)
}
end
but that method is kind of inefficient, because creating threads over and over again induces memory and CPU overhead.
You should better synchronize a fixed pool of reused threads by using a Queue:
require 'thread'
strings = ["word", "test", "again", "value", "fox", "car"]
q = Queue.new
strings.each { |s| q << s }
threads = []
2.times { threads << Thread.new {
while !q.empty?
s = q.pop
sleep(rand(5))
puts "#{Thread.current.inspect}: #{s}"
end
}}
threads.each { |t| t.join }
t1 = Thread.new { Thread.current[:status] = "1"; sleep 10; Thread.pass; sleep 100 }
t2 = Thread.new { Thread.current[:status] = "2"; sleep 1000 }
t3 = Thread.new { Thread.current[:status] = "3"; sleep 1000 }
puts Thread.list.map {|X| x[:status] }
#=> 1,2,3
Thread.list.each do |x|
if x[:status] == 2
x.kill # kill the thread
break
end
end
puts Thread.list.map {|X| x[:status] }
#=> 1,3
"Thread::pass" will pass control to the scheduler which can now schedule any other thread. The thread has voluntarily given up control to the scheduler - we cannot specify to pass control onto a specific thread
"Thread#kill" will kill the instance the thread
"Thread::list" will return the list of threads
Threads are managed by the scheduler, if you want explicit control then checkout fibers. But it has some gotchas, fibers are not supported in JRuby.
also checkout thread local variables, it will help you to communicate the status or return value of the thread, without joining to the thread.
http://github.com/defunkt/resque is a good option for a queue, check it out. Also try JRuby if you are going make heavy use of threads. It' advantage is that it will wrap java threads in ruby goodness.

Resources