How to control running order between threads? - ruby

Following is structure of processing data and printing results. Some contents will be printed out during process_records, so I want for all threads ensure part can be printed out at very end of program running together with order, like first thread's ensure part should be printed out first. How can I do that without moving print_report out of Thread.new do? Thanks
lock = Mutex.new()
thread_num.times do |i|
threads << Thread.new do
records = lock.synchronize{db_proxy.query(account_id)}
result1 = process_records(records)
result2 = process_records2(records)
result3 = process_records3(records)
ensure
print_report(result1, result2, result3)
end
}
end
end

Related

Limit the number of threads in an iteration ruby

When I have my code like this, I get "can't create thread, resource temporarily unavailable". There are over 24k files in the directory to process.
frames.each do |image|
Thread.new do
pipeline = ImageProcessing::MiniMagick.
source(File.open("original/#{image}"))
.append("-fuzz", "30%")
.append("-transparent", "#ff00fe")
result = pipeline.call
puts result.path
file_parts = image.split("_")
frame_number = file_parts[2]
FileUtils.cp(result.path, "transparent/image_transparent_#{frame_number}")
puts "Done with #{image}!"
puts "#{Dir.children("transparent").count.to_s} / #{Dir.children("original").count.to_s}"
puts "\n"
end
end.each{ |thread| thread.join }
So, I tried the first 1001 files by calling the index 0-1000, and did it this way:
frames[0..1000].each_with_index do |image, index|
thread = Thread.new do
pipeline = ImageProcessing::MiniMagick.
source(File.open("original/#{image}"))
.append("-fuzz", "30%")
.append("-transparent", "#ff00fe")
result = pipeline.call
puts result.path
file_parts = image.split("_")
frame_number = file_parts[2]
FileUtils.cp(result.path, "transparent/image_transparent_#{frame_number}")
puts "Done with #{image}!"
puts "#{Dir.children("transparent").count.to_s} / #{Dir.children("original").count.to_s}"
puts "\n"
end
thread.join
end
And while this is processing, the speed seems to be about the same as if it was on a single thread when I'm watching it in the Terminal.
But I want the code to be able to limit to whatever the OS will allow before it disallows, so that it can parse through them all faster.
Or at lease:
Find the maximum threads allowed
Get original directory's count, divided by the number of threads allowed.
Run this each in batches of that division.

Multi-threading IO

I am trying to make an interactive telnet client for Ruby. The current library is extremely lacking so I have been trying to add onto it by creating an interactive gem that allows a user to streamline data in real time with telnet. To do this I need to use multithreading:
t1 accepts user input. Users must always have capacity to input data through entire application. Once user data is sent we will receive data back right away which will be caught with our block { |c| print c }. The problem is that we want data to be streamlined to us. In other words, right now we only get data that is sent back after we send something, we want data to be received up to a minute after we sent a command. We want data to constantly be flowing to us.
I made t2 for this purpose. t2 waits for data to be received and then displays it when its regex pattern is matched. The problem with t2 is, if data is never received then the user cannot enter information into t1.
t3 operates on t1 and t2.
My Question is how can I organize my threads in such a way where the user can constantly type in console and submit commands, and simultaneously constantly receive information back from the server?
t1 = Thread.new {
while true
input = gets.chomp
localhost.cmd(input) { |c| print c }
end
}
t2 = Thread.new {
puts localhost.waitfor("Prompt" => /[$%#>:?.|](\e\[0?m\s*)* *\z/)
}
t3 = Thread.new {
t1.join
t2.join
}
t3.join
The problem is that we want data to be streamlined to us. In other
words, right now we only get data that is sent back after we send
something,
require 'thread'
user_data = Queue.new
t1 = Thread.new do
loop do
print "Enter data: "
line = gets.chomp
if line == ""
user_data << "END_OF_DATA"
break
else
user_data << line
end
end
end.join
t2 = Thread.new do
processed_data = []
loop do
line = user_data.shift
break if line == "END_OF_DATA"
processed_data << line
end
p processed_data
end.join
You might want to read this:
https://www.rfc-editor.org/rfc/rfc854

Ruby Parallel each loop

I have a the following code:
FTP ... do |ftp|
files.each do |file|
...
ftp.put(file)
sleep 1
end
end
I'd like to run the each file in a separate thread or some parallel way. What's the correct way to do this? Would this be right?
Here's my try on the parallel gem
FTP ... do |ftp|
Parallel.map(files) do |file|
...
ftp.put(file)
sleep 1
end
end
The issue with parallel is puts/outputs can occur at the same time like so:
as = [1,2,3,4,5,6,7,8]
results = Parallel.map(as) do |a|
puts a
end
How can I force puts to occur like they normally would line separated.
The whole point of parallelization is to run at the same time. But if there's some part of the process that you'd like to run some of the code sequentially you could use a mutex like:
semaphore = Mutex.new
as = [1,2,3,4,5,6,7,8]
results = Parallel.map(as, in_threads: 3) do |a|
# Parallel stuff
sleep rand
semaphore.synchronize {
# Sequential stuff
puts a
}
# Parallel stuff
sleep rand
end
You'll see that it prints stuff correctly but not necesarily in the same order. I used in_threads instead of in_processes (default) because Mutex doesn't work with processes. See below for an alternative if you do need processes.
References:
http://ruby-doc.org/core-2.2.0/Mutex.html
http://dev.housetrip.com/2014/01/28/efficient-cross-processing-locking-in-ruby/
In the interest of keeping it simple, here's what I'd do with built-in Thread:
results = files.map do |file|
result = Thread.new do
ftp.put(file)
end
end
Note that this code assumes that ftp.put(file) returns safely. If that isn't guaranteed, you'll have to do that yourself by wrapping calls in a timeout block and have each thread return an exception if one is thrown and then at the very end of the loop have a blocking check to see that results does not contain any exceptions.

Ruby understanding multithreading

I'm trying to multithread loop in ruby following this exmaple: http://t-a-w.blogspot.com/2010/05/very-simple-parallelization-with-ruby.html.
I copied that coded and wrote this:
module Enumerable
def ignore_exception
begin
yield
rescue Exception => e
STDERR.puts e.message
end
end
def in_parallel(n)
t_queue = Queue.new
threads = (1..n).map {
Thread.new{
while x = t_queue.deq
ignore_exception{ yield(x[0]) }
end
}
}
each{|x| t_queue << [x]}
n.times{ t_queue << nil }
threads.each{|t|
t.join
unless t[:out].nil?
puts t[:out]
end
}
end
end
ids.in_parallel(10){ |id|
conn = open_conn(loc)
out = conn.getData(id)
Thread.current[:out] = out
}
The way I understand it is that it will dequeue 10 items at a time, process the block in the loop per id and join the 10 threads at the end, and repeat until finished. After running this code I get different results sometimes, especially if the size of my ids is less then 10, and I am confused why this is occuring. Half the time it will not output anything for upto half the ids, even though I can check on server side that output for those ids exists. For example if the correct output is "Got id 1" and "Got id 2", it will only print out {"Got id 1"} or {"Got id 2"} or {"Got id 1", "Got id 2"}. My question is that is that is my understanding of this code correct?
The issue in my code was the open_conn() function call, which was not thread safe. I fixed the issue by synchronizing around getting the connection handle:
connLock = Mutex.new
ids.in_parallel(10){ |id|
conn = nil
connLock.synchronize {
conn = open_conn(loc)
}
out = conn.getData(id)
Thread.current[:out] = out
}
Also should use http://peach.rubyforge.org/ for the loop parallelization by using:
ids.peach(10){ |id| ... }

Odd bug with DataMapper, Mutexes, and Threads?

I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end

Resources