Ruby 1.8.7: Forks & Pipes - Troubleshooting - ruby

I'm aware that there are great gems like Parallel, but I came up with the class below as an exercise.
It's working fine, but when doing a lot of iterations it happens sometimes that Ruby will get "stuck". When pressing CTRL+C I can see from the backtrace that it's always in lines 38 or 45 (the both Marshal lines).
Can you see anything that is wrong here? It seems to be that the Pipes are "hanging", so I thought I might be using them in a wrong way.
My goal was to iterate through an array (which I pass as 'objects') with a limited number of forks (max_forks) and to return some values. Additionally I wanted to guarantee that all childs get killed when the parent gets killed (even in case of kill -9), this is why I introduced the "life_line" Pipe (I've read here on Stackoverflow that this might do the trick).
class Parallel
def self.do_fork(max_forks, objects)
waiter_threads = []
fork_counter = []
life_line = {}
comm_line = {}
objects.each do |object|
key = rand(24 ** 24).to_s(36)
sleep(0.01) while fork_counter.size >= max_forks
if fork_counter.size < max_forks
fork_counter << true
life_line[key] = {}
life_line[key][:r], life_line[key][:w] = IO.pipe
comm_line[key] = {}
comm_line[key][:r], comm_line[key][:w] = IO.pipe
pid = fork {
life_line[key][:w].close
comm_line[key][:r].close
Thread.new {
begin
life_line[key][:r].read
rescue SignalException, SystemExit => e
raise e
rescue Exception => e
Kernel.exit
end
}
Marshal.dump(yield(object), comm_line[key][:w]) # return yield
}
waiter_threads << Thread.new {
Process.wait(pid)
comm_line[key][:w].close
reply = Marshal.load(comm_line[key][:r])
# process reply here
comm_line[key][:r].close
life_line[key][:r].close
life_line[key][:w].close
life_line[key] = nil
fork_counter.pop
}
end
end
waiter_threads.each { |k| k.join } # wait for all threads to finish
end
end

The bug was this:
A pipe can handle only a certain amount of data (e.g. 64 KB).
Once you write more than that, the Pipe will get "stuck" forever.
An easy solution is to read the pipe in a thread before you start writing to it.
comm_line = IO.pipe
# Buffered Pipe Reading (in case bigger than 64 KB)
reply = ""
read_buffer = Thread.new {
while !comm_line[0].eof?
reply = Marshal.load(comm_line[0])
end
}
child_pid = fork {
comm_line[0].close
comm_line[0].write "HUGE DATA LARGER THAN 64 KB"
}
Process.wait(child_pid)
comm_line[1].close
read_buffer.join
comm_line[0].close
puts reply # outputs the "HUGE DATA"

I don't think the problem is with Marshal. The more obvious one seems to be that your fork may finish execution before the waiter thread gets to it (leading to the latter to wait forever).
Try changing Process.wait(pid) to Process.wait(pid, Process::WNOHANG). The Process::WNOHANG flag instructs Ruby to not hang if there are no children (matching the given PID, if any) available. Note that this may not be available on all platforms but at the very least should work on Linux.
There's a number of other potential problems with your code but if you just came up with it "as an exercise", they probably don't matter. For example, Marshal.load does not like to encounter EOFs, so I'd probably guard against those by saying something like Marshal.load(comm_line[key][:r]) unless comm_line[key][:r].eof? or loop until comm_line[key][:r].eof? if you expect there to be several objects to be read.

Related

Ruby Parallel each loop

I have a the following code:
FTP ... do |ftp|
files.each do |file|
...
ftp.put(file)
sleep 1
end
end
I'd like to run the each file in a separate thread or some parallel way. What's the correct way to do this? Would this be right?
Here's my try on the parallel gem
FTP ... do |ftp|
Parallel.map(files) do |file|
...
ftp.put(file)
sleep 1
end
end
The issue with parallel is puts/outputs can occur at the same time like so:
as = [1,2,3,4,5,6,7,8]
results = Parallel.map(as) do |a|
puts a
end
How can I force puts to occur like they normally would line separated.
The whole point of parallelization is to run at the same time. But if there's some part of the process that you'd like to run some of the code sequentially you could use a mutex like:
semaphore = Mutex.new
as = [1,2,3,4,5,6,7,8]
results = Parallel.map(as, in_threads: 3) do |a|
# Parallel stuff
sleep rand
semaphore.synchronize {
# Sequential stuff
puts a
}
# Parallel stuff
sleep rand
end
You'll see that it prints stuff correctly but not necesarily in the same order. I used in_threads instead of in_processes (default) because Mutex doesn't work with processes. See below for an alternative if you do need processes.
References:
http://ruby-doc.org/core-2.2.0/Mutex.html
http://dev.housetrip.com/2014/01/28/efficient-cross-processing-locking-in-ruby/
In the interest of keeping it simple, here's what I'd do with built-in Thread:
results = files.map do |file|
result = Thread.new do
ftp.put(file)
end
end
Note that this code assumes that ftp.put(file) returns safely. If that isn't guaranteed, you'll have to do that yourself by wrapping calls in a timeout block and have each thread return an exception if one is thrown and then at the very end of the loop have a blocking check to see that results does not contain any exceptions.

Is $SAFE = 4 and a timed execution limit enough to prevent eval's security vulnerabilities in Ruby?

Here is my current implementation of a safe eval in Ruby:
$mthread = Thread.new {}
class SafeEval
def self.safeEval code
$killed = false
$mthread = Thread.new {
$SAFE = 4
result = begin
eval code
rescue Exception => e
"Error in eval: #{e}"
end
Thread.current[:evalResult] = result
}
Thread.new {
sleep 3
if $mthread.alive?
$killed = true
Thread.kill $mthread
end
}.join
$mthread.join
$killed ? 'Error in eval: Maximum execution time reached' : String($mthread[:evalResult])
end
end
It uses $SAFE = 4. From my understanding, and from this post I've read, that's not enough to stop security vulnerabilities. However, if I set a maximum execution time, and kill the thread running the code after the time expires, is that enough for a safe eval?
If not, why isn't it safe? Are there still any vulnerabilites? Is there any way to prevent these vulnerabilities as well?
Of course setting an execution time is not secure. All you're doing then is making the execution path of whatever is executed less predictable.
Security is not about saying 'Oh, no untrusted code can cause trouble if it runs for less than 4s'. Security starts with not letting untrusted code execute anywhere outside of a strict sandboxed environment.
Why are you using eval here? What are you trying to accomplish?
edit- I'm an idiot, ignore, I read that as a timeout, not as a level. :P That said, this works perfectly well on my local machine:
$mthread = Thread.new {}
class SafeEval
def self.safeEval code
$killed = false
$mthread = Thread.new {
$SAFE = 4
result = begin
eval code
rescue Exception => e
"Error in eval: #{e}"
end
Thread.current[:evalResult] = result
}
Thread.new {
sleep 3
if $mthread.alive?
$killed = true
Thread.kill $mthread
end
}.join
$mthread.join
$killed ? 'Error in eval: Maximum execution time reached' : String($mthread[:evalResult])
end
end
SafeEval.safeEval("`cat /etc/passwd > /Users/usr/development/source/tests/test.txt`")
run that code on a web server that has a mail client or other method of connecting to remote servers, and an attacker can establish the user accounts on your machine and from there engage in social engineering to recover passwords.
Sandboxing is important because it prevents stuff like the above. $SAFE is not enough in and of itself, and this is one of the reasons you never put something like eval() or anything else whose core job is to execute untrusted code in an environment that could be reached by an attacker.
If you consider 'being able to kill the bot' as security vulnerability, then $SAFE = 4 is not safe enough, as we found out while testing it.
People can execute this, without getting the 'unsafe eval' error:
loop { Thread.start { loop{} } }
This starts many threads within 3 seconds, and after enough executions this will have created lots and lots of threads, which has killed the bot while testing.
Or this:
Thread.start { loop { Thread.start { loop {} } } }
It starts a thread which keeps generating other threads. The timeout does not stop this.

Ruby 192 recursive thread lock error

I am working with ruby 192 p290: under one unit-test script (shown below) throws ThreadError
1) Error:
test_orchpr_pass(TC_MyTest):
ThreadError: deadlock; recursive locking
internal:prelude:8:in `lock'
internal:prelude:8:in `synchronize'
testth.rb:121:in `orchpr_run'
testth.rb:158:in `test_orchpr_pass'
With ruby 187 gives error: Thread tried to join itself.
CODE
def orchpr_run(timeout = 60)
# used by the update function to signal that a final update was
# received from all clients
#update_mutex.lock
# required since we'll have to act as an observer to the DRb server
DRb.start_service
# get configuration objects
run_config_type = DataLayer.get_run_config_type
client_daemon = DataLayer.get_client_daemon_by_branch (run_config_type, #branch)
client_daemon['port_no'] = 9096
#get the servers for this client_daemon
servers = DataLayer.get_servers(run_config_type, client_daemon.id)
servers.each { |server| #pr[server.host_name] = OrchestratedPlatformRun.new(run_config_type, server, timeout)
}
#pr.each_value { |x| x.add_observer(self)
#pr.each_value { |x| x.start(#service_command_pass, true)
# wait for update to receive notifications from all servers # this is the statement causing error:
#update_mutex.synchronize {} end
Another piece of code throwing same error:
require "thread"
require "timeout"
def calc_fib(n)
if n == 0
0
elsif n == 1
1
else
calc_fib(n-1) + calc_fib(n-2)
end
end
lock = Mutex.new
threads = 20.times.collect do
Thread.new do
20.times do
begin
Timeout.timeout(0.25) do
lock.synchronize{ calc_fib(1000) }
end
rescue ThreadError => e
puts "#{e.class}: #{e.message}:\n" + e.backtrace.join("\n") + "\n\n"
rescue Timeout::Error => e
#puts e.class
nil
end
end
end
end
threads.each{ |t| t.join }
Commenting synchronizing Block will cause the error to disappear but then then threads are not able to synchronize. I found some stuff on net, saying bug with ruby 192 , need changes in file prelude.rb and thread.c regarding MUTEX synchronization.
But Under windows installation Unable to find file prelude.rb
If a mutex is locked by a thread then an error will be raised if you try and lock it again from the same thread
This is exactly what you are doing, since synchronize is just a convenience for method for locking the mutex, yielding to the block and then releasing the lock. I'm not sure what you're trying to do, but it feels to me like you might be trying to use mutexes for something other than their intended purposes.
Using threads and locks well is difficult to get right - you might want to look at celluloid for a different approach to concurrency.

Thread and Queue

I am interested in knowing what would be the best way to implement a thread based queue.
For example:
I have 10 actions which I want to execute with only 4 threads. I would like to create a queue with all the 10 actions placed linearly and start the first 4 action with 4 threads, once one of the thread is done executing, the next one will start etc - So at a time, the number of thread is either 4 or less than 4.
There is a Queue class in thread in the standard library. Using that you can do something like this:
require 'thread'
queue = Queue.new
threads = []
# add work to the queue
queue << work_unit
4.times do
threads << Thread.new do
# loop until there are no more things to do
until queue.empty?
# pop with the non-blocking flag set, this raises
# an exception if the queue is empty, in which case
# work_unit will be set to nil
work_unit = queue.pop(true) rescue nil
if work_unit
# do work
end
end
# when there is no more work, the thread will stop
end
end
# wait until all threads have completed processing
threads.each { |t| t.join }
The reason I pop with the non-blocking flag is that between the until queue.empty? and the pop another thread may have pop'ed the queue, so unless the non-blocking flag is set we could get stuck at that line forever.
If you're using MRI, the default Ruby interpreter, bear in mind that threads will not be absolutely concurrent. If your work is CPU bound you may just as well run single threaded. If you have some operation that blocks on IO you may get some parallelism, but YMMV. Alternatively, you can use an interpreter that allows full concurrency, such as jRuby or Rubinius.
There area a few gems that implement this pattern for you; parallel, peach,and mine is called threach (or jruby_threach under jruby). It's a drop-in replacement for #each but allows you to specify how many threads to run with, using a SizedQueue underneath to keep things from spiraling out of control.
So...
(1..10).threach(4) {|i| do_my_work(i) }
Not pushing my own stuff; there are plenty of good implementations out there to make things easier.
If you're using JRuby, jruby_threach is a much better implementation -- Java just offers a much richer set of threading primatives and data structures to use.
Executable descriptive example:
require 'thread'
p tasks = [
{:file => 'task1'},
{:file => 'task2'},
{:file => 'task3'},
{:file => 'task4'},
{:file => 'task5'}
]
tasks_queue = Queue.new
tasks.each {|task| tasks_queue << task}
# run workers
workers_count = 3
workers = []
workers_count.times do |n|
workers << Thread.new(n+1) do |my_n|
while (task = tasks_queue.shift(true) rescue nil) do
delay = rand(0)
sleep delay
task[:result] = "done by worker ##{my_n} (in #{delay})"
p task
end
end
end
# wait for all threads
workers.each(&:join)
# output results
puts "all done"
p tasks
You could use a thread pool. It's a fairly common pattern for this type of problem.
http://en.wikipedia.org/wiki/Thread_pool_pattern
Github seems to have a few implementations you could try out:
https://github.com/search?type=Everything&language=Ruby&q=thread+pool
Celluloid have a worker pool example that does this.
I use a gem called work_queue. Its really practic.
Example:
require 'work_queue'
wq = WorkQueue.new 4, 10
(1..10).each do |number|
wq.enqueue_b("Thread#{number}") do |thread_name|
puts "Hello from the #{thread_name}"
end
end
wq.join

Limiting concurrent threads

I'm using threads in a program that uploads files over sftp. The number of files that could be upload can potentially be very large or very small. I'd like to be able to have 5 or less simultaneous uploads, and if there's more have them wait. My understanding is usually a conditional variable would be used for this, but it looks to me like that would only allow for 1 thread at a time.
cv = ConditionVariable.new
t2 = Thread.new {
mutex.synchronize {
cv.wait(mutex)
upload(file)
cv.signal
}
}
I think that should tell it to wait for the cv to be available the release it when done. My question is how can I do this allowing more than 1 at a time while still limiting the number?
edit: I'm using Ruby 1.8.7 on Windows from the 1 click installer
Use a ThreadPool instead. See Deadlock in ThreadPool (the accepted answer, specifically).
A word of caution -- there is no real concurrency in Ruby unless you are using JRuby. Also, exception in thread will freeze main loop unless you are in debug mode.
require "thread"
POOL_SIZE = 5
items_to_process = (0..100).to_a
message_queue = Queue.new
start_thread =
lambda do
Thread.new(items_to_process.shift) do |i|
puts "Processing #{i}"
message_queue.push(:done)
end
end
items_left = items_to_process.length
[items_left, POOL_SIZE].min.times do
start_thread[]
end
while items_left > 0
message_queue.pop
items_left -= 1
start_thread[] unless items_left < POOL_SIZE
end

Resources