Ruby parallel process in map - ruby

Help me plz
How i can implement method pmap for Array like map but in two process
I have code
class Array
def pmap
out = []
each do |e|
out << yield(e)
end
out
end
end
require 'benchmark'
seconds = Benchmark.realtime do
[1, 2, 3].pmap do |x|
sleep x
puts x**x
end
end
puts "work #{seconds} seconds"
In result i must get 3 second for benchmark

To get absolutely 2 forks
You don't absolutely need RPC. Marshal + Pipe should usually work.
class Array
def pmap
first, last = self[0..(self.length/2)], self[(self.length/2+1)..-1]
pipes = [first, last].map do |array|
read, write = IO.pipe
fork do
read.close
message = []
array.each do |item|
message << yield(item)
end
write.write(Marshal.dump message)
write.close
end
write.close
read
end
Process.waitall
first_out, last_out = pipes.map do |read|
Marshal.load(read.read)
end
first_out + last_out
end
end
Edit
Now using fork

Try the parallel gem.
require 'parallel'
class Array
def pmap(&blk)
Parallel.map(self, {:in_processes: 3}, &blk)
end
end

Related

Refactoring out a for loop ruby

I am having problems refactoring out some duplicated code from two methods sharing a for loop. The two methods with the duplicated code are gcdOfFiveUpToFive and remainderStepsUpToFive. The two loops share in common setting instance variable #m to 5 and the both use a for x in 1..5 loop and then set #n to x as well as both of them need to call euclidGCD although one calls euclidGCD for its return value and the other to add +=1 to the #count variable. I do want want to return 2 values from one method. I guess I could make a 4th instance variable called #countArray and get an array of the remainder step count.
require 'minitest/autorun'
class GCDTest < Minitest::Test
def test_euclid_gcd
gcdTestObject=GCD.new(20,5)
assert gcdTestObject.euclidGcd==5
assert gcdTestObject.gcdRemainderSteps==1
end
def test_euclid_two
gcdTestObject=GCD.new(13,8)
assert gcdTestObject.euclidGcd==1
assert gcdTestObject.gcdRemainderSteps==5
end
def test_euclid_loop
gcdTestObject=GCD.new(0,0)
assert gcdTestObject.gcdOfFiveUpToFive==[1,1,1,1,5]
end
def test_count_of_loop
gcdTestObject=GCD.new(0,0)
assert gcdTestObject.remainderStepsUpToFive==[1,2,3,2,1]
end
end
class GCD
attr_accessor :m,:n
attr_reader :count
def initialize(m,n)
#m=m
#n=n
#count=0
end
def euclidGcd
#count=1
m=#m
n=#n
r= m % n
until r==0
m=n
n=r
r= m % n
#count+=1
end
return n
end
def gcdRemainderSteps
return #count
end
def gcdOfFiveUpToFive
#m=5
gcdArrayUpToFive=[]
for x in 1..5
#n=x
gcdArrayUpToFive << euclidGcd
end
return gcdArrayUpToFive
end
def remainderStepsUpToFive
#m=5
gcdStepArrayUpToFive=[]
for x in 1..5
#n=x
euclidGcd
gcdStepArrayUpToFive << gcdRemainderSteps
end
return gcdStepArrayUpToFive
end
def fiveLoopExtraction
end
Code that repeats itself is this:
array=[]
for x in 1..5
# result = do something with x
array << result
end
return array
That is exactly what map function does.
What does the "map" method do in Ruby?
Ruby methods names should be snake_case. Lets refactor this to use proper naming convention and map function.
def gcd_of_five_up_to_five
#m=5
(1..5).map do |x|
#n = x
# in ruby you don't have to write return
# value of last expression is returned automatically
euclid_gcd
end
end
def remainder_steps_up_to_five
#m=5
(1..5).map do |x|
#n = x
euclid_gcd
gcd_remainder_steps
end
end
I'd call it with params instead of using #m and #n. That would simplify the code. If you change euclid_gcd to this: def euclid_gcd(m:, n:) you'd get this:
def gcd_of_5_up_to_5
(1..5).map { |x| euclid_gcd(m: 5, n: x) }
end
def remainder_steps_up_to_5
(1..5).map do |x|
euclid_gcd(m: 5, n: x)
gcd_remainder_steps
end
end
Seems like this needs little or no further refactoring.

How to create a memory efficient Ruby Pipe class with lazy evaluation?

I would like to create a Pipe class to emulate Unix commands in Ruby in a two step fashion. First step is to compile a pipeline by adding a number of commands, and the second step is to run that pipeline. Here is a mockup:
#!/usr/bin/env ruby
p = Pipe.new
p.add(:cat, input: "table.txt")
p.add(:cut, field: 2)
p.add(:grep, pattern: "foo")
p.add(:puts, output: "result.txt")
p.run
The question is how to code this using lazy evaluation, so that the pipe is processed record by record when run() is called without loading all of the data into memory at any one time?
Take a look at the http://ruby-doc.org/core-2.0.0/Enumerator.html class. The Pipe class will stitch together an Enumerator, e.g. add(:cat, input: 'foo.txt') will create an enumerator which yields lines of foo.txt. add(:grep) will filter it according to regexp etc.
Here's the lazy file reader
require 'benchmark'
def lazy_cat(filename)
e = Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
e.lazy
end
def cat(filename)
Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
end
lazy = Benchmark.realtime { puts lazy_cat("log.txt").map{|s| s.upcase}.take(1).to_a }
puts "Lazy: #{lazy}"
eager = Benchmark.realtime { puts cat("log.txt").map{|s| s.upcase}.take(1).to_a }
puts "Eager: #{eager}"
Eager version takes 7 seconds for 10 million line file, lazy version takes pretty much no time.
For what I understood you can simply read one line at a time and move this single line thought the pipeline, then write it to the output. Some code:
output = File.new("output.txt")
File.new("input.txt").each do |line|
record = read_record(line)
newrecord = run_pipeline_on_one_record(record)
output.write(dump_record(newrecord))
end
Another much heavier option would be create actual IO blocking pipes and use one thread for each task in the pipeline. This somewhat reassembles what Unix does.
Sample usage with OP's syntax:
class Pipe
def initialize
#actions = []
end
def add(&block)
#actions << block
end
def run(infile, outfile)
output = File.open(outfile, "w")
File.open(infile).each do |line|
line.chomp!
#actions.each {|act| line = act[line] }
output.write(line+"\n")
end
end
end
p = Pipe.new
p.add {|line| line.size.to_s }
p.add {|line| "number of chars: #{line}" }
p.run("in.txt", "out.txt")
Sample in.txt:
aaa
12345
h
Generated out.txt:
number of chars: 3
number of chars: 5
number of chars: 1
This seems to work:
#!/usr/bin/env ruby
require 'pp'
class Pipe
def initialize
#commands = []
end
def add(command, options = {})
#commands << [command, options]
self
end
def run
enum = nil
#commands.each do |command, options|
enum = method(command).call enum, options
end
enum.each {}
enum
end
def to_s
cmd_string = "Pipe.new"
#commands.each do |command, options|
opt_list = []
options.each do |key, value|
if value.is_a? String
opt_list << "#{key}: \"#{value}\""
else
opt_list << "#{key}: #{value}"
end
end
cmd_string << ".add(:#{command}, #{opt_list.join(", ")})"
end
cmd_string << ".run"
end
private
def cat(enum, options)
Enumerator.new do |yielder|
enum.map { |line| yielder << line } if enum
File.open(options[:input]) do |ios|
ios.each { |line| yielder << line }
end
end.lazy
end
def cut(enum, options)
Enumerator.new do |yielder|
enum.each do |line|
fields = line.chomp.split(%r{#{options[:delimiter]}})
yielder << fields[options[:field]]
end
end.lazy
end
def grep(enum, options)
Enumerator.new do |yielder|
enum.each do |line|
yielder << line if line.match(options[:pattern])
end
end.lazy
end
def save(enum, options)
Enumerator.new do |yielder|
File.open(options[:output], 'w') do |ios|
enum.each do |line|
ios.puts line
yielder << line
end
end
end.lazy
end
end
p = Pipe.new
p.add(:cat, input: "table.txt")
p.add(:cut, field: 2, delimiter: ',\s*')
p.add(:grep, pattern: "4")
p.add(:save, output: "result.txt")
p.run
puts p
https://stackoverflow.com/a/20049201/3183101
require 'benchmark'
def lazy_cat(filename)
e = Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
e.lazy
end
def cat(filename)
Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
end
lazy = Benchmark.realtime { puts lazy_cat("log.txt").map{|s| s.upcase}.take(1).to_a }
puts "Lazy: #{lazy}"
eager = Benchmark.realtime { puts cat("log.txt").map{|s| s.upcase}.take(1).to_a }
puts "Eager: #{eager}"
This could have been simplified to the following, which I think makes the diff between the two methods easier to see.
require 'benchmark'
def cat(filename, evaluation_strategy: :eager)
e = Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
e.lazy if evaluation_strategy == :lazy
end
lazy = Benchmark.realtime { puts cat("log.txt", evaluation_strategy: :lazy).map{ |s|
s.upcase}.take(1).to_a
}
puts "Lazy: #{lazy}"
eager = Benchmark.realtime { puts cat("log.txt", evaluation_strategy: :eager).map{ |s|
s.upcase}.take(1).to_a
}
puts "Eager: #{eager}"
I would have just put this in a comment, but I'm too 'green' here to be permitted to do so. Anyway, the ability to post all of the code I think makes it clearer.
This builds on previous answers, and serves as a warning about a gotcha regarding enumerators. An enumerator that hasn't been exhausted (i.e. raised StopIteration) will not run ensure blocks. That means a construct like File.open { } won't clean up after itself.
Example:
def lazy_cat(filename)
f = nil # visible to the define_singleton_method block
e = Enumerator.new do |yielder|
# Also stored in #f for demonstration purposes only, so we examine it later
#f = f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
e.lazy.tap do |enum|
# Provide a finish method to close the File
# We can't use def enum.finish because it can't see 'f'
enum.define_singleton_method(:finish) do
f.close
end
end
end
def get_first_line(path)
enum = lazy_cat(path)
enum.take(1).to_a
end
def get_first_line_with_finish(path)
enum = lazy_cat(path)
enum.take(1).to_a
ensure
enum.finish
end
# foo.txt contains:
# abc
# def
# ghi
puts "Without finish"
p get_first_line('foo.txt')
if #f.closed?
puts "OK: handle was closed"
else
puts "FAIL: handle not closed!"
#f.close
end
puts
puts "With finish"
p get_first_line_with_finish('foo.txt')
if #f.closed?
puts "OK: handle was closed"
else
puts "FAIL: handle not closed!"
#f.close
end
Running this produces:
Without finish
["abc\n"]
FAIL: handle not closed!
With finish
["abc\n"]
OK: handle was closed
Note that if you don't provide the finish method, the stream won't be closed, and you'll leak file descriptors. It's possible that GC will close it, but you shouldn't depend on that.

ruby Thread#allocate TypeError

I was looking in detail at the Thread class. Basically, I was looking for an elegant mechanism to allow thread-local variables to be inherited as threads are created. For example the functionality I am looking to create would ensure that
Thread.new do
self[:foo]="bar"
t1=Thread.new { puts self[:foo] }
end
=> "bar"
i.e. a Thread would inherit it's calling thread's thread-local variables
So I hit upon the idea of redefining Thread.new, so that I could add an extra step to copy the thread-local variables into the new thread from the current thread. Something like this:
class Thread
def self.another_new(*args)
o=allocate
o.send(:initialize, *args)
Thread.current.keys.each{ |k| o[k]=Thread.current[k] }
o
end
end
But when I try this I get the following error:
:in `allocate': allocator undefined for Thread (TypeError)
I thought that as Thread is a subclass of Object, it should have a working #allocate method. Is this not the case?
Does anyone have any deep insight on this, and on how to achieve the functionality I am looking for.
Thanks in advance
Steve
Thread.new do
Thread.current[:foo]="bar"
t1=Thread.new(Thread.current) do |parent|
puts parent[:foo] ? parent[:foo] : 'nothing'
end.join
end.join
#=> bar
UPDATED:
Try this in irb:
thread_ext.rb
class Thread
def self.another_new(*args)
parent = Thread.current
a = Thread.new(parent) do |parent|
parent.keys.each{ |k| Thread.current[k] = parent[k] }
yield
end
a
end
end
use_case.rb
A = Thread.new do
Thread.current[:local_a]="A"
B1 =Thread.another_new do
C1 = Thread.another_new{p Thread.current[:local_a] }.join
end
B2 =Thread.another_new do
C2 = Thread.another_new{p Thread.current[:local_a] }.join
end
[B1, B2].each{|b| b.join }
end.join
output
"A"
"A"
Here is a revised answer based on #CodeGroover's suggestion, with a simple unit test harness
ext/thread.rb
class Thread
def self.inherit(*args, &block)
parent = Thread.current
t = Thread.new(parent, *args) do |parent|
parent.keys.each{ |k| Thread.current[k] = parent[k] }
yield *args
end
t
end
end
test/thread.rb
require 'test/unit'
require 'ext/thread'
class ThreadTest < Test::Unit::TestCase
def test_inherit
Thread.current[:foo]=1
m=Mutex.new
#check basic inheritence
t1= Thread.inherit do
assert_equal(1, Thread.current[:foo])
end
#check inheritence with parameters - in this case a mutex
t2= Thread.inherit(m) do |m|
assert_not_nil(m)
m.synchronize{ Thread.current[:bar]=2 }
assert_equal(1, Thread.current[:foo])
assert_equal(2, Thread.current[:bar])
sleep 0.1
end
#ensure t2 runs its mutexs-synchronized block first
sleep 0.05
#check that the inheritence works downwards only - not back up in reverse
m.synchronize do
assert_nil(Thread.current[:bar])
end
[t1,t2].each{|x| x.join }
end
end
I was looking for the same thing recently and was able to come up with the following answer. Note I am aware the following is a hack and not recommended, but for the sake of answering the specific question on how you could alter the Thread.new functionality, I have done as following:
class Thread
class << self
alias :original_new :new
def new(*args, **options, &block)
original_thread = Thread.current
instance = original_new(*args, **options, &block)
original_thread.keys.each do |key|
instance[key] = original_thread[key]
end
instance
end
end
end

Lazy evaluation of infinite enumerator in Ruby 1.9 - calling instance method on object from different class

I'm trying to get to grips with Lazy Evaluation of an enumerator using Ruby 1.9. This is work in progress so will probably have other bugs/missing code but I have one specific problem right now. I'm trying to pass this test (note I cannot change the test):
def test_enumerating_with_a_single_enumerator
enumerator = SomeClass.new(some_infinite_sequence.to_enum)
assert_equal [1, 2, 3, 4, 5], enumerator.take(5)
end
I've written this code below and I know the problem is that I'm calling the lazy_select instance method from the SomeClass on the argument from the initialize method which is an instance of the Enumerator class, so I get a NoMethodError. Any suggestions? Thank you.
class SomeClass < Enumerator
def initialize(*enumerators)
super() do |yielder|
enumerators.each do |enumerator|
enumerator.lazy_select { |yielder, first_value, second_value| yielder.yield first_value if (first_value <=> second_value) <= 0 }
.first(20)
end
end
end
def lazy_select(&block)
self.class.new do |yielder|
each_cons(2) do |first_value, second_value|
block.call(yielder, first_value, second_value)
end
end
end
end
I have one specific problem right now. I'm trying to pass this test
(note I cannot change the test):
def test_enumerating_with_a_single_enumerator
enumerator = SomeClass.new(some_infinite_sequence.to_enum)
assert_equal [1, 2, 3, 4, 5], enumerator.take(5)
end
class SomeClass < Enumerator
def initialize(enum, &block)
super() do |y|
begin
enum.each do |val|
if block
block.call(y, val) #while initializing sc2 in Line B execution takes this branch
else
y << val #while initializing sc1 from Line A execution halts here
end
end
rescue StopIteration
end
end
end
def lazy_take(n)
taken = 0
SomeClass.new(self) do |y, val| #Line B
if taken < n
y << val
taken += 1
else
raise StopIteration
end
end
end
def take(n)
lazy_take(n).to_a
end
end
sc1 = SomeClass.new( (1..6).cycle ) #Line A
p sc1.take(10)
--output:--
[1, 2, 3, 4, 5, 6, 1, 2, 3, 4]
sc2 is the name I'm giving to the anonymous instance created inside lazy_take().
The code is very difficult to understand. The code sets things up so that sc1's enumerator is cycle, and sc2's enumerator is sc1 (initialize() requires that the first arg be an Enumerator). When sc1 is initialized, the code starts stepping through the values in cycle and halts at this line:
y << val
Then when lazy_take() is called, sc2 is created, and its initialization code starts stepping through the values in sc1. But there are no values in sc1, so sc1 executes the line:
y << val
to inject a value from cycle into sc1's yielder. Then sc1's yielder immediately yields the val to sc2--because in sc2's code the each() method is demanding a value from sc1. sc2 then takes val and injects it into sc2's yielder. Then the next iteration of the each block in sc2 takes place, and once again sc2's code demands a value from sc1. sc2 repeatedly demands a value from sc1, which causes sc1 to pass on a value retrieved from cycle. Once sc2 runs the loop n times, it stops demanding values from sc1. The next step is to make sc2 give up the values in it's yielder.
If you prefer, you can define initialize() like this:
def initialize(enum)
super() do |y|
begin
enum.each do |val|
if block_given?
yield y, val #while initializing sc2 in Line B execution takes this branch
else
y << val #while initializing sc1 from Line A execution halts here
end
end
rescue StopIteration
end
end
end
That shows that you do not have to specify a block parameter and explicitly call() the block. Instead, you can dispense with the block parameter and call yield(), and the values will be sent to the block automatically.
Thanks for the comments received above. They were very helpful. I've managed to solve it as follows:
class SomeClass < Enumerator
class SomeOtherClass < RuntimeError
attr_reader :enumerator
def initialize(enumerator)
#enumerator = enumerator
end
end
def initialize(*enumerators)
super() do |yielder|
values = []
enumerators.each do |enumerator|
values.push lazy_select(enumerator) { |value| sorted? enumerator }.take(#number_to_take)
end
values.flatten.sort.each { |value| yielder.yield value }
end
end
def lazy_select(enumerator, &block)
Enumerator.new do |yielder|
enumerator.each do |value|
yielder.yield value if block.call enumerator
end
end
end
def sorted?(enumerator)
sorted = enumerator.each_cons(2).take(#number_to_take).all? { |value_pair| compare value_pair }
sorted || raise(SomeClass::SomeOtherClass, enumerator)
end
def compare(pair)
pair.first <= pair.last
end
def take(n)
#number_to_take = n
super
end
end
This passes all my tests.

Mocking an object method within a Thread?

In the situation below the #crawl object DOES RECEIVE the crawl call, but the method mock fails ie: the method is not mocked.
Does Thread somehow create its own copy of the #crawl object escaping the mock?
#crawl.should_receive(:crawl).with(an_instance_of(String)).twice.and_return(nil)
threads = #crawl.create_threads
thread creation code:
def crawl(uri)
dosomecrawling
end
def create_threads
(1..5).each do
Thread.new do
crawl(someurifeedingmethod)
end
end
end
It does not appear from the code posted that you are joining the threads. If so, there is a race condition: Sometimes the test will execute with some or all of the threads not having done their job; The fix is along these lines:
!/usr/bin/ruby1.9
class Crawler
def crawl(uri)
dosomecrawling
end
def create_threads
#threads = (1..5).collect do
Thread.new do
crawl(someurifeedingmethod)
end
end
end
def join
#threads.each do |thread|
thread.join
end
end
end
describe "the above code" do
it "should crawl five times" do
crawler = Crawler.new
uri = "uri"
crawler.should_receive(:someurifeedingmethod).with(no_args).exactly(5).times.and_return(uri)
crawler.should_receive(:crawl).with(uri).exactly(5).times
crawler.create_threads
crawler.join
end
end
This code works perfectly.
You can add 5 times the expects.
class Hello
def crawl(uri)
puts uri
end
def create_threads
(1..5).each do
Thread.new do
crawl('http://hello')
end
end
end
end
describe 'somting' do
it 'should mock' do
crawl = Hello.new
5.times do
crawl.should_receive(:crawl).with(an_instance_of(String)).and_return(nil)
end
threads = crawl.create_threads
end
end

Resources