I am trying to emulate UNIX command line pipes in a Ruby-only solution that uses multiple cores. Eventually, the records piped from command to command will be Ruby objects marshaled using msgpack. Unfortunately, the below code hangs after the first dump command. I suspect a pipe deadlock, but I fail to resolve it?
#!/usr/bin/env ruby
require 'parallel'
require 'msgpack'
require 'pp'
class Pipe
def initialize
#commands = []
end
def add(command, options = {})
#commands << Command.new(command, options)
self
end
def run
#commands.each_cons(2) do |c_in, c_out|
reader, writer = IO.pipe
c_out.input = MessagePack::Unpacker.new(reader)
c_in.output = MessagePack::Packer.new(writer)
end
Parallel.each(#commands, in_processes: #commands.size) { |command| command.run }
self
end
class Command
attr_accessor :input, :output
def initialize(command, options)
#command = command
#options = options
#input = nil
#output = nil
end
def run
send #command
end
def cat
#input.each_with_index { |record, i| #output.write(record).flush } if #input
File.open(#options[:input]) do |ios|
ios.each { |record| #output.write(record).flush } if #output
end
end
def dump
#input.each do |record|
puts record
#output.write(record).flush if #output
end
end
end
end
p = Pipe.new
p.add(:cat, input: "foo.tab").add(:dump).add(:cat, input: "table.txt").add(:dump)
p.run
Related
I am writing a Ruby script which loops through some hashes and executes the command in the hash, like so
$conf_info = {
"OpenSSH" => {
"type" => "static",
"cmd" => %q[sshd -T],
"msg" => "Testing OpenSSH Configuration",
"res" => $cmd_results,
}
}
I have multiple of these hashes and I loop through all of them, executing each command. I am also using threads. The problem is that I can't output to a file. I have separate threads like this.
threads = []
get_enums($options[:enumerate])
threads << Thread.new {_run_($sys_info , $options[:system] , $enum_sys) }
threads << Thread.new {_run_($net_info , $options[:network] , $enum_net) }
threads.each {|t| t.join}
and I output to file like this
File.open("test_file", "w") do |file|
file.puts __start__
end
but, the file is only filled with contents like this
#<Thread:0x98993fc>
#<Thread:0x9872fcc>
and not the actual output of the program. I would also need the program to output to STDOUT and the file, may someone please tell me what I might be doing wrong?
You can't capture output of __start__ by assigning it to a variable. You can redirect output instead:
With TeeIO (Updated based from this answer too):
class TeeIO
def initialize(*ios)
ios.each{ |e| raise ArgumentError, "Not an IO object: #{e}" if not e.is_a? IO }
#ios = ios
end
def write(data)
#ios.each{ |io| io.write(data) }
end
def close
#ios.each{ |io| io.close }
end
def flush
#ios.each{ |io| io.flush }
end
def method_missing(meth, *args)
# Return first if it's not an IO or else return self.
first = #ios.map{ |io| io.send(meth, *args) }.first
first.is_a? IO ? self : first
end
def respond_to_missing?(meth, include_all)
#ios.all?{ |io| io.respond_to?(meth, include_all) }
end
end
...
real_stdout = $stdout
file = File.open("test_file", "w")
$stdout = TeeIO.new(real_stdout, file)
__start__
$stdout = real_stdout
file.close
I would like to create a Pipe class to emulate Unix commands in Ruby in a two step fashion. First step is to compile a pipeline by adding a number of commands, and the second step is to run that pipeline. Here is a mockup:
#!/usr/bin/env ruby
p = Pipe.new
p.add(:cat, input: "table.txt")
p.add(:cut, field: 2)
p.add(:grep, pattern: "foo")
p.add(:puts, output: "result.txt")
p.run
The question is how to code this using lazy evaluation, so that the pipe is processed record by record when run() is called without loading all of the data into memory at any one time?
Take a look at the http://ruby-doc.org/core-2.0.0/Enumerator.html class. The Pipe class will stitch together an Enumerator, e.g. add(:cat, input: 'foo.txt') will create an enumerator which yields lines of foo.txt. add(:grep) will filter it according to regexp etc.
Here's the lazy file reader
require 'benchmark'
def lazy_cat(filename)
e = Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
e.lazy
end
def cat(filename)
Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
end
lazy = Benchmark.realtime { puts lazy_cat("log.txt").map{|s| s.upcase}.take(1).to_a }
puts "Lazy: #{lazy}"
eager = Benchmark.realtime { puts cat("log.txt").map{|s| s.upcase}.take(1).to_a }
puts "Eager: #{eager}"
Eager version takes 7 seconds for 10 million line file, lazy version takes pretty much no time.
For what I understood you can simply read one line at a time and move this single line thought the pipeline, then write it to the output. Some code:
output = File.new("output.txt")
File.new("input.txt").each do |line|
record = read_record(line)
newrecord = run_pipeline_on_one_record(record)
output.write(dump_record(newrecord))
end
Another much heavier option would be create actual IO blocking pipes and use one thread for each task in the pipeline. This somewhat reassembles what Unix does.
Sample usage with OP's syntax:
class Pipe
def initialize
#actions = []
end
def add(&block)
#actions << block
end
def run(infile, outfile)
output = File.open(outfile, "w")
File.open(infile).each do |line|
line.chomp!
#actions.each {|act| line = act[line] }
output.write(line+"\n")
end
end
end
p = Pipe.new
p.add {|line| line.size.to_s }
p.add {|line| "number of chars: #{line}" }
p.run("in.txt", "out.txt")
Sample in.txt:
aaa
12345
h
Generated out.txt:
number of chars: 3
number of chars: 5
number of chars: 1
This seems to work:
#!/usr/bin/env ruby
require 'pp'
class Pipe
def initialize
#commands = []
end
def add(command, options = {})
#commands << [command, options]
self
end
def run
enum = nil
#commands.each do |command, options|
enum = method(command).call enum, options
end
enum.each {}
enum
end
def to_s
cmd_string = "Pipe.new"
#commands.each do |command, options|
opt_list = []
options.each do |key, value|
if value.is_a? String
opt_list << "#{key}: \"#{value}\""
else
opt_list << "#{key}: #{value}"
end
end
cmd_string << ".add(:#{command}, #{opt_list.join(", ")})"
end
cmd_string << ".run"
end
private
def cat(enum, options)
Enumerator.new do |yielder|
enum.map { |line| yielder << line } if enum
File.open(options[:input]) do |ios|
ios.each { |line| yielder << line }
end
end.lazy
end
def cut(enum, options)
Enumerator.new do |yielder|
enum.each do |line|
fields = line.chomp.split(%r{#{options[:delimiter]}})
yielder << fields[options[:field]]
end
end.lazy
end
def grep(enum, options)
Enumerator.new do |yielder|
enum.each do |line|
yielder << line if line.match(options[:pattern])
end
end.lazy
end
def save(enum, options)
Enumerator.new do |yielder|
File.open(options[:output], 'w') do |ios|
enum.each do |line|
ios.puts line
yielder << line
end
end
end.lazy
end
end
p = Pipe.new
p.add(:cat, input: "table.txt")
p.add(:cut, field: 2, delimiter: ',\s*')
p.add(:grep, pattern: "4")
p.add(:save, output: "result.txt")
p.run
puts p
https://stackoverflow.com/a/20049201/3183101
require 'benchmark'
def lazy_cat(filename)
e = Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
e.lazy
end
def cat(filename)
Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
end
lazy = Benchmark.realtime { puts lazy_cat("log.txt").map{|s| s.upcase}.take(1).to_a }
puts "Lazy: #{lazy}"
eager = Benchmark.realtime { puts cat("log.txt").map{|s| s.upcase}.take(1).to_a }
puts "Eager: #{eager}"
This could have been simplified to the following, which I think makes the diff between the two methods easier to see.
require 'benchmark'
def cat(filename, evaluation_strategy: :eager)
e = Enumerator.new do |yielder|
f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
e.lazy if evaluation_strategy == :lazy
end
lazy = Benchmark.realtime { puts cat("log.txt", evaluation_strategy: :lazy).map{ |s|
s.upcase}.take(1).to_a
}
puts "Lazy: #{lazy}"
eager = Benchmark.realtime { puts cat("log.txt", evaluation_strategy: :eager).map{ |s|
s.upcase}.take(1).to_a
}
puts "Eager: #{eager}"
I would have just put this in a comment, but I'm too 'green' here to be permitted to do so. Anyway, the ability to post all of the code I think makes it clearer.
This builds on previous answers, and serves as a warning about a gotcha regarding enumerators. An enumerator that hasn't been exhausted (i.e. raised StopIteration) will not run ensure blocks. That means a construct like File.open { } won't clean up after itself.
Example:
def lazy_cat(filename)
f = nil # visible to the define_singleton_method block
e = Enumerator.new do |yielder|
# Also stored in #f for demonstration purposes only, so we examine it later
#f = f = File.open filename
s = f.gets
while s
yielder.yield s
s = f.gets
end
end
e.lazy.tap do |enum|
# Provide a finish method to close the File
# We can't use def enum.finish because it can't see 'f'
enum.define_singleton_method(:finish) do
f.close
end
end
end
def get_first_line(path)
enum = lazy_cat(path)
enum.take(1).to_a
end
def get_first_line_with_finish(path)
enum = lazy_cat(path)
enum.take(1).to_a
ensure
enum.finish
end
# foo.txt contains:
# abc
# def
# ghi
puts "Without finish"
p get_first_line('foo.txt')
if #f.closed?
puts "OK: handle was closed"
else
puts "FAIL: handle not closed!"
#f.close
end
puts
puts "With finish"
p get_first_line_with_finish('foo.txt')
if #f.closed?
puts "OK: handle was closed"
else
puts "FAIL: handle not closed!"
#f.close
end
Running this produces:
Without finish
["abc\n"]
FAIL: handle not closed!
With finish
["abc\n"]
OK: handle was closed
Note that if you don't provide the finish method, the stream won't be closed, and you'll leak file descriptors. It's possible that GC will close it, but you shouldn't depend on that.
I wrote the following code:
class Actions
def initialize
#people = []
#commands = {
"ADD" => ->(name){#people << name },
"REMOVE" => ->(n=0){ puts "Goodbye" },
"OTHER" => ->(n=0){puts "Do Nothing" }
}
end
def run_command(cmd,*param)
#commands[cmd].call param if #commands.key?(cmd)
end
def people
#people
end
end
act = Actions.new
act.run_command('ADD','joe')
act.run_command('ADD','jack')
puts act.people
This works, however, when the #commands hash is a class variable, the code inside the hash doesn't know the #people array.
How can I make the #commands hash be a class variable and still be able to access the specific object instance variables?
You could use instance_exec to supply the appropriate context for the lambdas when you call them, look for the comments to see the changes:
class Actions
# Move the lambdas to a class variable, a COMMANDS constant
# would work just as well and might be more appropriate.
##commands = {
"ADD" => ->(name) { #people << name },
"REMOVE" => ->(n = 0) { puts "Goodbye" },
"OTHER" => ->(n = 0) { puts "Do Nothing" }
}
def initialize
#people = [ ]
end
def run_command(cmd, *param)
# Use instance_exec and blockify the lambdas with '&'
# to call them in the context of 'self'. Change the
# ##commands to COMMANDS if you prefer to use a constant
# for this stuff.
instance_exec(param, &##commands[cmd]) if ##commands.key?(cmd)
end
def people
#people
end
end
EDIT Following #VictorMoroz's and #mu's recommendations:
class Actions
def initialize
#people = []
end
def cmd_add(name)
#people << name
end
def cmd_remove
puts "Goodbye"
end
def cmd_other
puts "Do Nothing"
end
def people
p #people
end
def run_command(cmd, *param)
cmd = 'cmd_' + cmd.to_s.downcase
send(cmd, *param) if respond_to?(cmd)
end
end
act = Actions.new
act.run_command('add', 'joe')
act.run_command(:ADD, 'jill')
act.run_command('ADD', 'jack')
act.run_command('people') # does nothing
act.people
Or
class Actions
ALLOWED_METHODS = %w( add remove other )
def initialize
#people = []
end
def add(name)
#people << name
end
def remove
puts "Goodbye"
end
def other
puts "Do Nothing"
end
def people
p #people
end
def run_command(cmd, *param)
cmd = cmd.to_s.downcase
send(cmd, *param) if ALLOWED_METHODS.include?(cmd)
end
end
act = Actions.new
act.run_command('add', 'joe')
act.run_command(:add, 'jill')
act.run_command('add', 'jack')
act.run_command('people') # does nothing
act.people
I am trying to write fails and errors that occur in the test to a log file, so that they don't appear on-screen, but it appears as though errors and assert failures write to STDOUT instead of STDERR. I have been unable to find information on how to redirect this output after hours of googling, and would greatly appreciate help.
Why should the errors not be on stdout?
Up to now I have no prepared solution to suppress the output for specific errors.
If you accept the unchanged output, I have a solution to store errors and failures in a file. It will be no problem to create a 2nd file for Notifications...
gem 'test-unit'
require 'test/unit'
module Test
module Unit
class TestSuite
alias :old_run :run
def run(result, &progress_block)
old_run(result, &progress_block)
File.open('test.log', 'w'){|f|
result.faults.each{|err|
case err
when Test::Unit::Error, Test::Unit::Failure
f << err.long_display
f << "\n===========\n"
#not in log file
when Test::Unit::Pending, Test::Unit::Notification, Test::Unit::Omission
end
}
}
end
end
end
end
class MyTest < Test::Unit::TestCase
def test_1()
assert_equal( 3, 1+1) #failure
end
def test_2()
1 / 0 #force an error
end
def test_3()
notify 'sss'
end
def test_4()
pend "MeineKlasse.new"
end
def test_5
omit 'aaa' if RUBY_VERSION == '1.9.2'
end
def test_5
assert_in_delta( 0.1, 0.00001, 1.0/10)
end
end
Have you tried redirecting the output to StringIO, to write it to a logfile later?
original_stdout = $stdout
original_stderr = $stderr
fake_stdout = StringIO.new
fake_stderr = StringIO.new
$stdout = fake_stdout
$stderr = fake_stderr
Then after you've ran the tests:
$stdout = original_stdout
$stderr = original_stderr
#stdout = fake_stdout.string
#stderr = fake_stderr.string
I'm not really sure this will work though...
I hope I understood yor question correct.
Do you mean something like this:
gem 'test-unit'
require 'test/unit'
class StdOutLogger < IO
def initialize(*)
super
#file = File.open('log.txt', 'w')
#stdout = true
end
#write to stdout and file
def write(s)
$stdout << s
#file << s
end
end
STDOUT = StdOutLogger.new(1)
class MyTest < Test::Unit::TestCase
def test_1()
assert_equal( 2, 1+1)
assert_equal( 2, 4/2)
assert_equal( 1, 3/2)
assert_equal( 1.5, 3/2.0)
end
end
But I would recommend to copy stdout on operation system level
ruby mytest.rb > log.txt
Here is a version to skip between stdout and file output. The output switch is only a interim solution - perhaps it can be made better.
class StdOutLogger < IO
def initialize(*)
super
#file = File.open('log.txt', 'w')
#stdout = true
end
def write(s)
case s
when /\AError:/
#stdout = false #change to file
when /\A(Pending):/
#stdout = true #change to file
end
if #stdout
$stdout << s
else
#file << s
end
end
end
STDOUT = StdOutLogger.new(1)
I have a testing library to assist in testing logging:
require 'stringio'
module RedirectIo
def setup
$stderr = #stderr = StringIO.new
$stdin = #stdin = StringIO.new
$stdout = #stdout = StringIO.new
super
end
def teardown
$stderr = STDERR
$stdin = STDIN
$stdout = STDOUT
super
end
end
It is meant to be used like so:
require 'lib/redirect_io'
class FooTest < Test::Unit::TestCase
include RedirectIo
def test_logging
msg = 'bar'
Foo.new.log msg
assert_match /^#{TIMESTAMP_REGEX} #{msg}$/, #stdout.string, 'log message'
end
end
Of course, I have a unit test for my test library. :)
require 'lib/redirect_io'
class RedirectIoTest < Test::Unit::TestCase
def test_setup_and_teardown_are_mixed_in
%W{setup teardown}.each do |method|
assert_not_equal self.class, self.class.new(__method__).method(method).owner, "owner of method #{method}"
end
self.class.class_eval { include RedirectIo }
%W{setup teardown}.each do |method|
assert_equal RedirectIo, self.class.new(__method__).method(method).owner, "owner of method #{method}"
end
end
def test_std_streams_captured
%W{STDERR STDIN STDOUT}.each do |stream_name|
assert_equal eval(stream_name), self.class.new(__method__).instance_eval("$#{stream_name.downcase}"), stream_name
end
self.class.class_eval { include RedirectIo }
setup
%W{STDERR STDIN STDOUT}.each do |stream_name|
assert_not_equal eval(stream_name), self.class.new(__method__).instance_eval("$#{stream_name.downcase}"),
stream_name
end
teardown
%W{STDERR STDIN STDOUT}.each do |stream_name|
assert_equal eval(stream_name), self.class.new(__method__).instance_eval("$#{stream_name.downcase}"), stream_name
end
end
end
I'm having trouble figuring out how to test that RedirectIo.setup and teardown call super. Any ideas?
I figured out one way to do it, which actually makes the test code much cleaner as well!
require 'lib/redirect_io'
class RedirectIoTest < Test::Unit::TestCase
class RedirectIoTestSubjectSuper < Test::Unit::TestCase
def setup
#setup = true
end
def teardown
#teardown = true
end
end
class RedirectIoTestSubject < RedirectIoTestSubjectSuper
include RedirectIo
def test_fake; end
end
def test_setup_and_teardown_are_mixed_in
%W{setup teardown}.each do |method|
assert_not_equal RedirectIoTestSubject, self.method(method).owner, "owner of method #{method}"
end
%W{setup teardown}.each do |method|
assert_equal RedirectIo, RedirectIoTestSubject.new(:test_fake).method(method).owner, "owner of method #{method}"
end
end
def test_std_streams_captured
obj = RedirectIoTestSubject.new(:test_fake)
$stderr = STDERR
$stdin = STDIN
$stdout = STDOUT
obj.setup
%W{STDERR STDIN STDOUT}.each do |stream_name|
assert_not_equal eval(stream_name), eval("$#{stream_name.downcase}"), stream_name
end
obj.teardown
%W{STDERR STDIN STDOUT}.each do |stream_name|
assert_equal eval(stream_name), eval("$#{stream_name.downcase}"), stream_name
end
end
def test_setup_and_teardown_call_super
obj = RedirectIoTestSubject.new(:test_fake)
obj.setup
assert obj.instance_eval{#setup}, 'super called?'
obj.teardown
assert obj.instance_eval{#teardown}, 'super called?'
end
end