Ignore directories passed as arguments in ARGF - ruby

I want to write a script that does some work on some files specified on a command line, but I want the user to be able to call
fooscript *
even if * expands to include some directory names
I know I can use ARGF to handle the files
ARGF.each do |f|
puts f
end
But this gives me an error for the case of a directory. I'd like to skip over directoryies, or perhaps handle them specially. What's the most idiomatic way to accomplish this is Ruby?

You could just filter them from ARGV before reading ARGF :
ARGV.reject! {|f| File.directory? f }
# Now there are no dirs in ARGF...
ARGF.each do |l|
#... etc.

Related

Running a set of Ruby scripts in Terminal

I want to repetitively run a Ruby script in the Mac Terminal window that does a text search on a text file. The script works well on each text file at a time in Terminal, but I want to do this multiple times on a sequence of files.
I've tried creating a script in automator but no luck. As an FYI the Ruby script is attached, but that is not the issue
Thank you
#!/usr/bin/env ruby
require 'yaml'
require 'csv'
abort "You must specify one or more files to search." if ARGV.size == 0
search_terms = "---
:stage1:
JTSJ3:
- text term 1
:stage2n:
JTSJ3:
- nothing
:stage2p:
JTSJ3:
- text term 2
:stage3:
JTSJ3:
- nothing"
...
File.open(File.join(result_dir, 'results_stage3.yml'), 'w') do |f|
f.write stage3_results.to_yaml
end
File.open(File.join(result_dir, 'results_stage3.csv'), 'w') do |f|
f.write csv_header.to_csv
stage3_results.each do |r|
f.write [ r[:category], r[:term], r[:line], r[:text], r[:file] ].to_csv
end
end
There's a few patterns you can use here, but to employ these techniques the key is to define a simple entry point method that you can call as necessary instead of having all that stuff just strewn about in the main namespace.
Pull filenames from the ARGV arguments list:
ARGV.each do |file|
process(file)
end
You can use File.basename(file, '.yml') to strip off extensions and switch that to .csv if you prefer. Keep your method as generic as possible.
Secondly, you can use xargs externally:
find . -name '*.yml' | xargs ruby program.rb
Where that will append all the files matching that pattern as arguments to your program. You can even tweak the options to run in parallel:
find . -name '*.yml' | xargs -n 2 -p 8 ruby program.rb
Where that runs 8 parallel processes (-p 8), each processing up to two files (-n 2).
You can also do this yourself with:
Dir.glob('source_dir/**/*.yml') do |file|
process(file)
end
Where Dir.glob is great at finding a lot of things. To parallelize that you can either use threads or forking. xargs is a quick way to get that all for "free".

Discover the file ruby require method would load?

The require method in ruby will search the lib_path and load the first matching files found if needed. Is there anyway to print the path to the file which would be loaded. I'm looking for, ideally built-in, functionality similar to the which command in bash and hoping it can be that simple too. Thanks.
I don't know of a built-in functionality, but defining your own isn't hard. Here's a solution adapted from this question:
def which(string)
$:.each do |p|
if File.exist? File.join(p, string)
puts File.join(p, string)
break
end
end
end
which 'nokogiri'
#=> /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri
Explanation: $: is a pre-defined variable. It's an array of places to search for files you can load or require. The which method iterates through each path looking for the file you called it on. If it finds a match, it returns the file path.
I'm assuming you just want the output to be a single line showing the full filepath of the required file, like which. If you want to also see the files your required file will load itself, something like the solution in the linked question might be more appropriate:
module Kernel
def require_and_print(string)
$:.each do |p|
if File.exist? File.join(p, string)
puts File.join(p, string)
break
end
end
require_original(string)
end
alias_method :require_original, :require
alias_method :require, :require_and_print
end
require 'nokogiri'
#=> /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/rubygems-update-1.3.5/lib/rbconfig
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml/pp
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml/sax
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml/node
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xml/xpath
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/xslt
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/html
# /opt/local/lib/ruby1.9/gems/1.9.1/gems/nokogiri-1.4.1/lib/nokogiri/css
# /opt/local/lib/ruby1.9/1.9.1/racc/parser.rb
$ gem which filename # (no .rb suffix) is what I use...

Good Way to Handle Many Different Files?

I'm building a specialized pipeline, and basically, every step in the pipeline involves taking one file as input and creating a different file as output. Not all files are in the same directory, all output files are of a different format, and because I'm using several different programs, different actions have to be taken to appease the different programs.
This has led to some complicated file management in my code, and the more I try to organize the file directories, the more ugly it's getting. Just about every class involves some sort of code like the following:
#fileName = File.basename(file)
#dataPath = "#{$path}/../data/"
MzmlToOther.new("mgf", "#{#dataPath}/spectra/#{#fileName}.mzML", 1, false).convert
system("wine readw.exe --mzXML #{#file}.raw #{$path}../data/spectra/#{File.basename(#file + ".raw", ".raw")}.mzXML 2>/dev/null")
fileName = "#{$path}../data/" + parts[0] + parts[1][6..parts[1].length-1].chomp(".pep.xml")
Is there some sort of design pattern, or ruby gem, or something to clean this up? I like writing clean code, so this is really starting to bother me.
You could use a Makefile.
Make is essential a DSL designed for handling converting one type of file to another type via running an external program. As an added bonus, it will handle only performing the steps necessary to incrementally update your output if some set of source files change.
If you really want to use Ruby, try a rakefile. Rake will do this, and it's still Ruby.
You can make this as sophisticated as you want but this basic script will match a file suffix to a method which you can then call with the file path.
# a conversion method can be used for each file type if you want to
# make the code more readable or if you need to rearrange filenames.
def htm_convert file
"HTML #{file}"
end
# file suffix as key, lambda as value, the last uses an external method
routines = {
:log => lambda {|file| puts "LOG #{file}"},
:rb => lambda {|file| puts "RUBY #{file}"},
:haml => lambda {|file| puts "HAML #{file}"},
:htm => lambda {|file| puts htm_convert(file) }
}
# this loops recursively through the directory and sub folders
Dir['**/*.*'].each do |f|
suffix = f.split(".")[-1]
if routine = routines[suffix.to_sym]
routine.call(f)
else
puts "UNPROCESSED -- #{f}"
end
end

Loop from contents of a file in ruby

Okay, so I am new to Ruby and I have a strong background in bash/ksh/sh.
What I am trying to do is use a simple for loop to run a command across several servers. In bash I would do it like:
for SERVER in `cat etc/SERVER_LIST`
do
ssh -q ${SERVER} "ls -l /etc"
done
etc/SERVER_LIST is just a file that looks like:
server1
server2
server3
etc
I can't seem to get this right in Ruby. This is what I have so far:
#!/usr/bin/ruby
### SSH testing
#
#
require 'net/ssh'
File.open("etc/SERVER_LIST") do |f|
f.each_line do |line|
Net::SSH.start(line, 'andex') do |ssh|
result = ssh.exec!("ls -l")
puts result
end
end
end
I'm getting these errors now:
andex#master:~/sysauto> ./ssh2.rb
/usr/lib64/ruby/gems/1.8/gems/net-ssh-2.0.23/lib/net/ssh/transport/session.rb:65:in `initialize': newline at the end of hostname (SocketError)
from /usr/lib64/ruby/gems/1.8/gems/net-ssh-2.0.23/lib/net/ssh/transport/session.rb:65:in `open'
from /usr/lib64/ruby/gems/1.8/gems/net-ssh-2.0.23/lib/net/ssh/transport/session.rb:65:in `initialize'
from /usr/lib64/ruby/1.8/timeout.rb:53:in `timeout'
from /usr/lib64/ruby/1.8/timeout.rb:93:in `timeout'
from /usr/lib64/ruby/gems/1.8/gems/net-ssh-2.0.23/lib/net/ssh/transport/session.rb:65:in `initialize'
from /usr/lib64/ruby/gems/1.8/gems/net-ssh-2.0.23/lib/net/ssh.rb:179:in `new'
from /usr/lib64/ruby/gems/1.8/gems/net-ssh-2.0.23/lib/net/ssh.rb:179:in `start'
from ./ssh2.rb:10
from ./ssh2.rb:9:in `each_line'
from ./ssh2.rb:9
from ./ssh2.rb:8:in `open'
from ./ssh2.rb:8
The file is sourced correctly, I am using the relative path, as I am sitting in the directory under etc/ (not /etc, I'm running this out of a scripting directory where I keep the file in a subdirectory called etc.)
File.open("/etc/SERVER_LIST", "r") do |file_handle|
file_handle.each_line do |server|
# do stuff to server here
end
end
The first line opens the file for reading and immediately goes into a block. (The block is the code between do and end. You can also surround blocks with just { and }. The rule of thumb is do..end for multi-line blocks and {...} for single-line blocks.) Blocks are very common in Ruby. Far more idiomatic than a while or for loop.) The call to open receives the filehandle automatically, and you give it a name in the pipes.
Once you have a hold of that, so to speak, you can call each_line on it, and iterate over it as if it were an array. Again, each iteration automatically passes you a line, which you call what you like in the pipes.
The nice thing about this method is that it saves you the trouble of closing the file when you're finished with it. A file opened this way will automatically get closed as you leave the outer block.
One other thing: The file is almost certainly named /etc/SERVER_LIST. You need the initial / to indicate the root of the file system (unless you are intentionally using a relative value for the path to the file, which I doubt). That alone may have kept you from getting the file open.
Update for new error: Net::SSH is barfing up over the newline. Where you have this:
Net::SSH.start(line, 'andex') do |ssh|
make it this:
Net::SSH.start(line.chomp, 'andex') do |ssh|
The chomp method removes any final newline character from a string.
Use File.foreach:
require 'net/ssh'
File.foreach('etc/SERVER_LIST', "\n") do |line|
Net::SSH.start(line, 'andex') do |ssh|
result = ssh.exec!("ls -l")
puts result
end
end
The most common construct I see when doing by-line iteration of a file is:
File.open("etc/SERVER_LIST") do |f|
f.each_line do |line|
# do something here
end
end
To expand on the above with some more general Ruby info... this syntax is equivalent to:
File.open("etc/SERVER_LIST") { |f|
f.each_line { |line|
# do something here
}
}
When I was first introduced to Ruby, I had no idea what the |f| and |line| syntax meant. I knew when to use it, and how it worked, but not why they choose that syntax. It is, in my opinion, one of the magical things about Ruby. That simple syntax above is actually hiding a very advanced programming concept right under your nose. The code nested inside of the "do"/"end" or { } is called a block. And you can consider it an anonymous function or lambda. The |f| and |line| syntax is in fact just the handle to the parameter passed to the block of code by the executing parent.
In the case of File.open(), the anonymous function takes a single argument, which is the handle to the underyling File IO object.
In the case of each_line, this is an interator function which gets called once for every line. The |line| is simply a variable handle to the data that gets passed with each iteration of the function.
Oh, and one nice thing about do/end with File.open is it automatically closes the file at the end.
Edit:
The error you're getting now suggests the SSH call doesn't appreciate the extra whitespace (newline) at the end of the string. To fix this, simply do a
Net::SSH.start(line.strip, 'andex') do |ssh|
end
Reading in lines from a file is a common operation, and Ruby has an easy way to do this:
servers = File.readlines('/etc/SERVER_LIST')
The readlines method will open the file, read the file into an array, and close the file for you (so you don't have to worry about any of that). The variable servers will be an array of strings; each string will be a line from the file. You can use the Array::each method to iterate through this array and use the code you already have. Try this:
servers = File.readlines('/etc/SERVER_LIST')
servers.each {|s|
Net::SSH.start(s, 'andex') {|ssh| puts ssh.exec!("ls -l") }
}
I think this is what you want for the in 'initialize': newline at the end of hostname (SocketError) error:
Net::SSH.start(line.chomp, 'andex')
The each_line method includes the "\n", and the chomp will remove it.

How to search file text for a pattern and replace it with a given value

I'm looking for a script to search a file (or list of files) for a pattern and, if found, replace that pattern with a given value.
Thoughts?
Disclaimer: This approach is a naive illustration of Ruby's capabilities, and not a production-grade solution for replacing strings in files. It's prone to various failure scenarios, such as data loss in case of a crash, interrupt, or disk being full. This code is not fit for anything beyond a quick one-off script where all the data is backed up. For that reason, do NOT copy this code into your programs.
Here's a quick short way to do it.
file_names = ['foo.txt', 'bar.txt']
file_names.each do |file_name|
text = File.read(file_name)
new_contents = text.gsub(/search_regexp/, "replacement string")
# To merely print the contents of the file, use:
puts new_contents
# To write changes to the file, use:
File.open(file_name, "w") {|file| file.puts new_contents }
end
Actually, Ruby does have an in-place editing feature. Like Perl, you can say
ruby -pi.bak -e "gsub(/oldtext/, 'newtext')" *.txt
This will apply the code in double-quotes to all files in the current directory whose names end with ".txt". Backup copies of edited files will be created with a ".bak" extension ("foobar.txt.bak" I think).
NOTE: this does not appear to work for multiline searches. For those, you have to do it the other less pretty way, with a wrapper script around the regex.
Keep in mind that, when you do this, the filesystem could be out of space and you may create a zero-length file. This is catastrophic if you're doing something like writing out /etc/passwd files as part of system configuration management.
Note that in-place file editing like in the accepted answer will always truncate the file and write out the new file sequentially. There will always be a race condition where concurrent readers will see a truncated file. If the process is aborted for any reason (ctrl-c, OOM killer, system crash, power outage, etc) during the write then the truncated file will also be left over, which can be catastrophic. This is the kind of dataloss scenario which developers MUST consider because it will happen. For that reason, I think the accepted answer should most likely not be the accepted answer. At a bare minimum write to a tempfile and move/rename the file into place like the "simple" solution at the end of this answer.
You need to use an algorithm that:
Reads the old file and writes out to the new file. (You need to be careful about slurping entire files into memory).
Explicitly closes the new temporary file, which is where you may throw an exception because the file buffers cannot be written to disk because there is no space. (Catch this and cleanup the temporary file if you like, but you need to rethrow something or fail fairly hard at this point.
Fixes the file permissions and modes on the new file.
Renames the new file and drops it into place.
With ext3 filesystems you are guaranteed that the metadata write to move the file into place will not get rearranged by the filesystem and written before the data buffers for the new file are written, so this should either succeed or fail. The ext4 filesystem has also been patched to support this kind of behavior. If you are very paranoid you should call the fdatasync() system call as a step 3.5 before moving the file into place.
Regardless of language, this is best practice. In languages where calling close() does not throw an exception (Perl or C) you must explicitly check the return of close() and throw an exception if it fails.
The suggestion above to simply slurp the file into memory, manipulate it and write it out to the file will be guaranteed to produce zero-length files on a full filesystem. You need to always use FileUtils.mv to move a fully-written temporary file into place.
A final consideration is the placement of the temporary file. If you open a file in /tmp then you have to consider a few problems:
If /tmp is mounted on a different file system you may run /tmp out of space before you've written out the file that would otherwise be deployable to the destination of the old file.
Probably more importantly, when you try to mv the file across a device mount you will transparently get converted to cp behavior. The old file will be opened, the old files inode will be preserved and reopened and the file contents will be copied. This is most likely not what you want, and you may run into "text file busy" errors if you try to edit the contents of a running file. This also defeats the purpose of using the filesystem mv commands and you may run the destination filesystem out of space with only a partially written file.
This also has nothing to do with Ruby's implementation. The system mv and cp commands behave similarly.
What is more preferable is to open a Tempfile in the same directory as the old file. This ensures that there will be no cross-device move issues. The mv itself should never fail, and you should always get a complete and untruncated file. Any failures, such as device out of space, permission errors, etc., should be encountered during writing the Tempfile out.
The only downsides to the approach of creating the Tempfile in the destination directory are:
Sometimes you may not be able to open a Tempfile there, such as if you are trying to 'edit' a file in /proc for example. For that reason you might want to fall back and try /tmp if opening the file in the destination directory fails.
You must have enough space on the destination partition in order to hold both the complete old file and the new file. However, if you have insufficient space to hold both copies then you are probably short on disk space and the actual risk of writing a truncated file is much higher, so I would argue this is a very poor tradeoff outside of some exceedingly narrow (and well-monitored) edge cases.
Here's some code that implements the full-algorithm (windows code is untested and unfinished):
#!/usr/bin/env ruby
require 'tempfile'
def file_edit(filename, regexp, replacement)
tempdir = File.dirname(filename)
tempprefix = File.basename(filename)
tempprefix.prepend('.') unless RUBY_PLATFORM =~ /mswin|mingw|windows/
tempfile =
begin
Tempfile.new(tempprefix, tempdir)
rescue
Tempfile.new(tempprefix)
end
File.open(filename).each do |line|
tempfile.puts line.gsub(regexp, replacement)
end
tempfile.fdatasync unless RUBY_PLATFORM =~ /mswin|mingw|windows/
tempfile.close
unless RUBY_PLATFORM =~ /mswin|mingw|windows/
stat = File.stat(filename)
FileUtils.chown stat.uid, stat.gid, tempfile.path
FileUtils.chmod stat.mode, tempfile.path
else
# FIXME: apply perms on windows
end
FileUtils.mv tempfile.path, filename
end
file_edit('/tmp/foo', /foo/, "baz")
And here is a slightly tighter version that doesn't worry about every possible edge case (if you are on Unix and don't care about writing to /proc):
#!/usr/bin/env ruby
require 'tempfile'
def file_edit(filename, regexp, replacement)
Tempfile.open(".#{File.basename(filename)}", File.dirname(filename)) do |tempfile|
File.open(filename).each do |line|
tempfile.puts line.gsub(regexp, replacement)
end
tempfile.fdatasync
tempfile.close
stat = File.stat(filename)
FileUtils.chown stat.uid, stat.gid, tempfile.path
FileUtils.chmod stat.mode, tempfile.path
FileUtils.mv tempfile.path, filename
end
end
file_edit('/tmp/foo', /foo/, "baz")
The really simple use-case, for when you don't care about file system permissions (either you're not running as root, or you're running as root and the file is root owned):
#!/usr/bin/env ruby
require 'tempfile'
def file_edit(filename, regexp, replacement)
Tempfile.open(".#{File.basename(filename)}", File.dirname(filename)) do |tempfile|
File.open(filename).each do |line|
tempfile.puts line.gsub(regexp, replacement)
end
tempfile.close
FileUtils.mv tempfile.path, filename
end
end
file_edit('/tmp/foo', /foo/, "baz")
TL;DR: That should be used instead of the accepted answer at a minimum, in all cases, in order to ensure the update is atomic and concurrent readers will not see truncated files. As I mentioned above, creating the Tempfile in the same directory as the edited file is important here to avoid cross device mv operations being translated into cp operations if /tmp is mounted on a different device. Calling fdatasync is an added layer of paranoia, but it will incur a performance hit, so I omitted it from this example since it is not commonly practiced.
There isn't really a way to edit files in-place. What you usually do when you can get away with it (i.e. if the files are not too big) is, you read the file into memory (File.read), perform your substitutions on the read string (String#gsub) and then write the changed string back to the file (File.open, File#write).
If the files are big enough for that to be unfeasible, what you need to do, is read the file in chunks (if the pattern you want to replace won't span multiple lines then one chunk usually means one line - you can use File.foreach to read a file line by line), and for each chunk perform the substitution on it and append it to a temporary file. When you're done iterating over the source file, you close it and use FileUtils.mv to overwrite it with the temporary file.
Another approach is to use inplace editing inside Ruby (not from the command line):
#!/usr/bin/ruby
def inplace_edit(file, bak, &block)
old_stdout = $stdout
argf = ARGF.clone
argf.argv.replace [file]
argf.inplace_mode = bak
argf.each_line do |line|
yield line
end
argf.close
$stdout = old_stdout
end
inplace_edit 'test.txt', '.bak' do |line|
line = line.gsub(/search1/,"replace1")
line = line.gsub(/search2/,"replace2")
print line unless line.match(/something/)
end
If you don't want to create a backup then change '.bak' to ''.
This works for me:
filename = "foo"
text = File.read(filename)
content = text.gsub(/search_regexp/, "replacestring")
File.open(filename, "w") { |file| file << content }
Here's a solution for find/replace in all files of a given directory. Basically I took the answer provided by sepp2k and expanded it.
# First set the files to search/replace in
files = Dir.glob("/PATH/*")
# Then set the variables for find/replace
#original_string_or_regex = /REGEX/
#replacement_string = "STRING"
files.each do |file_name|
text = File.read(file_name)
replace = text.gsub!(#original_string_or_regex, #replacement_string)
File.open(file_name, "w") { |file| file.puts replace }
end
require 'trollop'
opts = Trollop::options do
opt :output, "Output file", :type => String
opt :input, "Input file", :type => String
opt :ss, "String to search", :type => String
opt :rs, "String to replace", :type => String
end
text = File.read(opts.input)
text.gsub!(opts.ss, opts.rs)
File.open(opts.output, 'w') { |f| f.write(text) }
If you need to do substitutions across line boundaries, then using ruby -pi -e won't work because the p processes one line at a time. Instead, I recommend the following, although it could fail with a multi-GB file:
ruby -e "file='translation.ja.yml'; IO.write(file, (IO.read(file).gsub(/\s+'$/, %q('))))"
The is looking for white space (potentially including new lines) following by a quote, in which case it gets rid of the whitespace. The %q(')is just a fancy way of quoting the quote character.
Here an alternative to the one liner from jim, this time in a script
ARGV[0..-3].each{|f| File.write(f, File.read(f).gsub(ARGV[-2],ARGV[-1]))}
Save it in a script, eg replace.rb
You start in on the command line with
replace.rb *.txt <string_to_replace> <replacement>
*.txt can be replaced with another selection or with some filenames or paths
broken down so that I can explain what's happening but still executable
# ARGV is an array of the arguments passed to the script.
ARGV[0..-3].each do |f| # enumerate the arguments of this script from the first to the last (-1) minus 2
File.write(f, # open the argument (= filename) for writing
File.read(f) # open the argument (= filename) for reading
.gsub(ARGV[-2],ARGV[-1])) # and replace all occurances of the beforelast with the last argument (string)
end
EDIT: if you want to use a regular expression use this instead
Obviously, this is only for handling relatively small text files, no Gigabyte monsters
ARGV[0..-3].each{|f| File.write(f, File.read(f).gsub(/#{ARGV[-2]}/,ARGV[-1]))}
I am using the tty-file gem
Apart from replacing, it includes append, prepend (on a given text/regex inside the file), diff, and others.

Resources