Directory Traversal in Ruby - ruby

I have been trying to implement a directory traversal in Ruby for part of a bigger program using the simple recursive approach. However I have found that Dir.foreach does not include the directories inside of it. How can I get them listed?
Code:
def walk(start)
Dir.foreach(start) do |x|
if x == "." or x == ".."
next
elsif File.directory?(x)
walk(x)
else
puts x
end
end
end

The problem is that each time you recurse, the path you pass to File.directory? is no is just the entity (file or directory) name; all context is lost. So say you go into one/two/three/ to check if one/two/three/file.txt is a directory, File.directory? just gets "file.txt" as the path instead of the whole thing, from the perspective of the top-level directory. You have to maintain the relative path each time you recurse. This seems to work fine:
def walk(start)
Dir.foreach(start) do |x|
path = File.join(start, x)
if x == "." or x == ".."
next
elsif File.directory?(path)
puts path + "/" # remove this line if you want; just prints directories
walk(path)
else
puts x
end
end
end

For recursion you should use Find:
From the documentation:
The Find module supports the top-down traversal of a set of file paths.
For example, to total the size of all files under your home directory, ignoring anything in a “dot” directory (e.g. $HOME/.ssh):
require 'find'
total_size = 0
Find.find(ENV["HOME"]) do |path|
if FileTest.directory?(path)
if File.basename(path)[0] == ?.
Find.prune # Don't look any further into this directory.
else
next
end
else
total_size += FileTest.size(path)
end
end

Related

Ordering in Ruby Pathname .children

It seems the order of the filesystem entities returned by Pathname's .children method is arbitrary or at least not alphabetical.
Is there a way to have these returned in alphabetical order via the file system rather than calling .sort on the returned array?
Pathname's children is actually doing:
def children(with_directory=true)
with_directory = false if #path == '.'
result = []
Dir.foreach(#path) {|e|
next if e == '.' || e == '..'
if with_directory
result << self.class.new(File.join(#path, e))
else
result << self.class.new(e)
end
}
result
end
Dir.foreach calls the OS and iterates the directory passed in. There is no provision for telling the OS to sort by a particular order.
"What is the "directory order" of files in a directory (used by ls -U)?" is probably of interest to you.

A twist on directory walking in Ruby

I'd like to do the following:
Given a directory tree:
Root
|_dirA
|_dirB
|_file1
|_file2
|_dirC
|_dirD
|_dirE
|_file3
|_file4
|_dirF
|_dirG
|_file5
|_file6
|_file7
... I'd like to walk the directory tree and build an array that contains the path to the first file in each directory that has at least one file. The overall structure may be quite large with many more files than directories, so I'd like to capture just the path to the first file without iterating through all the files in a given directory. One file is enough. For the above tree, the result should look like an array that contains only:
root/dirB/file1
root/dirC/dirD/dirE/file3
root/dirF/dirG/file5
I've played with the Dir and Find options in ruby, but my approach feels too brute-force-ish.
Is there an efficient way to code this functionality? It feels like I am missing some ruby trick here.
Many thanks!
Here's my approach:
root="/home/subtest/tsttree/"
Dir.chdir(root)
dir_list=Dir.glob("**/*/") #this invokes recursion
result=Array.new
dir_list.each do |d|
Dir.chdir(root + d)
Dir.open(Dir.pwd).each do |filename|
next if File.directory? filename #some directories may contain only other directories so exclude them
result.push(d + filename)
break
end
end
puts result
Works, but seems messy.
require 'pathname'
# My answer to stackoverflow question posted here:
# http://stackoverflow.com/questions/12684736/a-twist-on-directory-walking-in-ruby
class ShallowFinder
def initialize(root)
#matches = {}
#root = Pathname(root)
end
def matches
while match = next_file
#matches[match.parent.to_s] = match
end
#matches.values
end
private
def next_file
#root.find do |entry|
Find.prune if previously_matched?(entry)
return entry if entry.file?
end
nil
end
def previously_matched?(entry)
return unless entry.directory?
#matches.key?(entry.to_s)
end
end
puts ShallowFinder.new('Root').matches
Outputs:
Root/B/file1
Root/C/D/E/file3
Root/F/G/file5

recursive file list in Ruby

I'm new to Ruby (being a Java dev) and trying to implement a method (oh, sorry, a function) that would retrieve and yield all files in the subdirectories recursively.
I've implemented it as:
def file_list_recurse(dir)
Dir.foreach(dir) do |f|
next if f == '.' or f == '..'
f = dir + '/' + f
if File.directory? f
file_list_recurse(File.absolute_path f) { |x| yield x }
else
file = File.new(f)
yield file
end
end
end
My questions are:
Does File.new really OPEN a file? In Java new File("xxx") doesn't... If I need to yield some structure that I could query file info (ctime, size etc) from what would it be in Ruby?
{ |x| yield x } looks a little strange to me, is this OK to do yields from recursive functions like that, or is there some way to avoid it?
Is there any way to avoid checking for '.' and '..' on each iteration?
Is there a better way to implement this?
Thanks
PS:
the sample usage of my method is something like this:
curr_file = nil
file_list_recurse('.') do |file|
curr_file = file if curr_file == nil or curr_file.ctime > file.ctime
end
puts curr_file.to_path + ' ' + curr_file.ctime.to_s
(that would get you the oldest file from the tree)
==========
So, thanks to #buruzaemon I found out the great Dir.glob function which saved me a couple of lines of code.
Also, thanks to #Casper I found out the File.stat method, which made my function run two times faster than with File.new
In the end my code is looking something like this:
i=0
curr_file = nil
Dir.glob('**/*', File::FNM_DOTMATCH) do |f|
file = File.stat(f)
next unless file.file?
i += 1
curr_file = [f, file] if curr_file == nil or curr_file[1].ctime > file.ctime
end
puts curr_file[0] + ' ' + curr_file[1].ctime.to_s
puts "total files #{i}"
=====
By default Dir.glob ignores file names starting with a dot (considered to be 'hidden' in *nix), so it's very important to add the second argument File::FNM_DOTMATCH
How about this?
puts Dir['**/*.*']
According to the docs File.new does open the file. You might want to use File.stat instead, which gathers file-related stats into a queryable object. But note that the stats are gathered at point of creation. Not when you call the query methods like ctime.
Example:
Dir['**/*'].select { |f| File.file?(f) }.map { |f| File.stat(f) }
this thing tells me to consider accepting an answer, I hope it wouldn't mind me answering it myself:
i=0
curr_file = nil
Dir.glob('**/*', File::FNM_DOTMATCH) do |f|
file = File.stat(f)
next unless file.file?
i += 1
curr_file = [f, file] if curr_file == nil or curr_file[1].ctime > file.ctime
end
puts curr_file[0] + ' ' + curr_file[1].ctime.to_s
puts "total files #{i}"
You could use the built-in Find module's find method.
If you are on Windows see my answer here under for a mutch faster (~26 times) way than standard Ruby Dir. If you use mtime it's still going to be waaayyy faster.
If you use another OS you could use the same technique, I'm curious if the gain would be that big but I'm almost certain.
How to find the file path of file that is not the current file in ruby

Increment part of a string in Ruby

I have a method in a Ruby script that is attempting to rename files before they are saved. It looks like this:
def increment (path)
if path[-3,2] == "_#"
print " Incremented file with that name already exists, renaming\n"
count = path[-1].chr.to_i + 1
return path.chop! << count.to_s
else
print " A file with that name already exists, renaming\n"
return path << "_#1"
end
end
Say you have 3 files with the same name being saved to a directory, we'll say the file is called example.mp3. The idea is that the first will be saved as example.mp3 (since it won't be caught by if File.exists?("#{file_path}.mp3") elsewhere in the script), the second will be saved as example_#1.mp3 (since it is caught by the else part of the above method) and the third as example_#2.mp3 (since it is caught by the if part of the above method).
The problem I have is twofold.
1) if path[-3,2] == "_#" won't work for files with an integer of more than one digit (example_#11.mp3 for example) since the character placement will be wrong (you'd need it to be path[-4,2] but then that doesn't cope with 3 digit numbers etc).
2) I'm never reaching problem 1) since the method doesn't reliably catch file names. At the moment it will rename the first to example_#1.mp3 but the second gets renamed to the same thing (causing it to overwrite the previously saved file).
This is possibly too vague for Stack Overflow but I can't find anything that addresses the issue of incrementing a certain part of a string.
Thanks in advance!
Edit/update:
Wayne's method below seems to work on it's own but not when included as part of the whole script - it can increment a file once (from example.mp3 to example_#1.mp3) but doesn't cope with taking example_#1.mp3 and incrementing it to example_#2.mp3. To provide a little more context - currently when the script finds a file to save it is passing the name to Wayne's method like this:
file_name = increment(image_name)
File.open("images/#{file_name}.jpeg", 'w') do |output|
open(image_url) do |input|
output << input.read
end
end
I've edited Wayne's script a little so now it looks like this:
def increment (name)
name = name.gsub(/\s{2,}|(http:\/\/)|(www.)/i, '')
if File.exists?("images/#{name}.jpeg")
_, filename, count, extension = *name.match(/(\A.*?)(?:_#(\d+))?(\.[^.]*)?\Z/)
count = (count || '0').to_i + 1
"#{name}_##{count}#{extension}"
else
return name
end
end
Where am I going wrong? Again, thanks in advance.
A regular expression will git 'er done:
#!/usr/bin/ruby1.8
def increment(path)
_, filename, count, extension = *path.match(/(\A.*?)(?:_#(\d+))?(\.[^.]*)?\Z/)
count = (count || '0').to_i + 1
"#{filename}_##{count}#{extension}"
end
p increment('example') # => "example_#1"
p increment('example.') # => "example_#1."
p increment('example.mp3') # => "example_#1.mp3"
p increment('example_#1.mp3') # => "example_#2.mp3"
p increment('example_#2.mp3') # => "example_#3.mp3"
This probably doesn't matter for the code you're writing, but if you ever may have multiple threads or processes using this algorithm on the same files, there's a race condition when checking for existence before saving: Two writers can both find the same filename unused and write to it. If that matters to you, then open the file in a mode that fails if it exists, rescuing the exception. When the exception occurs, pick a different name. Roughly:
loop do
begin
File.open(filename, File::CREAT | File::EXCL | File::WRONLY) do |file|
file.puts "Your content goes here"
end
break
rescue Errno::EEXIST
filename = increment(filename)
redo
end
end
Here's a variation that doesn't accept a file name with an existing count:
def non_colliding_filename( filename )
if File.exists?(filename)
base,ext = /\A(.+?)(\.[^.]+)?\Z/.match( filename ).to_a[1..-1]
i = 1
i += 1 while File.exists?( filename="#{base}_##{i}#{ext}" )
end
filename
end
Proof:
%w[ foo bar.mp3 jim.bob.mp3 ].each do |desired|
3.times{
file = non_colliding_filename( desired )
p file
File.open( file, 'w' ){ |f| f << "tmp" }
}
end
#=> "foo"
#=> "foo_#1"
#=> "foo_#2"
#=> "bar.mp3"
#=> "bar_#1.mp3"
#=> "bar_#2.mp3"
#=> "jim.bob.mp3"
#=> "jim.bob_#1.mp3"
#=> "jim.bob_#2.mp3"

Ruby Deleting subdirectories that contain only a specific directory

I need to delete a bunch of subdirectories that only contain other directories, and ".svn" directories.
If you look at it like a tree, the "leaves" contain only ".svn" directories, so it should be possible to delete the leaves, then step back up a level, delete the new leaves, etc.
I think this code should do it, but I'm stuck on what to put in "something".
Find.find('./com/') do |path|
if File.basename(path) == 'something'
FileUtils.remove_dir(path, true)
Find.prune
end
end
Any suggestions?
This one takes new leaves into account (sort.reverse for entries means that /a/b/.svn is processed before /a/b; thus if /a/b is otherwise empty, it will be removed and size<=2 is because with FNM_DOTMATCH glob will always return a minimum of 2 entries ('.' and '..'))
require 'fileutils'
def delete_leaves(dirname)
Dir.glob(dirname+"/**/",File::FNM_DOTMATCH).sort.reverse.each do |d|
FileUtils.rm_rf(d) if d.match(/.svn/) or Dir.glob(d+"/*",File::FNM_DOTMATCH).size<=2
end
end
delete_leaves(ARGV[0])
This would do the job... however it doesn't take into consideration, that the it's own run could create new leaves
#!/usr/bin/env ruby
require 'fileutils'
def remove_leaves(dir=".")
Dir.chdir(dir) do
entries=Dir.entries(Dir.pwd).reject { |e| e=="." or e==".."}
if entries.size == 1 and entries.first == ".svn"
puts "Removing #{Dir.pwd}"
FileUtils.rm_rf(Dir.pwd)
else
entries.each do |e|
if File.directory? e
remove_leaves(e)
end
end
end
end
end
remove_leaves

Resources