A twist on directory walking in Ruby - ruby

I'd like to do the following:
Given a directory tree:
Root
|_dirA
|_dirB
|_file1
|_file2
|_dirC
|_dirD
|_dirE
|_file3
|_file4
|_dirF
|_dirG
|_file5
|_file6
|_file7
... I'd like to walk the directory tree and build an array that contains the path to the first file in each directory that has at least one file. The overall structure may be quite large with many more files than directories, so I'd like to capture just the path to the first file without iterating through all the files in a given directory. One file is enough. For the above tree, the result should look like an array that contains only:
root/dirB/file1
root/dirC/dirD/dirE/file3
root/dirF/dirG/file5
I've played with the Dir and Find options in ruby, but my approach feels too brute-force-ish.
Is there an efficient way to code this functionality? It feels like I am missing some ruby trick here.
Many thanks!
Here's my approach:
root="/home/subtest/tsttree/"
Dir.chdir(root)
dir_list=Dir.glob("**/*/") #this invokes recursion
result=Array.new
dir_list.each do |d|
Dir.chdir(root + d)
Dir.open(Dir.pwd).each do |filename|
next if File.directory? filename #some directories may contain only other directories so exclude them
result.push(d + filename)
break
end
end
puts result
Works, but seems messy.

require 'pathname'
# My answer to stackoverflow question posted here:
# http://stackoverflow.com/questions/12684736/a-twist-on-directory-walking-in-ruby
class ShallowFinder
def initialize(root)
#matches = {}
#root = Pathname(root)
end
def matches
while match = next_file
#matches[match.parent.to_s] = match
end
#matches.values
end
private
def next_file
#root.find do |entry|
Find.prune if previously_matched?(entry)
return entry if entry.file?
end
nil
end
def previously_matched?(entry)
return unless entry.directory?
#matches.key?(entry.to_s)
end
end
puts ShallowFinder.new('Root').matches
Outputs:
Root/B/file1
Root/C/D/E/file3
Root/F/G/file5

Related

Recursive in Ruby - Return method in itself

I want to return the method in itself
def self.open_folder(file)
Dir.glob(file+"*") do |subfiles|
if File.directory?(subfiles)
open_folder(subfiles) ###Problem here
end
if File.file?(subfiles)
open_file(subfiles)
end
end
end
What I want is to return the "open_folder" to keep open the sub-folder. I got an error
block in open_folder': stack level too deep
Can you help me to find the solution for it?
If you just want to apply some method to every file in subdirectories, you could use :
Dir.glob("**/*").select{ |path| File.file?(path) }.each{ |file| open_file(file) }
This code works for me:
def open_file(file)
# Do your stuff here
puts file
end
def open_folder(file)
Dir.glob("#{file}/*") do |subfile|
File.directory?(subfile) ? open_folder(subfile) : open_file(subfile)
end
end
open_folder('path/to/directory')
NOTES:
You don't need to define the methods as self.* if you are running this code directly in irb or outside any class defined by you.
I used string interpolation (#{foo}) instead of concatenating the string.
Appending a '/*' to file path will look for all files and directories directly under the parent (not the nested subdirectories and files).
Instead of using 2 ifs, you can use elsif in this case as only 1 of the condition can be true in each iteration.

Directory Traversal in Ruby

I have been trying to implement a directory traversal in Ruby for part of a bigger program using the simple recursive approach. However I have found that Dir.foreach does not include the directories inside of it. How can I get them listed?
Code:
def walk(start)
Dir.foreach(start) do |x|
if x == "." or x == ".."
next
elsif File.directory?(x)
walk(x)
else
puts x
end
end
end
The problem is that each time you recurse, the path you pass to File.directory? is no is just the entity (file or directory) name; all context is lost. So say you go into one/two/three/ to check if one/two/three/file.txt is a directory, File.directory? just gets "file.txt" as the path instead of the whole thing, from the perspective of the top-level directory. You have to maintain the relative path each time you recurse. This seems to work fine:
def walk(start)
Dir.foreach(start) do |x|
path = File.join(start, x)
if x == "." or x == ".."
next
elsif File.directory?(path)
puts path + "/" # remove this line if you want; just prints directories
walk(path)
else
puts x
end
end
end
For recursion you should use Find:
From the documentation:
The Find module supports the top-down traversal of a set of file paths.
For example, to total the size of all files under your home directory, ignoring anything in a “dot” directory (e.g. $HOME/.ssh):
require 'find'
total_size = 0
Find.find(ENV["HOME"]) do |path|
if FileTest.directory?(path)
if File.basename(path)[0] == ?.
Find.prune # Don't look any further into this directory.
else
next
end
else
total_size += FileTest.size(path)
end
end

recursive file list in Ruby

I'm new to Ruby (being a Java dev) and trying to implement a method (oh, sorry, a function) that would retrieve and yield all files in the subdirectories recursively.
I've implemented it as:
def file_list_recurse(dir)
Dir.foreach(dir) do |f|
next if f == '.' or f == '..'
f = dir + '/' + f
if File.directory? f
file_list_recurse(File.absolute_path f) { |x| yield x }
else
file = File.new(f)
yield file
end
end
end
My questions are:
Does File.new really OPEN a file? In Java new File("xxx") doesn't... If I need to yield some structure that I could query file info (ctime, size etc) from what would it be in Ruby?
{ |x| yield x } looks a little strange to me, is this OK to do yields from recursive functions like that, or is there some way to avoid it?
Is there any way to avoid checking for '.' and '..' on each iteration?
Is there a better way to implement this?
Thanks
PS:
the sample usage of my method is something like this:
curr_file = nil
file_list_recurse('.') do |file|
curr_file = file if curr_file == nil or curr_file.ctime > file.ctime
end
puts curr_file.to_path + ' ' + curr_file.ctime.to_s
(that would get you the oldest file from the tree)
==========
So, thanks to #buruzaemon I found out the great Dir.glob function which saved me a couple of lines of code.
Also, thanks to #Casper I found out the File.stat method, which made my function run two times faster than with File.new
In the end my code is looking something like this:
i=0
curr_file = nil
Dir.glob('**/*', File::FNM_DOTMATCH) do |f|
file = File.stat(f)
next unless file.file?
i += 1
curr_file = [f, file] if curr_file == nil or curr_file[1].ctime > file.ctime
end
puts curr_file[0] + ' ' + curr_file[1].ctime.to_s
puts "total files #{i}"
=====
By default Dir.glob ignores file names starting with a dot (considered to be 'hidden' in *nix), so it's very important to add the second argument File::FNM_DOTMATCH
How about this?
puts Dir['**/*.*']
According to the docs File.new does open the file. You might want to use File.stat instead, which gathers file-related stats into a queryable object. But note that the stats are gathered at point of creation. Not when you call the query methods like ctime.
Example:
Dir['**/*'].select { |f| File.file?(f) }.map { |f| File.stat(f) }
this thing tells me to consider accepting an answer, I hope it wouldn't mind me answering it myself:
i=0
curr_file = nil
Dir.glob('**/*', File::FNM_DOTMATCH) do |f|
file = File.stat(f)
next unless file.file?
i += 1
curr_file = [f, file] if curr_file == nil or curr_file[1].ctime > file.ctime
end
puts curr_file[0] + ' ' + curr_file[1].ctime.to_s
puts "total files #{i}"
You could use the built-in Find module's find method.
If you are on Windows see my answer here under for a mutch faster (~26 times) way than standard Ruby Dir. If you use mtime it's still going to be waaayyy faster.
If you use another OS you could use the same technique, I'm curious if the gain would be that big but I'm almost certain.
How to find the file path of file that is not the current file in ruby

Avoiding making multiple calls to Find.find("./") in Ruby

I am not sure what is the best strategy for this. I have a class, where I can search the filesystem for a certain pattern of files. I want to execute Find.find("./") only once. how would I approach this:
def files_pattern(pattern)
Find.find("./") do |f|
if f.include? pattern
#fs << f
end
end
end
Remembering the (usually computationally intensive) result of a method call so that you don't need to recalculate it next time the method is called is known as memoization so you will probably want to read more about that.
One way of achieving that it Ruby is to use a little wrapper class that stores the result in an instance variable. e.g.
class Finder
def initialize(pattern)
#pattern = pattern
end
def matches
#matches ||= find_matches
end
private
def find_matches
fs = []
Find.find("./") do |f|
if f.include? #pattern
fs << f
end
end
fs
end
end
And then you can do:
irb(main):089:0> f = Finder.new 'xml'
=> #<Finder:0x2cfc568 #pattern="xml">
irb(main):090:0> f.matches
find_matches
=> ["./example.xml"]
irb(main):091:0> f.matches # won't result in call to find_matches
=> ["./example.xml"]
Note: the ||= operator performs an assignment only if the variable on the left hand side does evaluates to false. i.e. #matches ||= find_matches is shorthand for #matches = #matches || find_matches where find_matches will only be called the first time due to short circuit evaluation. There are lots of other questions explaining it on Stackoverflow.
Slight variation: You could change your method to return a list of all files and then use methods from Enumerable such as grep and select to perform multiple searches against the same list of files. Of course, this has the downside of keeping the entire list of files in memory. Here is an example though:
def find_all
fs = []
Find.find("./") do |f|
fs << f
end
fs
end
And then use it like:
files = find_all
files.grep /\.xml/
files.select { |f| f.include? '.cpp' }
# etc
If I understand your question correctly you want to run Find.find to assign the result to an instance variable. You can move what is now the block to a separate method and call that to return only files matching your pattern.
Only problem is that if the directory contains many files, you are holding a big array in memory.
how about system "find / -name #{my_pattern}"

Ruby Deleting subdirectories that contain only a specific directory

I need to delete a bunch of subdirectories that only contain other directories, and ".svn" directories.
If you look at it like a tree, the "leaves" contain only ".svn" directories, so it should be possible to delete the leaves, then step back up a level, delete the new leaves, etc.
I think this code should do it, but I'm stuck on what to put in "something".
Find.find('./com/') do |path|
if File.basename(path) == 'something'
FileUtils.remove_dir(path, true)
Find.prune
end
end
Any suggestions?
This one takes new leaves into account (sort.reverse for entries means that /a/b/.svn is processed before /a/b; thus if /a/b is otherwise empty, it will be removed and size<=2 is because with FNM_DOTMATCH glob will always return a minimum of 2 entries ('.' and '..'))
require 'fileutils'
def delete_leaves(dirname)
Dir.glob(dirname+"/**/",File::FNM_DOTMATCH).sort.reverse.each do |d|
FileUtils.rm_rf(d) if d.match(/.svn/) or Dir.glob(d+"/*",File::FNM_DOTMATCH).size<=2
end
end
delete_leaves(ARGV[0])
This would do the job... however it doesn't take into consideration, that the it's own run could create new leaves
#!/usr/bin/env ruby
require 'fileutils'
def remove_leaves(dir=".")
Dir.chdir(dir) do
entries=Dir.entries(Dir.pwd).reject { |e| e=="." or e==".."}
if entries.size == 1 and entries.first == ".svn"
puts "Removing #{Dir.pwd}"
FileUtils.rm_rf(Dir.pwd)
else
entries.each do |e|
if File.directory? e
remove_leaves(e)
end
end
end
end
end
remove_leaves

Resources