Ruby Deleting subdirectories that contain only a specific directory - ruby

I need to delete a bunch of subdirectories that only contain other directories, and ".svn" directories.
If you look at it like a tree, the "leaves" contain only ".svn" directories, so it should be possible to delete the leaves, then step back up a level, delete the new leaves, etc.
I think this code should do it, but I'm stuck on what to put in "something".
Find.find('./com/') do |path|
if File.basename(path) == 'something'
FileUtils.remove_dir(path, true)
Find.prune
end
end
Any suggestions?

This one takes new leaves into account (sort.reverse for entries means that /a/b/.svn is processed before /a/b; thus if /a/b is otherwise empty, it will be removed and size<=2 is because with FNM_DOTMATCH glob will always return a minimum of 2 entries ('.' and '..'))
require 'fileutils'
def delete_leaves(dirname)
Dir.glob(dirname+"/**/",File::FNM_DOTMATCH).sort.reverse.each do |d|
FileUtils.rm_rf(d) if d.match(/.svn/) or Dir.glob(d+"/*",File::FNM_DOTMATCH).size<=2
end
end
delete_leaves(ARGV[0])

This would do the job... however it doesn't take into consideration, that the it's own run could create new leaves
#!/usr/bin/env ruby
require 'fileutils'
def remove_leaves(dir=".")
Dir.chdir(dir) do
entries=Dir.entries(Dir.pwd).reject { |e| e=="." or e==".."}
if entries.size == 1 and entries.first == ".svn"
puts "Removing #{Dir.pwd}"
FileUtils.rm_rf(Dir.pwd)
else
entries.each do |e|
if File.directory? e
remove_leaves(e)
end
end
end
end
end
remove_leaves

Related

Hash with duplicate key values

I'm writing a small script for checking for repeated files inside a folder. I did with array, and i was successful. The problem is that i want to store the folder location also, so i can see where the duplicated files are.
My first thought was using a Hash. But since you will have a lot of files in the same folder, I can't do: hash[folder] = file. The reverse is also impossivel, because if I have repeated files, they will be overwritten (hash[file] = folder)
So what is the best approach to do that?
My code:
class FilesList
attr_accessor :elements
def initialize(path)
#elements = Hash.new
#path = path
printDirectory(#path)
end
def printDirectory(folderPath)
entries = Dir.entries(folderPath) - [".", "..", "repeat.rb"]
entries.each do |single|
if File.directory?("#{folderPath}/#{single}")
printDirectory("#{folderPath}/#{single}")
else
#elements[single] = folderPath
end
end
end
def printArray
puts #elements
end
def each()
#elements.each do |x, y|
yield x y
end
end
def checkRepeated
if #elements.length == #elements.keys.uniq.length
puts "No repeated Files"
else
counts = Hash.new(0)
#elements.each do |key,val|
counts[val] += 1
end
repeateds = counts.reject{|val,count|count==1}.keys
puts repeateds
end
end
end
array = FilesList.new(Dir.pwd)
array.printArray
You may store arrays (or sets) of file names (or folder paths) as hash values
For example, in your code you may change #elements[single] = folderPath to:
#elements[single] ||= []
#elements[single] << folderPath
And then later, your val's will be arrays of folders where file was met.
Similar to the above, but don't use the file as the key.
#elements = Hash.new([])
entries.each do |single|
if File.directory?("#{folderPath}/#{single}")
printDirectory("#{folderPath}/#{single}")
else
#elements[folderPath] << single
end
end
Then you'll get something that looks like this:
{ '/path1' => ['awesome_file.rb', 'beautiful.js'],
'/path2' => ['beautiful.js', 'coffee.rb'] }
And then, if I understand you correctly, you can find duplicate files like so:
files = #elements.values.flatten
repeateds = files.select{ |file| files.count(file) > 1 }
This will return an array: ["beautiful.js", "beautiful.js"], which you can call .uniq on to just get the one ["beautiful.js", or do a count if you like, or map the results to another hash that tells you how often it's repeated, etc.

Dir.glob to get all csv and xls files in folder

folder_to_analyze = ARGV.first
folder_path = File.join(Dir.pwd, folder_to_analyze)
unless File.directory?(folder_path)
puts "Error: #{folder_path} no es un folder valido."
exit
end
def get_csv_file_paths(path)
files = []
Dir.glob(path + '/**/*.csv').each do |f|
files << f
end
return files
end
def get_xlsx_file_path(path)
files = []
Dir.glob(path + '/**/*.xls').each do |f|
files << f
end
return files
end
files_to_process = []
files_to_process << get_csv_file_paths(folder_path)
files_to_process << get_xlsx_file_path(folder_path)
puts files_to_process[1].length # Not what I want, I want:
# puts files_to_process.length
I'm trying to make a simple script in Ruby that allows me to call it from the command line, like ruby counter.rb mailing_list1 and it goes to the folder and counts all .csv and .xls files.
I intend to operate on each file, getting a row count, etc.
Currently the files_to_process array is actually an array of array - I don't want that. I want to have a single array of both .csv and .xls files.
Since I don't know how to yield from the Dir.glob call, I added them to an array and returned that.
How can I accomplish this using a single array?
Just stick the file extensions together into one group:
Dir[path + "/**/*.{csv,xls}"]
Well, yielding is simple. Just yield.
def get_csv_file_paths(path)
Dir.glob(path + '/**/*.csv').each do |f|
yield f
end
end
def get_xlsx_file_path(path)
Dir.glob(path + '/**/*.xls').each do |f|
yield f
end
end
files_to_process = []
get_csv_file_paths(folder_path) {|f| files_to_process << f }
get_xlsx_file_path(folder_path) {|f| files_to_process << f }
puts files_to_process.length
Every method in ruby can be passed a block. And yield keyword sends data to that block. If the block may or may not be provided, yield is usually used with block_given?.
yield f if block_given?
Update
The code can be further simplified by passing your block directly to glob.each:
def get_csv_file_paths(path, &block)
Dir.glob(path + '/**/*.txt').each(&block)
end
def get_xlsx_file_path(path, &block)
Dir.glob(path + '/**/*.xls').each(&block)
end
Although this block/proc conversion is a little bit of advanced topic.
def get_folder_paths(root_path)
Dir.glob('**/*.csv') + Dir.glob('**/*.xls')
end
folder_path = File.join(Dir.pwd, ARGV.first || '')
raise "#{folder_path} is not a valid folder" unless File.directory?(folder_path)
puts get_folder_paths(folder_path).length
The get_folder_paths method returns an array of CSV and XLS files. Building an array of file names may not be what you really want, especially if there are a lot of them. An approach using the Enumerator returned by Dir.glob would be more appropriate in that case if you did not need the file count first.

Directory Traversal in Ruby

I have been trying to implement a directory traversal in Ruby for part of a bigger program using the simple recursive approach. However I have found that Dir.foreach does not include the directories inside of it. How can I get them listed?
Code:
def walk(start)
Dir.foreach(start) do |x|
if x == "." or x == ".."
next
elsif File.directory?(x)
walk(x)
else
puts x
end
end
end
The problem is that each time you recurse, the path you pass to File.directory? is no is just the entity (file or directory) name; all context is lost. So say you go into one/two/three/ to check if one/two/three/file.txt is a directory, File.directory? just gets "file.txt" as the path instead of the whole thing, from the perspective of the top-level directory. You have to maintain the relative path each time you recurse. This seems to work fine:
def walk(start)
Dir.foreach(start) do |x|
path = File.join(start, x)
if x == "." or x == ".."
next
elsif File.directory?(path)
puts path + "/" # remove this line if you want; just prints directories
walk(path)
else
puts x
end
end
end
For recursion you should use Find:
From the documentation:
The Find module supports the top-down traversal of a set of file paths.
For example, to total the size of all files under your home directory, ignoring anything in a “dot” directory (e.g. $HOME/.ssh):
require 'find'
total_size = 0
Find.find(ENV["HOME"]) do |path|
if FileTest.directory?(path)
if File.basename(path)[0] == ?.
Find.prune # Don't look any further into this directory.
else
next
end
else
total_size += FileTest.size(path)
end
end

A twist on directory walking in Ruby

I'd like to do the following:
Given a directory tree:
Root
|_dirA
|_dirB
|_file1
|_file2
|_dirC
|_dirD
|_dirE
|_file3
|_file4
|_dirF
|_dirG
|_file5
|_file6
|_file7
... I'd like to walk the directory tree and build an array that contains the path to the first file in each directory that has at least one file. The overall structure may be quite large with many more files than directories, so I'd like to capture just the path to the first file without iterating through all the files in a given directory. One file is enough. For the above tree, the result should look like an array that contains only:
root/dirB/file1
root/dirC/dirD/dirE/file3
root/dirF/dirG/file5
I've played with the Dir and Find options in ruby, but my approach feels too brute-force-ish.
Is there an efficient way to code this functionality? It feels like I am missing some ruby trick here.
Many thanks!
Here's my approach:
root="/home/subtest/tsttree/"
Dir.chdir(root)
dir_list=Dir.glob("**/*/") #this invokes recursion
result=Array.new
dir_list.each do |d|
Dir.chdir(root + d)
Dir.open(Dir.pwd).each do |filename|
next if File.directory? filename #some directories may contain only other directories so exclude them
result.push(d + filename)
break
end
end
puts result
Works, but seems messy.
require 'pathname'
# My answer to stackoverflow question posted here:
# http://stackoverflow.com/questions/12684736/a-twist-on-directory-walking-in-ruby
class ShallowFinder
def initialize(root)
#matches = {}
#root = Pathname(root)
end
def matches
while match = next_file
#matches[match.parent.to_s] = match
end
#matches.values
end
private
def next_file
#root.find do |entry|
Find.prune if previously_matched?(entry)
return entry if entry.file?
end
nil
end
def previously_matched?(entry)
return unless entry.directory?
#matches.key?(entry.to_s)
end
end
puts ShallowFinder.new('Root').matches
Outputs:
Root/B/file1
Root/C/D/E/file3
Root/F/G/file5

Avoiding making multiple calls to Find.find("./") in Ruby

I am not sure what is the best strategy for this. I have a class, where I can search the filesystem for a certain pattern of files. I want to execute Find.find("./") only once. how would I approach this:
def files_pattern(pattern)
Find.find("./") do |f|
if f.include? pattern
#fs << f
end
end
end
Remembering the (usually computationally intensive) result of a method call so that you don't need to recalculate it next time the method is called is known as memoization so you will probably want to read more about that.
One way of achieving that it Ruby is to use a little wrapper class that stores the result in an instance variable. e.g.
class Finder
def initialize(pattern)
#pattern = pattern
end
def matches
#matches ||= find_matches
end
private
def find_matches
fs = []
Find.find("./") do |f|
if f.include? #pattern
fs << f
end
end
fs
end
end
And then you can do:
irb(main):089:0> f = Finder.new 'xml'
=> #<Finder:0x2cfc568 #pattern="xml">
irb(main):090:0> f.matches
find_matches
=> ["./example.xml"]
irb(main):091:0> f.matches # won't result in call to find_matches
=> ["./example.xml"]
Note: the ||= operator performs an assignment only if the variable on the left hand side does evaluates to false. i.e. #matches ||= find_matches is shorthand for #matches = #matches || find_matches where find_matches will only be called the first time due to short circuit evaluation. There are lots of other questions explaining it on Stackoverflow.
Slight variation: You could change your method to return a list of all files and then use methods from Enumerable such as grep and select to perform multiple searches against the same list of files. Of course, this has the downside of keeping the entire list of files in memory. Here is an example though:
def find_all
fs = []
Find.find("./") do |f|
fs << f
end
fs
end
And then use it like:
files = find_all
files.grep /\.xml/
files.select { |f| f.include? '.cpp' }
# etc
If I understand your question correctly you want to run Find.find to assign the result to an instance variable. You can move what is now the block to a separate method and call that to return only files matching your pattern.
Only problem is that if the directory contains many files, you are holding a big array in memory.
how about system "find / -name #{my_pattern}"

Resources