Recursively rename folders based on contents in Ruby - ruby

I have a collection of folders (within a folder) that all need to be renamed based on their contents.
Specifically, I'd like to rename "/working_directory/my_folder/my_file.extension" to /working_directory/my_file/my_file.extension"
There are a few other files within /my_folder/. How might I recursively do this using ruby?
I'm new to ruby and programming, I have so tried to just extract the file names, but have not have much luck. The attempt at itterating through the folders. This will cycle through /working_directory/ every time Find.find is called. The intent is to search /working_directory/my_folder/ only for the file with the .fls extension.
require 'find'
Path = "/working_directory/"
Dir.foreach(Path) do |file|
puts file
new_dir = Path+file
puts new_dir
Find.find(new_dir) do |i| # this is intended to by /working_directory/my_folder/
fls_file << i if i =~ /.*\.fls$/
puts fls_file
end
end

Assuming, the my_file is to be chosen by extension, one might do:
Dir["/working_directory/**/*"].select do |dir_or_file|
File.directory? dir_or_file # select only directories, recursively
end.inject({}) do |memo, dir|
new_name = Dir["#{dir}/*.extension"].to_a
unless new_name.size == 1 # check if the folder contains only one proper file
puts "Multiple/No choices; can not rename dir [#{dir}] ⇒ skipping..."
next memo # skip if no condition met
end
my_file = new_name.first[/[^\/]+(?=\.extension\z)/] # get my_name
memo[dir] = dir.gsub /[^\/]+(?=\/#{myfile}\.extension\z)/, my_file
memo
end.each do |old, neu|
# dry run to make sure everything is OK
puts "Gonna rename #{old} to #{neu}"
# uncomment the lines below as you are certain the code works properly
# neu_folder = neu[/(.*?)([^\/]+\z)/, 1]
# FileUtils.mkdir neu_folder unless File.exist? neu_folder
# FileUtils.mv old, neu # rename
end
The rename is done after the main processing for the sake of previous iterator consistency, probably in this case it might be done in the previous loop, instead of injecting old: neu pairs into hash and iterating it later.
We are heavily using string parsing with regexps here.
my_file = new_name.first[/[^\/]+(?=\.extension\z)/] # get my_name
this line gets a new folder name by parsing a tail of the string, containing no slashes and trailing with '.extension\z' (see positive lookahead.)
memo[dir] = dir.gsub /[^\/]+(?=\/#{myfile}\.extension\z)/, my_file
This line assigns a new element on an accumulator hash, substituting the old folder name with the new one.

Related

Checking for words inside folders/subfolders and files

I am having issue with regular expressions. So basically I have a folder and this folder contains sub folders as well as files. I have to check for certain words in those folders. The words I have to check for are located in a file called words.txt file.
This is the code that I have so far in Ruby:
def check_words
array_of_words = File.readlines('words.txt')
re = Regexp.union(array_of_words)
new_array_of_words = [/\b(?:#{re.source})\b/]
Dir['source_test/**/*'].select{|f| File.file?(f) }.each do |filepath|
new_array_of_words.each do |word|
puts File.foreach(filepath).include?(word)
end
end
end
When I execute this code I keep getting false even though some of the files inside the folders/subfolders contains those words.
new_array_of_words is a single regex, and the include? methods acts on strings (and it doesn't make much sense to iterate over a single regex anyway).
You can keep using the regex, but use regex methods instead.
You can also fix your current code as follows:
arr = File.readlines('/home/afifit/Tests/word.txt')
arr.select { |e| e.strip! }
Dir['yourdir/*'].select{|f| File.file?(f) }.each do |filepath|
arr.each do |word|
puts File.foreach(filepath).include?(word)
end
end
I also used strip to remove any unnecessary whitespaces and newlines.

Check if file/folder is in a subdirectory in Ruby

What's the nicest way to check if a given file/directory is in some other directory (or one of its subdirectories)? Platform-independence and absolute/relative path handling would be nice.
One easy way is just to search through the files and check each time, but maybe there is a better one.
e.g. given directory A, is A anywhere in the directory subtree rooted at B, i.e. is_underneath?(A,B) or something.
A nice and quickly way is to use glob method provided by Dir class in the Ruby stdlib.
glob( pattern, [flags] ) # => matches
Expands pattern, which is an Array of patterns or a pattern String, and returns the results as matches or as arguments given to the block.
Works both with file and directory and allow you to search recursively.
It returns an array with the files/dirs which match the pattern, it will be empty if no one matches.
root = '/my_root'
value = 'et_voila.txt'
Dir.glob("#{root}/**/#{value}")
# ** Matches directories recursively.
# or you can pass also the relative path
Dir.glob("./foo/**/#{value}")
I hope I understood your question correct.
An example:
require 'pathname'
A = '/usr/xxx/a/b/c.txt'
path = Pathname.new(A)
[
'/usr/xxx/a/b',
'/usr/yyy/a/b',
].each{|b|
if path.fnmatch?(File.join(b,'**'))
puts "%s is in %s" % [A,b]
else
puts "%s is not in %s" % [A,b]
end
}
Result:
/usr/xxx/a/b/c.txt is in /usr/xxx/a/b
/usr/xxx/a/b/c.txt is not in /usr/yyy/a/b
The solution uses the class Pathname. An advantage of it: Pathname represents the name of a file or directory on the filesystem, but not the file itself. So you can make your test without a read access to the file.
The test itself is made with Pathname#fnmatch? and a glob-pattern File.join(path,'**') (** means all sub-directories).
If you need it more often, you could extend Pathname:
require 'pathname'
class Pathname
def is_underneath?(path)
return self.fnmatch?(File.join(path,'**'))
end
end
A = '/usr/xxx/a/b/c.txt'
path = Pathname.new(A)
[
'/usr/xxx/a/b',
'/usr/yyy/a/b',
].each{|b|
if path.is_underneath?(b)
puts "%s is in %s" % [A,b]
else
puts "%s is not in %s" % [A,b]
end
}
To handle absolute/relative pathes it may help to expand the pathes like in (sorry, this is untested).
class Pathname
def is_underneath?(path)
return self.expand_path.fnmatch?(File.expand_path(File.join(path,'**')))
end
end

Improve speed of the file search in Ruby

Given a directory with about 100 000 small files (each files is about 1kB).
I need to get list of these files and iterate over it in order to find files with the same name but different case (the files are on Linux ext4 FS).
Currently, I use some code like this:
def similar_files_in_folder(file_path, folder, exclude_folders = false)
files = Dir.glob(file_path, File::FNM_CASEFOLD)
files_set = files.select{|f| f.start_with?(folder)}
return files_set unless exclude_folders
files_set.reject{|entry| File.directory? entry}
end
dir_entries = Dir.entries(#directory) - ['.', '..']
dir_entries.map do |file_name|
similar_files_in_folder(file_name, #directory)
end
The issue with this approach is that the snippet takes a lot!!! of time to finish.
It is about some hours on my system.
Is there another way to achieve the same goal but much faster in Ruby?
Limitation: I can't load the file list in memory and then just compare the names in down case, because in the #directory new files are appear.
So, I need to scan the #directory on each iteration.
Thanks for any hint.
If I understand your code correctly, this already returns an array of all those 100k filenames:
dir_entries = Dir.entries(#directory) - ['.', '..']
#=> ["foo.txt", "bar.txt", "BAR.txt", ...]
I would group this array by the lowercase filename:
dir_entries.group_by(&:downcase)
#=> {"foo.txt"=>["foo.txt"], "bar.txt"=>["bar.txt", "BAR.txt"], ... }
And select the ones with more than 1 occurrences:
dir_entries.group_by(&:downcase).select { |k, v| v.size > 1 }
#=> {"bar.txt"=>["bar.txt", "BAR.txt"], ...}
What I meant by my comment was that you could search for a string as you traverse the filesystem, instead of first building up a huge array of all possible files and only then searching. I wrote something similar to a linux find <path> | grep --color -i <pattern> , except highlighting the pattern only in basename:
require 'find'
#find files whose basename matches a pattern (and output results to console)
def find_similar(s, opts={})
#by default, path is '.', case insensitive, no bash terminal coloring
opts[:verbose] ||= false
opts[:path] ||= '.'
opts[:insensitive]=true if opts[:insensitive].nil?
opts[:color]||=false
boldred = "\e[1m\e[31m\\1\e[0m" #contains an escaped \1 for regex
puts "searching for \"#{s}\" in \"#{opts[:path]}\", insensitive=#{opts[:insensitive]}..." if opts[:verbose]
reg = opts[:insensitive] ? /(#{s})/i : /(#{s})/
dir,base = '',''
Find.find(opts[:path]) {|path|
dir,base = File.dirname(path), File.basename(path)
if base =~ reg
if opts[:color]
puts "#{dir}/#{base.gsub(reg, boldred)}"
else
puts path
end
end
}
end
time = Time.now
#find_similar('LOg', :color=>true) #similar to find . | grep --color -i LOg
find_similar('pYt', :path=>'c:/bin/sublime3/', :color=>true, :verbose=>true)
puts "search took #{Time.now-time}sec"
example output (cygwin), but also works if run from cmd.exe

How to mass rename files in ruby

I have been trying to work out a file rename program based on ruby, as a programming exercise for myself (I am aware of rename under linux, but I want to learn Ruby, and rename is not available in Mac).
From the code below, the issue is that the .include? method always returns false even though I see the filename contains such search pattern. If I comment out the include? check, gsub() does not seem to generate a new file name at all (i.e. file name remains the same). So can someone please take a look at see what I did wrong? Thanks a bunch in advance!
Here is the expected behavior:
Assuming that in current folder there are three files: a1.jpg, a2.jpg, and a3.jpg
The Ruby script should be able to rename it to b1.jpg, b2.jpg, b3.jpg
#!/Users/Antony/.rvm/rubies/ruby-1.9.3-p194/bin/ruby
puts "Enter the file search query"
searchPattern = gets
puts "Enter the target to replace"
target = gets
puts "Enter the new target name"
newTarget = gets
Dir.glob("./*").sort.each do |entry|
origin = File.basename(entry, File.extname(entry))
if origin.include?(searchPattern)
newEntry = origin.gsub(target, newTarget)
File.rename( origin, newEntry )
puts "Rename from " + origin + " to " + newEntry
end
end
Slightly modified version:
puts "Enter the file search query"
searchPattern = gets.strip
puts "Enter the target to replace"
target = gets.strip
puts "Enter the new target name"
newTarget = gets.strip
Dir.glob(searchPattern).sort.each do |entry|
if File.basename(entry, File.extname(entry)).include?(target)
newEntry = entry.gsub(target, newTarget)
File.rename( entry, newEntry )
puts "Rename from " + entry + " to " + newEntry
end
end
Key differences:
Use .strip to remove the trailing newline that you get from gets. Otherwise, this newline character will mess up all of your match attempts.
Use the user-provided search pattern in the glob call instead of globbing for everything and then manually filtering it later.
Use entry (that is, the complete filename) in the calls to gsub and rename instead of origin. origin is really only useful for the .include? test. Since it's a fragment of a filename, it can't be used with rename. I removed the origin variable entirely to avoid the temptation to misuse it.
For your example folder structure, entering *.jpg, a, and b for the three input prompts (respectively) should rename the files as you are expecting.
I used the accepted answer to fix a bunch of copied files' names.
Dir.glob('./*').sort.each do |entry|
if File.basename(entry).include?(' copy')
newEntry = entry.gsub(' copy', '')
File.rename( entry, newEntry )
end
end
Your problem is that gets returns a newline at the end of the string. So, if you type "foo" then searchPattern becomes "foo\n". The simplest fix is:
searchPattern = gets.chomp
I might rewrite your code slightly:
$stdout.sync
print "Enter the file search query: "; search = gets.chomp
print "Enter the target to replace: "; target = gets.chomp
print " Enter the new target name: "; replace = gets.chomp
Dir['*'].each do |file|
# Skip directories
next unless File.file?(file)
old_name = File.basename(file,'.*')
if old_name.include?(search)
# Are you sure you want gsub here, and not sub?
# Don't use `old_name` here, it doesn't have the extension
new_name = File.basename(file).gsub(target,replace)
File.rename( file, new_path )
puts "Renamed #{file} to #{new_name}" if $DEBUG
end
end
Here's a short version I've used today (without pattern matching)
Save this as rename.rb file and run it inside the command prompt with ruby rename.rb
count = 1
newname = "car"
Dir["/path/to/folder/*"].each do |old|
File.rename(old, newname + count.to_s)
count += 1
end
I had /Copy of _MG_2435.JPG converted into car1, car2, ...
I made a small script to rename the entire DBZ serie by seasons and implement this:
count = 1
new_name = "Dragon Ball Z S05E"
format_file = ".mkv"
Dir.glob("dragon ball Z*").each do |old_name|
File.rename(old_name, new_name + count.to_s + format_file)
count += 1
end
The result would be:
Dragon Ball Z S05E1
Dragon Ball Z S05E2
Dragon Ball Z S05E3
In a folder, I wanted to remove the trailing underscore _ of any audio filename while keeping everything else. Sharing my code here as it might help someone.
What the program does:
Prompts the user for the:
Directory path: c:/your/path/here (make sure to use slashes /, not backslashes, \, and without the final one).
File extension: mp3 (without the dot .)
Trailing characters to remove: _
Looks for any file ending with c:/your/path/here/filename_.mp3 and renames it c:/your/path/here/filename.mp3 while keeping the file’s original extension.
puts 'Enter directory path'
path = gets.strip
directory_path = Dir.glob("#{path}/*")
# Get file extension
puts 'Enter file extension'
file_extension = gets.strip
# Get trailing characters to remove
puts 'Enter trailing characters to remove'
trailing_characters = gets.strip
suffix = "#{trailing_characters}.#{file_extension}"
# Rename file if condition is met
directory_path.each do |file_path|
next unless file_path.end_with?(suffix)
File.rename(file_path, "#{file_path.delete_suffix(suffix)}.#{file_extension}")
end

Ruby class is outputting files as well as directories

Why does below class output directory's as well as filenames on line "print "\n"+f" ?
I just want to output the files but directories are also being outputted.
class Sort
require 'find'
directoryToSort = "c:\\test"
total_size = 0
Find.find(directoryToSort) do |path|
if FileTest.directory?(path)
if File.basename(path)[0] == ?.
Find.prune # Don't look any further into this directory.
else
Dir.foreach(path) do
|f|
# do whatever you want with f, which is a filename within the
# given directory (not fully-qualified)
if !FileTest.directory? f
print "\n"+f
end
end
next
end
else
end
end
end
It says right there in a comment:
# do whatever you want with f, which is a filename within the
# given directory (not fully-qualified)
key being "not fully-qualified" part. you need to do something like:
if !FileTest.directory? (path + File::SEPARATOR + f)
Consider using the Ruby standard File.directory? method instead.
you need File.directory?( filename ) to check if it's a filename
you probably want to do something along these lines....
this is a helper method for doing recursive directory descend and executing a block depending
on if the filename matches a certain Regular Expressions.. a bit overkill for you, but maybe this helps.
# recursiveDirectoryDescend
# do action for files matching regexp
#
# (not very elegant solution, but just for illustration purposes. Pulled from some very old code.)
def recursive_dir_descend(dir,regexp,action)
olddir = Dir.pwd
dirp = Dir.open(dir)
Dir.chdir(dir)
pwd = Dir.pwd
for file in dirp
file.chomp
next if file =~ /^\.\.?$/ # ON UNIX, ignore '.' and '..' directories
filename = "#{pwd}/#{file}"
if File.directory?(filename) # CHECK IF DIRECTORY
recursive_dir_descend(filename,regexp,action)
else
if file =~ regexp
eval action # execute action on filename
end
end
end
Dir.chdir(olddir)
end

Resources