Find all sub directories and long paths - ruby

I am trying to find every folder that contains a .fbi2 file for processing later on in the script. But I am running into an issue where long paths are causing the script to terminate.
My code is as follows:
require "find"
def find_case_directories(search_directories)
case_paths = []
search_directories.each do |search_directory|
Find.find(search_directory) do |path|
if FileTest.directory?(path)
unless Dir["#{path}/case.fbi2"].empty?
case_paths << "#{path}"
Find.prune # Don't look any further into this directory.
else
next
end
end
end
end
case_paths
end
And the error I am getting is:
Script was cancelled by the user.
Errno::ESRCH: No such process - D:/Cases/<REDACTED>/<REDACTED>/20180914_Import_01/20180914_Import_01/<REDACTED>/<REDACTED>/Tranche 2/<REDACTED>/<REDACTED>.MSG
lstat at org/jruby/RubyFile.java:931
block in find at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/find.rb:51
catch at org/jruby/RubyKernel.java:1114
block in find at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/find.rb:48
each at org/jruby/RubyArray.java:1734
find at uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/find.rb:43
block in find_case_directories at <script>:76
each at org/jruby/RubyArray.java:1734
find_case_directories at <script>:75
<main> at <script>:98
I have checked the path the script mentions, and it is indeed over 255 characters in length.
How can I recursively walk directories to find all folders that contain a .fbi2 file, and store them in an array?

Related

Rename specific files depending on a diferent file in same directory

I'm practising some programming and I'm now faced with the following issue. I have a folder with multiple subfolders inside. Each subfolder contains two files: an .xlsx and a .doc file. I want to rename the .xlsx depending on the name of the .doc file. For example, in directory documents\main_folder\folder_1 there are two files: test_file.xlsx and final_file.doc. After running my code, result should be final_file.xlsx and final_file.doc. This must happen with all subfolders.
My code so far:
require 'FileUtils'
filename = nil
files = Dir.glob('**/*.doc')
files.each do |rename|
filename = File.basename(rename, File.extname(rename))
puts "working with file: #{filename}"
end
subs = Dir.glob('**/*.xlsx')
subs.each do |renaming|
File.rename(renaming, filename)
end
Two issues with this code: firstly, the .xlsx is moved where the .rb file is located. Secondly, renaming is partially achieved, only that the extension is not kept, but completely removed. Any help?
Dir.glob('**/*.doc').each do |doc_file|
# extract folder path e.g. "./foo" from "./foo/bar.doc"
dir = File.dirname(doc_file)
# extract filename without extension e.g. "bar" from "./foo/bar.doc"
basename = File.basename(doc_file, File.extname(doc_file))
# find the xlsx file in the same folder
xlsx_file = Dir.glob("#{dir}/*.xlsx")[0]
# perform the replacement
File.rename(xlsx_file, "#{dir}/#{basename}.xlsx")
end
edit
the validation step you requested:
# first, get all the directories
dirs = Dir.glob("**/*").select { |path| File.directory?(path) }
# then validate each of them
dirs.each do |dir|
[".doc", ".xlxs"].each do |ext|
# raise an error unless the extension has exactly 1 file
unless Dir.glob("#{dir}/*#{ext}").count == 1
raise "#{dir} doesn't have exactly 1 #{ext} file"
end
end
end
You can also bunch up the errors into one combined message if you prefer ... just push the error message into an errors array instead of raising them as soon as they come up

How do I open each file in a directory with Ruby?

I need to open each file inside a directory. My attempt at this looks like:
Dir.foreach('path/to/directory') do |filename|
next if filename == '.' || filename == '..'
puts "working on #{filename}"
# this is where it crashes
file = File.open(filename, 'r')
#some code
file.close
# more code
end
My code keeps crashing at File.open(filename, 'r'). I'm not sure what filename should be.
The filename should include the path to the file when the file is not in the same directory than the Ruby file itself:
path = 'path/to/directory'
Dir.foreach(path) do |filename|
next if filename == '.' || filename == '..'
puts "working on #{filename}"
file = File.open("#{path}/#{filename}", 'r')
#some code
file.close
# more code
end
I recommend using Find.find.
While we can use various methods from the Dir class, it will look and retrieve the list of files before returning, which can be costly if we're recursively searching multiple directories or have a huge number of files embedded in the directories.
Instead, Find.find will walk the directories, returning both the directories and files as each is found. A simple check lets us decide which we want to continue processing or whether we want to skip it. The documentation has this example which should be easy to understand:
The Find module supports the top-down traversal of a set of file paths.
For example, to total the size of all files under your home directory, ignoring anything in a “dot” directory (e.g. $HOME/.ssh):
require 'find'
total_size = 0
Find.find(ENV["HOME"]) do |path|
if FileTest.directory?(path)
if File.basename(path)[0] == ?.
Find.prune # Don't look any further into this directory.
else
next
end
else
total_size += FileTest.size(path)
end
end
I'd go for Dir.glob or File.find. But not Dir.foreach as it returns . and .. which you don't want.
Dir.glob('something/*').each do |filename|
next if File.directory?(filename)
do_something_with_the_file(filename)
end

Storing file names in an array

I'm trying to store the file names in some directory in an array. I have the following script:
files= Dir.glob('C:\Users\Abder-Rahman\Desktop\drugsatfda\*.*')
files.each do |filename|
contents = IO.read(filename)
puts contents
end
exit
But, I don't know why it doesn't work. What could I be missing?
Unfortunately, it is not described in documentation, but Dir.glob doesn't throw any exception in case you provided invalid path - it will return just empty array.
files = Dir.glob("./an/imaginary/directory/that/doesnt/exist/*")
# => []
Please, make sure, that the path you've provided both - exists, and has any files.

Search in current dir only

Im using
Find.find("c:\\test")
to search for files in a dir. I just want to search the dir at this level though, so any dir within c:\test does not get searched.
Is there another method I can use ?
Thanks
# Temporarily make c:\test your current directory
Dir.chdir('c:/test') do
# Get a list of file names just in this directory as an array of strings
Dir['*'].each do |filename|
# ...
end
end
Alternatively:
# Get a list of paths like "c:/test/foo.txt"
Dir['c:/test/*'] do |absolute|
# Get just the filename, e.g. "foo.txt"
filename = File.basename(absolute)
# ...
end
With both you can get just the filenames into an array, if you like:
files = Dir.chdir('c:/text'){ Dir['*'] }
files = Dir['c:/text/*'].map{ |f| File.basename(f) }
Find's prune method allows you to skip a current file or directory:
Skips the current file or directory,
restarting the loop with the next
entry. If the current file is a
directory, that directory will not be
recursively entered. Meaningful only
within the block associated with
Find::find.
Find.find("c:\\test") do |path|
if FileTest.directory?(path)
Find.prune # Don't look any further into this directory.
else
# path is not a directory, so must be file under c:\\test
# do something with file
end
end
You may use Dir.foreach(), for example, to list all the files under c:\test
Dir.foreach("c:\\test") {|x| puts "#{x}" }

Check directory for files, retrieve first file

I'm writing a small ruby daemon that I am hoping will do the following:
Check if a specific directory has files (in this case, .yml files)
If so, take the first file (numerically sorted preferrably), and parse into a hash
Do a 'yield', with this hash as the argument
What I have right now is like:
loop do
get_next_in_queue { |s| THINGS }
end
def get_next_in_queue
queue_dir = Dir[File.dirname(__FILE__)+'/../queue']
info = YAML::load_file(queue_dir[0]) #not sure if this works or not
yield info
end
I'd like to make the yield conditional if possible, so it only happens if a file is actually found. Thanks!
Okay, I got this working!
The problem with queue_dir.empty? is that a directory always contains [".", ".."]
So what I did was:
def get_next_in_queue
queue_dir = Dir.entries(File.dirname(__FILE__)+'/../queue')
queue_dir.delete "."
queue_dir.delete ".."
if !queue_dir.empty?
info = YAML::load_file("#{File.dirname(__FILE__)}/../queue/#{queue_dir[0]}")
yield stem_info
else
sleep(30) #since it is empty, we probably don't need to check instantly
end
end
Just add additional checks:
def get_next_in_queue
queue_dir = Dir[File.dirname(__FILE__)+'/../queue']
return if queue_dir.empty?
info = YAML::load_file(queue_dir[0]) #not sure if this works or not
yield info if info
end
Depending on your wanted behaviour, you can additionally raise an exception, log an error, sleep for N seconds etc.

Resources