Directory walk call method when directory is reached - ruby

Trying to write a script that will search through a directory and sub-directories for specific files. I would like to do know how a certain directory or directories come up to call a method.
this is what I have tried and failed:
def display_directory(path)
list = Dir[path+'/*']
return if list.length == 0
list.each do |f|
if File.directory? f #is it a directory?
if File.directory?('config')
puts "this is the config folder"
end
printf "%-50s %s\n", f, "is a directory:".upcase.rjust(25)
else
printf "%-50s %s\n", f, "is not a directory:".upcase.rjust(25)
end
end
end
start = File.join("**")
puts "Processing directory\n\n".upcase.center(30)
display_directory start
this is what I want to happen.
app
app/controllers
app/helpers
app/mailers
app/models
app/models/bugzilla
app/models/security
app/views
app/views/auth
app/views/calendar
app/views/layouts
app/views/step
app/views/step_mailer
app/views/suggestion
app/views/suggestion_mailer
app/views/task
app/views/user
bin
--------------------------------------
config <----------(call method foo)
config/environments
config/initializers
config/locales
--------------------------------------
db
db/bugzilla
db/migrate
db/security
lib
lib/tasks
log
public
public/images
public/javascripts
public/stylesheets
script
script/performance
script/process
--------------------------
test <---------(call method foobar)
test/fixtures
test/fixtures/mailer
test/functional
test/integration
test/performance
test/unit
--------------------------
vendor
vendor/plugins

Instead
if File.directory?('config')
Try
if f.path.include?('config')
but this will work for every directory that have config on the name. You can put a larger substring to make a better match.
Also, it is very idiomatic in ruby use do..end for multiline blocks and {..} for single line.

I figured out a way. this works pretty well. I've added a method to show all the files in mentioned directory when reached.
def special_dir(path)
puts "------------------------------------"
sp_path = Dir.glob(File.join(path,"*","**"))
sp_path.each do |cf|
puts "\t" + cf
end
end
def walk(path)
list = Dir[path+'/*'].reject{ |r| r['doc'] || r['tmp']}
list.each do |x|
path = File.join(path, x)
if File.directory?(x)
if x =~ /config/ or x =~ /test/
special_dir(x)
else
puts "#{x}"
walk(path)
end
else
#puts x
end
end
end
start = File.join("**")
walk start

Related

Ruby file full path

I am writing a program where I cycle through all files in sub-folders of a target folder and do stuff with what's writen in it. So my local folder looks like that
Folder
--Subfolder
---File
---File
---File
--Subfolder
---File
.
.
.
So I have a each loop to cycle through all subfolders and for each subfolder I am calling a method that basically do the same thing but in the subfolder and call for each file another method (parsing it a file argument which I obtained through a Dir.foreach(folder){ |file| method(file)} command).
So it looks like this :
Dir.foreach(Dir.pwd){ |folder| call_method(folder) }
def call_method(folder) Dir.foreach(folder){|file| reading_method(file) } end
That last called method (reading_method) should open called a C method and parse as an argument the full path of the file (so that the C program can open it) so I'm using File.absolute_path(file) in the reading_method but instead of returning C:/folder/subfolder/file as I want it to, it returns C:/folder/file skipping the subfolder (and thus the C program fail to execute).
Is there a way to get the full path of that file ?
Thanks for your help
EDIT : Here is the full code as asked
## Module
module GBK_Reader
PATH = "Z:/Folder/"
SAFETY = true
SAFETY_COUNT = 10
end
## Methods definitions
def read_file(file)
path = File.absolute_path(file)
c_string = `C:/Code/GBK_Reader/bin/Debug/GBK_Reader.exe #{path}`
return c_string.split(/ /).collect!{|spec| spec.to_i}
end
def read_folder(folder)
Dir.foreach(folder){ |file|
next if File.extname(file) != ".gbk"
temp = read_file(file)
#$bacteria_specs[0] += temp[0]
#$bacteria_specs[1] += temp[1]
}
return $bacteria_specs
end
## Main
# Look for folder
Dir.chdir(GBK_Reader::PATH)
puts "Directory found"
# Cycle through all sub-folders
$high_gc = {} #Hash to store high GC content bacterias
$count = 0
puts "Array variable set"
Dir.foreach(Dir.pwd){ |file|
next if file == "." || file == ".."
break if $count >= GBK_Reader::SAFETY_COUNT
$count += 1 if GBK_Reader::SAFETY
$bacteria_specs = [0.00, 0.00, 0.00]
$path = File.expand_path(file)
if File.directory?(file)
# Cycle through all .gbk files in sub-folder and call C program
read_folder(file)
else
# Call C program to directly evaluate GC content
c_string = read_file(file) if File.extname(file) == ".gbk"
$bacteria_specs[0] = c_string[0].to_i
$bacteria_specs[1] = c_string[1].to_i
end
# Evaluate GC content and store suitable entries
$bacteria_specs[2] = ($bacteria_specs[0]/$bacteria_specs[1])*100.00
$high_gc[file] = $bacteria_specs if $bacteria_specs[2] > 60
}
# Display suitable entries
puts "\n\n\n"
puts $high_gc
gets.chomp
Ok, I may have found something but it seems ugly so if anyone has a better solution by all means go ahead.
I edited my read_folder method to parse the full path to the read_file method as follow :
def read_folder(folder)
Dir.foreach(folder){ |file|
next if File.extname(file) != ".gbk"
path = File.absolute_path(folder)+'/'+File.basename(file)
temp = read_file(path)
$bacteria_specs[0] += temp[0]
$bacteria_specs[1] += temp[1]
}
return $bacteria_specs
end
And I do get the path I expect. (though my calling the C program still fails so I'll have to check somewhere else :D)

Script to append files

I am trying to write a script to do the following:
There are two directories A and B. In directory A, there are files called "today" and "today1". In directory B, there are three files called "today", "today1" and "otherfile".
I want to loop over the files in directory A and append the files that have similar names in directory B to the files in Directory A.
I wrote the method below to handle this but I am not sure if this is on track or if there is a more straightforward way to handle such a case?
Please note I am running the script from directory B.
def append_data_to_daily_files
directory = "B"
Dir.entries('B').each do |file|
fileName = file
next if file == '.' or file == '..'
File.open(File.join(directory, file), 'a') {|file|
Dir.entries('.').each do |item|
next if !(item.match(/fileName/))
File.open(item, "r")
file<<item
item.close
end
#file.puts "hello"
file.close
}
end
end
In my opinion, your append_data_to_daily_files() method is trying to do too many things -- which makes it difficult to reason about. Break down the logic into very small steps, and write a simple method for each step. Here's a start along that path.
require 'set'
def dir_entries(dir)
Dir.chdir(dir) {
return Dir.glob('*').to_set
}
end
def append_file_content(target, source)
File.open(target, 'a') { |fh|
fh.write(IO.read(source))
}
end
def append_common_files(target_dir, source_dir)
ts = dir_entries(target_dir)
ss = dir_entries(source_dir)
common_files = ts.intersection(ss)
common_files.each do |file_name|
t = File.join(target_dir, file_name)
s = File.join(source_dir, file_name)
append_file_content(t, s)
end
end
# Run script like this:
# ruby my_script.rb A B
append_common_files(*ARGV)
By using a Set, you can easily figure out the common files. By using glob you can avoid the hassle of filtering out the dot-directories. By designing the code to take its directory names from the command line (rather than hard-coding the names in the script), you end up with a potentially re-usable tool.
My solution....
def append_old_logs_to_daily_files
directory = "B"
#For each file in the folder "B"
Dir.entries('B').each do |file|
fileName = file
#skip dot directories
next if file == '.' or file == '..'
#Open each file
File.open(File.join(directory, file), 'a') {|file|
#Get each log file from the current directory in turn
Dir.entries('.').each do |item|
next if item == '.' or item == '..'
#that matches the day we are looking for
next if !(item.match(fileName))
#Read the log file
logFilesToBeCopied = File.open(item, "r")
contents = logFilesToBeCopied.read
file<<contents
end
file.close
}
end
end

How to open and read files line-by-line from a directory?

I am trying to read file lines from a directory containing about 200 text files, however, I can't get Ruby to read them line-by-line. I did it before, using one text file, not reading them from a directory.
I can get the file names as strings, but I am struggling to open them and read each line.
Here are some of the methods I've tried.
Method 1:
def readdirectory
#filearray = []
Dir.foreach('mydirectory') do |i|
# puts i.class
#filearray.push(i)
#filearray.each do |s|
# #words =IO.readlines('s')
puts s
end#do
# puts #words
end#do
end#readdirectory
Method 2:
def tryread
Dir.foreach('mydir'){
|x| IO.readlines(x)
}
end#tryread
Method 3:
def tryread
Dir.foreach('mydir') do |s|
File.readlines(s).each do |line|
sentence =line.split
end#inner do
end #do
end#tryread
With every attempt to open the string passed by the loop function, I keep getting the error:
Permission denied - . (Errno::EACCES)
sudo ruby reader.rb or whatever your filename is.
Since permissions are process based you can not read files with elevated permissions if the process reading does not have them.
Only solutions are either to run the script with more permissions or call another process which is already running with higher permissions to read for you.
Thanks for all replies,I did a bit of trial and error and got it to work.This is the syntax I used
Dir.entries('lemmatised').each do |s|
if !File.directory?(s)
file = File.open("pathname/#{s}", 'r')
file.each_line do |line|
count+=1
#words<<line.split(/[^a-zA-Z]/)
end # inner do
puts #words
end #if
end #do
Try this one,
#it'll hold the lines
f = []
#here test directory contains all the files,
#write the path as per the your computer,
#mine's as you can see, below
#fetch filenames and keep in sorted order
a = Dir.entries("c:/Users/lordsangram/desktop/test")
#read the files, line by line
Dir.chdir("c:/Users/lordsangram/desktop/test")
#beginning for i = 1, to ignore first two elements of array a,
#which has no associated file names
2.upto(a.length-1) do |i|
File.readlines("#{a[i]}").each do |line|
f.push(line)
end
end
f.each do |l|
puts l
end
#the Tin Man -> you need to avoid processing "." and ".." which are listed in Dir.foreach and give the permission denied error. A simple if should fix all your apporoaches.
Dir.foreach(ARGV[0]) do |f|
if f != "." and f != ".."
# code to process file
# example
# File.open(ARGV[0] + "\\" + f) do |file|
# end
end
end

How do I get all the files names in one folder using Ruby?

These are in a folder:
This_is_a_very_good_movie-y08iPnx_ktA.mp4
myMovie2-lKESbDzUwUg.mp4
his_is_another_movie-lKESbDzUwUg.mp4
How do I fetch the first part of the string mymovie1 from the file by giving the last part, y08iPnx_ktA? Something like:
get_first_part("y08iPnx_kTA") #=> "This_is_a_very_good_movie"
Break the problem into into parts. The method get_first_part should go something like:
Use Dir to get a listing of files.
Iterate over each file and;
Extract the "name" ('This_is_a_very_good_movie') and the "tag" ('y08iPnx_ktA'). The same regex should be used for each file.
If the "tag" matches what is being looked for, return "name".
Happy coding.
Play around in the REPL and have fun :-)
def get_first_part(path, suffix)
Dir.entries(path).find do |fname|
File.basename(fname, File.extname(fname)).end_with?(suffix)
end.split(suffix).first
end
Kind of expands on the answer from #Steve Wilhelm -- except doesn't use glob (there's no need for it when we're only working with filenames), avoids Regexp and uses File.exname(fname) to the File.basename call so you don't have to include the file extension. Also returns the string "This_is_a_very_good_movie" instead of an array of files.
This will of course raise if no file could be found.. in which case if you just want to return nil if a match couldn't be found:
def get_first_part(path, suffix)
file = Dir.entries(path).find do |fname|
File.basename(fname, File.extname(fname)).end_with?(suffix)
end
file.split(suffix).first if file
end
Can it be done cleaner than this? REVISED based on #Tin Man's suggestion
def get_first_part(path, suffix)
Dir.glob(path + "*" + suffix + "*").map { |x| File.basename(x).gsub(Regexp.new("#{suffix}\.*$"),'') }
end
puts get_first_part("/path/to/files/", "-y08iPnx_kTA")
If the filenames only have a single hyphen:
path = '/Users/greg/Desktop/test'
target = 'rb'
def get_files(path, target)
Dir.chdir(path) do
return Dir["*#{ target }*"].map{ |f| f.split('-').first }
end
end
puts get_files(path, 'y08iPnx_ktA')
# >> This_is_a_very_good_movie
If there are multiple hyphens:
def get_files(path, target)
Dir.chdir(path) do
return Dir["*#{ target }*"].map{ |f| f.split(target).first.chop }
end
end
puts get_files(path, 'y08iPnx_ktA')
# >> This_is_a_very_good_movie
If the code is assumed to be running from inside the directory containing the files, then Dir.chdir can be removed, simplifying things to either:
puts Dir["*#{ target }*"].map{ |f| f.split('-').first }
# >> This_is_a_very_good_movie
or
puts Dir["*#{ target }*"].map{ |f| f.split(target).first.chop }
# >> This_is_a_very_good_movie

Get names of all files from a folder with Ruby

I want to get all file names from a folder using Ruby.
You also have the shortcut option of
Dir["/path/to/search/*"]
and if you want to find all Ruby files in any folder or sub-folder:
Dir["/path/to/search/**/*.rb"]
Dir.entries(folder)
example:
Dir.entries(".")
Source: http://ruby-doc.org/core/classes/Dir.html#method-c-entries
The following snippets exactly shows the name of the files inside a directory, skipping subdirectories and ".", ".." dotted folders:
Dir.entries("your/folder").select { |f| File.file? File.join("your/folder", f) }
To get all files (strictly files only) recursively:
Dir.glob('path/**/*').select { |e| File.file? e }
Or anything that's not a directory (File.file? would reject non-regular files):
Dir.glob('path/**/*').reject { |e| File.directory? e }
Alternative Solution
Using Find#find over a pattern-based lookup method like Dir.glob is actually better. See this answer to "One-liner to Recursively List Directories in Ruby?".
This works for me:
If you don't want hidden files[1], use Dir[]:
# With a relative path, Dir[] will return relative paths
# as `[ './myfile', ... ]`
#
Dir[ './*' ].select{ |f| File.file? f }
# Want just the filename?
# as: [ 'myfile', ... ]
#
Dir[ '../*' ].select{ |f| File.file? f }.map{ |f| File.basename f }
# Turn them into absolute paths?
# [ '/path/to/myfile', ... ]
#
Dir[ '../*' ].select{ |f| File.file? f }.map{ |f| File.absolute_path f }
# With an absolute path, Dir[] will return absolute paths:
# as: [ '/home/../home/test/myfile', ... ]
#
Dir[ '/home/../home/test/*' ].select{ |f| File.file? f }
# Need the paths to be canonical?
# as: [ '/home/test/myfile', ... ]
#
Dir[ '/home/../home/test/*' ].select{ |f| File.file? f }.map{ |f| File.expand_path f }
Now, Dir.entries will return hidden files, and you don't need the wildcard asterix (you can just pass the variable with the directory name), but it will return the basename directly, so the File.xxx functions won't work.
# In the current working dir:
#
Dir.entries( '.' ).select{ |f| File.file? f }
# In another directory, relative or otherwise, you need to transform the path
# so it is either absolute, or relative to the current working dir to call File.xxx functions:
#
home = "/home/test"
Dir.entries( home ).select{ |f| File.file? File.join( home, f ) }
[1] .dotfile on unix, I don't know about Windows
In Ruby 2.5 you can now use Dir.children. It gets filenames as an array except for "." and ".."
Example:
Dir.children("testdir") #=> ["config.h", "main.rb"]
http://ruby-doc.org/core-2.5.0/Dir.html#method-c-children
Personally, I found this the most useful for looping over files in a folder, forward looking safety:
Dir['/etc/path/*'].each do |file_name|
next if File.directory? file_name
end
This is a solution to find files in a directory:
files = Dir["/work/myfolder/**/*.txt"]
files.each do |file_name|
if !File.directory? file_name
puts file_name
File.open(file_name) do |file|
file.each_line do |line|
if line =~ /banco1/
puts "Found: #{line}"
end
end
end
end
end
this code returns only filenames with their extension (without a global path)
Dir.children("/path/to/search/")
=> [file_1.rb, file_2.html, file_3.js]
While getting all the file names in a directory, this snippet can be used to reject both directories [., ..] and hidden files which start with a .
files = Dir.entries("your/folder").reject {|f| File.directory?(f) || f[0].include?('.')}
This is what works for me:
Dir.entries(dir).select { |f| File.file?(File.join(dir, f)) }
Dir.entries returns an array of strings. Then, we have to provide a full path of the file to File.file?, unless dir is equal to our current working directory. That's why this File.join().
Dir.new('/home/user/foldername').each { |file| puts file }
You may also want to use Rake::FileList (provided you have rake dependency):
FileList.new('lib/*') do |file|
p file
end
According to the API:
FileLists are lazy. When given a list of glob patterns for possible
files to be included in the file list, instead of searching the file
structures to find the files, a FileList holds the pattern for latter
use.
https://docs.ruby-lang.org/en/2.1.0/Rake/FileList.html
One simple way could be:
dir = './' # desired directory
files = Dir.glob(File.join(dir, '**', '*')).select{|file| File.file?(file)}
files.each do |f|
puts f
end
def get_path_content(dir)
queue = Queue.new
result = []
queue << dir
until queue.empty?
current = queue.pop
Dir.entries(current).each { |file|
full_name = File.join(current, file)
if not (File.directory? full_name)
result << full_name
elsif file != '.' and file != '..'
queue << full_name
end
}
end
result
end
returns file's relative paths from directory and all subdirectories
If you want get an array of filenames including symlinks, use
Dir.new('/path/to/dir').entries.reject { |f| File.directory? f }
or even
Dir.new('/path/to/dir').reject { |f| File.directory? f }
and if you want to go without symlinks, use
Dir.new('/path/to/dir').select { |f| File.file? f }
As shown in other answers, use Dir.glob('/path/to/dir/**/*') instead of Dir.new('/path/to/dir') if you want to get all the files recursively.
In addition to the suggestions in this thread, I wanted to mention that if you need to return dot files as well (.gitignore, etc), with Dir.glob you would need to include a flag as so:
Dir.glob("/path/to/dir/*", File::FNM_DOTMATCH)
By default, Dir.entries includes dot files, as well as current a parent directories.
For anyone interested, I was curious how the answers here compared to each other in execution time, here was the results against deeply nested hierarchy. The first three results are non-recursive:
user system total real
Dir[*]: (34900 files stepped over 100 iterations)
0.110729 0.139060 0.249789 ( 0.249961)
Dir.glob(*): (34900 files stepped over 100 iterations)
0.112104 0.142498 0.254602 ( 0.254902)
Dir.entries(): (35600 files stepped over 100 iterations)
0.142441 0.149306 0.291747 ( 0.291998)
Dir[**/*]: (2211600 files stepped over 100 iterations)
9.399860 15.802976 25.202836 ( 25.250166)
Dir.glob(**/*): (2211600 files stepped over 100 iterations)
9.335318 15.657782 24.993100 ( 25.006243)
Dir.entries() recursive walk: (2705500 files stepped over 100 iterations)
14.653018 18.602017 33.255035 ( 33.268056)
Dir.glob(**/*, File::FNM_DOTMATCH): (2705500 files stepped over 100 iterations)
12.178823 19.577409 31.756232 ( 31.767093)
These were generated with the following benchmarking script:
require 'benchmark'
base_dir = "/path/to/dir/"
n = 100
Benchmark.bm do |x|
x.report("Dir[*]:") do
i = 0
n.times do
i = i + Dir["#{base_dir}*"].select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.glob(*):") do
i = 0
n.times do
i = i + Dir.glob("#{base_dir}/*").select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.entries():") do
i = 0
n.times do
i = i + Dir.entries(base_dir).select {|f| !File.directory? File.join(base_dir, f)}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir[**/*]:") do
i = 0
n.times do
i = i + Dir["#{base_dir}**/*"].select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.glob(**/*):") do
i = 0
n.times do
i = i + Dir.glob("#{base_dir}**/*").select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.entries() recursive walk:") do
i = 0
n.times do
def walk_dir(dir, result)
Dir.entries(dir).each do |file|
next if file == ".." || file == "."
path = File.join(dir, file)
if Dir.exist?(path)
walk_dir(path, result)
else
result << file
end
end
end
result = Array.new
walk_dir(base_dir, result)
i = i + result.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.glob(**/*, File::FNM_DOTMATCH):") do
i = 0
n.times do
i = i + Dir.glob("#{base_dir}**/*", File::FNM_DOTMATCH).select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
end
The differences in file counts are due to Dir.entries including hidden files by default. Dir.entries ended up taking a bit longer in this case due to needing to rebuild the absolute path of the file to determine if a file was a directory, but even without that it was still taking consistently longer than the other options in the recursive case. This was all using ruby 2.5.1 on OSX.
When loading all names of files in the operating directory you can use
Dir.glob("*)
This will return all files within the context that the application is running in (Note for Rails this is the top level directory of the application)
You can do additional matching and recursive searching found here https://ruby-doc.org/core-2.7.1/Dir.html#method-c-glob
if you create directories with spaces:
mkdir "a b"
touch "a b/c"
You don't need to escape the directory names, it will do it automatically:
p Dir["a b/*"] # => ["a b/c"]

Resources