Ruby exclude directories and or files in search - ruby

I am trying to find all files in a directory and all subdirectories, that match a certain file extension, but ignore files which match any elements from an 'ignore' array.
Example:
ignore = ['test.conf', 'another.conf']
The files 'test.conf' and 'another.conf' should be ignored.
So far I have this:
Find.find('./').select { |x|
x.match('.*\.conf$') # => only files ending in .conf
}.reject { |x|
# code to reject any files which match any elements from 'ignore'
}
I know I can do something like this:
Find.find('./').select { |x|
x.match('.*\.conf')
}.reject { |x|
x.match('test.conf|another.conf')
}
But, consider the array having a large number of files, and do not want to write out all the files (like above)
Help appreciated.

What you should be using is -.
matches - ignore
For you purpose, a better way to get the matches is Dir.glob. So the whole code should be
Dir.glob("**/*.conf") - ignore

Related

Checking for words inside folders/subfolders and files

I am having issue with regular expressions. So basically I have a folder and this folder contains sub folders as well as files. I have to check for certain words in those folders. The words I have to check for are located in a file called words.txt file.
This is the code that I have so far in Ruby:
def check_words
array_of_words = File.readlines('words.txt')
re = Regexp.union(array_of_words)
new_array_of_words = [/\b(?:#{re.source})\b/]
Dir['source_test/**/*'].select{|f| File.file?(f) }.each do |filepath|
new_array_of_words.each do |word|
puts File.foreach(filepath).include?(word)
end
end
end
When I execute this code I keep getting false even though some of the files inside the folders/subfolders contains those words.
new_array_of_words is a single regex, and the include? methods acts on strings (and it doesn't make much sense to iterate over a single regex anyway).
You can keep using the regex, but use regex methods instead.
You can also fix your current code as follows:
arr = File.readlines('/home/afifit/Tests/word.txt')
arr.select { |e| e.strip! }
Dir['yourdir/*'].select{|f| File.file?(f) }.each do |filepath|
arr.each do |word|
puts File.foreach(filepath).include?(word)
end
end
I also used strip to remove any unnecessary whitespaces and newlines.

How do I lookup a key/symbol based on which Regex match?

I am extracting files from a zip archive in Ruby using RubyZip, and I need to label files based on characteristics of their filenames:
Example:
I have the following hash:
labels = {
:data_file=>/.\.dat/i,
:metadata=>/.\.xml/i,
:text_location=>/.\.txt/i
}
So, I have the file name of each file in the zip, let's say an example is
filename = 382582941917841df.xml
Assume that each file will match only one Regex in the labels hash, and if not it doesn't matter, just choose the first match. (In this case the regular expressions are all for detecting extensions, but it could be to detect any filename mask like DSC****.jpg for example.
I am doing this now:
label_match =~ labels.find {|key,value| filename =~ value}
---> label_match = [:metadata, /.\.xml/]
label_sym = label_match.nil? ? nil: label_match.first
So this works fine, however doesn't seem very Ruby-like. Is there something I am missing to clean this up nicely?
A case when does this effortlessly:
filename = "382582941917841df.xml"
category = case filename
when /.\.dat/i ; :data_file
when /.\.xml/i ; :metadata
when /.\.txt/i ; :text_location
end
p category # => :metadata ; nil if nothing matched
I think you're doing it backwards and the hard way. Ruby makes it easy to get the extension of a file, which then makes it easy to map it to something.
Starting with something like:
FILENAMES = %w[ foo.bar foo.baz 382582941917841df.xml DSC****.jpg]
FILETYPES = {
'.bar' => 'bar',
'.baz' => 'baz',
'.xml' => 'metadata',
'.dat' => 'data',
'.jpg' => 'image'
}
FILENAMES.each do |fn|
puts "#{ fn } is a #{ FILETYPES[File.extname(fn)] } file"
end
# >> foo.bar is a bar file
# >> foo.baz is a baz file
# >> 382582941917841df.xml is a metadata file
# >> DSC****.jpg is a image file
File.extname is built into Ruby. The File class contains many similar methods useful for finding out things about files known by the OS and/or tearing apart file paths and file names so it's a really good thing to become very familiar with.
It's also important to understand that an improperly written regexp, such as /.\.dat/i can be the source of a lot of pain. Consider these:
'foo.xml.dat'[/.\.dat/] # => "l.dat"
'foo.database.20010101.csv'[/.\.dat/] # => "o.dat"
Are the files really "data" files?
Why is the character in front of the delimiting . important or necessary?
Do you really want to slow your code using unanchored regexp patterns when a method, such as extname will be faster and less maintenance?
Those are things to consider when writing code.
Rather than using nil to indicate the label when there is no match, consider using another symbol like :unknown.
Then you can do:
labels = {
:data_file=>/.\.dat/i,
:metadata=>/.\.xml/i,
:text_location=>/.\.txt/i,
:unknown=>/.*/
}
label = labels.find {|key,value| filename =~ value}.first

In Ruby, how would I find all .csv files in a folder and print out path to the files that contain the word "meh"?

Title pretty much says it all, looking to search for all .csv files and puts out a list of all files with the word meh in the name. Assume there are a few.
EDIT:
This method is significantly more direct and efficient:
d = Dir.new('.')
d.entries.select do |e|
/^.+\.csv$/.match(e) && IO.readlines(e).grep(/meh/).length > 0
end
This should do it assuming you want to search the current directory
d = Dir.new('.')
# This will find all files whose path ends in .csv
csvs = d.entries.select {|e| /^.+\.csv$/.match(e)}
# This will find all .csv files that contain one or more instance
# of the pattern /meh/
mehs = csvs.select do |e|
f = File.open(e)
[*f.each_line].grep(/meh/).length > 0
end

How do I find files in a directory that do not have extensions?

I am looking for a code that will find files without extensions. In Rails, there is a file app_name/doc/README_FOR_APP. I am searching for a way to find files simular to this with no extension associated to the file, i.e., 'gemfile'. Something like:
file = File.join(directory_path, "**", "__something__")
Since your question didn't explicitly specify whether you want to search for files without extensions recursively (though in the comments it sounded like you might), or whether you would like to keep files with a leading dot (i.e. hidden files in unix), I'm including options for each scenario.
Visible Files (non-recursive)
Dir['*'].reject { |file| file.include?('.') }
will return an array of all files that do not contain a '.' and therefore only files that do not have extensions.
Hidden Files (non-recursive)
Dir.new('.').entries.reject { |file| %w(. ..).include?(file) or file[1..-1].include?('.') }
This finds all of the files in the current directory and then removes any files with a '.' in any character except the first (i.e. any character from index 1 to the end, a.k.a index -1). Also note that since Dir.new('.').entries contains '.' and '..' those are rejected as well.
Visible Files (recursive)
require 'find'
Find.find('.').reject { |file| File.basename(file).include?('.') }.map { |file| file[2..-1] }
The map on the end of this one is just to remain consistent with the others by removing the leading './'. If you don't care about that, you can remove it.
Hidden Files (recursive)
require 'find'
Find.find('.').reject { |file| File.basename(file)[1..-1].include?('.') }.map { |file| file[2..-1] }
Note: each of the above will also include directories (which are sometimes considered files too, well, in unix at least). To remove them, just add .select { |file| File.file?(file) } to the end of any one of the above.
Dir.glob(File.join(directory_path, "**", "*")).reject do |path|
File.directory?(path) || File.basename(path).include?('.')
end
Update: If you want to take a stricter definition of "extension", here's something a little more complex that considers a file name to have an extension if and only if it has exactly one dot and that dot is neither the first nor last character in the name:
Dir.glob(File.join(directory_path, "**", "*")).reject do |path|
name = File.basename(path)
File.directory?(path) || (name.count('.') == 1 && name[-1] != '.')
end
I suspect "not having a dot" is more what you were looking for, however.
nonfile = File.join("**", "*.")
Dir.glob(nonfile).each do |path|
puts path
end
I was messing around and I was talking to a colleague and we thought if this.
Wouldn't that do the trick?

One-liner to recursively list directories in Ruby?

What is the fastest, most optimized, one-liner way to get an array of the directories (excluding files) in Ruby?
How about including files?
Dir.glob("**/*/") # for directories
Dir.glob("**/*") # for all files
Instead of Dir.glob(foo) you can also write Dir[foo] (however Dir.glob can also take a block, in which case it will yield each path instead of creating an array).
Ruby Glob Docs
I believe none of the solutions here deal with hidden directories (e.g. '.test'):
require 'find'
Find.find('.') { |e| puts e if File.directory?(e) }
For list of directories try
Dir['**/']
List of files is harder, because in Unix directory is also a file, so you need to test for type or remove entries from returned list which is parent of other entries.
Dir['**/*'].reject {|fn| File.directory?(fn) }
And for list of all files and directories simply
Dir['**/*']
As noted in other answers here, you can use Dir.glob. Keep in mind that folders can have lots of strange characters in them, and glob arguments are patterns, so some characters have special meanings. As such, it's unsafe to do something like the following:
Dir.glob("#{folder}/**/*")
Instead do:
Dir.chdir(folder) { Dir.glob("**/*").map {|path| File.expand_path(path) } }
Fast one liner
Only directories
`find -type d`.split("\n")
Directories and normal files
`find -type d -or -type f`.split("\n")`
Pure beautiful ruby
require "pathname"
def rec_path(path, file= false)
puts path
path.children.collect do |child|
if file and child.file?
child
elsif child.directory?
rec_path(child, file) + [child]
end
end.select { |x| x }.flatten(1)
end
# only directories
rec_path(Pathname.new(dir), false)
# directories and normal files
rec_path(Pathname.new(dir), true)
In PHP or other languages to get the content of a directory and all its subdirectories, you have to write some lines of code, but in Ruby it takes 2 lines:
require 'find'
Find.find('./') do |f| p f end
this will print the content of the current directory and all its subdirectories.
Or shorter, You can use the ’**’ notation :
p Dir['**/*.*']
How many lines will you write in PHP or in Java to get the same result?
Here's an example that combines dynamic discovery of a Rails project directory with Dir.glob:
dir = Dir.glob(Rails.root.join('app', 'assets', 'stylesheets', '*'))
Dir.open(Dir.pwd).map { |h| (File.file?(h) ? "#{h} - file" : "#{h} - folder") if h[0] != '.' }
dots return nil, use compact
Although not a one line solution, I think this is the best way to do it using ruby calls.
First delete all the files recursively
Second delete all the empty directories
Dir.glob("./logs/**/*").each { |file| File.delete(file) if File.file? file }
Dir.glob("./logs/**/*/").each { |directory| Dir.delete(directory) }

Resources