In Ruby, how can I interpret (expand) a glob relative to a directory? - ruby

Wider context: Case-insensitive filename on case sensitive file system
Given the path of a directory (as a string, might be relative to the current working dir or absolute), I'd like to open a specific file. I know the file's filename except for the its case. (It could be TASKDATA.XML, TaskData.xml or even tAsKdAtA.xMl.)
Inspired by the accepted answer to Open a file case-insensitively in Ruby under Linux, I've come up with this little module to produce a glob for matching the file's name:
module Utils
def self.case_insensitive_glob_string(string)
string.each_char.map do |c|
cased = c.upcase != c.downcase
cased ? "[#{c.upcase}#{c.downcase}]" : c
end.join
end
end
For my specific case, I'd call this with
Utils.case_insensitive_glob_string('taskdata.xml')
and would get
'[Tt][Aa][Ss][Kk][Dd][Aa][Tt][Aa].[Xx][Mm][Ll]'
Specific context: glob relative to a dir ≠ pwd
Now I have to expand the glob, i.e. match it against actual files in the given directory. Unfortunately, Dir.glob(...) doesn't seem have an argument to pass a directory('s path) relative to which the glob should be expanded. Intuitively, it would make sense to me to create a Dir object and have that handle the glob:
d = Dir.new(directory_path)
# => #<Dir:/the/directory>
filename = d.glob(Utils.case_insensitive_glob_string('taskdata.xml')).first() # I wish ...
# NoMethodError: undefined method `glob' for #<Dir:/the/directory>
... but glob only exists as a class method, not as an instance method. (Anybody know why that's true of so many of Dir's methods that would perfectly make sense relative to a specific directory?)
So it looks like I have two options:
Change the current working dir to the given directory
or
expand the filename's glob in combination with the directory path
The first option is easy: Use Dir.chdir. But because this is in a Gem, and I don't want to mess with the environment of the users of my Gem, I shy away from it. (It's probably somewhat better when used with the block synopsis than manually (or not) resetting the working dir when I'm done.)
The second option looks easy. Simply do
taskdata_xml_name_glob = Utils.case_insensitive_glob_string('taskdata.xml')
taskdata_xml_path_glob = File.join(directory_path, taskdata_xml_name_glob)
filename = Dir.glob(taskdata_xml_path_glob).first()
, right? Almost. When directory_path contains characters that have a special meaning in globs, they will wrongly be expanded, when I only want glob expansion on the filename. This is unlikely, but as the path is provided by the Gem user, I have to account for it, anyway.
Question
Should I escape directory_path before File.joining it with the filename glob? If so, is there a facility to do that or would I have to code the escaping function myself?
Or should I use a different approach (be it chdir, or something yet different)?

If I were implementing that behaviour, I would go with filtering an array, returned by Dir#entries:
Dir.entries("#{target}").select { |f| f =~ /\A#{filename}\z/i }
Please be aware that on unix platform both . and .. entries will be listed as well, but they are unlikely to be matched on the second step. Also, probably the filename should be escaped with Regexp.escape:
Dir.entries("#{target}").select { |f| f =~ /\A#{Regexp.escape(filename)}\z/i }

Related

How to get filepaths that match a glob without having them on the filesystem

I have a list of filepaths relative to a root directory, and am trying to determine which would be matched by a glob pattern. I'm trying to get the same results that I would get if all the files were on my filesystem and I ran Dir.glob(<my_glob_pattern>) from the root diectory.
If this is the list of filepaths:
foo/index.md
foo/bar/index.md
foo/bar/baz/index.md
foo/bar/baz/qux/index.md
and this is the glob pattern:
foo/bar/*.md
If the files existed on my filesystem, Dir.glob('foo/bar/*.md') would return only foo/bar/index.md.
The glob docs mention fnmatch, and I tried using it but found that the pattern foo/bar/*.md was matching .md files in any number of nested subdirectories, similar to what Dir.glob('foo/bar/**/*.md') would, not just the direct children of the foo/bar directory:
my_glob = 'foo/bar/*.md'
filepaths = [
'foo/index.md',
'foo/bar/index.md',
'foo/bar/baz/index.md',
'foo/bar/baz/qux/index.md',
]
# Using the provided filepaths
filepaths_that_match_pattern = filepaths.select{|path| File.fnmatch?(my_glob, path)}.sort
# If the filepaths actually existed on my filesystem
filepaths_found_by_glob = Dir.glob(my_glob).sort
raise Exception.new("They don't match!") unless filepaths_that_match_pattern == filepaths_found_by_glob
I [incorrectly] expected the above code to work, but filepaths_found_by_glob only contains the direct children, while filepaths_that_match_pattern contains all the nested children too.
How can I get the same results as Dir.glob without having the file paths on my filesystem?
You can use the flag File::FNM_PATHNAME while calling File.fnmatch function. So your function call would look like this - File.fnmatch(pattern, path, File::FNM_PATHNAME)
You can see examples related to its usage here: https://apidock.com/ruby/File/fnmatch/class
Don't use File.fnmatch, instead use Pathname.fnmatch:
require 'pathname'
PATTERN = 'foo/bar/*.md'
%w[
foo/index.md
foo/bar/index.md
foo/bar/baz/index.md
foo/bar/baz/qux/index.md
].each do |p|
puts 'path: %-24s %s' % [
p,
Pathname.new(p).fnmatch(PATTERN) ? 'matches' : 'does not match'
]
end
# >> path: foo/index.md does not match
# >> path: foo/bar/index.md matches
# >> path: foo/bar/baz/index.md matches
# >> path: foo/bar/baz/qux/index.md matches
File assumes the existence of files or paths on the drive whereas Pathname:
Pathname represents the name of a file or directory on the filesystem, but not the file itself.
Also, regarding using Dir.glob: Be careful using it. It immediately attempts to find every file or path on the drive that matches and returns the hits. On a big or slow drive, or with a pattern that isn't written well, such as when debugging or testing, your code can be tied up for a long time or make Ruby or the machine Ruby's running on go to a crawl, and it only gets worse if you're checking a shared or remote drive. As an example of what can happen, try the following at your command-line, but be prepared to hit Cntrl+C to regain control:
ls /**/*
Instead, I recommend using the Find class in the Standard Library as it will iterate over the matches. See that documentation for examples.

Pathname with Regex

I would like to know how to use Regex when instantiating a new Pathname.
I am instantiating a Pathname and passing it to FileUtils#rm_rf to delete the file. The problem I am trying to solve is to remove files that have a certain name without regard to extension:
See this contrived example:
target = Pathname.new(["#{#app_name}/#{#file_name}"])
FileUtils.rm_rf(target)
#file_name does not include extensions such as .rb or html.erb, but I would like to match all files with name equal to #file_name no matter what extensions they use.
My initial approach was to use Regex. But how can I use it, or any other suggestions?
You can use Dir.Glob like this:
Dir.glob("#{#app_name}/#{#file_name}.*").each { |f| File.delete(f) }
See more on that at http://ruby-doc.org/core-2.2.1/Dir.html#method-c-glob

Ignoring hidden files with regular expressions in Ruby

I'm trying to read every file in a specified directory. I'd like to ignore hidden files. I've found a way to do this, but I'm pretty sure it is the most inefficient way to do this.
This is what I've tried,
Find.find(directory) do |path|
file_paths << path if path =~ /.*\./ and !path.split("/")[-1].to_s.starts_with?(".")
end
This works. But I hate it.
I then tried to do this,
file_paths << path if path =~ /.*\./ and path =~ /^\./
But this returned nothing for me. What am I doing wrong here?
You could just use Dir
file_paths = Dir.glob("#{directory}/*")
Dir#glob Docs:
Returns the filenames found by expanding pattern which is an Array of the patterns or the pattern String, either as an array or as parameters to the block.
Note, this will not match Unix-like hidden files (dotfiles). In order to include those in the match results, you must use something like “{,.}”.
per #arco444 if you want this to search recursively
file_paths = Dir.glob("#{directory}/**/*")
If you wanted to ignore files starting with ., the below would append those that don't to the file_paths array
Find.find(directory) do |path|
if File.file?(path)
file_paths << path unless File.basename(path).start_with?(".")
end
end
Note that this will not necessarily ignore hidden files, for the reasons mentioned in the comments. It also currently includes "hidden" directories, i.e. a file such as /some/.hidden/directory/normal.file would be included in the list.

Relative paths from inside a file: `../` --> parent directory vs `./` --> file itself?

Suppose file_a and file_b live in the same directory, and file_a contains a require statement that requires file_b. The way to do this seems to be like so:
require File.expand_path('../file_b', __FILE__)
But kind of expected it to look like this instead:
require File.expand_path('./file_a', __FILE__)
I played around with it and sure enough, the ./ version doesn't work and the ../ version does. The ./ version returns a path like path/to/file_a/file_b. Is the concept that code inside file_a lives inside that file, much like file_a "lives" inside it's parent directory? I feel like I just answered my own question, but want to make sure I'm understanding this right.
File.expand_path(file_name [, dir_string] ) -> abs_file_name
Converts a pathname to an absolute pathname. Relative paths are referenced from the current working directory of the process unless dir_string is given, in which case it will be used as the starting point.
File.expand_path treats the second (optional) parameter as dir_string, it doesn't care whether it is actually a directory or not. So it's your job to make sure the second parameter passed in be a path to a directory.
If you want preserve the ./file_a part, you may change the second parameter passed in:
require File.expand_path('./file_a', File.dirname(__FILE__))

If I have the name of a file, how do I search a folder for a file that contains that filename?

I have an image with the filename media_httpfarm3static_mAyIi.jpg.
I would like to search the parent folder and all subfolders of that parent folder for a file that contains that name - it doesn't have to be the EXACT name, but must contain that string.
E.g. this file should be returned: 11605730-media_httpfarm3static_mAyIi.jpg
So this is a 2-part question:
How do I achieve the above?
Once I have the file, how do I return the path for that file?
Use Dir::[] and File::absolute_path:
partial_name = "media_httpfarm3static_mAyIi.jpg"
Dir["../**/*#{partial_name}"].each do |filename|
puts File.absolute_path(filename)
end
This uses the glob "../**/*media_httpfarm3static_mAyIi.jpg" (go up one directory, then search all sub directories (recursively), for any file ending in the partial string "media_httpfarm3static_mAyIi.jpg". The relative paths are then returned in an Array.
You can use Array#each, Array#map, etc. to convert this into what you need. To convert a relative path, into an absolute path, just pass it to File::absolute_path.
Once you have the absolute path, you can use it to open the file, read the file, etc.
On File Paths
The glob "../**/*media_httpfarm3static_mAyIi.jpg" is relative to the current working directory. Normally, this is the directory from which the program was run. Not the directory of the source file. This can change using various utilities to change it.
To always use a glob relative to the source code file, try:
Dir[File.expand_path('../**/*#{partial_name}', __FILE__)]
You can also use:
Dir[File.join(__dir__, "..", "**", "*#{partial_name}")]
Note: __dir__ was added in Ruby 2.0. For older versions of ruby use File.dirname(__FILE__)
In the first code sample File::absolute_path was used. In the last sample File::expand_path is used. In most situations these can be used interchangeably. There is a minor difference, per the documentations:
File::absolute_path
Converts a pathname to an absolute pathname. Relative paths are
referenced from the current working directory of the process unless
dir_string is given, in which case it will be used as the starting
point. If the given pathname starts with a “~” it is NOT expanded, it
is treated as a normal directory name.
File::expand_path
Converts a pathname to an absolute pathname. Relative paths are
referenced from the current working directory of the process unless
dir_string is given, in which case it will be used as the starting
point. The given pathname may start with a “~”, which expands to the
process owner’s home directory (the environment variable HOME must be
set correctly). “~user” expands to the named user’s home directory.

Resources