Open a file case-insensitively in Ruby under Linux - ruby

Is there a way to open a file case-insensitively in Ruby under Linux? For example, given the string foo.txt, can I open the file FOO.txt?
One possible way would be reading all the filenames in the directory and manually search the list for the required file, but I'm looking for a more direct method.

One approach would be to write a little method to build a case insensitive glob for a given filename:
def ci_glob(filename)
glob = ''
filename.each_char do |c|
glob += c.downcase != c.upcase ? "[#{c.downcase}#{c.upcase}]" : c
end
glob
end
irb(main):024:0> ci_glob('foo.txt')
=> "[fF][oO][oO].[tT][xX][tT]"
and then you can do:
filename = Dir.glob(ci_glob('foo.txt')).first
Alternatively, you can write the directory search you suggested quite concisely. e.g.
filename = Dir.glob('*').find { |f| f.downcase == 'foo.txt' }
Prior to Ruby 3.1 it was possible to use the FNM_CASEFOLD option to make glob case insensitive e.g.
filename = Dir.glob('foo.txt', File::FNM_CASEFOLD).first
if filename
# use filename here
else
# no matching file
end
The documentation suggested FNM_CASEFOLD couldn't be used with glob but it did actually work in older Ruby versions. However, as mentioned by lildude in the comments, the behaviour has now been brought inline with the documentation and so this approach shouldn't be used.

You can use Dir.glob with the FNM_CASEFOLD flag to get a list of all filenames that match the given name except for case. You can then just use first on the resulting array to get any result back or use min_by to get the one that matches the case of the orignial most closely.
def find_file(f)
Dir.glob(f, File::FNM_CASEFOLD).min_by do |f2|
f.chars.zip(f2.chars).count {|c1,c2| c1 != c2}
end
end
system "touch foo.bar"
system "touch Foo.Bar"
Dir.glob("FOO.BAR", File::FNM_CASEFOLD) #=> ["foo.bar", "Foo.Bar"]
find_file("FOO.BAR") #=> ["Foo.Bar"]

Related

Ruby: Get filename without the extensions

How can I get the filename without the extensions? For example, input of "/dir1/dir2/test.html.erb" should return "test".
In actual code I will passing in __FILE__ instead of "/dir1/dir2/test.html.erb".
Read documentation:
basename(file_name [, suffix] ) → base_name
Returns the last component of the filename given in file_name, which
can be formed using both File::SEPARATOR and File::ALT_SEPARATOR as
the separator when File::ALT_SEPARATOR is not nil. If suffix is given
and present at the end of file_name, it is removed.
=> File.basename('public/500.html', '.html')
=> "500"
in you case:
=> File.basename("test.html.erb", ".html.erb")
=> "test"
How about this
File.basename(f, File.extname(f))
returns the file name without the extension.. works for filenames with multiple '.' in it.
In case you don't know the extension you can combine File.basename with File.extname:
filepath = "dir/dir/filename.extension"
File.basename(filepath, File.extname(filepath)) #=> "filename"
Pathname provides a convenient object-oriented interface for dealing with file names.
One method lets you replace the existing extension with a new one, and that method accepts the empty string as an argument:
>> Pathname('foo.bar').sub_ext ''
=> #<Pathname:foo>
>> Pathname('foo.bar.baz').sub_ext ''
=> #<Pathname:foo.bar>
>> Pathname('foo').sub_ext ''
=> #<Pathname:foo>
This is a convenient way to get the filename stripped of its extension, if there is one.
But if you want to get rid of all extensions, you can use a regex:
>> "foo.bar.baz".sub(/(?<=.)\..*/, '')
=> "foo"
Note that this only works on bare filenames, not paths like foo.bar/pepe.baz. For that, you might as well use a function:
def without_extensions(path)
p = Pathname(path)
p.parent / p.basename.sub(
/
(?<=.) # look-behind: ensure some character, e.g., for ‘.foo’
\. # literal ‘.’
.* # extensions
/x, '')
end
Split by dot and the first part is what you want.
filename = 'test.html.erb'
result = filename.split('.')[0]
Considering the premise, the most appropriate answer for this case (and similar cases with other extensions) would be something such as this:
__FILE__.split('.')[0...-1].join('.')
Which will only remove the extension (not the other parts of the name: myfile.html.erb here becomes myfile.html, rather than just myfile.
Thanks to #xdazz and #Monk_Code for their ideas. In case others are looking, the final code I'm using is:
File.basename(__FILE__, ".*").split('.')[0]
This generically allows you to remove the full path in the front and the extensions in the back of the file, giving only the name of the file without any dots or slashes.
name = "filename.100.jpg"
puts "#{name.split('.')[-1]}"
Yet understanding it's not a multiplatform solution, it'd work for unixes:
def without_extensions(path)
lastSlash = path.rindex('/')
if lastSlash.nil?
theFile = path
else
theFile = path[lastSlash+1..-1]
end
# not an easy thing to define
# what an extension is
theFile[0...theFile.index('.')]
end
puts without_extensions("test.html.erb")
puts without_extensions("/test.html.erb")
puts without_extensions("a.b/test.html.erb")
puts without_extensions("/a.b/test.html.erb")
puts without_extensions("c.d/a.b/test.html.erb")

Improve speed of the file search in Ruby

Given a directory with about 100 000 small files (each files is about 1kB).
I need to get list of these files and iterate over it in order to find files with the same name but different case (the files are on Linux ext4 FS).
Currently, I use some code like this:
def similar_files_in_folder(file_path, folder, exclude_folders = false)
files = Dir.glob(file_path, File::FNM_CASEFOLD)
files_set = files.select{|f| f.start_with?(folder)}
return files_set unless exclude_folders
files_set.reject{|entry| File.directory? entry}
end
dir_entries = Dir.entries(#directory) - ['.', '..']
dir_entries.map do |file_name|
similar_files_in_folder(file_name, #directory)
end
The issue with this approach is that the snippet takes a lot!!! of time to finish.
It is about some hours on my system.
Is there another way to achieve the same goal but much faster in Ruby?
Limitation: I can't load the file list in memory and then just compare the names in down case, because in the #directory new files are appear.
So, I need to scan the #directory on each iteration.
Thanks for any hint.
If I understand your code correctly, this already returns an array of all those 100k filenames:
dir_entries = Dir.entries(#directory) - ['.', '..']
#=> ["foo.txt", "bar.txt", "BAR.txt", ...]
I would group this array by the lowercase filename:
dir_entries.group_by(&:downcase)
#=> {"foo.txt"=>["foo.txt"], "bar.txt"=>["bar.txt", "BAR.txt"], ... }
And select the ones with more than 1 occurrences:
dir_entries.group_by(&:downcase).select { |k, v| v.size > 1 }
#=> {"bar.txt"=>["bar.txt", "BAR.txt"], ...}
What I meant by my comment was that you could search for a string as you traverse the filesystem, instead of first building up a huge array of all possible files and only then searching. I wrote something similar to a linux find <path> | grep --color -i <pattern> , except highlighting the pattern only in basename:
require 'find'
#find files whose basename matches a pattern (and output results to console)
def find_similar(s, opts={})
#by default, path is '.', case insensitive, no bash terminal coloring
opts[:verbose] ||= false
opts[:path] ||= '.'
opts[:insensitive]=true if opts[:insensitive].nil?
opts[:color]||=false
boldred = "\e[1m\e[31m\\1\e[0m" #contains an escaped \1 for regex
puts "searching for \"#{s}\" in \"#{opts[:path]}\", insensitive=#{opts[:insensitive]}..." if opts[:verbose]
reg = opts[:insensitive] ? /(#{s})/i : /(#{s})/
dir,base = '',''
Find.find(opts[:path]) {|path|
dir,base = File.dirname(path), File.basename(path)
if base =~ reg
if opts[:color]
puts "#{dir}/#{base.gsub(reg, boldred)}"
else
puts path
end
end
}
end
time = Time.now
#find_similar('LOg', :color=>true) #similar to find . | grep --color -i LOg
find_similar('pYt', :path=>'c:/bin/sublime3/', :color=>true, :verbose=>true)
puts "search took #{Time.now-time}sec"
example output (cygwin), but also works if run from cmd.exe

Is there a nice way to switch a file extension in ruby?

I'd like to switch the extension of a file. For example:
test_dir/test_file.jpg to .txt should give test_dir/test_file.txt.
I also want the solution to work on a file with two extensions.
test_dir/test_file.ext1.jpg to .txt should should give test_dir/test_file.ext1.txt
Similarly, on a file with no extension it should just add the extension.
test_dir/test_file to .txt should give test_dir/test_file.txt
I feel like this should be simple, but I haven't found a simple solution. Here is what I have right now. I think it is really ugly, but it does seem to work.
def switch_ext(f, new_ext)
File.join(File.dirname(f), File.basename(f, File.extname(f))) + new_ext
end
Do you have any more elegant ways to do this? I've looked on the internet, but I'm guessing that I'm missing something obvious. Are there any gotcha's to be aware of? I prefer a solution that doesn't use a regular expression.
Your example method isn't that ugly. Please do continue to use file naming semantic aware methods over string regexp. You could try the Pathname stdlib which might make it a little cleaner:
require 'pathname'
def switch_ext(f, new_ext)
p = Pathname.new f
p.dirname + "#{ p.basename('.*') }#{ new_ext }"
end
>> puts %w{ test_dir/test_file.jpg test_dir/test_file.ext1.jpg testfile .vimrc }.
| map{|f| switch_ext f, '.txt' }
test_dir/test_file.txt
test_dir/test_file.ext1.txt
testfile.txt
.vimrc.txt
def switch_ext(f, new_ext)
(n = f.rindex('.')) == 0 ? nil : (f[0..n] + new_ext)
end
It will find the most right occurrence of '.' if it is not the first character.
Regular expressions were invented for this sort of task.
def switch_ext f, new_ext
f.sub(/((?<!\A)\.[^.]+)?\Z/, new_ext)
end
puts switch_ext 'test_dir/test_file.jpg', '.txt'
puts switch_ext 'test_dir/test_file.ext1.jpg', '.txt'
puts switch_ext 'testfile', '.txt'
puts switch_ext '.vimrc', '.txt'
Output:
test_dir/test_file.txt
test_dir/test_file.ext1.txt
testfile.txt
.vimrc.txt
def switch_ext(filename, new_ext)
filename.chomp( File.extname(filename)) + new_ext
end
I just found this answer to my own question here at the bottom of this long discussion.
http://www.ruby-forum.com/topic/179524
I personally think it is the best one I've seen. I definitely want to avoid a regular expression, because they are hard for me to remember and therefore error prone.
For dotfiles this function just adds the extension onto the file. This behaviour seems sensible to me.
switch_ext('.vimrc', '.txt') # => ".vimrc.txt"
Please continue to post better answers if there are any, and post comments to let me know if you see any deficiencies in this answer. I'll leave the question open for now.
You can use regular expressions, or you can use things like the built-in filename manipulation tools in File:
%w[
test_dir/test_file.jpg
test_dir/test_file.ext1.jpg
test_dir/test_file
].each do |fn|
puts File.join(
File.dirname(fn),
File.basename(fn, File.extname(fn)) + '.txt'
)
end
Which outputs:
test_dir/test_file.txt
test_dir/test_file.ext1.txt
test_dir/test_file.txt
I personally use the File methods. They're aware of different OS's needs for filename separators so porting to another OS is a no brainer. In your use-case it's not a big deal. Mix in path manipulations and it becomes more important.
def switch_ext f, new_ext
"#{f.sub(/\.[^.]+\z/, "")}.#{new_ext}"
end
def switch_ext(filename, ext)
begin
filename[/\.\w+$/] = ext
rescue
filename << ext
end
filename
end
Usage
>> switch_ext('test_dir/test_file.jpeg', '.txt')
=> "test_dir/test_file.txt"
>> switch_ext('test_dir/no_ext_file', '.txt')
=> "test_dir/no_ext_file.txt"
Hope this help.
Since Ruby 1.9.1 the easiest answer is Pathname::sub_ext(replacement) which strips off the extension and replaces it with the given replacement (which can be an empty string ''):
Pathname.new('test_dir/test_file.jpg').sub_ext('.txt')
=> #<Pathname:test_dir/test_file.txt>
Pathname.new('test_dir/test_file.ext1.jpg').sub_ext('.txt')
=> #<Pathname:test_dir/test_file.ext1.txt>
Pathname.new('test_dir/test_file').sub_ext('.txt')
=> #<Pathname:test_dir/test_file.txt>
Pathname.new('test_dir/test_file.txt').sub_ext('')
=> #<Pathname:test_dir/test_file>
One thing to watch out for is that you need to have a leading . in the replacement:
Pathname.new('test_dir/test_file.jpg').sub_ext('txt')
=> #<Pathname:test_dir/test_filetxt>

One-liner to recursively list directories in Ruby?

What is the fastest, most optimized, one-liner way to get an array of the directories (excluding files) in Ruby?
How about including files?
Dir.glob("**/*/") # for directories
Dir.glob("**/*") # for all files
Instead of Dir.glob(foo) you can also write Dir[foo] (however Dir.glob can also take a block, in which case it will yield each path instead of creating an array).
Ruby Glob Docs
I believe none of the solutions here deal with hidden directories (e.g. '.test'):
require 'find'
Find.find('.') { |e| puts e if File.directory?(e) }
For list of directories try
Dir['**/']
List of files is harder, because in Unix directory is also a file, so you need to test for type or remove entries from returned list which is parent of other entries.
Dir['**/*'].reject {|fn| File.directory?(fn) }
And for list of all files and directories simply
Dir['**/*']
As noted in other answers here, you can use Dir.glob. Keep in mind that folders can have lots of strange characters in them, and glob arguments are patterns, so some characters have special meanings. As such, it's unsafe to do something like the following:
Dir.glob("#{folder}/**/*")
Instead do:
Dir.chdir(folder) { Dir.glob("**/*").map {|path| File.expand_path(path) } }
Fast one liner
Only directories
`find -type d`.split("\n")
Directories and normal files
`find -type d -or -type f`.split("\n")`
Pure beautiful ruby
require "pathname"
def rec_path(path, file= false)
puts path
path.children.collect do |child|
if file and child.file?
child
elsif child.directory?
rec_path(child, file) + [child]
end
end.select { |x| x }.flatten(1)
end
# only directories
rec_path(Pathname.new(dir), false)
# directories and normal files
rec_path(Pathname.new(dir), true)
In PHP or other languages to get the content of a directory and all its subdirectories, you have to write some lines of code, but in Ruby it takes 2 lines:
require 'find'
Find.find('./') do |f| p f end
this will print the content of the current directory and all its subdirectories.
Or shorter, You can use the ’**’ notation :
p Dir['**/*.*']
How many lines will you write in PHP or in Java to get the same result?
Here's an example that combines dynamic discovery of a Rails project directory with Dir.glob:
dir = Dir.glob(Rails.root.join('app', 'assets', 'stylesheets', '*'))
Dir.open(Dir.pwd).map { |h| (File.file?(h) ? "#{h} - file" : "#{h} - folder") if h[0] != '.' }
dots return nil, use compact
Although not a one line solution, I think this is the best way to do it using ruby calls.
First delete all the files recursively
Second delete all the empty directories
Dir.glob("./logs/**/*").each { |file| File.delete(file) if File.file? file }
Dir.glob("./logs/**/*/").each { |directory| Dir.delete(directory) }

Cut off the filename and extension of a given string

I build a little script that parses a directory for files of a given filetype and stores the location (including the filename) in an array. This look like this:
def getFiles(directory)
arr = Dir[directory + '/**/*.plt']
arr.each do |k|
puts "#{k}"
end
end
The output is the path and the files. But I want only the path.
Instead of /foo/bar.txt I want only the /foo/
My first thought was a regexp but I am not sure how to do that.
Could File.dirname be of any use?
File.dirname(file_name ) → dir_name
Returns all components of the filename
given in file_name except the last
one. The filename must be formed using
forward slashes (``/’’) regardless of
the separator used on the local file
system.
File.dirname("/home/gumby/work/ruby.rb") #=> "/home/gumby/work"
You don't need a regex or split.
File.dirname("/foo/bar/baz.txt")
# => "/foo/bar"
The following code should work (tested in the ruby console):
>> path = "/foo/bar/file.txt"
=> "/foo/bar/file.txt"
>> path[0..path.rindex('/')]
=> "/foo/bar/"
rindex finds the index of the last occurrence of substring. Here is the documentation http://docs.huihoo.com/api/ruby/core/1.8.4/classes/String.html#M001461
Good luck!
I would split it into an array by the slashes, then remove the last element (the filename), then join it into a string again.
path = '/foo/bar.txt'
path = path.split '/'
path.pop
path = path.join '/'
# path is now '/foo'
not sure what language your in but here is the regex for the last / to the end of the string.
/[^\/]*+$/
Transliterates to all characters that are not '/' before the end of the string
For a regular expression, this should work, since * is greedy:
.*/

Resources