Is there a nice way to switch a file extension in ruby? - ruby

I'd like to switch the extension of a file. For example:
test_dir/test_file.jpg to .txt should give test_dir/test_file.txt.
I also want the solution to work on a file with two extensions.
test_dir/test_file.ext1.jpg to .txt should should give test_dir/test_file.ext1.txt
Similarly, on a file with no extension it should just add the extension.
test_dir/test_file to .txt should give test_dir/test_file.txt
I feel like this should be simple, but I haven't found a simple solution. Here is what I have right now. I think it is really ugly, but it does seem to work.
def switch_ext(f, new_ext)
File.join(File.dirname(f), File.basename(f, File.extname(f))) + new_ext
end
Do you have any more elegant ways to do this? I've looked on the internet, but I'm guessing that I'm missing something obvious. Are there any gotcha's to be aware of? I prefer a solution that doesn't use a regular expression.

Your example method isn't that ugly. Please do continue to use file naming semantic aware methods over string regexp. You could try the Pathname stdlib which might make it a little cleaner:
require 'pathname'
def switch_ext(f, new_ext)
p = Pathname.new f
p.dirname + "#{ p.basename('.*') }#{ new_ext }"
end
>> puts %w{ test_dir/test_file.jpg test_dir/test_file.ext1.jpg testfile .vimrc }.
| map{|f| switch_ext f, '.txt' }
test_dir/test_file.txt
test_dir/test_file.ext1.txt
testfile.txt
.vimrc.txt

def switch_ext(f, new_ext)
(n = f.rindex('.')) == 0 ? nil : (f[0..n] + new_ext)
end
It will find the most right occurrence of '.' if it is not the first character.

Regular expressions were invented for this sort of task.
def switch_ext f, new_ext
f.sub(/((?<!\A)\.[^.]+)?\Z/, new_ext)
end
puts switch_ext 'test_dir/test_file.jpg', '.txt'
puts switch_ext 'test_dir/test_file.ext1.jpg', '.txt'
puts switch_ext 'testfile', '.txt'
puts switch_ext '.vimrc', '.txt'
Output:
test_dir/test_file.txt
test_dir/test_file.ext1.txt
testfile.txt
.vimrc.txt

def switch_ext(filename, new_ext)
filename.chomp( File.extname(filename)) + new_ext
end
I just found this answer to my own question here at the bottom of this long discussion.
http://www.ruby-forum.com/topic/179524
I personally think it is the best one I've seen. I definitely want to avoid a regular expression, because they are hard for me to remember and therefore error prone.
For dotfiles this function just adds the extension onto the file. This behaviour seems sensible to me.
switch_ext('.vimrc', '.txt') # => ".vimrc.txt"
Please continue to post better answers if there are any, and post comments to let me know if you see any deficiencies in this answer. I'll leave the question open for now.

You can use regular expressions, or you can use things like the built-in filename manipulation tools in File:
%w[
test_dir/test_file.jpg
test_dir/test_file.ext1.jpg
test_dir/test_file
].each do |fn|
puts File.join(
File.dirname(fn),
File.basename(fn, File.extname(fn)) + '.txt'
)
end
Which outputs:
test_dir/test_file.txt
test_dir/test_file.ext1.txt
test_dir/test_file.txt
I personally use the File methods. They're aware of different OS's needs for filename separators so porting to another OS is a no brainer. In your use-case it's not a big deal. Mix in path manipulations and it becomes more important.

def switch_ext f, new_ext
"#{f.sub(/\.[^.]+\z/, "")}.#{new_ext}"
end

def switch_ext(filename, ext)
begin
filename[/\.\w+$/] = ext
rescue
filename << ext
end
filename
end
Usage
>> switch_ext('test_dir/test_file.jpeg', '.txt')
=> "test_dir/test_file.txt"
>> switch_ext('test_dir/no_ext_file', '.txt')
=> "test_dir/no_ext_file.txt"
Hope this help.

Since Ruby 1.9.1 the easiest answer is Pathname::sub_ext(replacement) which strips off the extension and replaces it with the given replacement (which can be an empty string ''):
Pathname.new('test_dir/test_file.jpg').sub_ext('.txt')
=> #<Pathname:test_dir/test_file.txt>
Pathname.new('test_dir/test_file.ext1.jpg').sub_ext('.txt')
=> #<Pathname:test_dir/test_file.ext1.txt>
Pathname.new('test_dir/test_file').sub_ext('.txt')
=> #<Pathname:test_dir/test_file.txt>
Pathname.new('test_dir/test_file.txt').sub_ext('')
=> #<Pathname:test_dir/test_file>
One thing to watch out for is that you need to have a leading . in the replacement:
Pathname.new('test_dir/test_file.jpg').sub_ext('txt')
=> #<Pathname:test_dir/test_filetxt>

Related

How do I lookup a key/symbol based on which Regex match?

I am extracting files from a zip archive in Ruby using RubyZip, and I need to label files based on characteristics of their filenames:
Example:
I have the following hash:
labels = {
:data_file=>/.\.dat/i,
:metadata=>/.\.xml/i,
:text_location=>/.\.txt/i
}
So, I have the file name of each file in the zip, let's say an example is
filename = 382582941917841df.xml
Assume that each file will match only one Regex in the labels hash, and if not it doesn't matter, just choose the first match. (In this case the regular expressions are all for detecting extensions, but it could be to detect any filename mask like DSC****.jpg for example.
I am doing this now:
label_match =~ labels.find {|key,value| filename =~ value}
---> label_match = [:metadata, /.\.xml/]
label_sym = label_match.nil? ? nil: label_match.first
So this works fine, however doesn't seem very Ruby-like. Is there something I am missing to clean this up nicely?
A case when does this effortlessly:
filename = "382582941917841df.xml"
category = case filename
when /.\.dat/i ; :data_file
when /.\.xml/i ; :metadata
when /.\.txt/i ; :text_location
end
p category # => :metadata ; nil if nothing matched
I think you're doing it backwards and the hard way. Ruby makes it easy to get the extension of a file, which then makes it easy to map it to something.
Starting with something like:
FILENAMES = %w[ foo.bar foo.baz 382582941917841df.xml DSC****.jpg]
FILETYPES = {
'.bar' => 'bar',
'.baz' => 'baz',
'.xml' => 'metadata',
'.dat' => 'data',
'.jpg' => 'image'
}
FILENAMES.each do |fn|
puts "#{ fn } is a #{ FILETYPES[File.extname(fn)] } file"
end
# >> foo.bar is a bar file
# >> foo.baz is a baz file
# >> 382582941917841df.xml is a metadata file
# >> DSC****.jpg is a image file
File.extname is built into Ruby. The File class contains many similar methods useful for finding out things about files known by the OS and/or tearing apart file paths and file names so it's a really good thing to become very familiar with.
It's also important to understand that an improperly written regexp, such as /.\.dat/i can be the source of a lot of pain. Consider these:
'foo.xml.dat'[/.\.dat/] # => "l.dat"
'foo.database.20010101.csv'[/.\.dat/] # => "o.dat"
Are the files really "data" files?
Why is the character in front of the delimiting . important or necessary?
Do you really want to slow your code using unanchored regexp patterns when a method, such as extname will be faster and less maintenance?
Those are things to consider when writing code.
Rather than using nil to indicate the label when there is no match, consider using another symbol like :unknown.
Then you can do:
labels = {
:data_file=>/.\.dat/i,
:metadata=>/.\.xml/i,
:text_location=>/.\.txt/i,
:unknown=>/.*/
}
label = labels.find {|key,value| filename =~ value}.first

Ruby: Get filename without the extensions

How can I get the filename without the extensions? For example, input of "/dir1/dir2/test.html.erb" should return "test".
In actual code I will passing in __FILE__ instead of "/dir1/dir2/test.html.erb".
Read documentation:
basename(file_name [, suffix] ) → base_name
Returns the last component of the filename given in file_name, which
can be formed using both File::SEPARATOR and File::ALT_SEPARATOR as
the separator when File::ALT_SEPARATOR is not nil. If suffix is given
and present at the end of file_name, it is removed.
=> File.basename('public/500.html', '.html')
=> "500"
in you case:
=> File.basename("test.html.erb", ".html.erb")
=> "test"
How about this
File.basename(f, File.extname(f))
returns the file name without the extension.. works for filenames with multiple '.' in it.
In case you don't know the extension you can combine File.basename with File.extname:
filepath = "dir/dir/filename.extension"
File.basename(filepath, File.extname(filepath)) #=> "filename"
Pathname provides a convenient object-oriented interface for dealing with file names.
One method lets you replace the existing extension with a new one, and that method accepts the empty string as an argument:
>> Pathname('foo.bar').sub_ext ''
=> #<Pathname:foo>
>> Pathname('foo.bar.baz').sub_ext ''
=> #<Pathname:foo.bar>
>> Pathname('foo').sub_ext ''
=> #<Pathname:foo>
This is a convenient way to get the filename stripped of its extension, if there is one.
But if you want to get rid of all extensions, you can use a regex:
>> "foo.bar.baz".sub(/(?<=.)\..*/, '')
=> "foo"
Note that this only works on bare filenames, not paths like foo.bar/pepe.baz. For that, you might as well use a function:
def without_extensions(path)
p = Pathname(path)
p.parent / p.basename.sub(
/
(?<=.) # look-behind: ensure some character, e.g., for ‘.foo’
\. # literal ‘.’
.* # extensions
/x, '')
end
Split by dot and the first part is what you want.
filename = 'test.html.erb'
result = filename.split('.')[0]
Considering the premise, the most appropriate answer for this case (and similar cases with other extensions) would be something such as this:
__FILE__.split('.')[0...-1].join('.')
Which will only remove the extension (not the other parts of the name: myfile.html.erb here becomes myfile.html, rather than just myfile.
Thanks to #xdazz and #Monk_Code for their ideas. In case others are looking, the final code I'm using is:
File.basename(__FILE__, ".*").split('.')[0]
This generically allows you to remove the full path in the front and the extensions in the back of the file, giving only the name of the file without any dots or slashes.
name = "filename.100.jpg"
puts "#{name.split('.')[-1]}"
Yet understanding it's not a multiplatform solution, it'd work for unixes:
def without_extensions(path)
lastSlash = path.rindex('/')
if lastSlash.nil?
theFile = path
else
theFile = path[lastSlash+1..-1]
end
# not an easy thing to define
# what an extension is
theFile[0...theFile.index('.')]
end
puts without_extensions("test.html.erb")
puts without_extensions("/test.html.erb")
puts without_extensions("a.b/test.html.erb")
puts without_extensions("/a.b/test.html.erb")
puts without_extensions("c.d/a.b/test.html.erb")

Copy a file with the variables substituted

I have a file containing substituted variables (#{...}) and I would like to copy it into another file, with the variables substituted by their values.
Here's what I have
file = File.open(#batch_file_name, "w+")
script=File.open("/runBatch.script","r")
script.each do |line|
file.puts(line)
end
But this is apparently not the right way to do that. Any suggestion ?
Instead of #{...} in your file use ERB files.
No, this isn't the right way to do it. You can't expect Ruby to magically interpret any #{} it encounters anywhere in your data as variable interpolation. This would (amongst other terrible side effects) yield massive security problems everywhere.
If you want to interpolate data into a string you'll need to eval it, which has its own security risks:
str = 'The value of x is #{x}'
puts str # The value of x is #{x}
x = "123"
puts eval "\"#{str}\"" # Thje value of x is 123
It's not clear which variables you're trying to interpolate into your data. This is almost certainly the wrong way to go about doing whatever it is your doing.
Ok say you have a file named tmp.file that has the following text:
This is #{foobar}!
Then you can easily do the following:
str = ""
File.open("tmp.file", "r") do |f|
str = f.read
end
abc = "Sparta"
puts eval('"' + str + '"')
And your result would be This is Sparta!
But as already suggested you should go with a real template solution like ERB. Then you would use your files like views in Rails. Instead of This is #{foobar}. you would have This is <%= foobar %>.

Open a file case-insensitively in Ruby under Linux

Is there a way to open a file case-insensitively in Ruby under Linux? For example, given the string foo.txt, can I open the file FOO.txt?
One possible way would be reading all the filenames in the directory and manually search the list for the required file, but I'm looking for a more direct method.
One approach would be to write a little method to build a case insensitive glob for a given filename:
def ci_glob(filename)
glob = ''
filename.each_char do |c|
glob += c.downcase != c.upcase ? "[#{c.downcase}#{c.upcase}]" : c
end
glob
end
irb(main):024:0> ci_glob('foo.txt')
=> "[fF][oO][oO].[tT][xX][tT]"
and then you can do:
filename = Dir.glob(ci_glob('foo.txt')).first
Alternatively, you can write the directory search you suggested quite concisely. e.g.
filename = Dir.glob('*').find { |f| f.downcase == 'foo.txt' }
Prior to Ruby 3.1 it was possible to use the FNM_CASEFOLD option to make glob case insensitive e.g.
filename = Dir.glob('foo.txt', File::FNM_CASEFOLD).first
if filename
# use filename here
else
# no matching file
end
The documentation suggested FNM_CASEFOLD couldn't be used with glob but it did actually work in older Ruby versions. However, as mentioned by lildude in the comments, the behaviour has now been brought inline with the documentation and so this approach shouldn't be used.
You can use Dir.glob with the FNM_CASEFOLD flag to get a list of all filenames that match the given name except for case. You can then just use first on the resulting array to get any result back or use min_by to get the one that matches the case of the orignial most closely.
def find_file(f)
Dir.glob(f, File::FNM_CASEFOLD).min_by do |f2|
f.chars.zip(f2.chars).count {|c1,c2| c1 != c2}
end
end
system "touch foo.bar"
system "touch Foo.Bar"
Dir.glob("FOO.BAR", File::FNM_CASEFOLD) #=> ["foo.bar", "Foo.Bar"]
find_file("FOO.BAR") #=> ["Foo.Bar"]

How can I write this so it's easier to understand?

Is there and way to write code like this in a way that makes what it does clearer?
a = (a.split(" ")[1..-1]).join(" ")
That deletes the first word of a sentence but the code doesn't look expressive at all.
irb(main):024:0> "this is a test".split.drop(1) * " "
=> "is a test"
Edited to add:
Explanation:
By default #split delimits on whitespace.
#drop(1) gets rid of the first entry.
* " " does the same as #join(" ").
for somebody who is used to reading rexep this is pretty clean:
a = a.sub(/^\S+\s/, "")
ymmv
code
a = a.split[1..-1] * " "
explanation
String#split's default parameter is " "
Array * String is an alias for Array.join(String)
On second thought, I'm not sure if it's more transparent to someone who is not familiar with ruby, per se. But anyone who has worked with Ruby strings for a little bit will understand what's going on. And it's a lot more clean than the original version.
UPDATE
As per just-my-correct-opinion's answer (which you all should vote up instead of mine), if you are running Ruby 1.9.1 (which you should be, anyway) or Ruby 1.8.7, you can do:
a = a.split.drop(1) * " "
maybe making the process explicit will help
words = a.split
words.pop
a = words.join " "
And if you were using this throughout some code you might want to create the
methods in String and Array to make your code readable. (works in 1.8.6 too)
class String
def words
split
end
end
class Array
def but_first
self[1..-1]
end
def to_sentence
join(' ')
end
end
str = "And, this is a sentence"
puts str.words.but_first.to_sentence
Something like following
a = a[a.index(' '), a.length-(a.index(' ')+1)]
No check though

Resources