Ordering in Ruby Pathname .children - ruby

It seems the order of the filesystem entities returned by Pathname's .children method is arbitrary or at least not alphabetical.
Is there a way to have these returned in alphabetical order via the file system rather than calling .sort on the returned array?

Pathname's children is actually doing:
def children(with_directory=true)
with_directory = false if #path == '.'
result = []
Dir.foreach(#path) {|e|
next if e == '.' || e == '..'
if with_directory
result << self.class.new(File.join(#path, e))
else
result << self.class.new(e)
end
}
result
end
Dir.foreach calls the OS and iterates the directory passed in. There is no provision for telling the OS to sort by a particular order.
"What is the "directory order" of files in a directory (used by ls -U)?" is probably of interest to you.

Related

Ruby Loops While Statements

I have an array of strings dictionary and a string target:
dictionary = ['a', 'b', 'c', 'ab', 'abc']
target = 'abba'
My goal is to return combinations of words from dictionary that can make up target. It should return something like this
['a abc', 'a a b c', 'a ab c']
This is what I have:
def possible_combinations(dictionary, target)
results = [] #eventually an array of results
i = 0 #to go through the dictionary index starting at 0
t = 0 #to go through the target index starting at 0
while i < dictionary.count #while 0 is less than the total index in dict
while t < target.length
if dictionary[i] == target[t]#dict is not changing but target[t] is changing
puts 'I am ' + dictionary[i] + ' at DICT for now'
puts 'I am ' + target[t] + ' at t for now'
puts 'I match somewhere in target so I am added.'#dict[1] is not happening here.
# results.push(dictionary[i])
if results.empty?
results.push(dictionary[i])
puts results
else
results = results[0] + ' ' + dictionary[i] #this is not entirely working?
puts results
end
else
puts 'forget about me'
end
t = t + 1
end
i = i + 1
end
end
and when I run it, I get this:
I am a at DICT for now
I am a at t for now
I match somewhere in target so I am added.
a
forget about me
forget about me
I am a at DICT for now
I am a at t for now
I match somewhere in target so I am added.
a a
I notice that target[t] is changing, but dictionary[i] is not. I don't understand nested while loops. I think the inner while loop has to finish before it heads to the outer, so dictionary[i] is getting stuck. I want to iterate over i for both dictionary and target, so I am using nested while loops.
If target = 'aaaba', I get this:
I am a at DICT for now
I am a at t for now
I match somewhere in target so I am added.
a
I am a at DICT for now
I am a at t for now
I match somewhere in target so I am added.
a a
I am a at DICT for now
I am a at t for now
I match somewhere in target so I am added.
a a
forget about me
I am a at DICT for now
I am a at t for now
I match somewhere in target so I am added.
a a
Notice how the results got stuck with two 'a' but not three or four?
Instead of using while you can use each on the dictionary and each_char on the target
dictionary.each do |word|
target.each_char do |char|
puts word, char
end
end
The problem w/ your current loops is that you are initializing t = 0 outside both loops, so you only loop through the target once before the inner while condition is always false. If you move that declaration inside the first while loop you will get a result more similar to what you expect
ri Array.each
ri String.index
You're doing this in the most un-ruby like way possible. Read the chapter on the Enumerable module. Look up all the methods String supports. There is almost always a better way than using while in Ruby.
http://ruby-doc.com/docs/ProgrammingRuby/
Note that while String.[1] works Strings are not enumerable, you'd be better off to split the string into an Array of chars if you want to enumerate over it. Or better yet use string search functions rather than direct comparisons. In your
code
dictionary[i] is a string
while
target[i] is a single char.
So you test will never be equal when the dictionary elements are longer than one char.
Here is a more Ruby-like way to write your program:
def possible_combinations(dictionary, target)
results = #eventually an array of results
dictionary.each do |d|
str = ''
target.each do |tl
if d == t #dict is not changing but target[t] is changing
puts 'I am ' + d + ' at DICT for now'
puts 'I am at target ' + t + ' now'
puts 'I match somewhere in target so I am added.' #dict[1] is not happening here.
str = << ' ' unless str.empty?
str << d
puts results
else
puts 'forget about me'
end
end
results << str
end
end
It took just a couple minutes to do this translation. I mainly removed the indices, so the iterations are over elements of the dictionary and target objects.

Directory Traversal in Ruby

I have been trying to implement a directory traversal in Ruby for part of a bigger program using the simple recursive approach. However I have found that Dir.foreach does not include the directories inside of it. How can I get them listed?
Code:
def walk(start)
Dir.foreach(start) do |x|
if x == "." or x == ".."
next
elsif File.directory?(x)
walk(x)
else
puts x
end
end
end
The problem is that each time you recurse, the path you pass to File.directory? is no is just the entity (file or directory) name; all context is lost. So say you go into one/two/three/ to check if one/two/three/file.txt is a directory, File.directory? just gets "file.txt" as the path instead of the whole thing, from the perspective of the top-level directory. You have to maintain the relative path each time you recurse. This seems to work fine:
def walk(start)
Dir.foreach(start) do |x|
path = File.join(start, x)
if x == "." or x == ".."
next
elsif File.directory?(path)
puts path + "/" # remove this line if you want; just prints directories
walk(path)
else
puts x
end
end
end
For recursion you should use Find:
From the documentation:
The Find module supports the top-down traversal of a set of file paths.
For example, to total the size of all files under your home directory, ignoring anything in a “dot” directory (e.g. $HOME/.ssh):
require 'find'
total_size = 0
Find.find(ENV["HOME"]) do |path|
if FileTest.directory?(path)
if File.basename(path)[0] == ?.
Find.prune # Don't look any further into this directory.
else
next
end
else
total_size += FileTest.size(path)
end
end

recursive file list in Ruby

I'm new to Ruby (being a Java dev) and trying to implement a method (oh, sorry, a function) that would retrieve and yield all files in the subdirectories recursively.
I've implemented it as:
def file_list_recurse(dir)
Dir.foreach(dir) do |f|
next if f == '.' or f == '..'
f = dir + '/' + f
if File.directory? f
file_list_recurse(File.absolute_path f) { |x| yield x }
else
file = File.new(f)
yield file
end
end
end
My questions are:
Does File.new really OPEN a file? In Java new File("xxx") doesn't... If I need to yield some structure that I could query file info (ctime, size etc) from what would it be in Ruby?
{ |x| yield x } looks a little strange to me, is this OK to do yields from recursive functions like that, or is there some way to avoid it?
Is there any way to avoid checking for '.' and '..' on each iteration?
Is there a better way to implement this?
Thanks
PS:
the sample usage of my method is something like this:
curr_file = nil
file_list_recurse('.') do |file|
curr_file = file if curr_file == nil or curr_file.ctime > file.ctime
end
puts curr_file.to_path + ' ' + curr_file.ctime.to_s
(that would get you the oldest file from the tree)
==========
So, thanks to #buruzaemon I found out the great Dir.glob function which saved me a couple of lines of code.
Also, thanks to #Casper I found out the File.stat method, which made my function run two times faster than with File.new
In the end my code is looking something like this:
i=0
curr_file = nil
Dir.glob('**/*', File::FNM_DOTMATCH) do |f|
file = File.stat(f)
next unless file.file?
i += 1
curr_file = [f, file] if curr_file == nil or curr_file[1].ctime > file.ctime
end
puts curr_file[0] + ' ' + curr_file[1].ctime.to_s
puts "total files #{i}"
=====
By default Dir.glob ignores file names starting with a dot (considered to be 'hidden' in *nix), so it's very important to add the second argument File::FNM_DOTMATCH
How about this?
puts Dir['**/*.*']
According to the docs File.new does open the file. You might want to use File.stat instead, which gathers file-related stats into a queryable object. But note that the stats are gathered at point of creation. Not when you call the query methods like ctime.
Example:
Dir['**/*'].select { |f| File.file?(f) }.map { |f| File.stat(f) }
this thing tells me to consider accepting an answer, I hope it wouldn't mind me answering it myself:
i=0
curr_file = nil
Dir.glob('**/*', File::FNM_DOTMATCH) do |f|
file = File.stat(f)
next unless file.file?
i += 1
curr_file = [f, file] if curr_file == nil or curr_file[1].ctime > file.ctime
end
puts curr_file[0] + ' ' + curr_file[1].ctime.to_s
puts "total files #{i}"
You could use the built-in Find module's find method.
If you are on Windows see my answer here under for a mutch faster (~26 times) way than standard Ruby Dir. If you use mtime it's still going to be waaayyy faster.
If you use another OS you could use the same technique, I'm curious if the gain would be that big but I'm almost certain.
How to find the file path of file that is not the current file in ruby

Increment part of a string in Ruby

I have a method in a Ruby script that is attempting to rename files before they are saved. It looks like this:
def increment (path)
if path[-3,2] == "_#"
print " Incremented file with that name already exists, renaming\n"
count = path[-1].chr.to_i + 1
return path.chop! << count.to_s
else
print " A file with that name already exists, renaming\n"
return path << "_#1"
end
end
Say you have 3 files with the same name being saved to a directory, we'll say the file is called example.mp3. The idea is that the first will be saved as example.mp3 (since it won't be caught by if File.exists?("#{file_path}.mp3") elsewhere in the script), the second will be saved as example_#1.mp3 (since it is caught by the else part of the above method) and the third as example_#2.mp3 (since it is caught by the if part of the above method).
The problem I have is twofold.
1) if path[-3,2] == "_#" won't work for files with an integer of more than one digit (example_#11.mp3 for example) since the character placement will be wrong (you'd need it to be path[-4,2] but then that doesn't cope with 3 digit numbers etc).
2) I'm never reaching problem 1) since the method doesn't reliably catch file names. At the moment it will rename the first to example_#1.mp3 but the second gets renamed to the same thing (causing it to overwrite the previously saved file).
This is possibly too vague for Stack Overflow but I can't find anything that addresses the issue of incrementing a certain part of a string.
Thanks in advance!
Edit/update:
Wayne's method below seems to work on it's own but not when included as part of the whole script - it can increment a file once (from example.mp3 to example_#1.mp3) but doesn't cope with taking example_#1.mp3 and incrementing it to example_#2.mp3. To provide a little more context - currently when the script finds a file to save it is passing the name to Wayne's method like this:
file_name = increment(image_name)
File.open("images/#{file_name}.jpeg", 'w') do |output|
open(image_url) do |input|
output << input.read
end
end
I've edited Wayne's script a little so now it looks like this:
def increment (name)
name = name.gsub(/\s{2,}|(http:\/\/)|(www.)/i, '')
if File.exists?("images/#{name}.jpeg")
_, filename, count, extension = *name.match(/(\A.*?)(?:_#(\d+))?(\.[^.]*)?\Z/)
count = (count || '0').to_i + 1
"#{name}_##{count}#{extension}"
else
return name
end
end
Where am I going wrong? Again, thanks in advance.
A regular expression will git 'er done:
#!/usr/bin/ruby1.8
def increment(path)
_, filename, count, extension = *path.match(/(\A.*?)(?:_#(\d+))?(\.[^.]*)?\Z/)
count = (count || '0').to_i + 1
"#{filename}_##{count}#{extension}"
end
p increment('example') # => "example_#1"
p increment('example.') # => "example_#1."
p increment('example.mp3') # => "example_#1.mp3"
p increment('example_#1.mp3') # => "example_#2.mp3"
p increment('example_#2.mp3') # => "example_#3.mp3"
This probably doesn't matter for the code you're writing, but if you ever may have multiple threads or processes using this algorithm on the same files, there's a race condition when checking for existence before saving: Two writers can both find the same filename unused and write to it. If that matters to you, then open the file in a mode that fails if it exists, rescuing the exception. When the exception occurs, pick a different name. Roughly:
loop do
begin
File.open(filename, File::CREAT | File::EXCL | File::WRONLY) do |file|
file.puts "Your content goes here"
end
break
rescue Errno::EEXIST
filename = increment(filename)
redo
end
end
Here's a variation that doesn't accept a file name with an existing count:
def non_colliding_filename( filename )
if File.exists?(filename)
base,ext = /\A(.+?)(\.[^.]+)?\Z/.match( filename ).to_a[1..-1]
i = 1
i += 1 while File.exists?( filename="#{base}_##{i}#{ext}" )
end
filename
end
Proof:
%w[ foo bar.mp3 jim.bob.mp3 ].each do |desired|
3.times{
file = non_colliding_filename( desired )
p file
File.open( file, 'w' ){ |f| f << "tmp" }
}
end
#=> "foo"
#=> "foo_#1"
#=> "foo_#2"
#=> "bar.mp3"
#=> "bar_#1.mp3"
#=> "bar_#2.mp3"
#=> "jim.bob.mp3"
#=> "jim.bob_#1.mp3"
#=> "jim.bob_#2.mp3"

Avoiding making multiple calls to Find.find("./") in Ruby

I am not sure what is the best strategy for this. I have a class, where I can search the filesystem for a certain pattern of files. I want to execute Find.find("./") only once. how would I approach this:
def files_pattern(pattern)
Find.find("./") do |f|
if f.include? pattern
#fs << f
end
end
end
Remembering the (usually computationally intensive) result of a method call so that you don't need to recalculate it next time the method is called is known as memoization so you will probably want to read more about that.
One way of achieving that it Ruby is to use a little wrapper class that stores the result in an instance variable. e.g.
class Finder
def initialize(pattern)
#pattern = pattern
end
def matches
#matches ||= find_matches
end
private
def find_matches
fs = []
Find.find("./") do |f|
if f.include? #pattern
fs << f
end
end
fs
end
end
And then you can do:
irb(main):089:0> f = Finder.new 'xml'
=> #<Finder:0x2cfc568 #pattern="xml">
irb(main):090:0> f.matches
find_matches
=> ["./example.xml"]
irb(main):091:0> f.matches # won't result in call to find_matches
=> ["./example.xml"]
Note: the ||= operator performs an assignment only if the variable on the left hand side does evaluates to false. i.e. #matches ||= find_matches is shorthand for #matches = #matches || find_matches where find_matches will only be called the first time due to short circuit evaluation. There are lots of other questions explaining it on Stackoverflow.
Slight variation: You could change your method to return a list of all files and then use methods from Enumerable such as grep and select to perform multiple searches against the same list of files. Of course, this has the downside of keeping the entire list of files in memory. Here is an example though:
def find_all
fs = []
Find.find("./") do |f|
fs << f
end
fs
end
And then use it like:
files = find_all
files.grep /\.xml/
files.select { |f| f.include? '.cpp' }
# etc
If I understand your question correctly you want to run Find.find to assign the result to an instance variable. You can move what is now the block to a separate method and call that to return only files matching your pattern.
Only problem is that if the directory contains many files, you are holding a big array in memory.
how about system "find / -name #{my_pattern}"

Resources