Ruby - Get all files in a directory from shallower to deeper - ruby

By the magic of Dir I can get all files in a directory:
Dir['lib/**/*.rb']
=> ["lib/a.rb", "lib/foo/bar/c.rb", "lib/foo/b.rb"]
But I want to iterate them from shallower to deeper. i.e. a.rb -> b.rb -> c.rb.
Any suggestion?

Well, you could sort them by the amount of slashes, which may not be very efficient but easy:
["lib/a.rb", "lib/foo/bar/c.rb", "lib/foo/b.rb"].sort_by { |s| s.count('/') }
#=> ["lib/a.rb", "lib/foo/b.rb", "lib/foo/bar/c.rb"]
Or use group_by and get an array of files per directory level:
["lib/a.rb", "lib/foo/bar/c.rb", "lib/foo/b.rb"].group_by { |s| s.count('/') }
#=> {1=>["lib/a.rb"], 3=>["lib/foo/bar/c.rb"], 2=>["lib/foo/b.rb"]}

Related

Net::SFTP sort directory files?

I'm currently doing the following to get a list of all the files in a directory:
Net::SFTP.start('host', 'username', :password => 'password') do |sftp|
sftp.dir.foreach("/path") do |entry|
puts entry.name
end
end
But that lists the files seemingly at random. I need to order the files by name.
So, how can I sort the files by name?
Since SFTP is just returning the sorting that was sent by your server, you could manually sort the results:
entries = sftp.dir.entries("/path").sort_by(&:name)
entries.each do |entry|
puts entry.name
end
This isn't quite what OP was looking for, but here's a sample of sorting by modified date, to list the oldest files first. You could easily adapt this to sort by any other attributes, reverse sort, etc.
It also filters out directories and dot-files, and ultimately only returns the filename, with no preceding path.
def files_to_process
sftp.dir
.glob(inbox_path, '*')
.reject { |file| file.name.starts_with?('.') }
.select(&:file?)
.sort { |a, b| a.attributes.mtime <=> b.attributes.mtime }
.map(&:name)
end

How to Store and Retrieve a Hash by an Array in Ruby

I want to build a hash that is used to store a directory. I want to have multiple levels of keys. At the match point, I want an array of files. It is like a directory structure on a computer. It seems a hash is the best way to do this.
Given that I have an array of folders ["folder1", "folder1a", "folder1ax"], how do I:
Set a hash using the folder structure as the key and the file as the value in an array, and
Query the hash using the folder structure?
I'm using this to parse out URLs to show them in a folder structure, and it's very similar to dumping into JSTree in a Rails app. So, if you have a better alternative for how to display 5000 URLs that works great with Rails views, please provide an alternative.
This is a starting point:
dirs = %w(Downloads)
Hash[ dirs.map{ |dir| [dir, Dir.glob("#{dir}/*")] } ]
This is the result:
{"Downloads"=> ["Downloads/jquery-ui-1.9.1.custom.zip", ... ] }
You can refine the code f.e. making it recursive, removing the folder name from the array results... this is an example of recursive implementation:
class Dir
def self.ls_r(dir)
Hash[ dir,
entries(dir).reject{ |entry| %w(. ..).include?(entry) }.map do |entry|
entry_with_dir = File.join(dir, entry)
File.directory?(entry_with_dir) ? ls_r(entry_with_dir) : entry
end ]
end
end
puts Dir.ls_r('~/Downloads').inspect
#=> { "Downloads" => ["file1", {"Downloads/folder1"=>["subfile1"], ... ] } ... }
Note that this is not the best implementation, because the recursion doesn't take in consideration that the children folders keys should be relative to the respective parent keys; to resolve this issue, this info should be maintained through the recursion:
class Dir
def self.ls_r(dir, key_as_last_path_component = false)
Hash[ (key_as_last_path_component ? File.split(dir).last : dir),
entries(dir).reject{ |entry| %w(. ..).include?(entry) }.map do |entry|
entry_with_dir = File.join(dir, entry)
File.directory?(entry_with_dir) ? ls_r(entry_with_dir, true) : entry
end ]
end
end
puts Dir.ls_r('~/Downloads').inspect
#=> { "Downloads" => ["file1", {"folder1"=>["subfile1"], ... ] } ... }
and now the children folders are relative to their parent keys.

In Ruby, how to be warned of duplicate keys in hashes when loading a YAML document?

In the following Ruby example, is there a mode to have YAML NOT silently ignore the duplicate key 'one'?
irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> str = '{ one: 1, one: 2 }'
=> "{ one: 1, one: 2 }"
irb(main):003:0> YAML.load(str)
=> {"one"=>2}
Thanks!
Using Psych, you can traverse the AST tree to find duplicate keys. I'm using the following helper method in my test suite to validate that there are no duplicate keys in my i18n translations:
def duplicate_keys(file_or_content)
yaml = file_or_content.is_a?(File) ? file_or_content.read : file_or_content
duplicate_keys = []
validator = ->(node, parent_path) do
if node.is_a?(Psych::Nodes::Mapping)
children = node.children.each_slice(2) # In a Mapping, every other child is the key node, the other is the value node.
duplicates = children.map { |key_node, _value_node| key_node }.group_by(&:value).select { |_value, nodes| nodes.size > 1 }
duplicates.each do |key, nodes|
duplicate_key = {
file: (file_or_content.path if file_or_content.is_a?(File)),
key: parent_path + [key],
occurrences: nodes.map { |occurrence| "line: #{occurrence.start_line + 1}" },
}.compact
duplicate_keys << duplicate_key
end
children.each { |key_node, value_node| validator.call(value_node, parent_path + [key_node.try(:value)].compact) }
else
node.children.to_a.each { |child| validator.call(child, parent_path) }
end
end
ast = Psych.parse_stream(yaml)
validator.call(ast, [])
duplicate_keys
end
There is a solution involving a linter, but I'm not sure it will be relevant to
you since it's not a 100% Ruby solution. I'll post it anyway since I don't know
any way to do this in Ruby:
You can use the yamllint command-line tool:
sudo pip install yamllint
Specifically, it has a rule key-duplicates that detects duplicated keys:
$ cat test.yml
{ one: 1, one: 2 }
$ yamllint test.yml
test.yml
1:11 error duplication of key "one" in mapping (key-duplicates)
One of the things I do to help maintain the YAML files I use, is write code to initially generate it from a known structure in Ruby. That gets me started.
Then, I'll write a little snippet that loads it and outputs what it parsed using either PrettyPrint or Awesome Print so I can compare that to the file.
I also sort the fields as necessary to make it easy to look for duplicates.
No. You'd have to decide how to rename the keys since hash keys have to be unique - I'd go for some workaround like manually looking for keys that are the same and renaming them before you do a YAML::load.

Search for text in files in the path using ruby

I need to search all the *.c source files in the path to find a reference to a *.h header to find unused C headers. I wrote a ruby script but it feel very clumsy.
I create an array with all C files and an array with all the H files.
I iterate over the header file array. For each header I open each C file and look for a reference to the header.
Is there a easier or better way?
require 'ftools'
require 'find'
# add a file search
class File
def self.find(dir, filename="*.*", subdirs=true)
Dir[ subdirs ? File.join(dir.split(/\\/), "**", filename) : File.join(dir.split(/\\/), filename) ]
end
end
files = File.find(".", "*.c", true)
headers = File.find(".", "*.h", true)
headers.each do |file|
#puts "Searching for #{file}(#{File.basename(file)})"
found = 0
files.each do |cfile|
#puts "searching in #{cfile}"
if File.read(cfile).downcase.include?(File.basename(file).downcase)
found += 1
end
end
puts "#{file} used #{found} times"
end
As already pointed out, you can use Dir#glob to simplify your file-finding. You could also consider switching your loops, which would mean opening each C file once, instead of once per H file.
I'd consider going with something like the following, which ran on the Ruby source in 3 seconds:
# collect the File.basename for all h files in tree
hfile_names = Dir.glob("**/*.h").collect{|hfile| File.basename(hfile) }
h_counts = Hash.new(0) # somewhere to store the counts
Dir.glob("**/*.c").each do |cfile| # enumerate the C files
file_text = File.read(cfile) # downcase here if necessary
hfile_names.each do |hfile|
h_counts[hfile] += 1 if file_text.include?(hfile)
end
end
h_counts.each { |file, found| puts "#{file} used #{found} times" }
EDIT: That won't list H files not referenced in any C files. To be certain to catch those, the hash would have to be explicitly initialised:
h_counts = {}
hfile_names.each { |hfile| h_counts[hfile] = 0 }
To search *.c and *.h files, you could use Dir.glob
irb(main):012:0> Dir.glob("*.[ch]")
=> ["test.c", "test.h"]
To search across any subdirectory, you can pass **/*
irb(main):013:0> Dir.glob("**/*.[ch]")
=> ["src/Python-2.6.2/Demo/embed/demo.c", "src/Python-2.6.2/Demo/embed/importexc.c",
.........
Well, once you've found your .c files, you can do this to them:
1) open the file and store the text in a variable
2) use 'grep' : http://ruby-doc.org/core/classes/Enumerable.html#M003121
FileList in the Rake API is very useful for this. Just be aware of the list size growing larger than you have memory to handle. :)
http://rake.rubyforge.org/

Getting a list of folders in a directory

How do I get a list of the folders that exist in a certain directory with ruby?
Dir.entries() looks close but I don't know how to limit to folders only.
I've found this more useful and easy to use:
Dir.chdir('/destination_directory')
Dir.glob('*').select {|f| File.directory? f}
it gets all folders in the current directory, excluded . and ...
To recurse folders simply use ** in place of *.
The Dir.glob line can also be passed to Dir.chdir as a block:
Dir.chdir('/destination directory') do
Dir.glob('*').select { |f| File.directory? f }
end
Jordan is close, but Dir.entries doesn't return the full path that File.directory? expects. Try this:
Dir.entries('/your_dir').select {|entry| File.directory? File.join('/your_dir',entry) and !(entry =='.' || entry == '..') }
In my opinion Pathname is much better suited for filenames than plain strings.
require "pathname"
Pathname.new(directory_name).children.select { |c| c.directory? }
This gives you an array of all directories in that directory as Pathname objects.
If you want to have strings
Pathname.new(directory_name).children.select { |c| c.directory? }.collect { |p| p.to_s }
If directory_name was absolute, these strings are absolute too.
Recursively find all folders under a certain directory:
Dir.glob 'certain_directory/**/*/'
Non-recursively version:
Dir.glob 'certain_directory/*/'
Note: Dir.[] works like Dir.glob.
With this one, you can get the array of a full path to your directories, subdirectories, subsubdirectories in a recursive way.
I used that code to eager load these files inside config/application file.
Dir.glob("path/to/your/dir/**/*").select { |entry| File.directory? entry }
In addition we don't need deal with the boring . and .. anymore. The accepted answer needed to deal with them.
directory = 'Folder'
puts Dir.entries(directory).select { |file| File.directory? File.join(directory, file)}
You can use File.directory? from the FileTest module to find out if a file is a directory. Combining this with Dir.entries makes for a nice one(ish)-liner:
directory = 'some_dir'
Dir.entries(directory).select { |file| File.directory?(File.join(directory, file)) }
Edit: Updated per ScottD's correction.
Dir.glob('/your_dir').reject {|e| !File.directory?(e)}
$dir_target = "/Users/david/Movies/Camtasia 2/AzureMobileServices.cmproj/media"
Dir.glob("#{$dir_target}/**/*").each do |f|
if File.directory?(f)
puts "#{f}\n"
end
end
For a generic solution you probably want to use
Dir.glob(File.expand_path(path))
This will work with paths like ~/*/ (all folders within your home directory).
We can combine Borh's answer and johannes' answer to get quite an elegant solution to getting the directory names in a folder.
# user globbing to get a list of directories for a path
base_dir_path = ''
directory_paths = Dir.glob(File.join(base_dir_path, '*', ''))
# or recursive version:
directory_paths = Dir.glob(File.join(base_dir_path, '**', '*', ''))
# cast to Pathname
directories = directory_paths.collect {|path| Pathname.new(path) }
# return the basename of the directories
directory_names = directories.collect {|dir| dir.basename.to_s }
Only folders ('.' and '..' are excluded):
Dir.glob(File.join(path, "*", File::SEPARATOR))
Folders and files:
Dir.glob(File.join(path, "*"))
I think you can test each file to see if it is a directory with FileTest.directory? (file_name). See the documentation for FileTest for more info.

Resources