Too much nesting in ruby? - ruby

Surely there must be a better way of doing this:
File.open('Data/Networks/to_process.txt', 'w') do |out|
Dir['Data/Networks/*'].each do |f|
if File.directory?(f)
File.open("#{f}/list.txt").each do |line|
out.puts File.basename(f) + "/" + line.split(" ")[0]
end
end
end
end
Cheers!

You can rid of 1 level of nesting by utilizing Guard Clause pattern:
File.open('Data/Networks/to_process.txt', 'w') do |out|
Dir['Data/Networks/*'].each do |f|
next unless File.directory?(f)
File.open("#{f}/list.txt").each do |line|
out.puts File.basename(f) + "/" + line.split(" ")[0]
end
end
end
See Jeff Atwood's article on this approach.

IMHO there's nothing wrong with your code, but you could do the directory globbing and the check from the if in one statement, saving one level of nesting:
Dir.glob('Data/Networks/*').select { |fn| File.directory?(fn) }.each do |f|
...
end

Since you're looking for a particular file in each of the directories, just let Dir#[] find them for you, completely eliminating the need to check for a directory. In addition, IO#puts will accept an array, putting each element on a new line. This will get rid of another level of nesting.
File.open('Data/Networks/to_process.txt', 'w') do |out|
Dir['Data/Networks/*/list.txt'] do |file|
dir = File.basename(File.dirname(file))
out.puts File.readlines(file).map { |l| "#{dir}/#{l.split.first}" }
end
end

Reducing the nesting a bit by separating the input from the output:
directories = Dir['Data/Networks/*'].find_all{|f| File.directory?(f)}
output_lines = directories.flat_map do |f|
output_lines_for_directory = File.open("#{f}/list.txt").map do |line|
File.basename(f) + "/" + line.split(" ")[0]
end
end
File.open('Data/Networks/to_process.txt', 'w') do |out|
out.puts output_lines.join("\n")
end

Related

Refactoring my code so that file closes automatically once loaded, how does the syntax work?

My program loads a list from a file, and I'm trying to change the method so that it closes automatically.
I've looked at the Ruby documentation, the broad stackoverflow answer, and this guy's website, but the syntax is always different and doesn't mean much to me yet.
My original load:
def load_students(filename = "students.csv")
if filename == nil
filename = "students.csv"
elsif filename == ''
filename = "students.csv"
end
file = File.open(filename, "r")
file.readlines.each do |line|
name, cohort = line.chomp.split(",")
add_students(name).to_s
end
file.close
puts "List loaded from #{filename}."
end
My attempt to close automatically:
def load_students(filename = "students.csv")
if filename == nil
filename = "students.csv"
elsif filename == ''
filename = "students.csv"
end
open(filename, "r", &block)
line.each do |line|
name, cohort = line.chomp.split(",")
add_students(name).to_s
end
puts "List loaded from #{filename}."
end
I'm looking for the same result, but without having to manually close the file.
I don't think it'll be much different, so how does the syntax work for automatically closing with blocks?
File.open(filename, 'r') do |file|
file.readlines.each do |line|
name, cohort = line.chomp.split(",")
add_students(name).to_s
end
end
I’d refactor the whole code:
def load_students(filename = "students.csv")
filename = "students.csv" if filename.to_s.empty?
File.open(filename, "r") do |file|
file.readlines.each do |line|
add_students(line.chomp.split(",").first)
end
end
puts "List loaded from #{filename}."
end
Or, even better, as suggested by Kimmo Lehto in comments:
def load_students(filename = "students.csv")
filename = "students.csv" if filename.to_s.empty?
File.foreach(filename) do |line|
add_students(line.chomp.split(",").first)
end
puts "List loaded from #{filename}."
end

output lines to separate line in ruby

I have a file where I search for specific lines, like this:
<ClCompile Include="..\..\..\Source\fileA.c" />
<ClCompile Include="..\..\..\Tests\fileB.c" />
In my script I can find this lines and extract only the path string between the double qoutes . When I find them, I save it to an array (which I use later in my code). It looks like this:
source_path_array = []
File.open(file_name) do |f|
f.each_line {|line|
if line =~ /<ClCompile Include="..\\/
source_path = line.scan(/".*.c"/)
###### Add path to array ######
source_path_array << source_path
end
}
end
So far, everything OK. Later in my script I output the array within an other file to a line "Source Files":
f.puts "Source Files= #{source_path_array.flatten.join(" ")}"
The result is than like this:
Source Files= "..\..\..\Source\fileA.c" "..\..\..\Tests\fileB.c"
I would like to have the output in this form:
Source Files=..\..\..\Source\fileA.c
Source Files=..\..\..\Tests\fileB.c
As you can see, each path in an separate line with the string "Source Files" before and also without double quotes. Any idea? Maybe my concept with the array is also not the best.
Don't use #join, then. Use #each or #map. Also, you can use #gsub to remove the quotes:
source_path_array.flatten.each do |path|
f.puts "Source Files=#{path.gsub(/(^"|")$/, '')}"
end
or
f.puts source_path_array.flatten.map do |path|
"Source Files=#{path.gsub(/(^"|")$/, '')}"
end.join("\n")
The second version is probably more I/O efficient.
For this to work (and as an answer to the second part of your question), the source_path_array should contain strings. Here's a way to obtain this:
regex = /<ClCompile Include="(\.\.\\[^"]+)/
File.open(file_name) do |f|
f.each_line do |line|
regex.match(line) do |matches|
source_path_array << matches[1]
end
end
end
If you don't mind reading the entire file in memory at once, this is slightly shorter:
regex = /<ClCompile Include="(\.\.\\[^"]+)/
File.read(file_name).split(/(\r?\n)+/).each do |line|
regex.match(line) do |matches|
source_path_array << matches[1]
end
end
Finally, here's an example using Nokogiri:
require 'nokogiri'
source_path_array = File.open(file_name) do |f|
Nokogiri::XML(f)
end.css('ClCompile[Include^=..\\]').map{|el| el['Include']}
All of these parse out the quotes, so you can remove the #gsub from the first portion.
All together now:
require 'nokogiri'
f.puts File.open(file_name) do |source|
Nokogiri::XML(source)
end.css('ClCompile[Include^=..\\]').map do |el|
"Source Files=#{el['Include']}"
end.join("\n")
and let's not loop twice (#map then #join) when once (a single #reduce) is doable:
require 'nokogiri'
f.puts File.open(file_name) do |source|
Nokogiri::XML(source)
end.css('ClCompile[Include^=..\\]').reduce('') do |memo, el|
memo += "Source Files=#{el['Include']}\n"
end.chomp
Thanks to #Félix Saparelli:
The following worked for me:
source_path_array.flatten.each do |path|
f.puts "Source Files=#{path.delete('"')}"
end

How to take the result from another method

I have a directory structure with sub-directories:
../../../../../MY_PROJECT/TEST_A/cats/
../../../../../MY_PROJECT/TEST_B/dogs/
../../../../../MY_PROJECT/TEST_A/tigers/
../../../../../MY_PROJECT/TEST_A/elephants/
each of which has a file that ends with ".sln":
../../../../../MY_PROJECT/TEST_A/cats/cats.sln
../../../../../MY_PROJECT/TEST_B/dogs/dogs.sln
...
These files contain information specific to their directory. I would like to do the following:
Create a file "myfile.txt" within each sub-directory, and write some strings to them:
../../../../../MY_PROJECT/TEST_A/cats/myfile.txt
../../../../../MY_PROJECT/TEST_B/dogs/myfile.txt
../../../../../MY_PROJECT/TEST_A/tigers/myfile.txt
../../../../../MY_PROJECT/TEST_A/elephants/myfile.txt
Copy a specific string in the ".sln" files to the myfile.txt of certain directories using the following method:
def parse_sln_files
sln_files = Dir["../../../../../MY_PROJECT/TEST_*/**/*.sln"]
sln_files.each do |file_name|
File.open(file_name) do |f|
f.each_line { |line|
if line =~ /C Source files ="..\\/ #"
path = line.scan(/".*.c"/)
puts path
end
}
end
end
end
I would like to do something like this:
def create_myfile
Dir['../../../../../MY_PROJECT/TEST_*/*/'].each do |dir|
File.new File.join(dir, 'myfile.txt'), 'w+'
Dir['../../../../../TEST/TEST_*/*/myfile.txt'].each do |path|
File.open(path,'w+') do |f|
f.puts "some text...."
f.puts "some text..."
f.puts # here I would like to return the result of parse_sln_files
end
end
end
end
Any suggestions on how to express this?
It seems like you want to read list of C file names from a Visual C++ Solution file, and store in a separate file in the same directory. You may have to merge the two loops that you have shown in your code, and do something like this:
def parse_sln_and_store_source_files
sln_files = Dir["../../../../../MY_PROJECT/TEST_*/**/*.sln"]
sln_files.each do |file_name|
#### Lets collect source file names in this array
source_file_names = []
File.open(file_name) do |f|
f.each_line { |line|
if line =~ /C Source files ="..\\/ #"
path = line.scan(/".*.c"/)
############ Add path to array ############
source_file_names << path
end
}
end
#### lets create `myfile.txt` in same dir as that of .sln
test_file = File.expand_path(File.dirname(file_name)) + "/myfile.txt"
File.open(test_file,'w+') do |f|
f.puts "some text...."
f.puts "some text..."
##### Iterate over source file names & write to file
source_file_names.each { |n| f.puts n }
end
end
end
This can be done bit more elegantly with few more refactoring. Also note that this is not tested code, hopefully, you get the gist of what I am suggesting.

Nokogiri and XPath: saving text result of scrape

I would like to save the text results of a scrape in a file. This is my current code:
require "rubygems"
require "open-uri"
require "nokogiri"
class Scrapper
attr_accessor :html, :single
def initialize(url)
download = open(url)
#page = Nokogiri::HTML(download)
#html = #page.xpath('//div[#class = "quoteText"andfollowing-sibling::div[1][#class = "quoteFooter" and .//a[#href and normalize-space() = "hard-work"]]]')
end
def get_quotes
#quotes_array = #html.collect {|node| node.text.strip}
#single = #quotes_array.each do |quote|
quote.gsub(/\s{2,}/, " ")
end
end
end
I know that I can write a file like this:
File.open('text.txt', 'w') do |fo|
fo.write(content)
but I don't know how to incorporate #single which holds the results of my scrape. Ultimate goal is to insert the information into a database.
I have come across some folks using Yaml but I am finding it hard to follow the step to step guide.
Can anyone point me in the right direction?
Thank you.
Just use:
#single = #quotes_array.map do |quote|
quote.squeeze(' ')
end
File.open('text.txt', 'w') do |fo|
fo.puts #single
end
Or:
File.open('text.txt', 'w') do |fo|
fo.puts #quotes_array.map{ |q| q.squeeze(' ') }
end
and don't bother creating #single.
Or:
File.open('text.txt', 'w') do |fo|
fo.puts #html.collect { |node| node.text.strip.squeeze(' ') }
end
and don't bother creating #single or #quotes_array.
squeeze is part of the String class. This is from the documentation:
" now is the".squeeze(" ") #=> " now is the"

Get names of all files from a folder with Ruby

I want to get all file names from a folder using Ruby.
You also have the shortcut option of
Dir["/path/to/search/*"]
and if you want to find all Ruby files in any folder or sub-folder:
Dir["/path/to/search/**/*.rb"]
Dir.entries(folder)
example:
Dir.entries(".")
Source: http://ruby-doc.org/core/classes/Dir.html#method-c-entries
The following snippets exactly shows the name of the files inside a directory, skipping subdirectories and ".", ".." dotted folders:
Dir.entries("your/folder").select { |f| File.file? File.join("your/folder", f) }
To get all files (strictly files only) recursively:
Dir.glob('path/**/*').select { |e| File.file? e }
Or anything that's not a directory (File.file? would reject non-regular files):
Dir.glob('path/**/*').reject { |e| File.directory? e }
Alternative Solution
Using Find#find over a pattern-based lookup method like Dir.glob is actually better. See this answer to "One-liner to Recursively List Directories in Ruby?".
This works for me:
If you don't want hidden files[1], use Dir[]:
# With a relative path, Dir[] will return relative paths
# as `[ './myfile', ... ]`
#
Dir[ './*' ].select{ |f| File.file? f }
# Want just the filename?
# as: [ 'myfile', ... ]
#
Dir[ '../*' ].select{ |f| File.file? f }.map{ |f| File.basename f }
# Turn them into absolute paths?
# [ '/path/to/myfile', ... ]
#
Dir[ '../*' ].select{ |f| File.file? f }.map{ |f| File.absolute_path f }
# With an absolute path, Dir[] will return absolute paths:
# as: [ '/home/../home/test/myfile', ... ]
#
Dir[ '/home/../home/test/*' ].select{ |f| File.file? f }
# Need the paths to be canonical?
# as: [ '/home/test/myfile', ... ]
#
Dir[ '/home/../home/test/*' ].select{ |f| File.file? f }.map{ |f| File.expand_path f }
Now, Dir.entries will return hidden files, and you don't need the wildcard asterix (you can just pass the variable with the directory name), but it will return the basename directly, so the File.xxx functions won't work.
# In the current working dir:
#
Dir.entries( '.' ).select{ |f| File.file? f }
# In another directory, relative or otherwise, you need to transform the path
# so it is either absolute, or relative to the current working dir to call File.xxx functions:
#
home = "/home/test"
Dir.entries( home ).select{ |f| File.file? File.join( home, f ) }
[1] .dotfile on unix, I don't know about Windows
In Ruby 2.5 you can now use Dir.children. It gets filenames as an array except for "." and ".."
Example:
Dir.children("testdir") #=> ["config.h", "main.rb"]
http://ruby-doc.org/core-2.5.0/Dir.html#method-c-children
Personally, I found this the most useful for looping over files in a folder, forward looking safety:
Dir['/etc/path/*'].each do |file_name|
next if File.directory? file_name
end
This is a solution to find files in a directory:
files = Dir["/work/myfolder/**/*.txt"]
files.each do |file_name|
if !File.directory? file_name
puts file_name
File.open(file_name) do |file|
file.each_line do |line|
if line =~ /banco1/
puts "Found: #{line}"
end
end
end
end
end
this code returns only filenames with their extension (without a global path)
Dir.children("/path/to/search/")
=> [file_1.rb, file_2.html, file_3.js]
While getting all the file names in a directory, this snippet can be used to reject both directories [., ..] and hidden files which start with a .
files = Dir.entries("your/folder").reject {|f| File.directory?(f) || f[0].include?('.')}
This is what works for me:
Dir.entries(dir).select { |f| File.file?(File.join(dir, f)) }
Dir.entries returns an array of strings. Then, we have to provide a full path of the file to File.file?, unless dir is equal to our current working directory. That's why this File.join().
Dir.new('/home/user/foldername').each { |file| puts file }
You may also want to use Rake::FileList (provided you have rake dependency):
FileList.new('lib/*') do |file|
p file
end
According to the API:
FileLists are lazy. When given a list of glob patterns for possible
files to be included in the file list, instead of searching the file
structures to find the files, a FileList holds the pattern for latter
use.
https://docs.ruby-lang.org/en/2.1.0/Rake/FileList.html
One simple way could be:
dir = './' # desired directory
files = Dir.glob(File.join(dir, '**', '*')).select{|file| File.file?(file)}
files.each do |f|
puts f
end
def get_path_content(dir)
queue = Queue.new
result = []
queue << dir
until queue.empty?
current = queue.pop
Dir.entries(current).each { |file|
full_name = File.join(current, file)
if not (File.directory? full_name)
result << full_name
elsif file != '.' and file != '..'
queue << full_name
end
}
end
result
end
returns file's relative paths from directory and all subdirectories
If you want get an array of filenames including symlinks, use
Dir.new('/path/to/dir').entries.reject { |f| File.directory? f }
or even
Dir.new('/path/to/dir').reject { |f| File.directory? f }
and if you want to go without symlinks, use
Dir.new('/path/to/dir').select { |f| File.file? f }
As shown in other answers, use Dir.glob('/path/to/dir/**/*') instead of Dir.new('/path/to/dir') if you want to get all the files recursively.
In addition to the suggestions in this thread, I wanted to mention that if you need to return dot files as well (.gitignore, etc), with Dir.glob you would need to include a flag as so:
Dir.glob("/path/to/dir/*", File::FNM_DOTMATCH)
By default, Dir.entries includes dot files, as well as current a parent directories.
For anyone interested, I was curious how the answers here compared to each other in execution time, here was the results against deeply nested hierarchy. The first three results are non-recursive:
user system total real
Dir[*]: (34900 files stepped over 100 iterations)
0.110729 0.139060 0.249789 ( 0.249961)
Dir.glob(*): (34900 files stepped over 100 iterations)
0.112104 0.142498 0.254602 ( 0.254902)
Dir.entries(): (35600 files stepped over 100 iterations)
0.142441 0.149306 0.291747 ( 0.291998)
Dir[**/*]: (2211600 files stepped over 100 iterations)
9.399860 15.802976 25.202836 ( 25.250166)
Dir.glob(**/*): (2211600 files stepped over 100 iterations)
9.335318 15.657782 24.993100 ( 25.006243)
Dir.entries() recursive walk: (2705500 files stepped over 100 iterations)
14.653018 18.602017 33.255035 ( 33.268056)
Dir.glob(**/*, File::FNM_DOTMATCH): (2705500 files stepped over 100 iterations)
12.178823 19.577409 31.756232 ( 31.767093)
These were generated with the following benchmarking script:
require 'benchmark'
base_dir = "/path/to/dir/"
n = 100
Benchmark.bm do |x|
x.report("Dir[*]:") do
i = 0
n.times do
i = i + Dir["#{base_dir}*"].select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.glob(*):") do
i = 0
n.times do
i = i + Dir.glob("#{base_dir}/*").select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.entries():") do
i = 0
n.times do
i = i + Dir.entries(base_dir).select {|f| !File.directory? File.join(base_dir, f)}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir[**/*]:") do
i = 0
n.times do
i = i + Dir["#{base_dir}**/*"].select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.glob(**/*):") do
i = 0
n.times do
i = i + Dir.glob("#{base_dir}**/*").select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.entries() recursive walk:") do
i = 0
n.times do
def walk_dir(dir, result)
Dir.entries(dir).each do |file|
next if file == ".." || file == "."
path = File.join(dir, file)
if Dir.exist?(path)
walk_dir(path, result)
else
result << file
end
end
end
result = Array.new
walk_dir(base_dir, result)
i = i + result.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
x.report("Dir.glob(**/*, File::FNM_DOTMATCH):") do
i = 0
n.times do
i = i + Dir.glob("#{base_dir}**/*", File::FNM_DOTMATCH).select {|f| !File.directory? f}.length
end
puts " (#{i} files stepped over #{n} iterations)"
end
end
The differences in file counts are due to Dir.entries including hidden files by default. Dir.entries ended up taking a bit longer in this case due to needing to rebuild the absolute path of the file to determine if a file was a directory, but even without that it was still taking consistently longer than the other options in the recursive case. This was all using ruby 2.5.1 on OSX.
When loading all names of files in the operating directory you can use
Dir.glob("*)
This will return all files within the context that the application is running in (Note for Rails this is the top level directory of the application)
You can do additional matching and recursive searching found here https://ruby-doc.org/core-2.7.1/Dir.html#method-c-glob
if you create directories with spaces:
mkdir "a b"
touch "a b/c"
You don't need to escape the directory names, it will do it automatically:
p Dir["a b/*"] # => ["a b/c"]

Resources