Refactoring my code so that file closes automatically once loaded, how does the syntax work? - ruby

My program loads a list from a file, and I'm trying to change the method so that it closes automatically.
I've looked at the Ruby documentation, the broad stackoverflow answer, and this guy's website, but the syntax is always different and doesn't mean much to me yet.
My original load:
def load_students(filename = "students.csv")
if filename == nil
filename = "students.csv"
elsif filename == ''
filename = "students.csv"
end
file = File.open(filename, "r")
file.readlines.each do |line|
name, cohort = line.chomp.split(",")
add_students(name).to_s
end
file.close
puts "List loaded from #{filename}."
end
My attempt to close automatically:
def load_students(filename = "students.csv")
if filename == nil
filename = "students.csv"
elsif filename == ''
filename = "students.csv"
end
open(filename, "r", &block)
line.each do |line|
name, cohort = line.chomp.split(",")
add_students(name).to_s
end
puts "List loaded from #{filename}."
end
I'm looking for the same result, but without having to manually close the file.
I don't think it'll be much different, so how does the syntax work for automatically closing with blocks?

File.open(filename, 'r') do |file|
file.readlines.each do |line|
name, cohort = line.chomp.split(",")
add_students(name).to_s
end
end
I’d refactor the whole code:
def load_students(filename = "students.csv")
filename = "students.csv" if filename.to_s.empty?
File.open(filename, "r") do |file|
file.readlines.each do |line|
add_students(line.chomp.split(",").first)
end
end
puts "List loaded from #{filename}."
end
Or, even better, as suggested by Kimmo Lehto in comments:
def load_students(filename = "students.csv")
filename = "students.csv" if filename.to_s.empty?
File.foreach(filename) do |line|
add_students(line.chomp.split(",").first)
end
puts "List loaded from #{filename}."
end

Related

How to remove redundant file open operation in ruby

I made a ruby program to copy content of one CSV file to a new CSV file.
This is my code -
require 'csv'
class CopyFile
def self.create_duplicate_file(file_name)
CSV.open(file_name, "wb") do |output_row|
output_row << CSV.open('input.csv', 'r') { |csv| csv.first }
CSV.foreach('input.csv', headers: true) do |row|
output_row << row
end
end
end
end
puts "Insert duplicate file name"
file_name = gets.chomp
file_name = file_name+".csv"
CopyFile.create_duplicate_file(file_name)
puts "\nDuplicate File Created."
I am opening the input.csv file twice, one to copy headers and then to copy content.
I want to optimise my code. So is there a way to optimise it further?
Just use the cp method:
FileUtils.cp(src, destination, options), no need to reinvent the wheel, like this:
class CopyFile
def self.create_duplicate_file(file_name)
FileUtils.cp('input.csv',file_name)
end
end
or better yet:
file_name = gets.chomp
file_name = file_name+".csv"
FileUtils.cp('input.csv', file_name)

how do I save the parsed data to a file

I wounder how I can save the parsed data to a txt file. My script is only saving the last parsed. Do i need to add .each do ? kind of lost right now
here is my code and if maybe somebody could explain to me how save the parsed info on a new line
here is the code
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://www.clearsearch.se/foretag/-/q_advokat/1/"
doc = Nokogiri::HTML(open(url))
doc.css(".gray-border-bottom").each do |item|
title = item.css(".medium").text.strip
phone = item.css(".grayborderwrapper > .bold").text.strip
adress = item.css(".grayborder span").text.strip
www = item.css(".click2www").map { |link| link['href'] }
puts "#{title} ; \n"
puts "#{phone} ; \n"
puts "#{adress} ; \n"
puts "#{www} ; \n\n\n"
puts "Writing"
company = "#{title}; #{phone}; #{adress}; #{www} \n\n"
puts "saving"
file = File.open("exporterad.txt", "w")
file.write(company)
file.close
puts "done"
end
puts "done"
Calling File.open inside your loop truncates the file to zero length with each invocation. Instead, open the file outside your loop (using the block form):
File.open("exporterad.txt", "w") do |file|
doc.css(".gray-border-bottom").each do |item|
# ...
file.write(company)
# ...
end
end # <- file is closed automatically at the end of the block

Nokogiri and XPath: saving text result of scrape

I would like to save the text results of a scrape in a file. This is my current code:
require "rubygems"
require "open-uri"
require "nokogiri"
class Scrapper
attr_accessor :html, :single
def initialize(url)
download = open(url)
#page = Nokogiri::HTML(download)
#html = #page.xpath('//div[#class = "quoteText"andfollowing-sibling::div[1][#class = "quoteFooter" and .//a[#href and normalize-space() = "hard-work"]]]')
end
def get_quotes
#quotes_array = #html.collect {|node| node.text.strip}
#single = #quotes_array.each do |quote|
quote.gsub(/\s{2,}/, " ")
end
end
end
I know that I can write a file like this:
File.open('text.txt', 'w') do |fo|
fo.write(content)
but I don't know how to incorporate #single which holds the results of my scrape. Ultimate goal is to insert the information into a database.
I have come across some folks using Yaml but I am finding it hard to follow the step to step guide.
Can anyone point me in the right direction?
Thank you.
Just use:
#single = #quotes_array.map do |quote|
quote.squeeze(' ')
end
File.open('text.txt', 'w') do |fo|
fo.puts #single
end
Or:
File.open('text.txt', 'w') do |fo|
fo.puts #quotes_array.map{ |q| q.squeeze(' ') }
end
and don't bother creating #single.
Or:
File.open('text.txt', 'w') do |fo|
fo.puts #html.collect { |node| node.text.strip.squeeze(' ') }
end
and don't bother creating #single or #quotes_array.
squeeze is part of the String class. This is from the documentation:
" now is the".squeeze(" ") #=> " now is the"

Calling multiple methods on a CSV object

I have constructed an Event Manager class that performs parsing actions on a CSV file, and produces html letters using erb. It is part of a jumpstart labs tutorial
The program works fine, but I am unable to call multiple methods on an object without the earlier methods interfering with the later methods. As a result, I have opted to create multiple objects to call instance methods on, which seems like a clunky inelegant solution. Is there a better way to do this, where I can create a single new object and call methods on it?
Like so:
eventmg = EventManager.new("event_attendees.csv")
eventmg.print_valid_phone_numbers
eventmg_2 = EventManager.new("event_attendees.csv")
eventmg_2.print_zipcodes
eventmg_3 = EventManager.new("event_attendees.csv")
eventmg_3.time_targeter
eventmg_4 = EventManager.new("event_attendees.csv")
eventmg_4.day_of_week
eventmg_5 = EventManager.new("event_attendees.csv")
eventmg_5.create_thank_you_letters
The complete code is as follows
require 'csv'
require 'sunlight/congress'
require 'erb'
class EventManager
INVALID_PHONE_NUMBER = "0000000000"
Sunlight::Congress.api_key = "e179a6973728c4dd3fb1204283aaccb5"
def initialize(file_name, list_selections = [])
puts "EventManager Initialized."
#file = CSV.open(file_name, {:headers => true,
:header_converters => :symbol} )
#list_selections = list_selections
end
def clean_zipcode(zipcode)
zipcode.to_s.rjust(5,"0")[0..4]
end
def print_zipcodes
puts "Valid Participant Zipcodes"
#file.each do |line|
zipcode = clean_zipcode(line[:zipcode])
puts zipcode
end
end
def clean_phone(phone_number)
converted = phone_number.scan(/\d/).join('').split('')
if converted.count == 10
phone_number
elsif phone_number.to_s.length < 10
INVALID_PHONE_NUMBER
elsif phone_number.to_s.length == 11 && converted[0] == 1
phone_number.shift
phone_number.join('')
elsif phone_number.to_s.length == 11 && converted[0] != 1
INVALID_PHONE_NUMBER
else
phone_number.to_s.length > 11
INVALID_PHONE_NUMBER
end
end
def print_valid_phone_numbers
puts "Valid Participant Phone Numbers"
#file.each do |line|
clean_number = clean_phone(line[:homephone])
puts clean_number
end
end
def time_targeter
busy_times = Array.new(24) {0}
#file.each do |line|
registration = line[:regdate]
prepped_time = DateTime.strptime(registration, "%m/%d/%Y %H:%M")
prepped_time = prepped_time.hour.to_i
# inserts filtered hour into the array 'list_selections'
#list_selections << prepped_time
end
# tallies number of registrations for each hour
i = 0
while i < #list_selections.count
busy_times[#list_selections[i]] += 1
i+=1
end
# delivers a result showing the hour and the number of registrations
puts "Number of Registered Participants by Hour:"
busy_times.each_with_index {|counter, hours| puts "#{hours}\t#{counter}"}
end
def day_of_week
busy_day = Array.new(7) {0}
d_of_w = ["Monday:", "Tuesday:", "Wednesday:", "Thursday:", "Friday:", "Saturday:", "Sunday:"]
#file.each do |line|
registration = line[:regdate]
# you have to reformat date because of parser format
prepped_date = Date.strptime(registration, "%m/%d/%y")
prepped_date = prepped_date.wday
# adds filtered day of week into array 'list selections'
#list_selections << prepped_date
end
i = 0
while i < #list_selections.count
# i is minus one since days of week begin at '1' and arrays begin at '0'
busy_day[#list_selections[i-1]] += 1
i+=1
end
#busy_day.each_with_index {|counter, day| puts "#{day}\t#{counter}"}
prepared = d_of_w.zip(busy_day)
puts "Number of Registered Participants by Day of Week"
prepared.each{|date| puts date.join(" ")}
end
def legislators_by_zipcode(zipcode)
Sunlight::Congress::Legislator.by_zipcode(zipcode)
end
def save_thank_you_letters(id,form_letter)
Dir.mkdir("output") unless Dir.exists?("output")
filename = "output/thanks_#{id}.html"
File.open(filename,'w') do |file|
file.puts form_letter
end
end
def create_thank_you_letters
puts "Thank You Letters Available in Output Folder"
template_letter = File.read "form_letter.erb"
erb_template = ERB.new template_letter
#file.each do |line|
id = line[0]
name = line[:first_name]
zipcode = clean_zipcode(line[:zipcode])
legislators = legislators_by_zipcode(zipcode)
form_letter = erb_template.result(binding)
save_thank_you_letters(id,form_letter)
end
end
end
The reason you're experiencing this problem is because when you apply each to the result of CSV.open you're moving the file pointer each time. When you get to the end of the file with one of your methods, there is nothing for anyone else to read.
An alternative is to read the contents of the file into an instance variable at initialization with readlines. You'll get an array of arrays which you can operate on with each just as easily.
"Is there a better way to do this, where I can create a single new object and call methods on it?"
Probably. If your methods are interfering with one another, it means you're changing state within the manager, instead of working on local variables.
Sometimes, it's the right thing to do (e.g. Array#<<); sometimes not (e.g. Fixnum#+)... Seeing your method names, it probably isn't.
Nail the offenders down and adjust the code accordingly. (I only scanned your code, but those Array#<< calls on an instance variable, in particular, look fishy.)

Too much nesting in ruby?

Surely there must be a better way of doing this:
File.open('Data/Networks/to_process.txt', 'w') do |out|
Dir['Data/Networks/*'].each do |f|
if File.directory?(f)
File.open("#{f}/list.txt").each do |line|
out.puts File.basename(f) + "/" + line.split(" ")[0]
end
end
end
end
Cheers!
You can rid of 1 level of nesting by utilizing Guard Clause pattern:
File.open('Data/Networks/to_process.txt', 'w') do |out|
Dir['Data/Networks/*'].each do |f|
next unless File.directory?(f)
File.open("#{f}/list.txt").each do |line|
out.puts File.basename(f) + "/" + line.split(" ")[0]
end
end
end
See Jeff Atwood's article on this approach.
IMHO there's nothing wrong with your code, but you could do the directory globbing and the check from the if in one statement, saving one level of nesting:
Dir.glob('Data/Networks/*').select { |fn| File.directory?(fn) }.each do |f|
...
end
Since you're looking for a particular file in each of the directories, just let Dir#[] find them for you, completely eliminating the need to check for a directory. In addition, IO#puts will accept an array, putting each element on a new line. This will get rid of another level of nesting.
File.open('Data/Networks/to_process.txt', 'w') do |out|
Dir['Data/Networks/*/list.txt'] do |file|
dir = File.basename(File.dirname(file))
out.puts File.readlines(file).map { |l| "#{dir}/#{l.split.first}" }
end
end
Reducing the nesting a bit by separating the input from the output:
directories = Dir['Data/Networks/*'].find_all{|f| File.directory?(f)}
output_lines = directories.flat_map do |f|
output_lines_for_directory = File.open("#{f}/list.txt").map do |line|
File.basename(f) + "/" + line.split(" ")[0]
end
end
File.open('Data/Networks/to_process.txt', 'w') do |out|
out.puts output_lines.join("\n")
end

Resources