Ruby CSV enumeration confusion - ruby

How come this does not work? The CSV is there and has values, and I have 'require "csv" and time at the top, so good there. The problem seems to be with csv.each actually doing anything.
It returns
=> [] is the most common registration hour
=> [] is the most common registration day (Sunday being 0, Mon => 1 ... Sat => 7)
If there is any more info I can provide, please let me know.
#x = CSV.open \
'event_attendees.csv', headers: true, header_converters: :symbol
def time_target
y = []
#x.each do |line|
if line[:regdate].to_s.length > 0
y << DateTime.strptime(line[:regdate], "%m/%d/%y %H:%M").hour
y = y.sort_by {|i| grep(i).length }.last
end
end
puts "#{y} is the most common registration hour"
y = []
#x.each do |line|
if line[:regdate].to_s.length > 0
y << DateTime.strptime(line[:regdate], "%m/%d/%y %H:%M").wday
y = y.sort_by {|i| grep(i).length }.last
end
end
puts "#{y} is the most common registration day \
(Sunday being 0, Mon => 1 ... Sat => 7)"
end
making all the 'y's '#y's has not fixed it.
Here is sample from the CSV I'm using:
,RegDate,first_Name,last_Name,Email_Address,HomePhone,Street,City,State,Zipcode
1,11/12/08
10:47,Allison,Nguyen,arannon#jumpstartlab.com,6154385000,3155 19th St
NW,Washington,DC,20010
2,11/12/08
13:23,SArah,Hankins,pinalevitsky#jumpstartlab.com,414-520-5000,2022
15th Street NW,Washington,DC,20009
3,11/12/08 13:30,Sarah,Xx,lqrm4462#jumpstartlab.com,(941)979-2000,4175
3rd Street North,Saint Petersburg,FL,33703

Try this to load your data:
def database_load(arg='event_attendees.csv')
#contents = CSV.open(arg, headers: true, header_converters: :symbol)
#people = []
#contents.each do |row|
person = {}
person["id"] = row[0]
person["regdate"] = row[:regdate]
person["first_name"] = row[:first_name].downcase.capitalize
person["last_name"] = row[:last_name].downcase.capitalize
person["email_address"] = row[:email_address]
person["homephone"] = PhoneNumber.new(row[:homephone].to_s)
person["street"] = row[:street]
person["city"] = City.new(row[:city]).clean
person["state"] = row[:state]
person["zipcode"] = Zipcode.new(row[:zipcode]).clean
#people << person
end
puts "Loaded #{#people.count} Records from file: '#{arg}'..."
end

Related

Ruby - reading CSV from STDIN

I'm trying to read from .CSV file and create objects with attributes of every row.
My code works fine:
def self.load_csv
puts "Name of a file?"
filename = STDIN.gets.chomp
rows = []
text = File.read(filename).gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
row = row.to_h
row = row.each_with_object({}){|(k,v), h| h[k.to_sym] = v}
rows << row
end
rows.map do |row|
Call.new(row)
end
end
end
Now I wanted to take filename from STDIN. I simply changed:
def self.load_csv(filename)
rows = []
text = File.read(filename).gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
row = row.to_h
row = row.each_with_object({}){|(k,v), h| h[k.to_sym] = v}
rows << row
end
rows.map do |row|
Call.new(row)
end
end
end
and when I try ruby program.rb filename.csv I got error no implicit conversion of String into IO, and after removing line with File.read it does nothing - like an infinite loop maybe? Of course I invoke ceratain methods with STDIN argument in different parts of the code. I used similiar code for reading from STDIN with success in the past, what am I doing wrong this time?
This code is working:
require 'csv'
class Call
def initialize(args)
end
end
def load_csv(filename)
rows = []
text = File.read(filename).gsub(/\\"/,'""')
CSV.parse(text, headers: true, header_converters: :symbol) do |row|
row = row.to_h
row = row.each_with_object({}){ |(k,v), h| h[k.to_sym] = v }
rows << row
end
rows.map { |row| Call.new(row) }
end
filename = ARGV[0]
load_csv(filename)

Ruby how to merge two CSV files with slightly different headers

I have two CSV files with some common headers and others that only appear in one or in the other, for example:
# csv_1.csv
H1,H2,H3
V11,V22,V33
V14,V25,V35
# csv_2.csv
H1,H4
V1a,V4b
V1c,V4d
I would like to merge both and obtain a new CSV file that combines all the information for the previous CSV files. Injecting new columns when needed, and feeding the new cells with null values.
Result example:
H1,H2,H3,H4
V11,V22,V33,
V14,V25,V35,
V1a,,,V4b
V1c,,,V4d
Challenge accepted :)
#!/usr/bin/env ruby
require "csv"
module MergeCsv
class << self
def run(csv_paths)
csv_files = csv_paths.map { |p| CSV.read(p, headers: true) }
merge(csv_files)
end
private
def merge(csv_files)
headers = csv_files.flat_map(&:headers).uniq.sort
hash_array = csv_files.flat_map(&method(:csv_to_hash_array))
CSV.generate do |merged_csv|
merged_csv << headers
hash_array.each do |row|
merged_csv << row.values_at(*headers)
end
end
end
# Probably not the most performant way, but easy
def csv_to_hash_array(csv)
csv.to_a[1..-1].map { |row| csv.headers.zip(row).to_h }
end
end
end
if(ARGV.length == 0)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2>"
exit 1
end
puts MergeCsv.run(ARGV)
I have the answer, I just wanted to help people that is looking for the same solution
require "csv"
module MergeCsv
def self.run(csv_1_path, csv_2_path)
merge(File.read(csv_1_path), File.read(csv_2_path))
end
def self.merge(csv_1, csv_2)
csv_1_table = CSV.parse(csv_1, :headers => true)
csv_2_table = CSV.parse(csv_2, :headers => true)
return csv_2_table.to_csv if csv_1_table.headers.empty?
return csv_1_table.to_csv if csv_2_table.headers.empty?
headers_in_1_not_in_2 = csv_1_table.headers - csv_2_table.headers
headers_in_1_not_in_2.each do |header_in_1_not_in_2|
csv_2_table[header_in_1_not_in_2] = nil
end
headers_in_2_not_in_1 = csv_2_table.headers - csv_1_table.headers
headers_in_2_not_in_1.each do |header_in_2_not_in_1|
csv_1_table[header_in_2_not_in_1] = nil
end
csv_2_table.each do |csv_2_row|
csv_1_table << csv_1_table.headers.map { |csv_1_header| csv_2_row[csv_1_header] }
end
csv_1_table.to_csv
end
end
if(ARGV.length != 2)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2>"
exit 1
end
puts MergeCsv.run(ARGV[0], ARGV[1])
And execute it from the console this way:
$ ruby merge_csv.rb csv_1.csv csv_2.csv
Any other, maybe cleaner, solution is welcome.
Simplied first answer:
How to use it:
listPart_A = CSV.read(csv_path_A, headers:true)
listPart_B = CSV.read(csv_path_B, headers:true)
listPart_C = CSV.read(csv_path_C, headers:true)
list = merge(listPart_A,listPart_B,listPart_C)
Function:
def merge(*csvs)
headers = csvs.map {|csv| csv.headers }.flatten.compact.uniq.sort
csvs.flat_map(&method(:csv_to_hash_array))
end
def csv_to_hash_array(csv)
csv.to_a[1..-1].map do |row|
Hash[csv.headers.zip(row)]
end
end
I had to do something very similar
to merge n CSV files that the might share some of the columns but some may not
if you want to keep a structure and do it easily,
I think the best way is to convert to hash and then re-convert to CSV file
my solution:
#!/usr/bin/env ruby
require "csv"
def join_multiple_csv(csv_path_array)
return nil if csv_path_array.nil? or csv_path_array.empty?
f = CSV.parse(File.read(csv_path_array[0]), :headers => true)
f_h = {}
f.headers.each {|header| f_h[header] = f[header]}
n_rows = f.size
csv_path_array.shift(1)
csv_path_array.each do |csv_file|
curr_csv = CSV.parse(File.read(csv_file), :headers => true)
curr_h = {}
curr_csv.headers.each {|header| curr_h[header] = curr_csv[header]}
new_headers = curr_csv.headers - f_h.keys
exist_headers = curr_csv.headers - new_headers
new_headers.each { |new_header|
f_h[new_header] = Array.new(n_rows) + curr_csv[new_header]
}
exist_headers.each {|exist_header|
f_h[exist_header] = f_h[exist_header] + curr_csv[exist_header]
}
n_rows = n_rows + curr_csv.size
end
csv_string = CSV.generate do |csv|
csv << f_h.keys
(0..n_rows-1).each do |i|
row = []
f_h.each_key do |header|
row << f_h[header][i]
end
csv << row
end
end
return csv_string
end
if(ARGV.length < 2)
puts "Use: ruby merge_csv.rb <file_path_csv_1> <file_path_csv_2> .. <file_path_csv_n>"
exit 1
end
csv_str = join_multiple_csv(ARGV)
f = File.open("results.csv", "w")
f.write(csv_str)
puts "CSV merge is done"

Ruby Array Variable Reference Lost During Loop

I am writing a parsing routine in Ruby 2.1 for a spreadsheet. The code works properly through the first array of pricing data. Unfortunately, on the fifth loop through datatable, while processing the second set of pricing data, the variable termtable is not set, even though #tmptermtables is modified by the shift method in this statement on line 72: termtable = #tmptermtables.shift if termtable.empty? This is possibly a scope problem and I am hoping someone can explain to me why the reference is lost.
Below is a copy of the code. Thank you in advance for lending me your brain.
def sp_parser()
begin
pricing_date = Date.new(2014,2,7)
tz = DateTime.parse(Time.now.to_s).strftime('%z')
expires = DateTime.new(pricing_date.year,pricing_date.mon,pricing_date.mday,17,00,00,tz)
datatable = Array.new
datatable << ["Zone","Business - Low Load Factor",nil,nil,nil,"Business - Medium Load Factor",nil,nil,nil,"Business - High Load Factor",nil,nil,nil]
datatable << [nil,6,9,12,15,6,9,12,15,6,9,12,15]
datatable << [nil,"Daily Pricing",nil,nil,nil,"Daily Pricing",nil,nil,nil,"Daily Pricing",nil,nil,nil]
datatable << ["COAST",6.41,6.55,6.19,6.01,6.07,6.18,5.88,5.74,5.63,5.71,5.48,5.37]
datatable << ["NORTH",6.58,6.74,6.35,6.15,6.02,6.13,5.85,5.68,5.61,5.68,5.47,5.33]
datatable << [nil,3/1/2014,nil,nil,nil,3/1/2014,nil,nil,nil,3/1/2014,nil,nil,nil]
datatable << ["COAST",7.08,6.53,6.20,6.00,6.63,6.17,5.89,5.73,6.06,5.69,5.49,5.36]
datatable << ["NORTH",7.34,6.72,6.36,6.13,6.60,6.10,5.86,5.66,6.06,5.65,5.48,5.31]
loadprofiles = Array.new
termtables = Array.new
pvalue = 0
load_factor_found = false
daily_pricing_found = false
dataset = []
datatable.each_index {|row|
record = datatable[row]
termtable = Array.new
#tmptermtables = Array.new(termtables)
#tmploadprofiles = Array.new(loadprofiles)
record.each_index {|col|
val = record[col]
## Build the load profile table
loadprofiles << "LOW" if val.to_s.downcase.match(/ low/)
loadprofiles << "MEDIUM" if val.to_s.downcase.match(/medium/)
loadprofiles << "HIGH" if val.to_s.downcase.match(/high/)
load_factor_found = true if val.to_s.downcase.match(/load factor/)
daily_pricing_found = true if val.to_s.downcase.match(/daily pricing/)
## Build the term tables for each load profile
if load_factor_found and !daily_pricing_found
isinteger = val.is_a? Integer
if isinteger
cvalue = val
if cvalue > pvalue
termtable << cvalue
pvalue = cvalue
termtables << termtable if col == record.length - 1
else
unless termtable.empty?
termtables << termtable
termtable = []
termtable << cvalue
pvalue = cvalue
end
end
else
cvalue = 0
end
end
if daily_pricing_found
#start_date = pricing_date if val.to_s.downcase.match(/daily pricing/)
#start_date = val if val.is_a? Date
#zone = "CenterPoint" if val.to_s.downcase.match(/coast/)
#zone = "Oncor" if val.to_s.downcase.match(/north/)
if val.is_a? Float
#load = #tmploadprofiles.shift if termtable.empty?
# Here is where it breaks
termtable = #tmptermtables.shift if termtable.empty?
term = termtable.shift unless termtable.empty?
price = (val/100).round(4)
r = {
:loaded => Time.now,
:start => #start_date,
:load => #load,
:term => term,
:zone => #zone,
:price => price,
:expiration => expires,
:product => "Fixed"
}
dataset << r
end
end
}
}
return dataset
rescue => err
puts "\n" + DateTime.parse(Time.now.to_s).strftime("%Y-%m-%d %r") + " Exception: #{__callee__} in #{__FILE__} generated an error: #{err}\n"
err
end
end
x = sp_parser()

How do I customize the spreadsheet gem/output?

I have a program using the spreadsheet gem to create a CSV file; I have not been able to find the way to configure the functionality that I need.
This is what I would like the gem to do: The model number and additional_image field should be "in sync", that is, each additional image written to the spreadsheet doc should be a new line and should not be wrapped.
Here are some snippets of the desired output in contrast with the current. These fields are defined by XPath objects that are screen scraped using another gem. The program won't know for sure how many objects it will encounter in the additional image field but due to business logic the number of objects in the additional image field should mirror the number of model number objects that are written to the spreadsheet.
model
168868837a
168868837a
168868837a
168868837a
168868837a
168868837a
additional_image
1688688371.jpg
1688688372.jpg
1688688373.jpg
1688688374.jpg
1688688375.jpg
1688688376.jpg
This is the current code:
require "capybara/dsl"
require "spreadsheet"
require "fileutils"
require "open-uri"
LOCAL_DIR = 'data-hold/images'
FileUtils.makedirs(LOCAL_DIR) unless File.exists?LOCAL_DIR
Capybara.run_server = false
Capybara.default_driver = :selenium
Capybara.default_selector = :xpath
Spreadsheet.client_encoding = 'UTF-8'
class Tomtop
include Capybara::DSL
def initialize
#excel = Spreadsheet::Workbook.new
#work_list = #excel.create_worksheet
#row = 0
end
def go
visit_main_link
end
def retryable(options = {}, &block)
opts = { :tries => 1, :on => Exception }.merge(options)
retry_exception, retries = opts[:on], opts[:tries]
begin
return yield
rescue retry_exception
retry if (retries -= 1) > 0
end
yield
end
def visit_main_link
retryable(:tries => 1, :on => OpenURI::HTTPError) do
visit "http://www.example.com/clothing-accessories?dir=asc&limit=72&order=position"
results = all("//h5/a[contains(#onclick, 'analyticsLog')]")
item = []
results.each do |a|
item << a[:href]
end
item.each do |link|
visit link
save_item
end
#excel.write "inventory.csv"
end
end
def save_item
data = all("//*[#id='content-wrapper']/div[2]/div/div")
data.each do |info|
#work_list[#row, 0] = info.find("//*[#id='productright']/div/div[1]/h1").text
price = info.first("//div[contains(#class, 'price font left')]")
#work_list[#row, 1] = (price.text.to_f * 1.33).round(2) if price
#work_list[#row, 2] = info.find("//*[#id='productright']/div/div[11]").text
#work_list[#row, 3] = info.find("//*[#id='tabcontent1']/div/div").text.strip
color = info.all("//dd[1]//select[contains(#name, 'options')]//*[#price='0']")
#work_list[#row, 4] = color.collect(&:text).join(', ')
size = info.all("//dd[2]//select[contains(#name, 'options')]//*[#price='0']")
#work_list[#row, 5] = size.collect(&:text).join(', ')
model = File.basename(info.find("//*[#id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
#work_list[#row, 6] = model.gsub!(/\D/, "")
#work_list[#row, 7] = File.basename(info.find("//*[#id='content-wrapper']/div[2]/div/div/div[1]/div[1]/a")['href'])
additional_image = info.all("//*[#rel='lightbox[rotation]']")
#work_list[#row, 8] = additional_image.map { |link| File.basename(link['href']) }.join(', ')
images = imagelink.map { |link| link['href'] }
images.each do |image|
File.open(File.basename("#{LOCAL_DIR}/#{image}"), 'w') do |f|
f.write(open(image).read)
end
end
#row = #row + 1
end
end
end
tomtop = Tomtop.new
tomtop.go
I would like this to do two things that I'm not sure how to do:
Each additional image should print to a new line (currently it prints all in one cell).
I would like the model field to be duplicated exactly as many times as there are additional_images in the same new line manner.
Use the CSV gem. I took the long way of writing this so you can see how it works.
require 'csv'
DOC = "file.csv"
profile = []
profile[0] = "model"
CSV.open(DOC, "a") do |me|
me << profile
end
img_url = ['pic_1.jpg','pic_2.jpg','pic_3.jpg','pic_4.jpg','pic_5.jpg','pic_6.jpg']
a = 0
b = img_url.length
while a < b
profile = []
profile[0] = img_url[a]
CSV.open(DOC, "a") do |me|
me << profile
end
a += 1
end
The csv file should look like this
model
pic_1.jpg
pic_2.jpg
pic_3.jpg
pic_4.jpg
pic_5.jpg
pic_6.jpg
for your last question
whatever = []
whatever = temp[1] + " " + temp[2]
profile[x] = whatever
OR
profile[x] = temp[1] + " " + temp[2]
NIL error in array
if temp[2] == nil
profile[x] = temp[1]
else
profile[x] = temp[1] + " " + temp[2]
end

How to create an infinite enumerable of Times?

I want to be able to have an object extend Enumerable in Ruby to be an infinite list of Mondays (for example).
So it would yield: March 29, April 5, April 12...... etc
How can I implement this in Ruby?
In 1.9 (and probably previous versions using backports), you can easily create enumerator:
require 'date'
def ndays_from(from, step=7)
Enumerator.new {|y|
loop {
y.yield from
from += step
}
}
end
e = ndays_from(Date.today)
p e.take(5)
#=> [#<Date: 2010-03-25 (4910561/2,0,2299161)>, #<Date: 2010-04-01 (4910575/2,0,2299161)>, #<Date: 2010-04-08 (4910589/2,0,2299161)>, #<Date: 2010-04-15 (4910603/2,0,2299161)>, #<Date: 2010-04-22 (4910617/2,0,2299161)>]
Store a Date as instance variable, initialized to a Monday. You would implement an each method which increments the stored date by 7 days using date += 7.
You could do something by extending Date...
#!/usr/bin/ruby
require 'date'
class Date
def current_monday
self - self.wday + 1
end
def next_monday
self.current_monday + 7
end
end
todays_date = Date.today
current_monday = todays_date.current_monday
3.times do |i|
puts current_monday.to_s
current_monday = current_monday.next_monday
end
2010-03-22
2010-03-29
2010-04-05
2010-04-12
...with the usual warnings about extending base classes of course.
You can extend Date class with nw method mondays
class Date
def self.mondays(start_date=Date.today, count=10)
monday = start_date.wday > 1 ? start_date - start_date.wday + 8 : start_date - start_date.wday + 1
mondays = []
count.times { |i| mondays << monday + i*7}
mondays
end
end
Date.mondays will return by default Array of mondays with 10 elements from closest monday to Date.today. You can pass parameters:
Date.mondays(start_date:Date, count:Integer)
start_date - start point to find closest monday
count - number of mondays you are looking
IE:
Date.mondays(Date.parse('11.3.2002'))
Date.mondays(Date.parse('11.3.2002'), 30)
module LazyEnumerable
extend Enumerable
def select(&block)
lazily_enumerate { |enum, value| enum.yield(value) if
block.call(value) }
end
def map(&block)
lazily_enumerate {|enum, value| enum.yield(block.call(value))}
end
def collect(&block)
map(&block)
end
private
def lazily_enumerate(&block)
Enumerator.new do |enum|
self.each do |value|
block.call(enum, value)
end
end
end
end
...........
class LazyInfiniteDays
include LazyEnumerable
attr_reader :day
def self.day_of_week
dow = { :sundays => 0, :mondays => 1, :tuesdays => 2, :wednesdays =>
3, :thursdays => 4, :fridays => 5, :saturdays => 6, :sundays => 7 }
dow.default = -10
dow
end
DAY_OF_WEEK = day_of_week()
def advance_to_midnight_of_next_specified_day(day_sym)
year = DateTime.now.year
month = DateTime.now.month
day_of_month = DateTime.now.day
output_day = DateTime.civil(year, month, day_of_month)
output_day += 1 until output_day.wday == DAY_OF_WEEK[day_sym]
output_day
end
def initialize(day_sym)
#day = advance_to_midnight_of_next_specified_day(day_sym)
end
def each
day = #day.dup
loop {
yield day
day += 7
}
end
def ==(other)
return false unless other.kind_of? LazyInfiniteDays
#day.wday == other.day.wday
end
end
Ruby 2.7 introduced Enumerator#produce for creating an infinite enumerator from any block, which results in a very elegant, very functional way of implementing the original problem:
irb(main):001:0> require 'date'
=> true
irb(main):002:0> puts Date.today
2022-09-23
=> nil
irb(main):003:0> Date.today.friday?
=> true
irb(main):004:0> future_mondays = Enumerator.produce { |date|
date = (date || Date.today).succ
date = date.succ until date.monday?
date
}
=> #<Enumerator: #<Enumerator::Producer:0x00007fa4300b3070>:each>
irb(main):005:0> puts future_mondays.first(5)
2022-09-26
2022-10-03
2022-10-10
2022-10-17
2022-10-24
=> nil
irb(main):006:0> _

Resources