Ruby: How to iterate through a hash created from a csv file - ruby

I am trying to take an existing CSV file, add a fourth row to it, and then iterate through the second and third row to create the fourth rows values. Using Ruby I've created hashes where the headers are the keys and the column values are the hash values (ex: "id"=>"1", "new_fruit" => "apple")
My practice CSV file looks like this:practice csv file image
My goal is to create a fourth column: "brand_new" (which I was able to do) and then add values to it by concatenating the values from the second and third row (which I am stuck on). At the moment I just have a placement value (x) for the fourth columns values so I could see if adding the fourth column to the hash actually worked: Results with x = 1
Here is my code:
require 'csv'
def self.import
table = []
CSV.foreach(File.path("practice.csv"), headers: true) do |row|
table.each do |row|
row["brand_new"] = full_name
end
table << row.to_h
end
table
end
def full_name
x = 1
return x
end
# Add another col, row by row:
import.each do |row|
row["brand_new"] = full_name
end
puts import
Any suggestions or guidance would be much appreciated. Thank you.

Simplified your code a bit. I read the file first, then iterate about the read content.
Note: Change col_sep to comma or delete it to use the default if needed.
require "csv"
def self.import
table = CSV.read("practice.csv", headers: true , col_sep: ";")
table.each do |row|
row["brand_new"] = "#{row["old_fruit"]} #{row["new_fruit"]}"
end
puts table
end
I use the read method to read the CSV file content. It allows you to directly access the column/cell values.
Line 7 shows how to concatenate the column values as string:
"#{row["old_fruit"]} #{row["new_fruit"]}"
Refer to this old SO post and the CSV Ruby docs to learn more about working with CSV files.

Related

Fetching second row from csv file in Ruby [duplicate]

This question already has answers here:
Ignore header line when parsing CSV file
(6 answers)
Closed 4 years ago.
actual_row = File.open(file_name[0], 'r')
first_row_data = []
CSV.foreach(actual_row) do |row|
first_row_data << row[1]
end
puts first_row_data
With this I am trying to fetch the second row of CSV but it is printing the second column instead.
The foreach method returns an enumerator if no block is given, which allows you to use methods such as drop from Enumerable:
# outputs all rows after the first
CSV.foreach('test.csv').drop(1).each { |row| puts row.inspect }
To limit to just one row, we can then take:
# outputs only the second row
CSV.foreach('test.csv').drop(1).take(1).each { |row| puts row.inspect }
But, we're still parsing the entire file and just discarding most of it. Luckily, we can add lazy into the mix:
# outputs only the second row, parsing only the first 2 rows of the file
CSV.foreach('test.csv').lazy.drop(1).take(1).each { |row| puts row.inspect }
But, if the first row is a header row, don't forgot you can tell CSV about it:
# outputs only the second row, as a CSV::Row, only parses 2 rows
CSV.foreach('test.csv', headers: true).take(1).each { |row| puts row.inspect }
As an aside (in case I did this wrong), it looks like the shift method is what CSV is using for parsing the rows, so I just added:
class CSV
alias :orig_shift :shift
def shift
$stdout.puts "shifting row"
orig_shift
end
end
and ran with a sample csv to see how many times "shifting row" was output for each of the examples.
If you'd like the entire row, you should change
row[1]
to just
row
row[1] is grabbing the second column's value of the entire row. Each column value is stored sequentially in the row variable. You can see this directly in your console if you print
puts row.inspect
If you want just the second row, you can try something like this:
actual_row = File.open(file_name[0], 'r')
first_row_data = []
CSV.foreach(actual_row) do |row|
if $. == 1
first_row_data << row
end
end
puts first_row_data
You can learn more about $. and similar variables here: https://docs.ruby-lang.org/en/2.4.0/globals_rdoc.html

How to create a new CSV row of data per X amount of strings in an array

I'm trying to create a spreadsheet from an array.
#Loop through each .olpOffer (product listing) and gather content from various elements
parse_page.css('.olpOffer').each do |a|
if a.css('.olpSellerName img').empty?
seller = a.css('.olpSellerName').text.strip
else
seller = a.css('.olpSellerName img').attr('alt').value
end
offer_price = a.css('.olpOfferPrice').text.strip
prime = a.css('.supersaver').text.strip
shipping_info = a.css('.olpShippingInfo').text.strip.squeeze(" ").gsub!(/(\n)/, '')
condition = a.css('.olpCondition').text.strip
fba = "FBA" unless a.css('.olpBadge').empty?
#Push data from each product listing into array
arr.push(seller,offer_price,prime,shipping_info,condition,fba)
end
#Need to make each product listing's data begin in new row [HELP!!]
CSV.open("file.csv", "wb") do |csv|
csv << ["Seller", "Price", "Prime", "Shipping", "Condition", "FBA"]
end
end
I need to reset the row that the array is writing to after the "FBA" column so that I don't end up with one huge row of data in row 2.
I can't figure out how to correlate each string to a specific column header. Should I not use an array?
I figured it out. I needed the array that I was feeding into my csv to create a new row after every 7 strings in the array. Here's how I did it:
arr = an array that has some given amount of strings, always divisible by 7
rows = arr.each_slice(7)
CSV.open("#{file_name}", "ab") do |csv|
csv << [title, asin]
rows.each do |row|
csv << row
end
end

Ruby: Write to CSV if condition met

I am brand new to Ruby and using it to try to read/write to csv. So far, I have a script that does the following:
Imports data from a CSV file, storing select columns as a separate array (I don't need data from every column)
Performs calculations on the data, stores the results in newly created arrays
Transposes the arrays to table rows, to be outputted to a csv
table = [Result1, Result2, Result3].transpose
Currently, I am able to output the table using the following:
CSV.open(resultsFile, "wb",
:write_headers=> true,
:headers => ["Result1", "Result2", "Result3"]
) do |csv|
table.each do |row|
csv << row
end
My question is, how can I add a conditional to only output rows where one of the results equals a certain text string. For example, if the value in result2 is equal to "Apple", I want the data in that row to be written to the csv file. If not, then skip that row.
I've tried placing if/else in a few different areas and have not had any success.
Thanks for any help
You could do something like below:
header = ["Result1", "Result2", "Result3"]
CSV.open(resultsFile, "wb", :write_headers=> true, :headers => header) do |csv|
table.each do |row|
csv << row if header.zip(row).to_h["Result2"] == "Apple"
end
end
zip merges two arrays and produces array of arrays where each sub-array has element from input arrays at same index, and to_h can convert any array of 2-element arrays into hash. For example:
row = ["Orange", "Apple", "Guava"]
header = ["Result1", "Result2", "Result3"]
header.zip(row).to_h
=> {"Result1"=>"Orange", "Result2"=>"Apple", "Result3"=>"Guava"}

Open CSV without reading header rows in Ruby

I'm opening CSV using Ruby:
CSV.foreach(file_name, "r+") do |row|
next if row[0] == 'id'
update_row! row
end
and I don't really care about headers row.
I don't like next if row[1] == 'id' inside loop. Is there anyway to tell CSV to skip headers row and just iterate through rows with data ?
I assume provided CSVs always have a header row.
There are a few ways you could handle this. The simplest method would be to pass the {headers: true} option to your loop:
CSV.foreach(file_name, headers: true) do |row|
update_row! row
end
Notice how there is no mode specified - this is because according to the documentation, CSV::foreach takes only the file and options hash as its arguments (as opposed to, say, CSV::open, which does allow one to specify mode.
Alternatively, you could read the data into an array (rather than using foreach), and shift the array before iterating over it:
my_csv= CSV.read(filename)
my_csv.shift
my_csv.each do |row|
update_row! row
end
According to Ruby doc:
options = {:headers=>true}
CSV.foreach(file_name, options) ...
should suffice.
A simple thing to do that works when reading files line-by-line is:
CSV.foreach(file_name, "r+") do |row|
next if $. == 1
update_row! row
end
$. is a global variable in Ruby that contains the line-number of the file being read.

How do I merge two CSV's with nearly identical sets using rules for which data is kept? (Using Ruby & FasterCSV)

I have two csv files, each with 13 columns.
The first column of each row contains a unique string. Some are duplicated in each file, some only exist in one file.
If the row exists in only one file I want to keep it in the new file.
If it exists in both I want to keep the one that has a certain value (or lacks a certain value) in a certain column of that same row.
For example:
file 1:
D600-DS-1991, name1, address1, date1
D601-DS-1991, name2, address2, date2
D601-DS-1992, name3, address3, date3
file 2:
D600-DS-1991, name1, address1, time1
D601-DS-1992, dave1, address2, date2
I would keep the first row of the first file because the fourth column contains date instead of time.
I would keep the second row of the first file since its first column, first row value is unique.
I would keep the second row of the second file as the third row of the new file because it contains text other than "name#" in the second column.
Should I first map all of the unique values to one another so that each file contains the same number of entries - even if some are blank or just have filler data?
I only know a little ruby and python... but I much prefer to solve this with a single Ruby file if at all possible since I will be able to understand the code better. If you can't do it in Ruby then please feel free to answer differently!
I'm not super happy with my solution but it works:
require 'csv'
def readcsv(filename)
csv = {}
CSV.foreach(filename) do |line|
csv[line[0]] = { name: line[1], address: line[2], date: line[3] }
end
csv
end
csv1 = readcsv('orders1.csv')
csv2 = readcsv('orders2.csv')
results = {}
csv1.each do |id, val|
unless csv2[id]
results[id] = val # checks to see if it only exists in 1 file
next
end
#see if name exists
if (val[:name] =~ /name/) && (csv2[id]) && (csv2[id][:name] =~ /name/).nil?
csv1.delete(id)
end
#missing some if statement regarding date vs. time
end
results = results.merge(csv2) # merge together whatever is remaining
CSV.open('newfile.csv', 'w') do |csv|
results.each do |key, val|
row = []
csv << (row.push(key, val.values)).flatten
end
end
Output of newfile.csv :
D601-DS-1991, name2, address2, date2
D600-DS-1991, name1, address1, time1
D601-DS-1992, dave1, address2, date2
I won't give you the complete code but here's a general approach to such a problem:
require 'csv'
# list of csv files to read
files = ['a.csv', 'b.csv']
# used to resolve conflicts when we have a existing entry with same id
# here, we prefer the new entry if its fourth column starts with `'date'`
# this also means that the last file in the list above wins if both entries are valid.
def resolve_conflict(existing_entry, new_entry)
if new_entry[3].start_with? 'date'
new_entry
else
existing_entry
end
end
# keep a hash of entries, with the unique id as key.
# we use this id to detect duplicate entries later on.
entries = {}
CSV.foreach(file) do |new_entry|
# get id (first column) from row
id = new_entry[0]
# see if we have a conflicting entry
existing_entry = entries[id]
if existing_entry.nil?
# no conflict, just save the row
entries[id] = new_entry
else
# resolve conflict and save that
entries[id] = resolve_conflict(existing_entry, new_entry)
end
end
# now all conflicts are resolved
# note that stale rows from the first file could now be in the result
# you might want to filter them out as well
# we can now build a new csv file with the result
CSV.open("result.csv", "w") do |csv|
entries.values.each do |row|
csv << row
end
end

Resources