How to Get Specific Row Value From CSV? - ruby

I have a vertical CSV file that looks like this:
name,value
case,"123Case0001"
custodian,"Doe_John"
PDate,"10/30/2013"
I can read the file like this:
CSV.foreach("#{batch_File_Dir_cdata}", :quote_char => '"', :col_sep =>',', :row_sep =>:auto, :headers => true) do |record|
ev_info = record[0]
ev_val = record[1]
The problem is, I need to get a specific ev_val for just one specific ev_info. I could potentially use the row number, but foresight tells me that this could change. What will be the same is the name of information. I want to find the row with the specific information name and get that value.
When I do the foreach, it gets that value and then goes past it and leaves me with an empty variable, because it went on to the other rows.
Can anyone help?

You've got a lot of choices, but the easiest is to assign to a variable based on the contents, as in:
ev_info = record[0]
ev_val = record[1] if ev_info='special name'
Note, though, that you need to define whatever variable you are assigning to outside of the block as it will otherwise be created as a local variable and be inaccessible to you afterwards.
Alternatively, you can read in the entire array and then select the record you're interested in with index or select.

I'd do it something like:
require 'pp'
require 'csv'
ROWS_IN_RECORD = 4
data = []
File.open('test.dat', 'r') do |fi|
loop do
record = {}
ROWS_IN_RECORD.times do
row = fi.readline.parse_csv
record[row.first] = row.last
end
data << record
break if fi.eof?
end
end
pp data
Running that outputs:
[{"name"=>"value",
"case"=>"123Case0001",
"custodian"=>"Doe_John",
"PDate"=>"10/30/2013"},
{"name"=>"value_2",
"case"=>"123Case0001 2",
"custodian"=>"Doe_John 2",
"PDate"=>"10/30/2013 2"}]
It returns an array of hashes, so each hash is the record you'd normally get from CSV if the file was a normal CSV file.
There are other ways of breaking down the input file into logical groups, but this is scalable, with a minor change, to work on huge data files. For a huge file just process each record at the end of the loop instead of pushing it onto the data variable.

I got it to work. I original had the following:
CSV.foreach("#{batch_File_Dir_cdata}", :quote_char => '"', :col_sep =>',', :row_sep =>:auto, :headers => true) do |record|
ev_info = record[0]
c_val = record[1]
case when ev_info == "Custodian"
cust = cval
end
end
puts cust
what I needed to do was this:
CSV.foreach("#{batch_File_Dir_cdata}", :quote_char => '"', :col_sep =>',', :row_sep =>:auto, :headers => true) do |record|
ev_info = record[0]
case when ev_info == "Custodian"
c_val = record[1]
end
end
puts c_val

Related

Wrapping output of an array to CSV conversion in quotations in Ruby

What I'm wanting to find out is how to have every entry passed from the array to the CSV at the end of the program be wrapped by " "'s to allow Excel to read it correctly. I know this needs to be done before or during the "push" at line 34, but doing "streets.push('"'+street_name+'"')" results in every entry being surrounded by THREE quotation marks, which doesn't make much sense to me.
#!ruby.exe
require 'csv'
puts "Please enter a file name:" #user input file name (must be in same
folder as this file)
file = gets.chomp
begin
File.open(file, 'r')
rescue
print "Failed to open #{file}\n"
exit
end #makes sure that the file exists, if it does not it posts an error
data_file = File.new(file)
data = [] #initializes array for addresses from .csv
counter=0 #set counter up to allow for different sized files to be used
without issue
CSV.foreach(data_file, headers: true) do |row|
data << row.to_hash
counter+=1
end #goes through .csv one line ar a time
data.reject(&:empty?)
puts "Which column do you want to parse?"
column = gets.chomp
i=0
streets = []
while (i<counter)
address = data[i][column]
street_name = address.gsub(/^((\d[a-zA-Z])|[^a-zA-Z])*/, '')
streets.push(street_name)
i+=1
end
streets.reject(&:empty?)
puts "What do you want the output to be called?"
new_file = gets.chomp
CSV.open(new_file, "w", :write_headers=> true, :headers => [column]) do |hdr|
hdr << streets
end
You can pass the :force_quotes option to the CSV library to have it quote everything in the csv for you:
base_options = {headers: ['first,col', 'second column'], write_headers: true}
options = [{}, {force_quotes: true}]
data = [
['a', 'b'],
['c', 'd'],
['e', 'f']
]
options.each do |option|
result = CSV.generate(base_options.merge(option)) do |csv|
data.each do |datum|
csv << datum
end
end
puts "#{option}:\n#{result}"
end
For instance, in this small script, by default, the only thing that gets quoted is the first column header because it contains a comma. By passing in force_quotes: true, in the second pass though, everything gets quoted.
Output:
{}:
"first,col",second column
a,b
c,d
e,f
{:force_quotes=>true}:
"first,col","second column"
"a","b"
"c","d"
"e","f"
You can use map to process the array before putting it in csv.
streets.map!{|s| '"'+s+'"'}

Ruby- CSV merge field value

I started learning Ruby this weekend. I'm working on a script that is going to read a CSV file that has a Date field and a Time field, and merge the values into a new DateTime field written to the output.
What I have is partially working, but the problem I have is the Date and Time values are comma separated. I would like to remove the comma and replace it with a space. How can I remove the comma and merge the values together?
require 'csv'
CSV.open("output.csv", "wb", :headers => true) do |output|
CSV.foreach("input.csv", :headers => true, :return_headers => true) do |row|
if row.header_row?
output << (row << 'DateTime')
else
output << (row << row['Date'].to_s << (row['Time'].to_s))
end
end
end
You can use tr to replace contents in a string.
date = row['Date']
time = row['Time']
datetime = "#{date} #{time}".tr(',', ' ')
Something like this should help:
require 'csv'
CSV.open("output.csv", "wb", :headers => true) do |output|
output << 'DateTime'
CSV.foreach("input.csv", :headers => true, :return_headers => true, :header_converters => :symbolize) do |row|
output << ["#{row[:date] row[:time]}"]
end
end
The changes here represent this functionality:
:return_headers => true converts header field names to symbol, which can significantly improve performance for even moderate-length CSV files
Moved the header output outside the input CSV loop, as the headers can safely be written before any row data
Use the efficient column reference mechanism for row[:date] and row[:time]
wrote the row data as an array (of one element), consisting of the interpolated string containing both row[:date] and row[:time]

Ruby CSV: Comparison of columns (from two csvs), write new column in one

I've searched and haven't found a method for this particular conundrum. I have two CSV files of data that sometimes relate to the same thing. Here's an example:
CSV1 (500 lines):
date,reference,amount,type
10/13/2015,,1510.40,sale
10/13/2015,,312.90,sale
10/14/2015,,928.50,sale
10/15/2015,,820.25,sale
10/12/2015,,702.70,credit
CSV2 (20000 lines):
reference,date,amount
243534985,10/13/2015,312.90
345893745,10/15/2015,820.25
086234523,10/14/2015,928.50
458235832,10/13/2015,1510.40
My goal is to match the date and amount from CSV2 with the date and amount in CSV1, and write the reference from CSV2 to the reference column in the corresponding row.
This is a simplified view, as CSV2 actually contains many many more columns - these are just the relevant ones, so ideally I'd like to refer to them by header name or maybe index somehow?
Here's what I've attempted, but I'm a bit stuck.
require 'csv'
data1 = {}
data2 = {}
CSV.foreach("data1.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
data1[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
CSV.foreach("data2.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
data2[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end
data1.each do |data1_row|
data2.each do |data2_row|
if (data1_row['comparitive'] == data2_row['comparitive'])
puts data1_row['identifier'] + data2_row['column_thats_important_and_wanted']
end
end
end
Result:
22:in `[]': no implicit conversion of String into Integer (TypeError)
I've also tried:
CSV.foreach('data2.csv') do |data2|
CSV.foreach('data1.csv') do |data1|
if (data1[3] == data2[4])
data1[1] << data2[1]
puts "Change made!"
else
puts "nothing changed."
end
end
end
This however did not match anything inside the if statement, so perhaps not the right approach?
The headers method should help you match columns--from there it's a matter of parsing and writing the modified data back out to a file.
Solved.
data1 = CSV.read('data1.csv')
data2 = CSV.read('data2.csv')
data2.each do |data2|
data1.each do |data1|
if (data1[5] == data2[4])
data1[1] = data2[1]
puts "Change made!"
puts data1
end
end
end
File.open('referenced.csv','w'){ |f| f << data1.map(&:to_csv).join("")}

use array to iterate and parse other arrays CSV?

I've got a list of persons saved in an array and I want to loop a file with organizations looking for matches and save them but it keeps going wrong. I think I'm doing something wrong with the arrays.
This is exactly what I'm doing:
I have a list of persons in a file called 'personen_fixed.csv'.
I save that list into an array.
I have another file that also has the name of the people ("pers2"), but also three other interesting columns of data. I save the four columns into arrays.
I want to loop over the first array (the persons) and search for matches with the list of persons ("pers2").
If there is a match I want to save that row.
What I'm getting now is two rows of data, of which one is filled with ALL persons. See my code below. On the bottom i have some sample input data.
require 'csv'
array_pers1 = []
array_pers2 = []
array_orgaan = []
array_functie = []
array_rol = []
filename_1 = 'personen_fixed.csv'
CSV.foreach(filename_1, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
array_pers1 << row[0].to_s
end
filename_2 = 'Functies_fixed.csv'
CSV.foreach(filename_2, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
array_pers2 << row[1].to_s
array_orgaan << row[16].to_s
array_functie << row[17].to_s
array_rol << row[18].to_s
end
CSV.open("testrij.csv", "w") do |row|
row << ["rijnummer","link","ptext","soort_woonhuis"]
for rij in array_pers1
for x in 1...4426 do
if rij === array_pers2["#{x}".to_f]
pers2 = array_pers2["#{x}".to_f]
orgaan = array_orgaan["#{x}".to_f]
functie = array_functie["#{x}".to_f]
rol = array_rol["#{x}".to_f]
row << [pers2,orgaan,functie,rol]
else
pers2 = ""
orgaan = ""
functie = ""
rol = ""
end
end
end
end
input data for the first excel data (excel column name and first row of data):
person
someonesname
Input data for the second excel file:
person,organizationid,role,organization,function
someonesname,34971,member,americanairways,boardofdirectors
Since many of the people in the dataset have multiple jobs at different organizations, I want to save all them next to eachother (output I'm going for):
person,organization(1),function(1),role(1),organization(2),function(2),role(2) (max 5)
I don't understand the purpose of storing a single row from your Functies csv file in 4 separate arrays, and then combining them together later, so my answer doesn't tell you why your approach isn't working. Instead, I suggest a different approach that I believe is cleaner.
Building an array of names from the first file is ok. For the second file, I would store each row as an array and use a hash:
data = {
"name1 => ["name1", "orgaan1", "functie1", "rol1"],
"name2 => ["name2", "orgaan2", "functie2", "rol2"],
...
}
Building it might look like
data = {}
CSV.foreach(filename_2, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
name = row[1]
orgaan = row[16]
functie = row[17]
rol = row[18]
data[name] = [name, orgaan, functie, rol]
end
Then you would iterate over your first array and keep all the arrays that match
results = []
for name in array_pers1
results << data[name] if data.include?(name)
end
On the other hand, if you don't want to use a hash and insist on using arrays (perhaps because names are not unique), I would still store them like
data = [
["name1", "orgaan1", "functie1", "rol1"],
["name2", "orgaan2", "functie2", "rol2"]
]
And then during your search step you would just iterate like
results = []
for name in array_pers1
for row in data
results << row if row[0] == name
end
end

Parse CSV Data with Ruby

I am trying to return a specific cell value based on two criteria.
The logic:
If ClientID = 1 and BranchID = 1, puts SurveyID
Using Ruby 1.9.3, I want to basically look through an excel file and for two specific values located within the ClientID and BranchID column, return the corresponding value in the SurveyID column.
This is what I have so far, which I found during my online searches. It seemed promising, but no luck:
require 'csv'
# Load file
csv_fname = 'FS_Email_Test.csv'
# Key is the column to check, value is what to match
search_criteria = { 'ClientID' => '1',
'BranchID' => '1' }
options = { :headers => :first_row,
:converters => [ :numeric ] }
# Save `matches` and a copy of the `headers`
matches = nil
headers = nil
# Iterate through the `csv` file and locate where
# data matches the options.
CSV.open( csv_fname, "r", options ) do |csv|
matches = csv.find_all do |row|
match = true
search_criteria.keys.each do |key|
match = match && ( row[key] == search_criteria[key] )
end
match
end
headers = csv.headers
end
# Once matches are found, we print the results
# for a specific row. The row `row[8]` is
# tied specifically to a notes field.
matches.each do |row|
row = row[1]
puts row
end
I know the last bit of code following matches.each do |row| is invalid, but I left it in in hopes that it will make sense to someone else.
How can I write puts surveyID if ClientID == 1 & BranchID == 1?
You were very close indeed. Your only error was setting the values of the search_criteria hash to strings '1' instead of numbers. Since you have converters: :numeric in there the find_all was comparing 1 to '1' and getting false. You could just change that and you're done.
Alternatively this should work for you.
The key is the line
Hash[row].select { |k,v| search_criteria[k] } == search_criteria
Hash[row] converts the row into a hash instead of an array of arrays. Select generates a new hash that has only those elements that appear in search_criteria. Then just compare the two hashes to see if they're the same.
require 'csv'
# Load file
csv_fname = 'FS_Email_Test.csv'
# Key is the column to check, value is what to match
search_criteria = {
'ClientID' => 1,
'BranchID' => 1,
}
options = {
headers: :first_row,
converters: :numeric,
}
# Save `matches` and a copy of the `headers`
matches = nil
headers = nil
# Iterate through the `csv` file and locate where
# data matches the options.
CSV.open(csv_fname, 'r', options) do |csv|
matches = csv.find_all do |row|
Hash[row].select { |k,v| search_criteria[k] } == search_criteria
end
headers = csv.headers
end
p headers
# Once matches are found, we print the results
# for a specific row. The row `row[8]` is
# tied specifically to a notes field.
matches.each { |row| puts row['surveyID'] }
Possibly...
require 'csv'
b_headers = false
client_id_col = 0
branch_id_col = 0
survey_id_col = 0
CSV.open('FS_Email_Test.csv') do |file|
file.find_all do |row|
if b_headers == false then
client_id_col = row.index("ClientID")
branch_id_col = row.index("BranchID")
survey_id_col = row.index("SurveyID")
b_headers = true
if branch_id_col.nil? || client_id_col.nil? || survey_id_col.nil? then
puts "Invalid csv file - Missing one of these columns (or no headers):\nClientID\nBranchID\nSurveyID"
break
end
else
puts row[survey_id_col] if row[branch_id_col] == "1" && row[client_id_col] == "1"
end
end
end

Resources