Read complete CSV cell data for complicated strings - ruby

I am using ruby to merge CSV files that might contain different headers.
my problem is that some of the values in the CSV files are quite complicated and when data get lost in the merge process
for example the original value: "[cell([""A"",""B""]),""X""+cell([""A"",""C""])+""W""].join(""_"")" will be written as "[cell([""A"",v1,""B""]),
and as a result I get CSV::MalformedCSVError (CSV::MalformedCSVError) when trying to read the merged file.
how can I read and write the exact content of each CSV cell?
my code and running example:
def join_multiple_csv(csv_path_array)
f = CSV.parse(File.read(csv_path_array[0]), :headers => true, :quote_char => "'")
f_h = {}
f.headers.each {|header| f_h[header] = f[header]}
n_rows = f.size
csv_path_array.shift(1)
csv_path_array.each do |csv_file|
curr_csv = CSV.parse(File.read(csv_file), :headers => true, :quote_char => "'")
curr_h = {}
curr_csv.headers.each {|header| curr_h[header] = curr_csv[header]}
new_headers = curr_csv.headers - f_h.keys
exist_headers = curr_csv.headers - new_headers
new_headers.each { |new_header|
f_h[new_header] = Array.new(n_rows) + curr_csv[new_header]
}
exist_headers.each {|exist_header|
f_h[exist_header] = f_h[exist_header] + curr_csv[exist_header]
}
n_rows = n_rows + curr_csv.size
end
csv_headers = f_h.keys.map {|string| string}
output = csv_headers.join(",") + "\n"
(0..n_rows-1).each do |i|
row = ''
f_h.each_key do |header|
if f_h[header][i].nil?
row.concat(f_h[header][i].to_s + ",")
else
row.concat(f_h[header][i].to_s + ",")
end
end
output.concat(row + "\n")
end
return output
end
csv_files = ['f1.csv', 'f2.csv']
outputs = join_multiple_csv(csv_files)
f = CSV.new(outputs)
row = f.readline
while row do
row = f.readline
end
running example:
f1.csv
H1,H3,H4
v1,v2,v3
f2.csv
H2,H3,H4
v1,v3,"[cell([""A"",""B""]),""X""+cell([""A"",""C""])+""W""].join(""_"")"
expected output:
H1,H2,H3,H4
v1,,v2,v3
,v1,v3,"[cell([""A"",""B""]),""X""+cell([""A"",""C""])+""W""].join(""_"")"
output:
H1,H3,H4,H2,
v1,v2,v3,,,
,v3,"[cell([""A"",v1,""B""]),
,,,,,
,,,,,
Any idea what can I do?

Sorry I answered in rush.
I tried to run your program and found that the quote character causing to split the cell value on each comma in the string. changing quote character to double quote worked for me
f = CSV.parse(File.read(csv_path_array[0]), :headers => true, :quote_char => '"')
curr_csv = CSV.parse(File.read(csv_file), :headers => true, :quote_char => '"')

Related

How to convert a ruby hash into a string with a specific format

This is the hash I'm trying to format
input = {"test_key"=>"test_value", "test_key2"=>"test_value2"}
And this is the expected result
"{\n\t\"test_key\" = \"test_value\";\n\t\"test_key2\" = \"test_value2\";\n}"
I have the following code so far
def format_hash(hash)
output = ""
hash.to_s.split(',').each do |k|
new_string = k + ';'
new_string.gsub!('=>', ' = ')
output += new_string
end
end
which gives me the this output
output = "{\"test_key\" = \"test_value\"; \"test_key2\" = \"test_value2\"};"
But I'm still struggling with adding the rest. Any ideas/suggestions?
input = {"test_key"=>"test_value", "test_key2"=>"test_value2"}
"{" << input.map { |k,v| "\n\t\"#{k}\" = \"#{v}\"" }.join(';') << ";\n}"
#=> "{\n\t\"test_key\" = \"test_value\";\n\t\"test_key2\" = \"test_value2\";\n}"
The steps are as follows.
a = input.map { |k,v| "\n\t\"#{k}\" = \"#{v}\"" }
#=> ["\n\t\"test_key\" = \"test_value\"", "\n\t\"test_key2\" = \"test_value2\""]
b = a.join(';')
#=> "\n\t\"test_key\" = \"test_value\";\n\t\"test_key2\" = \"test_value2\""
"{" << b << ";\n}"
#=> "{\n\t\"test_key\" = \"test_value\";\n\t\"test_key2\" = \"test_value2\";\n}"
input may contain any number of key-value pairs that adhere to the indicated pattern.
One starting point might be to use JSON formatter:
require 'json'
input = {"test_key"=>"test_value", "test_key2"=>"test_value2"}
JSON.pretty_generate(input)
=> "{\n \"test_key\": \"test_value\",\n \"test_key2\": \"test_value2\"\n}"
This has some subtle differences, since it looks like you use = as opposed to :. That said, perhaps it's easier to work from this than from what you have.
Working with JSON
JSON.pretty_generate(input).gsub(/:/,' =').gsub(/,(?=\n)/, ';').gsub(/(;\n|\n)\s+/, '\1'+"\t")
=> "{\n\t\"test_key\" = \"test_value\";\n\t\"test_key2\" = \"test_value2\"\n}"
Custom Formatter
Of course you could define your custom formatter:
def formatter(hash)
output = ""
output += "{\n\t"
output += hash.entries.map{|a| "\"#{a[0]}\" = \"#{a[1]}\"" }.join(";\n\t")
output += ";\n}"
end
formatter( input )

CSV parsing, newline/linebreak issues

I'm trying to create a parser for multiple CSV files, that will eventually output to another CSV file in Excel-compatible format. The CSV files are exported by a commercial tool that takes a Firewall configuration and gives us a report of any issues it finds.
So far I have figured out how to read a directory of files in, look for certain values, determine the type of device I have and then spit it out to screen or to a CSV, but only if each line has single cell entries. If the source IP 'cell' (or any other) contains more than one IP, separated by a newline, the output breaks on that newline and pushes the remainder onto the next line.
The code I have so far is:
require 'csv'
require 'pp'
nipperfiles = Dir.glob(ARGV[0] + '/*.csv')
def allcsv(nipperfiles)
filearray = []
nipperfiles.each do |csv|
filearray << csv
end
filearray
end
def devicetype(filelist)
filelist.each do |f|
CSV.foreach(f, :headers => true, :force_quotes => true) do |row|
if row["Table"] =~ /audit device list/ && row["Device"] =~ /Cisco/
return "Cisco"
elsif row["Table"] =~ /audit device list/ && row["Device"] =~ /Dell/
return "Sonicwall"
elsif row["Table"] =~ /audit device list/ && row["Device"] =~ /Juniper/
return "Juniper"
end
end
end
end
def adminservices(device, filelist)
administrative = []
filelist.each do |f|
CSV.foreach(f, :headers => true, :col_sep => ",", :force_quotes => true, :encoding => Encoding::UTF_8) do |row|
if row["Table"] =~ /administrative service rule/
if row["Dst Port"] != "Any" and row["Service"] != "[Host] Any"
if device == "Cisco"
administrative << row["Table"] + ',' + row["Rule"] + ',' + row["Protocol"] + ',' + row["Source"] + ',' + row["Destination"] + ',' + row["Dst Port"]
elsif device == "Sonicwall"
administrative << row["Table"] + ',' + row["Rule"] + ',' + row["Source"] + ',' + row["Destination"] + ',' + row["Service"]
elsif device == "Juniper"
administrative << row["Table"] + ',' + row["Rule"] + ',' + row["Source"] + ',' + row["Destination"] + ',' + row["Service"]
end
end
end
end
end
administrative
end
def writecsv(admin)
finalcsv = File.new("randomstorm.csv", "w+")
finalcsv.puts("Administrative Services Table:\n", admin, "\r\n")
finalcsv.close
end
filelist = allcsv(nipperfiles)
device = devicetype(filelist)
adminservices(device, filelist)
admin = adminservices(device, filelist)
writecsv(admin)
Is there a way to get it to ignore the newlines that are inside cells, or is my code complete balls and needs to be started again?
I have tried writing a CSV file with the CSV library, but the results are the same and I figured this code was slightly clearer for demonstrating the issue.
I can sanitise an input file if it would help.
newlines are OK inside of fields as long they are quoted:
CSV.parse("1,\"2\n\n\",3")
=> [["1", "2\n\n", "3"]]
Try writing directly to a string or a file like in the documentation which will ensure your fields with newlines are quoted:
def writecsv(admin)
csv_string = CSV.generate do |csv|
admin.each { |row| csv << row }
end
finalcsv = File.new("randomstorm.csv", "w+")
finalcsv.puts("Administrative Services Table:\n", csv_string, "\r\n")
finalcsv.close
end
Also ensure you are writing your fields as an array inside of adminservices():
administrative << [row["Table"], row["Rule"], row["Protocol"], row["Source"], row["Destination"], row["Dst Port"]]

Tring to Remove Comma Before or During Parsing CSV in ruby

I am importing a CSV File that is being Parsed and i have the seperators set to "|" i would like to remove the comma's or comment them so they do not mess up colums
Here is the part that i think should have a code to remove the ,'s
namespace :postonce do
desc "Check postonce ftp files and post loads and trucks."
task :post => :environment do
files = %x[ls /home/web2_postonce/].split("\n")
files.each do |file|
%x[ iconv -t UTF-8 /home/web2_postonce/#{file} > /home/deployer/postonce/#{file} ]
%x[ mv /home/web2_postonce/#{file} /home/deployer/postonce_backup/ ]
end
files = %x[ ls /home/deployer/postonce/ ].split("\n")
files.each do |file|
begin
lines = CSV.read("/home/deployer/postonce/#{file}")
rescue Exception => e
log.error e
next
end
h = lines.shift
header = CSV.parse_line(h[0], { :col_sep => "|" } )
lines.each do |line|
fields = CSV.parse_line(line[0],{:col_sep => "|"})
post = Hash[header.zip fields]
if post["EmailAddress"].blank?
log.error "Blank Email #{post["EmailAddress"]}"
else
log.debug "Email #{post["EmailAddress"]}"
end
Here is the the full code that pulls the file and parses the file into colums
require 'resque'
require 'logger'
log = Logger.new("#{Rails.root}/log/PostOnce.log")
log.datetime_format = "%F %T"
namespace :postonce do
desc "Check postonce ftp files and post loads and trucks."
task :post => :environment do
files = %x[ls /home/web2_postonce/].split("\n")
files.each do |file|
%x[ iconv -t UTF-8 /home/web2_postonce/#{file} > /home/deployer/postonce/#{file} ]
%x[ mv /home/web2_postonce/#{file} /home/deployer/postonce_backup/ ]
end
files = %x[ ls /home/deployer/postonce/ ].split("\n")
files.each do |file|
begin
lines = CSV.read("/home/deployer/postonce/#{file}")
rescue Exception => e
log.error e
next
end
h = lines.shift
header = CSV.parse_line(h[0], { :col_sep => "|" } )
lines.each do |line|
fields = CSV.parse_line(line[0],{:col_sep => "|"})
post = Hash[header.zip fields]
if post["EmailAddress"].blank?
log.error "Blank Email #{post["EmailAddress"]}"
else
log.error "Email #{post["EmailAddress"]}"
end
if post["Notes"].blank?
post["Notes"] = "~PostOnce~"
else
post["Notes"] = post["Notes"]+" ~PostOnce~"
end
if Company.where(:name => post["Company"]).first.nil?
c = Company.new
c.name = post["Company"]
c.dispatch = post["Customer_Phone"]
c.save
end
if User.where(:email => ["EmailAddress"]).first.blank?
u = User.new
c = Company.where(:name => post["Company"]).first unless Company.where(:name => post["Company"]).first.nil?
u.company_id = c.id
u.username = post["EmailAddress"].gsub(/#.*/,"") unless post["EmailAddress"].nil?
u.password = Time.now.to_s
u.email = post["EmailAddress"]
u.dispatch = post["Customer_Phone"]
u.save
end
#If Load
if file.start_with?("PO_loads")
record = Hash.new
begin
record[:user_id] = User.where(:email => post["EmailAddress"]).first.id
rescue Exception => e
log.error e
next
end
record[:origin] = "#{post["Starting_City"]}, #{post["Starting_State"]}"
record[:dest] = "#{post["Destination_City"]}, #{post["Destination_State"]}"
record[:pickup] = Time.parse(post["Pickup_Date_Time"])
record[:ltl] = false
record[:ltl] = true unless post["#Load_Type_Full"] = "FULL"
begin
record[:equipment_id] = Equipment.where(:code => post["Type_of_Equipment"]).first.id
rescue Exception => e
record[:equipment_id] = 34
end
record[:comments] = post["Notes"]
record[:weight] = post["Weight"]
record[:length] = post["Length"]
record[:rate] = post["Payment_amount"]
record[:rate] = '' if post["Payment_amount"] == 'Call' or post["Payment_amount"] == 'CALL'
Resque.enqueue(MajorPoster, record)
#If Truck
elsif file.start_with?("PO_trucks")
record = Hash.new
begin
record[:user_id] = User.where(:email => post["EmailAddress"]).first.id
rescue Exception => e
log.error e
next
end
record[:origin] = "#{post["Starting_City"]}, #{post["Starting_State"]}"
record[:dest] = "#{post["Destination_City"]}, #{post["Destination_State"]}"
record[:available] = Time.parse(post["Pickup_Date_Time"])
record[:expiration] = record[:available] + 8.days
begin
record[:equipment_id] = Equipment.where(:code => post["Type_of_Equipment"]).first.id
rescue Exception => e
record[:equipment_id] = 34
end
record[:comments] = post["Notes"]
Resque.enqueue(MajorPoster, record)
end
end
# %x[rm /home/deployer/postonce/#{file}]
end
end
end
here is a sample of data that i am tring to load up the commas come in Customer_Contact and in Notes this data comes to us thru FTP
Member_ID|Action_type|Entry_Number|Pickup_Date_Time|Starting_City|Starting_State|Destination_City|Destination_State|Type_of_Equipment|Length|Quantity|#Load_type_full|Extra_Stops|Payment_amount|Weight|Distance|Notes|Customer_Phone|Extension|Customer_Contact|EmailAddress|Company|
SUMMIT|L-delete|16491978|20140213|PEWAMO|MI|DENVER|CO|FT|45|1|FULL|0|Call|46000|||866-807-4968||DISPATCH, Dispatch|IANP#SUMMITTRANS.NET|SUMMIT TRANSPORTATION SERVICES INC.|
SUMMIT|L-delete|16490693|20140213|PEWAMO|MI|DENVER|CO|V|48|1|FULL|0|Call|44000|||866-807-4968||DISPATCH|IANP#SUMMITTRANS.NET|SUMMIT TRANSPORTATION SERVICES INC.|
SUMMIT|L-delete|16490699|20140214|PEWAMO|MI|DENVER|CO|V|48|1|FULL|0|Call|44000|||866-807-4968||DISPATCH|IANP#SUMMITTRANS.NET|SUMMIT TRANSPORTATION SERVICES INC.|
megacorpwv|L-Delete|16491928|20140214|WAITE PARK|MN|DOLTON|IL|R||1|FULL|0|CALL|0|0|(859) 538-1660 x2007|877-670-2837|||snewman#megacorplogistics.com|MEGACORP LOGISTICS 03|
My log shows this: As you see I manually put a comma in one field on the first record and it acted as a seperator
2014-02-13 12:29:41 ERROR -- Blank Email
2014-02-13 12:29:41 ERROR -- undefined method `id' for nil:NilClass
2014-02-13 12:29:41 DEBUG -- Email IANP#SUMMITTRANS.NET
2014-02-13 12:29:42 DEBUG -- Email IANP#SUMMITTRANS.NET
2014-02-13 12:29:42 DEBUG -- Email snewman#megacorplogistics.com
I think your problem is that you're only parsing the first element of the array "h" and "line". Try removing the "[0]" from those two lines. It's not that the email is blank it's that everything except Member_ID is blank.
header = CSV.parse_line(h, { :col_sep => "|" } )
lines.each do |line|
fields = CSV.parse_line(line,{:col_sep => "|"})
Ah. OK. Phillip Hallstrom has identified the problem. It's in the CSV.read statement. By default CSV.read will attempt to delimited by comma ",". What CSV.read is attempting to do is read each line as an array element and then parse each line into another array. Therefore, if your file looks like this:
a|b|c|d|e
apple|ball, bearing|cantelope|date|elephant
It will return the following array on CSV.read
[["a|b|c|d|e"], ["apple|ball", " bearing|cantelope|date|elephant"]]
You can see that CSV.read is attempting to do the full parse before you get the opportunity to specify a delimiter.
Either read the lines in using normal file I/O or recode to specify the delimiter in the CSV.read statement

Parse CSV Data with Ruby

I am trying to return a specific cell value based on two criteria.
The logic:
If ClientID = 1 and BranchID = 1, puts SurveyID
Using Ruby 1.9.3, I want to basically look through an excel file and for two specific values located within the ClientID and BranchID column, return the corresponding value in the SurveyID column.
This is what I have so far, which I found during my online searches. It seemed promising, but no luck:
require 'csv'
# Load file
csv_fname = 'FS_Email_Test.csv'
# Key is the column to check, value is what to match
search_criteria = { 'ClientID' => '1',
'BranchID' => '1' }
options = { :headers => :first_row,
:converters => [ :numeric ] }
# Save `matches` and a copy of the `headers`
matches = nil
headers = nil
# Iterate through the `csv` file and locate where
# data matches the options.
CSV.open( csv_fname, "r", options ) do |csv|
matches = csv.find_all do |row|
match = true
search_criteria.keys.each do |key|
match = match && ( row[key] == search_criteria[key] )
end
match
end
headers = csv.headers
end
# Once matches are found, we print the results
# for a specific row. The row `row[8]` is
# tied specifically to a notes field.
matches.each do |row|
row = row[1]
puts row
end
I know the last bit of code following matches.each do |row| is invalid, but I left it in in hopes that it will make sense to someone else.
How can I write puts surveyID if ClientID == 1 & BranchID == 1?
You were very close indeed. Your only error was setting the values of the search_criteria hash to strings '1' instead of numbers. Since you have converters: :numeric in there the find_all was comparing 1 to '1' and getting false. You could just change that and you're done.
Alternatively this should work for you.
The key is the line
Hash[row].select { |k,v| search_criteria[k] } == search_criteria
Hash[row] converts the row into a hash instead of an array of arrays. Select generates a new hash that has only those elements that appear in search_criteria. Then just compare the two hashes to see if they're the same.
require 'csv'
# Load file
csv_fname = 'FS_Email_Test.csv'
# Key is the column to check, value is what to match
search_criteria = {
'ClientID' => 1,
'BranchID' => 1,
}
options = {
headers: :first_row,
converters: :numeric,
}
# Save `matches` and a copy of the `headers`
matches = nil
headers = nil
# Iterate through the `csv` file and locate where
# data matches the options.
CSV.open(csv_fname, 'r', options) do |csv|
matches = csv.find_all do |row|
Hash[row].select { |k,v| search_criteria[k] } == search_criteria
end
headers = csv.headers
end
p headers
# Once matches are found, we print the results
# for a specific row. The row `row[8]` is
# tied specifically to a notes field.
matches.each { |row| puts row['surveyID'] }
Possibly...
require 'csv'
b_headers = false
client_id_col = 0
branch_id_col = 0
survey_id_col = 0
CSV.open('FS_Email_Test.csv') do |file|
file.find_all do |row|
if b_headers == false then
client_id_col = row.index("ClientID")
branch_id_col = row.index("BranchID")
survey_id_col = row.index("SurveyID")
b_headers = true
if branch_id_col.nil? || client_id_col.nil? || survey_id_col.nil? then
puts "Invalid csv file - Missing one of these columns (or no headers):\nClientID\nBranchID\nSurveyID"
break
end
else
puts row[survey_id_col] if row[branch_id_col] == "1" && row[client_id_col] == "1"
end
end
end

What is the syntax for array.select?

I'm trying to use Array.select to separate out, and then delete, strings from a database that contain unwanted items. I get no errors but this does not seem to be working as hoped.
The relevant code is the last part:
totaltext = []
masterfacs = ''
nilfacs = ''
roomfacs_hash = {'lcd' => lcd2, 'wifi'=> wifi2, 'wired' => wired2, 'ac' => ac2}
roomfacs_hash.each do |fac, fac_array|
if roomfacs.include? (fac)
totaltext = (totaltext + fac_array)
masterfacs = (masterfacs + fac + ' ')
else
nilfacs = (nilfacs + fac + ' ')
end
end
finaltext = Array.new
text_to_delete = totaltext2.select {|sentences| sentences =~ /#{nilfacs}/i}
finaltext = totaltext2.delete (text_to_delete)
puts finaltext
It's probably not working because delete isn't a chainable method (the return value is the object you are trying to delete on success, or nil if not found; not the modified array). To simplify your code, just use reject
finaltext = totaltext.reject{|sentence| nilfacs.any?{|fac| sentence =~ /#{fac}/i } }

Resources