How to read specific columns of a zipped CSV file - ruby

I used the code below to read the contents of a zipped CSV file.
Zip::ZipFile.foreach(file) do |entry|
istream = entry.get_input_stream
data = istream.read
#...
end
It gives me the entire content of the text (CSV) file with headers like below:
NAME AGE GENDER NAME1 29 MALE NAME2 30 FEMALE
but I need specific data of the column. For example, I want to display only the names (NAME). Please help me proceed with this.

Though your example shows ZipFile, you're really asking a CSV question. First, you should check the docs in http://www.ruby-doc.org/stdlib-2.0/libdoc/csv/rdoc/CSV.html
You'll find that if you parse your data with the :headers => true option, you'll get a CSV::table object that knows how to extract a column of data as follows. (For obvious reasons, I wouldn't code it this way -- this is for example only.)
require 'zip'
require 'csv'
csv_table = nil
Zip::ZipFile.foreach("x.csv.zip") do |entry|
istream = entry.get_input_stream
data = istream.read
csv_table = CSV.parse(data, :col_sep => " ", :headers => true)
end
With the data you gave, we need `col_sep => " " since you're using spaces as column separators. But now we can do:
>> csv_table["NAME"] # extract the NAME column
=> ["NAME1", "NAME2"]

First you can use this for reference:
http://www.ruby-doc.org/stdlib-2.0/libdoc/csv/rdoc/CSV.html
If you have a string you can do
array = CSV.parse("data")
This would give you an array of arrays, one for each line.
Now if you know that the first column for each line is the name you can just manipulate that array i.e
array.map { |line| line[0] }.join(",") # returns NAME,<name>,<name>,<name> ...

Related

Change Headers for Certain Columns in CSV File

I have a CSV file that I want to change the headers only for certain columns (about 20 of them in my actual file). Here's a sample CSV file:
CSV File
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
I have been trying this with a File/IO class, but when it reads the file to do the gsub it removes all of the quotation marks around each string separated by commas. Here's the code I'm using:
Ruby Code
file = 'file.csv'
replacements = {
'blah_01_blah' => 'newblah1',
'foo_01_foo' => 'coolfoo1',
'bacon_01_bacon' => 'goodpig1',
'bacon_01_bacon' => 'goodpig2'
}
matcher = /#{replacements.keys.join('|')}/
outdata = File.read(file).gsub(matcher, replacements)
File.open(file, 'w') do |out|
out << outdata
end
What I end up with is this in the CSV file:
New CSV File
name,blah_01_blah,foo_1_01_foo,bacon_01_bacon,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae
It's keeping the quotation marks in fields that are blank, but taking them out around the strings elsewhere. I want to retain those quotation marks in case for some reason a rogue comma ends up in a string somewhere so it doesn't get thrown off. How can I change the headers without losing my quotation marks around the strings?
EDIT - This is what I want the file to look like at the end.
Expected Result CSV File
"name","newblah1","coolfoo1","goodpig1","goodpig2"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
Thanks!
You don’t need to handle CSV at all:
File.write(
file,
File.readlines(file).tap do |lines|
lines.first.gsub!(matcher, replacements)
end.join
)
File#readlines.
The trick here is we actually deal with the first line only, as with plain text.
Let's first create the input CSV file.
text =<<_
"name","blah_01_blah","foo_1_01_foo","bacon_01_bacon","bacon_02_bacon"
"John","yucky","summer","yum","food"
"Mary","","","cool","sundae"
_
file_in = 'file_in.csv'
file_out = 'file_out.csv'
File.write(file_in, text)
#=> 137
Here is the replacements hash, which I simplified slightly.
replacements = {'blah_01_blah'=>'newblah1', 'foo_01_foo'=>'coolfoo1',
'bacon_01_bacon'=>'goodpig1'}
The first task is to modify this hash so that if it has no key k, replacements[k] will return k. For this we use the method Hash#default_proc=.
replacements.default_proc = ->(_,k) { k }
Here are two examples of how this hash is used.
replacements['bacon_01_bacon']
#=> "goodpig1"
replacements['name']
#=> "name"`
The latter follows because replacements has no key 'name'.
The code is as follows.
require 'csv'
f_in = CSV.read(file_in, headers:true)
CSV.open(file_out, 'w') do |csv_out|
csv_out << replacements.values_at(*f_in.headers)
f_in.each { |row| csv_out << row }
end
#=> #<CSV::Table mode:col_or_row row_count:3>
Note that
f_in.headers
#=> ["name", "blah_01_blah", "foo_1_01_foo", "bacon_01_bacon", "bacon_02_bacon"]
Let's look at the output file.
puts File.read(file_out)
prints
name,newblah1,foo_1_01_foo,goodpig1,bacon_02_bacon
John,yucky,summer,yum,food
Mary,"","",cool,sundae

How do I detect if a CSV is empty or has no rows?

I have an array info, where I am reading each item and adding it to a CSV file like so:
info.each do |listing|
CSV.open(csvfile, "a+") do |csv|
csv << listing
end
end
However, what I want to do is when the CSV is empty (i.e. this is the first row being added to this specific CSV) I will add a header first before adding any data. A header being a row that just has data-categories: First Name, Last Name, Address, etc.
If I add it to that loop, it will add it after each record.
Also, there is no guarantee that the first item in the array will be the first item in the CSV. The CSV could be empty by the time the iterator is at i[10] for example.
How do I approach this?
You can check whether the CSV::Table contains any rows:
require 'csv'
filepath = File.join('.', 'test')
csv = CSV.open(filepath, 'wb', col_sep: ';', quote_char: "\x00")
csv_table = CSV.table(filepath)
csv_table.count
#=> 0

Ruby - Matching a value and returning a corresponding value from a separate file

I'm looking to match one value (A sku from a website) to the same value (A sku from lookup.csv) and return a corresponding model (From lookup.csv).
Here's sample data from lookup.csv:
SKU , Model
2520045 , DQ.SUNAA.002
7423599 , DA.MX00.1CC
9547543 , DX.MF01.2BM
Here's my code thus far:
url = "http://www.bestbuy.com/site/acer-aspire-23-touch-screen-all-in-one-intel-core-i5-8gb-memory-2tb-hard-drive-black/2520045.p?id=1219547718151&skuId=2520045"
page = Nokogiri::HTML(open(url))
sku = page.css('span#sku-value').text
#model = match the sku to the sku in lookup.csv and return corresponding model
puts model
I know that I can open the file with
open("lookup.csv", 'r')
but past that, I'm not quite sure how to match/return a corresponding value.
Any help is appreciated!
The suggestion of Aoukar would work but be slow with large data sets.
Here a better solution, read the CSV once, using the CSV gem (no need to reinvent the wheel) and store the data in a hash, after that you can just ask fort the right Model, here a working sample.
I'm using the CSV data in the DATA part of the script here so I don't need the CSV file itself.
require "csv"
lookup = {}
CSV.parse(DATA, col_sep: " , ", headers: true, force_quotes: false, :quote_char => "\x00").each do |row|
lookup.merge! Hash[row['SKU'], row['Model']]
end
lookup #{"2520045"=>"DQ.SUNAA.002", "7423599"=>"DA.MX00.1CC", "9547543"=>"DX.MF01.2BM"}
lookup['2520045'] #"DQ.SUNAA.002"
__END__
_ ,SKU , Model #the first element is to work around a bug in CSV used this way
2520045 , DQ.SUNAA.002
7423599 , DA.MX00.1CC
9547543 , DX.MF01.2BM
you can try this code, but I didn't test it so it might need some modifications, I've written it as a function since it's what I'm used to.
def search(path,key) #path to file, and word to search for
File.open(path,'r') do |file| #open file
file.readlines.each { |line| #read lines array
if line.split(' , ')[0] == key #match the SKU
return line.split(' , ')[1] #return the Model
end
}
end
end

Import CSV column where header includes X?

I have a CSV file with a column header like so:
[NPSScore] On a scale of 0 to 10, how likely would you b...
But sometimes the header may look like this:
[NPSScore]
I need to read this column into an array. My code looks like this:
CSV.foreach(csv_file_to_read_for_is_average_nps_score, :col_sep => "\t", :encoding => "BOM|UTF-16LE:UTF-8", :headers => true) do |column|
is_average_nps_score_arr << column['[NPSScore]']
end
However, this will only pick up the latter of the two header options.
How can I write this to pick up any column that includes NPSScore?
To get the first column that includes the string '[NPSScore]' do
column.find{|k,_| k.include? '[NPSScore]' }.last
instead of
column['[NPSScore]']

How do I make an array of arrays out of a CSV?

I have a CSV file that looks like this:
Jenny, jenny#example.com ,
Ricky, ricky#example.com ,
Josefina josefina#example.com ,
I'm trying to get this output:
users_array = [
['Jenny', 'jenny#example.com'], ['Ricky', 'ricky#example.com'], ['Josefina', 'josefina#example.com']
]
I've tried this:
users_array = Array.new
file = File.new('csv_file.csv', 'r')
file.each_line("\n") do |row|
puts row + "\n"
columns = row.split(",")
users_array.push columns
puts users_array
end
Unfortunately, in Terminal, this returns:
Jenny
jenny#example.com
Ricky
ricky#example.com
Josefina
josefina#example.com
Which I don't think will work for this:
users_array.each_with_index do |user|
add_page.form_with(:id => 'new_user') do |f|
f.field_with(:id => "user_email").value = user[0]
f.field_with(:id => "user_name").value = user[1]
end.click_button
end
What do I need to change? Or is there a better way to solve this problem?
Ruby's standard library has a CSV class with a similar api to File but contains a number of useful methods for working with tabular data. To get the output you want, all you need to do is this:
require 'csv'
users_array = CSV.read('csv_file.csv')
PS - I think you are getting the output you expected with your file parsing as well, but maybe you're thrown off by how it is printing to the terminal. puts behaves differently with arrays, printing each member object on a new line instead of as a single array. If you want to view it as an array, use puts my_array.inspect.
Assuming that your CSV file actually has a comma between the name and email address on the third line:
require 'csv'
users_array = []
CSV.foreach('csv_file.csv') do |row|
users_array.push row.delete_if(&:nil?).map(&:strip)
end
users_array
# => [["Jenny", "jenny#example.com"],
# ["Ricky", "ricky#example.com"],
# ["Josefina", "josefina#example.com"]]
There may be a simpler way, but what I'm doing there is discarding the nil field created by the trailing comma and stripping the spaces around the email addresses.

Resources