I am using the roo-gem in ruby to get excel sheet cell values.
I have a file 'ruby.rb' with:
require 'spreadsheet'
require 'roo'
xls = Roo::Spreadsheet.open('test_work.xls')
xls.each do |row|
p row
end
my output in the terminal when I run ruby 'ruby.rb' is:
["id", "header2", "header3", "header4"]
["val1", "val2", "val3", "val4"]
["val1", "val2", "val3", "val4"]
when I add:
require 'spreadsheet'
require 'roo'
xls = Roo::Spreadsheet.open('test_work.xls')
xls.each do |row|
two_dimensional = []
two_dimensional << row
p two_dimensional
end
I get:
[["id", "header2", "header3", "header4"]]
[["val1", "val2", "val3", "val4"]]
[["val1", "val2", "val3", "val4"]]
What I want is:
[["id", "header2", "header3", "header4"],
["val1", "val2", "val3", "val4"],
["val1", "val2", "val3", "val4"]]
How would I go about doing this.
Thanks!
Just declare the array outside the each block. You're resetting it to [] every time the block is run. In that case, you will only append to one array.
two_dimensional = []
xls = Roo::Spreadsheet.open('test_work.xls')
xls.each do |row|
two_dimensional << row
p two_dimensional
end
You can also try
require 'rubygems'
require 'roo'
class InputExcelReader
$INPUTPATH = 'C:\test_input_excel.xlsx'
excel_data_array = Array.new()
workbook = Roo::Spreadsheet.open($INPUTPATH)
worksheets = workbook.sheets
puts worksheets
puts "Found #{worksheets.count} worksheets"
worksheets.each do |worksheet|
puts "Reading: #{worksheet}"
num_rows = 0
workbook.sheet(worksheet).each_row_streaming do |row|
if(num_rows>0)
puts "Reading the row no: #{num_rows}"
row_cells = row.map { |cell|
puts "Reading cells"
cell.value
}
excel_data_array.push(row_cells)
end
num_rows += 1
end
puts excel_data_array.to_s
end
end
Related
I have a CSV from which I've removed the irrelevant data.
Now I need to split "Name and surname" into 2 columns by space but ignoring a 3rd column in case there are 3 names, then invert the order of the columns "Name and surname" and "Phone" (phone first) and then put them into a file ignoring the headers. I've never actually learned Ruby but I've played with Python 10 years ago. Can you help me? This is what I was able to do until now:
E.g.
require 'csv'
csv_table = CSV.read(ARGV[0], :headers => true)
keep = ["Name and surname", "Phone", "Email"]
new_csv_table = csv_table.by_col!.delete_if do |column_name,column_values|
!keep.include? column_name
end
new_csv_table.to_csv
Begin by creating a CSV file.
str =<<~END
Name and surname,Phone,Email
John Doe,250-256-3145,John#Doe.com
Marsha Magpie,250-256-3154,Marsha#Magpie.com
END
File.write('t_in.csv', str)
#=> 109
Initially, let's read the file, add two columns, "Name" and "Surname", and optionally delete the column, "Name and surname", without regard to column order.
First read the file into a CSV::Table object.
require 'csv'
tbl = CSV.read('t_in.csv', headers: true)
#=> #<CSV::Table mode:col_or_row row_count:3>
Add the new columns.
tbl.each do |row|
row["Name"], row["Surname"] = row["Name and surname"].split
end
#=> #<CSV::Table mode:col_or_row row_count:3>
Note that if row["Name and surname"] had equaled “John Paul Jones”, we would have obtained row["Name"] #=> “John” and row["Surname"] #=> “Paul”.
If the column "Name and surname" is no longer required we can delete it.
tbl.delete("Name and surname")
#=> ["John Doe", "Marsha Magpie"]
Write tbl to a new CSV file.
CSV.open('t_out.csv', "w") do |csv|
csv << tbl.headers
tbl.each { |row| csv << row }
end
#=> #<CSV::Table mode:col_or_row row_count:3>
Let's see what was written.
puts File.read('t_out.csv')
displays
Phone,Email,Name,Surname
250-256-3145,John#Doe.com,John,Doe
250-256-3154,Marsha#Magpie.com,Marsha,Magpie
Now let's rearrange the order of the columns.
header_order = ["Phone", "Name", "Surname", "Email"]
CSV.open('t_out.csv', "w") do |csv|
csv << header_order
tbl.each { |row| csv << header_order.map { |header| row[header] } }
end
puts File.read('t_out.csv')
#=> #<CSV::Table mode:col_or_row row_count:3>
displays
Phone,Name,Surname,Email
250-256-3145,John,Doe,John#Doe.com
250-256-3154,Marsha,Magpie,Marsha#Magpie.com
I have created a CSV file about my eshop that contains multiple items with different SKUs. Some SKUs appear more than once because they can be in more than one category (but the Title and Price will always be the same for a given SKU). Example:
SKU,Title,Category,Price
001,Soap,Bathroom,0.5
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8
I now wish to create from that file another CSV file that has no duplicate SKU's and aggregates the "Category" attributes as follows:
SKU,Title,Category,Price
001,Soap,Bathroom/Kitchen,0.5
002,Water,Kitchen/Garage,0.4
003,Juice,Kitchen,0.8
How can I do that?
It's my understand you wish to read a CSV file, perform some operations on the data and then write the result to a new CSV file. You could do that as follows.
Code
require 'csv'
def convert(csv_file_in, csv_file_out, group_field, aggregate_field)
csv = CSV.read(FNameIn, headers: true)
headers = csv.headers
arr = csv.group_by { |row| row[group_field] }.
map do |_,a|
headers.map { |h| h==aggregate_field ?
(a.map { |row| row[aggregate_field] }.join('/')) : a.first[h] }
end
CSV.open(FNameOut, "wb") do |csv|
csv << headers
arr.each { |row| csv << row }
end
end
Example
Let's create a CSV file with the following data:
s =<<_
SKU,Title,Category,Price
001,Soap,Bathroom,0.5
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8
_
FNameIn = 'testin.csv'
FNameOut = 'testout.csv'
IO.write(FNameIn, s)
#=> 135
Now execute the method with these values:
convert(FNameIn, FNameOut, "SKU", "Category")
and confirm FNameOut was written correctly:
puts IO.read(FNameOut)
SKU,Title,Category,Price
001,Soap,Bathroom/Kitchen,0.5
002,Water,Kitchen/Garage,0.4
003,Juice,Kitchen,0.8
Explanation
The steps are as follows:
csv_file_in = FNameIn
csv_file_out = FNameOut
group_field = "SKU"
aggregate_field = "Category"
csv = CSV.read(FNameIn, headers: true)
See CSV::read.
headers = csv.headers
#=> ["SKU", "Title", "Category", "Price"]
h = csv.group_by { |row| row[group_field] }
#=> {"001"=>[
#<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Bathroom" "Price":"0.5">,
# #<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Kitchen" "Price":"0.5">
# ],
# "002"=>[
# #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Kitchen" "Price":"0.4">,
# #<CSV::Row "SKU":"002" "Title":"Water" "Category":"Garage" "Price":"0.4">
# ],
# "003"=>[
# #<CSV::Row "SKU":"003" "Title":"Juice" "Category":"Kitchen" "Price":"0.8">
# ]
# }
arr = h.map do |_,a|
headers.map { |h| h==aggregate_field ?
(a.map { |row| row[aggregate_field] }.join('/')) : a.first[h] }
end
#=> [["001", "Soap", "Bathroom/Kitchen", "0.5"],
# ["002", "Water", "Kitchen/Garage", "0.4"],
# ["003", "Juice", "Kitchen", "0.8"]]
See CSV#headers and Enumerable#group_by, an oft-used method. Lastly, write the output file:
CSV.open(FNameOut, "wb") do |csv|
csv << headers
arr.each { |row| csv << row }
end
See CSV::open. Now let's return to the calculation of arr. This is most easily explained by inserting some puts statements and executing the code.
arr = h.map do |_,a|
puts " _=#{_}"
puts " a=#{a}"
headers.map do |h|
puts " header=#{h}"
if h==aggregate_field
a.map { |row| row[aggregate_field] }.join('/')
else
a.first[h]
end.
tap { |s| puts " mapped to #{s}" }
end
end
See Object#tap. The following is displayed.
_=001
a=[#<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Bathroom" "Price":"0.5">,
#<CSV::Row "SKU":"001" "Title":"Soap" "Category":"Kitchen" "Price":"0.5">]
header=SKU
mapped to 001
header=Title
mapped to Soap
header=Category
mapped to Bathroom/Kitchen
header=Price
mapped to 0.5
_=002
a=[#<CSV::Row "SKU":"002" "Title":"Water" "Category":"Kitchen" "Price":"0.4">,
#<CSV::Row "SKU":"002" "Title":"Water" "Category":"Garage" "Price":"0.4">]
header=SKU
mapped to 002
header=Title
mapped to Water
header=Category
mapped to Kitchen/Garage
header=Price
mapped to 0.4
_=003
a=[#<CSV::Row "SKU":"003" "Title":"Juice" "Category":"Kitchen" "Price":"0.8">]
header=SKU
mapped to 003
header=Title
mapped to Juice
header=Category
mapped to Kitchen
header=Price
mapped to 0.8
It seems that in order for this to be correct, we must assume the SKU number and the price are always the same. Since you know the only key you want to merge data between is Category here is how you can do it.
Assuming this is your test.csv in the same path as the ruby script:
# test.csv
SKU,Title,Category,Price
001,Soap,Bathroom,0.5
001,Soap,Kitchen,0.5
002,Water,Kitchen,0.4
002,Water,Garage,0.4
003,Juice,Kitchen,0.8
Ruby script in same directory as your test.csv file
# fix_csv.rb
require 'csv'
rows = CSV.read 'test.csv', :headers => true
skews = rows.group_by{|row| row['SKU']}.keys.uniq
values = rows.group_by{|row| row['SKU']}
merged = skews.map do |key|
group = values.select{|k,v| k == key}.values.flatten.map(&:to_h)
category = group.map{|k,v| k['Category']}.join('/')
new_data = group[0]
new_data['Category'] = category
new_data
end
CSV.open('merged_data.csv', 'w') do |csv|
csv << merged.first.keys # writes the header row
merged.each do |hash|
csv << hash.values
end
end
puts 'see contents of merged_data.csv'
Read a csv format file and construct a new class with the name of the file dynamically. So if the csv is persons.csv, the ruby class should be person, if it's places.csv, the ruby class should be places
Also create methods for reading and displaying each value in "csv" file and values in first row of csv file will act as name of the function.
Construct an array of objects and associate each object with the row of a csv file. For example the content of the csv file could be
name,age,city
abd,45,TUY
kjh,65,HJK
Previous code :
require 'csv'
class Feed
def initialize(source_name, column_names = [])
if column_names.empty?
column_names = CSV.open(source_name, 'r', &:first)
end
columns = column_names.reduce({}) { |columns, col_name| columns[col_name] = []; columns }
define_singleton_method(:columns) { column_names }
column_names.each do |col_name|
define_singleton_method(col_name.to_sym) { columns[col_name] }
end
CSV.foreach(source_name, headers: true) do |row|
column_names.each do |col_name|
columns[col_name] << row[col_name]
end
end
end
end
feed = Feed.new('input.csv')
puts feed.columns #["name", "age", "city"]
puts feed.name # ["abd", "kjh"]
puts feed.age # ["45", "65"]
puts feed.city # ["TUY", "HJK"]
I am trying to refine this solution using class methods and split code into smaller methods. Calling values outside the class using key names but facing errors like "undefined method `age' for Feed:Class". Is that a way I can access values outside the class ?
My solution looks like -
require 'csv'
class Feed
attr_accessor :column_names
def self.col_name(source_name, column_names = [])
if column_names.empty?
#column_names = CSV.open(source_name, :headers => true)
end
columns = #column_names.reduce({}) { |columns, col_name| columns[col_name] = []; columns }
end
def self.get_rows(source_name)
col_name(source_name, column_names = [])
define_singleton_method(:columns) { column_names }
column_names.each do |col_name|
define_singleton_method(col_name.to_sym) { columns[col_name] }
end
CSV.foreach(source_name, headers: true) do |row|
#column_names.each do |col_name|
columns[col_name] << row[col_name]
end
end
end
end
obj = Feed.new
Feed.get_rows('Input.csv')
puts obj.class.columns
puts obj.class.name
puts obj.class.age
puts obj.class.city
Expected Result -
input = Input.new
p input.name # ["abd", "kjh"]
p input.age # ["45", "65"]
input.name ='XYZ' # Value must be appended to array
input.age = 25
p input.name # ["abd", "kjh", "XYZ"]
p input.age # ["45", "65", "25"]
Let's create the CSV file.
str =<<END
name,age,city
abd,45,TUY
kjh,65,HJK
END
FName = 'temp/persons.csv'
File.write(FName, str)
#=> 36
Now let's create a class:
klass = Class.new
#=> #<Class:0x000057d0519de8a0>
and name it:
class_name = File.basename(FName, ".csv").capitalize
#=> "Persons"
Object.const_set(class_name, klass)
#=> Persons
Persons.class
#=> Class
See File::basename, String#capitalize and Module#const_set.
Next read the CSV file with headers into a CSV::Table object:
require 'csv'
csv = CSV.read(FName, headers: true)
#=> #<CSV::Table mode:col_or_row row_count:3>
csv.class
#=> CSV::Table
See CSV#read. We may now create the methods name, age and city.
csv.headers.each { |header| klass.define_method(header) { csv[header] } }
See CSV#headers, Module::define_method and CSV::Row#[].
We can now confirm they work as intended:
k = klass.new
k.name
#=> ["abd", "kjh"]
k.age
#=> ["45", "65"]
k.city
#=> ["TUY", "HJK"]
or
p = Persons.new
#=> #<Persons:0x0000598dc6b01640>
p.name
#=> ["abd", "kjh"]
and so on.
I am trying to store the results from my scrapping exercice into a CSV file.
The current CSV file gives me the following output :
Name of Movie 1
Rating 1
Name of Movie 2
Rating 2
I would like to get the following output :
Name of Movie 1 Rating 1
Name of Movie 2 Rating 2
Here is my code, I guess it has to deal with the row / column separator :
require 'open-uri'
require 'nokogiri'
require 'csv'
array = []
for i in 1..10
url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
html_doc.search('.img_side_content').each do |element|
array << element.search('.no_underline').inner_text
element.search('.note').each do |data|
array << data.inner_text
end
end
end
puts array
csv_options = { row_sep: ',', force_quotes: true, quote_char: '"' }
filepath = 'allocine.csv'
CSV.open(filepath, 'wb', csv_options) do |csv|
array.each { |item| csv << [item] }
end
I think the problem here is that you are not pushing the elements correctly into your array variable. Basically, your array ends up looking like this:
['Movie 1 Title', 'Movie 1 rating', 'Movie 2 Title', 'Movie 2 rating', ...]
What you actually want is an array of arrays, like so:
[
['Movie 1 Title', 'Movie 1 rating'],
['Movie 2 Title', 'Movie 2 rating'],
...
]
And once your array is correctly set, you don't even need to specify a row separator in your CSV options.
The following should do the trick:
require 'open-uri'
require 'nokogiri'
require 'csv'
array = []
10.times do |i|
url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
html_doc.search('.img_side_content').each do |element|
title = element.search('.no_underline').inner_text.strip
notes = element.search('.note').map { |note| note.inner_text }
array << [title, notes].flatten
end
end
puts array
filepath = 'allocine.csv'
csv_options = { force_quotes: true, quote_char: '"' }
CSV.open(filepath, 'w', csv_options) do |csv|
array.each do |item|
csv << item
end
end
( I also took the liberty of changing your for loop to a times, which is more ruby-like ;) )
I am trying to scrape the allocine website as an exercice and my output is the following :
Movie Name
Rating 1 Rating 2
Example :
Coco
4,14,6
Forrest Gump
2,64,6
it should be instead :
Movie Name
Rating 1
Rating 2
Hope you can help me !
require 'open-uri'
require 'nokogiri'
require 'csv'
array = []
for i in 1..10
url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
html_doc.search('.img_side_content').each do |element|
array << element.search('.no_underline').inner_text
array << element.search('.note').inner_text
end
end
puts array
csv_options = { col_sep: ',', force_quotes: true, quote_char: '"' }
filepath = 'allocine.csv'
CSV.open(filepath, 'wb', csv_options) do |csv|
array.each { |item| csv << [item] }
end
You forgot to parse the notes, this is why they appear without a space in the console.
What you can do is to add an each and fill your array like this :
element.search('.note').each do |data|
array << data.inner_text
end