I'm trying to take an excel file and put it into Postgres. I can access the file and read rows from it.
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet.open 'C:\test\RubyTestFile.xls'
sheet1 = book.worksheet 0
$test_array = []
sheet1.each do |row|
$test_array += row
end
print $test_array
My problem is that it won't read null values. Is there a method to grab say 3 columns of every row? Should I handle this when I upload to postgres instead? Is there a better way of doing this? I tried searching but couldn't find anything.
Here's a slightly more Ruby interpretation:
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'
def read_spreadsheet(path)
book = Spreadsheet.open(path)
sheet1 = book.worksheet 0
test_array = [ ]
sheet1.each do |row|
test_array << (row + [ nil ] * 3).first(3)
end
test_array
end
puts read_spreadsheet('C:\test\RubyTestFile.xls').inspect
If you'd rather have literal 'null' in there, you can substitute that for the nil in the array there.
Related
I am trying to store the results from my scrapping exercice into a CSV file.
The current CSV file gives me the following output :
Name of Movie 1
Rating 1
Name of Movie 2
Rating 2
I would like to get the following output :
Name of Movie 1 Rating 1
Name of Movie 2 Rating 2
Here is my code, I guess it has to deal with the row / column separator :
require 'open-uri'
require 'nokogiri'
require 'csv'
array = []
for i in 1..10
url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
html_doc.search('.img_side_content').each do |element|
array << element.search('.no_underline').inner_text
element.search('.note').each do |data|
array << data.inner_text
end
end
end
puts array
csv_options = { row_sep: ',', force_quotes: true, quote_char: '"' }
filepath = 'allocine.csv'
CSV.open(filepath, 'wb', csv_options) do |csv|
array.each { |item| csv << [item] }
end
I think the problem here is that you are not pushing the elements correctly into your array variable. Basically, your array ends up looking like this:
['Movie 1 Title', 'Movie 1 rating', 'Movie 2 Title', 'Movie 2 rating', ...]
What you actually want is an array of arrays, like so:
[
['Movie 1 Title', 'Movie 1 rating'],
['Movie 2 Title', 'Movie 2 rating'],
...
]
And once your array is correctly set, you don't even need to specify a row separator in your CSV options.
The following should do the trick:
require 'open-uri'
require 'nokogiri'
require 'csv'
array = []
10.times do |i|
url = "http://www.allocine.fr/film/meilleurs//?page=#{i}"
html_file = open(url).read
html_doc = Nokogiri::HTML(html_file)
html_doc.search('.img_side_content').each do |element|
title = element.search('.no_underline').inner_text.strip
notes = element.search('.note').map { |note| note.inner_text }
array << [title, notes].flatten
end
end
puts array
filepath = 'allocine.csv'
csv_options = { force_quotes: true, quote_char: '"' }
CSV.open(filepath, 'w', csv_options) do |csv|
array.each do |item|
csv << item
end
end
( I also took the liberty of changing your for loop to a times, which is more ruby-like ;) )
I am trying to sum 2 matrixes from a CSV file
Currently, I put them into to arrays and then transform the array into matrixes. However, when I print them, I get concatenated strings not summed integers.
require 'csv'
require 'matrix'
matrix1 = "./matrix1.csv"
matrix2 = "./matrix2.csv"
line_count = 0
elements_in_line_count = 0
arr1 = Array.new #=> []
arr2 = Array.new #=> []
CSV.foreach(matrix1) do |row|
arr1 << row
line_count += 1
elements_in_line_count = row.size
end
n1 = elements_in_line_count
m1 = line_count
# find n and m of second matrix
line_count = 0
elements_in_line_count = 0
CSV.foreach(matrix2) do |row|
# print row
arr2 << row
line_count += 1
elements_in_line_count = row.size
end
puts Matrix.rows(arr1) + Matrix.rows(arr2)
For example, CSV 1 is:
1,2
3,4
Same for CSV 2.
The output is
Matrix[[11, 22], [33, 44]]
But I want it to be [2,4],[6,8]
When you read in the CSV, by default it reads in all the rows/columns as strings, the Ruby CSV class can take an optional parameter to foreach and new and similar methods called :converters that it will use to convert each applicable column. One of the converters it can take is
:integer
Converts any field Integer() accepts.
So you can also change your code to look like:
csv_options = { converters: [:integer] }
CSV.foreach(matrix1, csv_options) do |row|
# ...
CSV.foreach(matrix2, csv_options) do |row|
to achieve results similar to calling map(&:to_i) on each row.
[m1, m2].map do |m|
CSV.foreach(m).map { |row| row.map(&:to_i) }
end.reduce do |m1, m2|
m1.map.with_index do |row, idx|
row.zip(m2[idx]).map { |cell1, cell2| cell1 + cell2 }
end
end
When you're reading in the CSV, all columns will be strings, so you'll have to manually do the conversion to a number in the code.
If all of the columns of the CSV are intended to be numbers, you can add .map(&:to_i) to the row line. Like this:
CSV.foreach(matrix1) do |row|
arr1 << row.map(&:to_i) # <-- this will turn everything in the row into a number
line_count += 1
elements_in_line_count = row.size
end
As you want to add matrices, consider using Ruby's built-in Matrix class, and the instance method Matrix#+ in particular.
Let's first construct three CSV files.
fname1 = 't1.csv'
fname2 = 't2.csv'
fname3 = 't3.csv'
File.write(fname1, "1,2\n3,4")
#=> 7
File.write(fname2, "100,200\n300,400")
#=> 15
File.write(fname3, "1000,2000\n3000,4000")
#=> 19
We can sum the underlying matrices as follows.
require 'csv'
require 'matrix'
fnames = [fname1, fname2, fname3]
fnames.drop(1).reduce(matrix_from_CSV(fnames.first)) do |t,fname|
t + matrix_from_CSV(fname)
end.to_a
#=> [[1101, 2202],
# [3303, 4404]]
def matrix_from_CSV(fname)
Matrix[*CSV.read(fname, converters: [:integer])]
end
I borrowed converters: [:integer] from #Simple's answer. I wasn't aware of that.
I am using the following code to convert a csv to spreadsheet:
require 'spreadsheet'
require 'csv'
book = Spreadsheet::Workbook.new
sheet1 = book.create_worksheet
CSV.open("product_2014-11-19_10-41-00.csv", 'r') do |csv|
csv.each_with_index do |row, i|
sheet1.row(i).replace(row)
end
end
book.write("temp.xls")
But on doing so, the spreadsheet contains a leading quote for columns that hold integer values. Eg consider the row SGDEL,18,,,,140,0,Bib columns corresponding to 18, 140 and 0 become '18, '140, '0. Why is this so? How can I fix this?
Thanks
There is an option for CSV#open:
# vvvvvvvvvvvv
CSV.open("…", "…", { force_quotes: false }) do |csv|
Setting it to false should do the trick.
I am extracting data from excel using spreadsheet gem in ruby and it is working good. This is the code which does it
require 'spreadsheet'
require 'open-uri'
url = "Linio_batch1_semantic_24092014.xls"
book = nil
a1 = Array.new
a2 = Array.new
open url do |f|
book = Spreadsheet.open f
end
book.worksheets.each do |sheet|
#puts "Sheet called #{sheet.name} has #{sheet.row_count} rows and #{sheet.column_count} columns"
s = sheet.column(5)
s.each do |m|
a1 << m
end
s = sheet.column(6)
s.each do |n|
a2 << n
end
end
I am storing the results in an array. I don't know how to write the results to another new spreadsheet. I need help to write the array results to a new spreadsheet.
You can write something like following
require 'spreadsheet'
book = Spreadsheet::Workbook.new
sheet1 = book.create_worksheet :name => 'test'
sheet1.row(0).push "just text","another text"
book.write 'test.xls'
You can also refer to this page or this page
This is the link of a XLS file. I am trying to use Spreadsheet gem to extract the contents of the XLS file. In particular, I want to collect all the column headers like (Year, Gross National Product etc.). But, the issue is they are not in the same row. For example, Gross National Income comprised of three rows. I also want to know how many row cells are merged to make the cell 'Year'.
I have started writing the program and I am upto this:
require 'rubygems'
require 'open-uri'
require 'spreadsheet'
rows = Array.new
url = 'http://www.stats.gov.cn/tjsj/ndsj/2012/html/C0201e.xls'
doc = Spreadsheet.open (open(url))
sheet1 = doc.worksheet 0
sheet1.each do |row|
if row.is_a? Spreadsheet::Formula
# puts row.value
rows << row.value
else
# puts row
rows << row
end
# puts row.value
end
But, now I am stuck and really need some guideline to proceed. Any kind of help is well appreciated.
require 'rubygems'
require 'open-uri'
require 'spreadsheet'
rows = Array.new
temp_rows = Array.new
column_headers = Array.new
index = 0
url = 'http://www.stats.gov.cn/tjsj/ndsj/2012/html/C0201e.xls'
doc = Spreadsheet.open (open(url))
sheet1 = doc.worksheet 0
sheet1.each do |row|
rows << row.to_a
end
rows.each_with_index do |row,ind|
if row[0]=="Year"
index = ind
break
end
end
(index..7).each do |i|
# puts rows[i].inspect
if rows[i][0] =~ /[0-9]/
break
else
temp_rows << rows[i]
end
end
col_size = temp_rows[0].size
# puts temp_rows.inspect
col_size.times do |c|
temp_str = ""
temp_rows.each do |row|
temp_str +=' '+ row[c] unless row[c].nil?
end
# puts temp_str.inspect
column_headers << temp_str unless temp_str.nil?
end
puts 'Column Headers of this xls file are : '
# puts column_headers.inspect
column_headers.each do |col|
puts col.strip.inspect if col.length >1
end