reading only the second row of an xlsx file using roo - ruby

I am using roo to read an xlsx file.
book = Roo::Spreadsheet.open("codes.xlsx")
sheet = book.sheet('Sheet1')
This reads all the rows in the file.
I can iterate through this sheet using
plans_sheet.each(column_1: 'column_1', column_2: 'column_2') do |hash|
But this iteration happens from the first row which has all the column names as well.
I need to take only the second row data.
Update -
when you do row(2) it returns you an array. And when you are iterating using .each it returns you a hash which has column names as your key .
How to do that.

Roo::Excelx#each method returns standard Ruby Enumerator:
enum = sheet.each(column_1: 'column_1', column_2: 'column_2')
enum.class # => Enumerator
So there are two ways to achieve your goal:
1) Use drop method:
enum.drop(1).each do |hash|
# do something with the hash
end
If you need only the second row:
hash = enum.drop(1).first
2) Move the internal position of the iterator:
enum.next # move 1 step forward to skip the first row
# continue moving
loop do
hash = enum.next
# do something with the hash
end
If you need only the second row:
enum.next # skip the first row
hash = enum.next # second row
Also take into account:
There isRoo::Excelx::Sheet class which also represents a worksheet and has the each_row method that receives :offset parameter. But unfortunately it doesn't have an option to transform a row into a hash with given keys.
book = Roo::Spreadsheet.open("codes.xlsx")
sheet = book.sheet_for('Sheet1')
sheet.each_row(offset: 1) do |row| # skip first row
# do something with the row
end

If you want to iterate over rows starting from the second one, just use Enumerable#drop
plans_sheet.each(column_1: 'column_1', column_2: 'column_2').drop(1) do |hash|
# ...
end

Just do
whatever_sheet.row(2)
It is always good to have a look at the documentation, it is explained in the README.md of the Gem, see https://github.com/roo-rb/roo#accessing-rows-and-columns

Related

Ruby: How to iterate through a hash created from a csv file

I am trying to take an existing CSV file, add a fourth row to it, and then iterate through the second and third row to create the fourth rows values. Using Ruby I've created hashes where the headers are the keys and the column values are the hash values (ex: "id"=>"1", "new_fruit" => "apple")
My practice CSV file looks like this:practice csv file image
My goal is to create a fourth column: "brand_new" (which I was able to do) and then add values to it by concatenating the values from the second and third row (which I am stuck on). At the moment I just have a placement value (x) for the fourth columns values so I could see if adding the fourth column to the hash actually worked: Results with x = 1
Here is my code:
require 'csv'
def self.import
table = []
CSV.foreach(File.path("practice.csv"), headers: true) do |row|
table.each do |row|
row["brand_new"] = full_name
end
table << row.to_h
end
table
end
def full_name
x = 1
return x
end
# Add another col, row by row:
import.each do |row|
row["brand_new"] = full_name
end
puts import
Any suggestions or guidance would be much appreciated. Thank you.
Simplified your code a bit. I read the file first, then iterate about the read content.
Note: Change col_sep to comma or delete it to use the default if needed.
require "csv"
def self.import
table = CSV.read("practice.csv", headers: true , col_sep: ";")
table.each do |row|
row["brand_new"] = "#{row["old_fruit"]} #{row["new_fruit"]}"
end
puts table
end
I use the read method to read the CSV file content. It allows you to directly access the column/cell values.
Line 7 shows how to concatenate the column values as string:
"#{row["old_fruit"]} #{row["new_fruit"]}"
Refer to this old SO post and the CSV Ruby docs to learn more about working with CSV files.

CSV iteration in Ruby, and grouping by column value to get last line of each group

I have a csv of transaction data, with columns like:
ID,Name,Transaction Value,Running Total,
5,mike,5,5,
5,mike,2,7,
20,bob,1,1,
20,bob,15,16,
1,jane,4,4,
etc...
I need to loop through every line and do something with the transaction value, and do something different when I get to the last line of each ID.
I currently do something like this:
total = ""
id = ""
idHold = ""
totalHold = ""
CSV.foreach(csvFile) do |row|
totalHold = total
idHold = id
id = row[0]
value = row[2]
total = row[3]
if id != idHold
# do stuff with the totalHold here
end
end
But this has a problem - it skips the last line. Also, something about it doesn't feel right. I feel like there should be a better way of detecting the last line of an 'ID'.
Is there a way of grouping the id's and then detecting the last item in the id group?
note: all id's are grouped together in the csv
Let's first construct a CSV file.
str =<<~END
ID,Name,Transaction Value,Running Total
5,mike,5,5
5,mike,2,7
20,bob,1,1
20,bob,15,16
1,jane,4,4
END
CSVFile = 't.csv'
File.write(CSVFile, str)
#=> 107
I will first create a method that takes two arguments: an instance of CSV::row and a boolean to indicate whether the CSV row is the last of the group (true if it is).
def process_row(row, is_last)
puts "Do something with row #{row}"
puts "last row: #{is_last}"
end
This method would of course be modified to perform whatever operations need be performed for each row.
Below are three ways to process the file. All three use the method CSV::foreach to read the file line-by-line. This method is called with two arguments, the file name and an options hash { header: true, converters: :numeric } that indicates that the first line of the file is a header row and that strings representing numbers are to be converted to the appropriate numeric object. Here values for "ID", "Transaction Value" and "Running Total" will be converted to integers.
Though it is not mentioned in the doc, when foreach is called without a block it returns an enumerator (in the same way that IO::foreach does).
We of course need:
require 'csv'
Chain foreach to Enumerable#chunk
I have chosen to use chunk, as opposed to Enumerable#group_by, because the lines of the file are already grouped by ID.
CSV.foreach(CSVFile, headers:true, converters: :numeric).
chunk { |row| row['ID'] }.
each do |_,(*arr, last_row)|
arr.each { |row| process_row(row, false) }
process_row(last_row, true)
end
displays
Do something with row 5,mike,5,5
last row: false
Do something with row 5,mike,2,7
last row: true
Do something with row 20,bob,1,1
last row: false
Do something with row 20,bob,15,16
last row: true
Do something with row 1,jane,4,4
last row: true
Note that
enum = CSV.foreach(CSVFile, headers:true, converters: :numeric).
chunk { |row| row['ID'] }.
each
#=> #<Enumerator: #<Enumerator::Generator:0x00007ffd1a831070>:each>
Each element generated by this enumerator is passed to the block and the block variables are assigned values by a process called array decomposition:
_,(*arr,last_row) = enum.next
#=> [5, [#<CSV::Row "ID":5 "Name":"mike" "Transaction Value":5 "Running Total ":5>,
# #<CSV::Row "ID":5 "Name":"mike" "Transaction Value":2 "Running Total ":7>]]
resulting in the following:
_ #=> 5
arr
#=> [#<CSV::Row "ID":5 "Name":"mike" "Transaction Value":5 "Running Total ":5>]
last_row
#=> #<CSV::Row "ID":5 "Name":"mike" "Transaction Value":2 "Running Total ":7>
See Enumerator#next.
I have followed the convention of using an underscore for block variables that are used in the block calculation (to alert readers of your code). Note that an underscore is a valid block variable.1
Use Enumerable#slice_when in place of chunk
CSV.foreach(CSVFile, headers:true, converters: :numeric).
slice_when { |row1,row2| row1['ID'] != row2['ID'] }.
each do |*arr, last_row|
arr.each { |row| process_row(row, false) }
process_row(last_row, true)
end
This displays the same information that is produced when chunk is used.
Use Kernel#loop to step through the enumerator CSV.foreach(CSVFile, headers:true)
enum = CSV.foreach(CSVFile, headers:true, converters: :numeric)
row = nil
loop do
row = enum.next
next_row = enum.peek
process_row(row, row['ID'] != next_row['ID'])
end
process_row(row, true)
This displays the same information that is produced when chunk is used. See Enumerator#next and Enumerator#peek.
After enum.next returns the last CSV::Row object enum.peek will generate a StopIteration exception. As explained in its doc, loop handles that exception by breaking out of the loop. row must be initialized to an arbitrary value before entering the loop so that row is visible after the loop terminates. At that time row will contain the CSV::Row object for the last line of the file.
1 IRB uses the underscore for its own purposes, resulting in the block variable _ being assigned an erroneous value when the code above is run.
Yes.. ruby has got your back.
grouped = CSV.table('./test.csv').group_by { |r| r[:id] }
# Then process the rows of each group individually:
grouped.map { |id, rows|
puts [id, rows.length ]
}
Tip: You can access each row as a hash by using CSV.table
CSV.table('./test.csv').first[:name]
=> "mike"

Why am i getting the error `stack level too deep (systemstackerror)` when hashing 2 columns in a CSV?

This is my code, which is supposed to hash the 2 columns in fotoFd.csv and then save the hashed columns in a separate file, T4Friendship.csv:
require "csv"
arrayUser=[]
arrayUserUnique=[]
arrayFriends=[]
fileLink = "fotoFd.csv"
f = File.open(fileLink, "r")
f.each_line { |line|
row = line.split(",");
arrayUser<<row[0]
arrayFriends<<row[1]
}
arrayUserUnique = arrayUser.uniq
arrayHash = []
for i in 0..arrayUser.size-1
arrayHash<<arrayUser[i]
arrayHash<<i
end
hash = Hash[arrayHash.each_slice(2).to_a]
array1 =hash.values_at *arrayUser
array2 =hash.values_at *arrayFriends
fileLink = "T4Friendship.csv"
for i in 0..array1.size-1
logfile = File.new(fileLink,"a")
logfile.print("#{array1[i]},#{array2[i]}\n")
logfile.close
end
The first columns contains users, and the second column contains their friends. So, I want it to produce something like this in the T4Friendship.csv:
1 2
1 4
1 10
1 35
2 1
2 8
2 11
3 28
3 31
...
The problem is caused by the splat expansion of a large array. The splat * can be used to expand an array as a parameter list. The parameters are passed on the stack. If there are too many parameters, you'll exhaust stack space and get the mentioned error.
Here's a quick demo of the problem in irb that tries to splat an array of one million elements when calling puts:
irb
irb(main):001:0> a = [0] * 1000000; nil # Use nil to suppress statement output
=> nil
irb(main):002:0> puts *a
SystemStackError: stack level too deep
from /usr/lib/ruby/1.9.1/irb/workspace.rb:80
Maybe IRB bug!
irb(main):003:0>
You seem to be processing large CSV files, and so your arrayUser array is quite large. Expanding the large array with the splat causes the problem on the line:
array1 =hash.values_at *arrayUser
You can avoid the splat by calling map on arrayUser, and converting each value in a block:
array1 = arrayUser.map{ |user| hash[user] }
Suggested Code
Your code appears to map names to unique ID numbers. The output appears to be the same format as the input, except with the names translated to ID numbers. You can do this without keeping any arrays around eating up memory, and just use a single hash built up during read, and used to translate the names to numbers on the fly. The code would look like this:
def convertCsvNamesToNums(inputFileName, outputFileName)
# Create unique ID number hash
# When unknown key is lookedup, it is added with new unique ID number
# Produces a 0 based index
nameHash = Hash.new { |hash, key| hash[key] = hash.size }
# Convert input CSV with names to output CSV with ID numbers
File.open(inputFileName, "r") do |inputFile|
File.open(outputFileName, 'w') do |outputFile|
inputFile.each_line do |line|
# Parse names from input CSV
userName, friendName = line.split(",")
# Map names to unique ID numbers
userNum = nameHash[userName]
friendNum = nameHash[friendName]
# Write unique ID numbers to output CSV
outputFile.puts "#{userNum}, #{friendNum}"
end
end
end
end
convertCsvNamesToNums("fotoFd.csv", "T4Friendship.csv")
Note: This code assigns ID numbers to user and friends, as they are encountered. Your previous code assigned ID numbers to users only, and then looked up the friends after. The code I suggested will ensure friends are assigned ID numbers, even if they never appeared in the user list. The numerical ordering will different slightly from what you supplied, but I assume that is not important.
You can also shorten the body of the inner loop to:
# Parse names from input, map to ID numbers, and write to output
outputFile.puts line.split(",").map{|name| nameHash[name]}.join(',')
I thought I'd include this change separately for readability.
Updated Code
As per your request in the comments, here is code that gives priority to the user column for ID numbers. Only once the first column is completely processed will ID numbers be assigned to entries in the second column. It does this by first passing over the input once, adding the first column to the hash, and then passing over the input a second time to process it as before, using the pre-prepared hash from the first pass. New entries can still be added in the second pass in the case where the friend column contains a new entry that doesn't exist anywhere in the user column.
def convertCsvNamesToNums(inputFileName, outputFileName)
# Create unique ID number hash
# When unknown key is lookedup, it is added with new unique ID number
# Produces a 0 based index
nameHash = Hash.new { |hash, key| hash[key] = hash.size }
# Pass over the data once to give priority to user column for ID numbers
File.open(inputFileName, "r") do |inputFile|
inputFile.each_line do |line|
name, = line.split(",") # Parse name from line, ignore the rest
nameHash[name] # Add name to unique ID number hash (if it doesn't already exist)
end
end
# Convert input CSV with names to output CSV with ID numbers
File.open(inputFileName, "r") do |inputFile|
File.open(outputFileName, 'w') do |outputFile|
inputFile.each_line do |line|
# Parse names from input, map to ID numbers, and write to output
outputFile.puts line.split(",").map{|name| nameHash[name]}.join(',')
end
end
end
end
convertCsvNamesToNums("fotoFd.csv", "T4Friendship.csv")

Pushing to an array not working as expected

When I execute the code below, my array 'tasks' ends up with the same last row from the dbi call repeated for each row in the database.
require 'dbi'
require 'PP'
dbh = DBI.connect('DBI:SQLite3:test', 'test', '')
dbh.do("DROP TABLE IF EXISTS TASK;")
dbh.do("CREATE TABLE TASK(ID INT, NAME VARCHAR(20))")
# Insert two rows
1.upto(2) do |i|
sql = "INSERT INTO TASK (ID, NAME) VALUES (?, ?)"
dbh.do(sql, i, "Task #{i}")
end
sth = dbh.prepare('select * from TASK')
sth.execute
tasks = Array.new
while row=sth.fetch do
p row
p row.object_id
tasks.push(row)
end
pp(tasks)
sth.finish
So if I have two rows in my TASK table, then instead of getting this in the tasks array:
[[1, "Task 1"], [2, "Task 2"]]
I get this
[[2, "Task 2"], [2, "Task 2"]]
The full output looks like this:
[1, "Task 1"]
19877028
[2, "Task 2"]
19876728
[[2, "Task 2"], [2, "Task 2"]]
What am I doing wrong?
It seems there are some strange behavior in row objects wich seems to be some kind of singleton, and that's why dup method wont solve it.
Jumping into the source code it seems that the to_a method will duplicate the inner row elements and that's why it works so the answer is to use to_a on the row object or if you want you can also transform it into a Hash to preserve meta.
while row=sth.fetch do
tasks.push(row.to_a)
end
But I recommend the more ruby way
sth.fetch do |row|
tasks << row.to_a
end
Are you sure you have copied your code exactly as it is ? AFAIK the code you have written shouldn't work at all... You mix two constructs that are not intended to be used that way.
Am i wrong to assume that you come from a C or Java background ? Iteration in ruby is very different, let me try to explain.
A while loop in ruby has this structure :
while condition
# code to be executed as long as condition is true
end
A method with a block has this structure :
sth.fetch do |element|
# code to be executed once per element in the sth collection
end
Now there something really important to understand : fetch, or any other method of this kind in ruby, is not an iterator as you would encounter in C for example - you do not have to call it again an again until the iterator hits the end of the collection.
You just call it once, and give it a block as argument, which is a kind of anonymous function (as in javascript). The fetch method will then pass ("yield") each element of the collection, one after another, to this block.
So the correct syntax in your case should be :
sth.fetch do |row|
p row
tasks.push row
end
which could be otherwise written like this, in a more "old school" fashion :
# define a function
# = this is your block
def process( row )
p row
tasks.push row
end
# pass each element of a collection to this function
# = this is done inside the fetch method
for row in sth
process row
end
I would advise you to read more on blocks / procs / lambdas, because they are all over the place in ruby, and IMHO are one of the reasons this language is so awesome. Iterators is just the beginning, you can do a LOT more with these...If you need good reference docs, the pickaxe is considered one of the best sources among rubyists, and i can tell you more if you want.
I don't know how your code works entirely, but I guess if you change tasks.push(row) into tasks.push(row.dup), then it shall work. If that is the case, then sth.fetch keeps giving you the same array (same object id) each time even if its content is renewed, and you are pushing the same array into tasks repeatedly.
There are so many things that can be happening but try this.
First ensuring the block is passed to the while using parens.
while (row=sth.fetch) do
p row
tasks.push(row)
end
Then the idiomatic ruby way
sth.fetch do |row|
p row
tasks << row # same as push
end

hashes ruby merge

My txt file contains a few lines and i want to add each line to a hash with key as first 2 words and value as 3rd word...The following code has no errors but the logic may be wrong...last line is supposed to print all the keys of the hash...but nothing happens...pls help
def word_count(string)
count = string.count(' ')
return count
end
h = Hash.new
f = File.open('sheet.txt','r')
f.each_line do |line|
count = word_count(line)
if count == 3
a = line.split
h.merge(a[0]+a[1] => a[2])
end
end
puts h.keys
Hash#merge doesn't modify the hash you call it on, it returns the merged Hash:
merge(other_hash) → new_hash
Returns a new hash containing the contents of other_hash and the contents of hsh. [...]
Note the Returns a new hash... part. When you say this:
h.merge(a[0]+a[1] => a[2])
You're merge the new values you built into a copy of h and then throwing away the merged hash; the end result is that h never gets anything added to it and ends up being empty after all your work.
You want to use merge! to modify the Hash:
h.merge!(a[0]+a[1] => a[2])
or keep using merge but save the return value:
h = h.merge(a[0]+a[1] => a[2])
or, since you're only adding a single value, just assign it:
h[a[0] + a[1]] = a[2]
If you want to add the first three words of each line to the hash, regardless of how many words there are, then you can drop the if count == 3 line. Or you can change it to if count > 2 if you want to make sure that there are at least three words.
Also, mu is correct. You'll want h.merge!

Resources