I know how to do it with CSV.read, but CSV.open and enumerator I'm not sure how. Or how do I omit those specific row before loading them in the new_csv[] ?
Thanks!
new_csv = []
CSV.open(file, headers:true) do |unit|
units = unit.each
units.select do |row|
#delete row [0][1][2][3]
new_csv << row
end
Code Example
If you want to skip the first four rows plus the header, this are some options.
Get pure array:
new_csv = CSV.read(filename)[5..]
or keep the csv object
new_csv = []
CSV.open(filename, headers:true) do |csv|
csv.each_with_index do |row, i|
new_csv << row if i > 3
end
end
or using Enumerable#each_with_object:
csv = CSV.open(filename, headers:true)
new_csv = csv.each_with_index.with_object([]) do |(row, i), ary|
ary << row if i > 3
end
Let's begin by creating a CSV file:
contents =<<~END
name,nickname,age
Robert,Bobbie,23
Wilma,Stretch,45
William,Billy-Bob,72
Henrietta,Mama,53
END
FName = 'x.csv'
File.write(FName, contents)
#=> 91
We can use CSV::foreach without a block to return an enumerator.
csv = CSV.foreach(FName, headers:true)
#=> #<Enumerator: CSV:foreach("x.csv", "r", headers: true)>
The enumerator csv generates CSV::ROW objects:
obj = csv.next
#=> #<CSV::Row "name":"Robert" "nickname":"Bobbie" "age":"23">
obj.class
#=> CSV::Row
Before continuing let me Enumerator#rewind csv so that csv.next will once again generate its first element.
csv.rewind
Suppose we wish to skip the first two records. We can do that using Enumerator#next:
2.times { csv.next }
Now continue generating elements with the enumerator, mapping them to an array of hashes:
loop.map { csv.next.to_h }
#=> [{"name"=>"William", "nickname"=>"Billy-Bob", "age"=>"72"},
# {"name"=>"Henrietta", "nickname"=>"Mama", "age"=>"53"}]
See Kernel#loop and CSV::Row#to_h. The enumerator csv raises a StopInteration exception when next invoked after the enumerator has generated its last element. As you see from its doc, loop handles that exception by breaking out of the loop.
loop is a very versatile method. I generally use it in place of while and until, as well as when I need it to handle a StopIteration exception.
If you just want the values, then:
csv.rewind
2.times { csv.next }
loop.with_object([]) { |_,arr| arr << csv.next.map(&:last) }
#=> [["William", "Billy-Bob", "72"],
# ["Henrietta", "Mama", "53"]]
Related
I have a CSV in the following format:
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,codes.1
YK,1234,4567,AB001,AK002
As you can see, this is a nested structure. The CSV may contain multiple rows. I would like to convert this into an array of hashes like this:
[
{
name: 'YK',
contacts: [
{
phone_no: '1234'
},
{
phone_no: '4567'
}
],
codes: ['AB001', 'AK002']
}
]
The structure uses numbers in the given format to represent arrays. There can be hashes inside arrays. Is there a simple way to do that in Ruby?
The CSV headers are dynamic. It can change. I will have to create the hash on the fly based on the CSV file.
There is a similar node library called csvtojson to do that for JavaScript.
Just read and parse it line-by-line. The arr variable in the code below will hold an array of Hash that you need
arr = []
File.readlines('README.md').drop(1).each do |line|
fields = line.split(',').map(&:strip)
hash = { name: fields[0], contacts: [fields[1], fields[2]], address: [fields[3], fields[4]] }
arr.push(hash)
end
Let's first construct a CSV file.
str = <<~END
name,contacts.0.phone_no,contacts.1.phone_no,codes.0,IQ,codes.1
YK,1234,4567,AB001,173,AK002
ER,4321,7654,BA001,81,KA002
END
FName = 't.csv'
File.write(FName, str)
#=> 121
I have constructed a helper method to construct a pattern that will be used to convert each row of the CSV file (following the first, containing the headers) to an element (hash) of the desired array.
require 'csv'
def construct_pattern(csv)
csv.headers.group_by { |col| col[/[^.]+/] }.
transform_values do |arr|
case arr.first.count('.')
when 0
arr.first
when 1
arr
else
key = arr.first[/(?<=\d\.).*/]
arr.map { |v| { key=>v } }
end
end
end
In the code below, for the example being considered:
construct_pattern(csv)
#=> {"name"=>"name",
# "contacts"=>[{"phone_no"=>"contacts.0.phone_no"},
# {"phone_no"=>"contacts.1.phone_no"}],
# "codes"=>["codes.0", "codes.1"],
# "IQ"=>"IQ"}
By tacking if pattern.empty? onto the above expression we ensure the pattern is constructed only once.
We may now construct the desired array.
pattern = {}
CSV.foreach(FName, headers: true).map do |csv|
pattern = construct_pattern(csv) if pattern.empty?
pattern.each_with_object({}) do |(k,v),h|
h[k] =
case v
when Array
case v.first
when Hash
v.map { |g| g.transform_values { |s| csv[s] } }
else
v.map { |s| csv[s] }
end
else
csv[v]
end
end
end
#=> [{"name"=>"YK",
# "contacts"=>[{"phone_no"=>"1234"}, {"phone_no"=>"4567"}],
# "codes"=>["AB001", "AK002"],
# "IQ"=>"173"},
# {"name"=>"ER",
# "contacts"=>[{"phone_no"=>"4321"}, {"phone_no"=>"7654"}],
# "codes"=>["BA001", "KA002"],
# "IQ"=>"81"}]
The CSV methods I've used are documented in CSV. See also Enumerable#group_by and Hash#transform_values.
I am trying to sum 2 matrixes from a CSV file
Currently, I put them into to arrays and then transform the array into matrixes. However, when I print them, I get concatenated strings not summed integers.
require 'csv'
require 'matrix'
matrix1 = "./matrix1.csv"
matrix2 = "./matrix2.csv"
line_count = 0
elements_in_line_count = 0
arr1 = Array.new #=> []
arr2 = Array.new #=> []
CSV.foreach(matrix1) do |row|
arr1 << row
line_count += 1
elements_in_line_count = row.size
end
n1 = elements_in_line_count
m1 = line_count
# find n and m of second matrix
line_count = 0
elements_in_line_count = 0
CSV.foreach(matrix2) do |row|
# print row
arr2 << row
line_count += 1
elements_in_line_count = row.size
end
puts Matrix.rows(arr1) + Matrix.rows(arr2)
For example, CSV 1 is:
1,2
3,4
Same for CSV 2.
The output is
Matrix[[11, 22], [33, 44]]
But I want it to be [2,4],[6,8]
When you read in the CSV, by default it reads in all the rows/columns as strings, the Ruby CSV class can take an optional parameter to foreach and new and similar methods called :converters that it will use to convert each applicable column. One of the converters it can take is
:integer
Converts any field Integer() accepts.
So you can also change your code to look like:
csv_options = { converters: [:integer] }
CSV.foreach(matrix1, csv_options) do |row|
# ...
CSV.foreach(matrix2, csv_options) do |row|
to achieve results similar to calling map(&:to_i) on each row.
[m1, m2].map do |m|
CSV.foreach(m).map { |row| row.map(&:to_i) }
end.reduce do |m1, m2|
m1.map.with_index do |row, idx|
row.zip(m2[idx]).map { |cell1, cell2| cell1 + cell2 }
end
end
When you're reading in the CSV, all columns will be strings, so you'll have to manually do the conversion to a number in the code.
If all of the columns of the CSV are intended to be numbers, you can add .map(&:to_i) to the row line. Like this:
CSV.foreach(matrix1) do |row|
arr1 << row.map(&:to_i) # <-- this will turn everything in the row into a number
line_count += 1
elements_in_line_count = row.size
end
As you want to add matrices, consider using Ruby's built-in Matrix class, and the instance method Matrix#+ in particular.
Let's first construct three CSV files.
fname1 = 't1.csv'
fname2 = 't2.csv'
fname3 = 't3.csv'
File.write(fname1, "1,2\n3,4")
#=> 7
File.write(fname2, "100,200\n300,400")
#=> 15
File.write(fname3, "1000,2000\n3000,4000")
#=> 19
We can sum the underlying matrices as follows.
require 'csv'
require 'matrix'
fnames = [fname1, fname2, fname3]
fnames.drop(1).reduce(matrix_from_CSV(fnames.first)) do |t,fname|
t + matrix_from_CSV(fname)
end.to_a
#=> [[1101, 2202],
# [3303, 4404]]
def matrix_from_CSV(fname)
Matrix[*CSV.read(fname, converters: [:integer])]
end
I borrowed converters: [:integer] from #Simple's answer. I wasn't aware of that.
Here is how I get all the hands
def get_all_hands
doc = Nokogiri::HTML(open('http://www.cardplayer.com/rules-of-poker/hand-rankings'))
hand_hash = {}
hands_array = []
doc.css('div#rules-of-poker-accordion').each do |hands|
hands.css('strong').each do |hand|
hand_hash[:name] = hand.text
end
hands.css('div.rules-cards').each do |hand|
hand_value = []
hand.css('img').each do |card|
hand_value << card.attr('src')
hand_hash[:value] = hand_value
end
end
hands_array << hand_hash
end
hands_array
end
HandScraper.new.get_all_hands
This returns:
[{:name=>"10. High Card",
:value=>
["/packages/cards/Large/Diamond/3-909f8b1571f834c774576c93eae26594.png",
"/packages/cards/Large/Club/J-58b4c0f26e3e0cf8c0772ab3e9e34784.png",
"/packages/cards/Large/Spade/8-60d335b08119f600c3ca02aa58fa902d.png",
"/packages/cards/Large/Heart/4-712ce04b7f2c7e588c48a1e2b46a4244.png",
"/packages/cards/Large/Spade/2-e2d1cee5fc0db0b70990036153d57906.png"]}]
which is the tenth and final hand, when I want it to return all 10.
This particular piece of code is reason why it doesn't work. You are iterating on each strong assigning the value to the same key (:name) of hand_hash. Same is the case with next iteration. Basically, you are overriding the same hash without saving it anywhere, until last iteration.
hands.css('strong').each_with_index do |hand, index|
hand_hash[index] = hand.text
end
I made some changes in your own code to fix this:
doc = Nokogiri::HTML(open('http://www.cardplayer.com/rules-of-poker/hand-rankings'))
hands_array = []
doc.css('div#rules-of-poker-accordion').each do |hands|
hands.css('strong').zip(hands.css('div.rules-cards')).each do |hand, value|
hand_hash = {}
hand_hash[:name] = hand.text
hand_value = []
value.css('img').each do |card|
hand_value << card.attr('src')
hand_hash[:value] = hand_value
end
hands_array << hand_hash #here, now you are saving after each hand
end
end
hands_array
hands.css('strong').zip(hands.css('div.rules-cards')) will pair up each name and rule and then you are just adding that in your hands_array.
Result:
[{:name=>"1. Royal flush", :value=>
["/packages/cards/Large/Diamond/A-49a04aae5e96d2f948dc2062c2c4fcd5.png",
"/packages/cards/Large/Diamond/K-0bfc14d8f58cc13891b108e4178f92f9.png",
"/packages/cards/Large/Diamond/Q-b981aa1f57642480de1dceaf1c2e810f.png",
"/packages/cards/Large/Diamond/J-d915fc38dbca1ca74cdd75dd913de1f3.png",
"/packages/cards/Large/Diamond/T-ef2fe11bbd701e4c5b6681e506271700.png"]},
{:name=>"2. Straight flush"}, {:name=>"3. Four of a kind", :value=>
["/packages/cards/Large/Heart/J-2bf19067cda29391286416d0d00646d6.png",
"/packages/cards/Large/Diamond/J-d915fc38dbca1ca74cdd75dd913de1f3.png",
"/packages/cards/Large/Spade/J-fff29c49da8ca1f7a272c5ac83f51d06.png",
"/packages/cards/Large/Club/J-58b4c0f26e3e0cf8c0772ab3e9e34784.png",
"/packages/cards/Large/Diamond/7-7e507c2122efe10ed7abacab95edff97.png"]},
{:name=>"4. Full house", :value=>
["/packages/cards/Large/Heart/T-c3f8fd4ffc3e09ec705a817aa212dc86.png",
"/packages/cards/Large/Diamond/T-ef2fe11bbd701e4c5b6681e506271700.png",
"/packages/cards/Large/Spade/T-9a16f63a333b3edeb50c4372f8dd9883.png",
"/packages/cards/Large/Club/9-e6f0020a48aef9907b626477c5a60ac2.png",
"/packages/cards/Large/Diamond/9-3e500833bafc81a708d195f16d005125.png"]},
{:name=>"5. Flush", :value=>
["/packages/cards/Large/Spade/4-4200c8b5f3f5ba04d9fd5a69d71dab2f.png",
"/packages/cards/Large/Spade/J-fff29c49da8ca1f7a272c5ac83f51d06.png",
"/packages/cards/Large/Spade/8-60d335b08119f600c3ca02aa58fa902d.png",
"/packages/cards/Large/Spade/2-e2d1cee5fc0db0b70990036153d57906.png",
"/packages/cards/Large/Spade/9-b0d71e77734375ceb3954156232f1f2d.png"]},
{:name=>"6. Straight", :value=>
["/packages/cards/Large/Club/9-e6f0020a48aef9907b626477c5a60ac2.png",
"/packages/cards/Large/Diamond/8-6cd5b3025be0dd56cd52dfd2a49d922d.png",
"/packages/cards/Large/Spade/7-6c1d119e9c923f8e4773cf00d05e26d6.png",
"/packages/cards/Large/Diamond/6-a0c0218210a1a6c4ec17e5cec17ee3d8.png",
"/packages/cards/Large/Heart/5-f498916a3011c2b7199e1c1008dbe330.png"]},
{:name=>"7. Three of a kind", :value=>
["/packages/cards/Large/Club/7-5610625720208cc02c1107c91365eb37.png",
"/packages/cards/Large/Diamond/7-7e507c2122efe10ed7abacab95edff97.png",
"/packages/cards/Large/Spade/7-6c1d119e9c923f8e4773cf00d05e26d6.png",
"/packages/cards/Large/Club/K-3e8312c33de4718943cd0276de8a16a1.png",
"/packages/cards/Large/Diamond/3-909f8b1571f834c774576c93eae26594.png"]},
{:name=>"8. Two pair", :value=>
["/packages/cards/Large/Club/4-33a9251d25da1ea2ba49e69e94549aee.png",
"/packages/cards/Large/Spade/4-4200c8b5f3f5ba04d9fd5a69d71dab2f.png",
"/packages/cards/Large/Club/3-0c3eda54cfb6808b0a94950c045e497a.png",
"/packages/cards/Large/Diamond/3-909f8b1571f834c774576c93eae26594.png",
"/packages/cards/Large/Club/Q-9fcc4fd7692aa96ba9fcb04fa9fd727d.png"]},
{:name=>"9. Pair", :value=>
["/packages/cards/Large/Heart/A-748f3f87f79ac475e6a432750725b64c.png",
"/packages/cards/Large/Diamond/A-49a04aae5e96d2f948dc2062c2c4fcd5.png",
"/packages/cards/Large/Club/8-c3708e4821723f1100d514e5280b3f32.png",
"/packages/cards/Large/Spade/4-4200c8b5f3f5ba04d9fd5a69d71dab2f.png",
"/packages/cards/Large/Heart/7-1610ff3e74c68f6dd8a855bd16887457.png"]},
{:name=>"10. High Card", :value=>
["/packages/cards/Large/Diamond/3-909f8b1571f834c774576c93eae26594.png",
"/packages/cards/Large/Club/J-58b4c0f26e3e0cf8c0772ab3e9e34784.png",
"/packages/cards/Large/Spade/8-60d335b08119f600c3ca02aa58fa902d.png",
"/packages/cards/Large/Heart/4-712ce04b7f2c7e588c48a1e2b46a4244.png",
"/packages/cards/Large/Spade/2-e2d1cee5fc0db0b70990036153d57906.png"]}]
Hope it helps :)
The doc.css('div#rules-of-poker-accordion') call is returning a single div, as that is how the page is structured. Because of that, you are only actually entering the each loop once.
The layout of that site is a bit funny, so you'll have to get the names and the values separately. Here is a brute force solution...
def get_all_hands
doc = Nokogiri::HTML(open('http://www.cardplayer.com/rules-of-poker/hand-rankings'))
hands_array = []
doc.css('div#rules-of-poker-accordion').css("strong").each do |name|
hands_array.push({name: name.text, value: []})
end
doc.css('div#rules-of-poker-accordion').css(".rules-cards").each_with_index do |hand, i|
hand.css('img').each do |card|
hands_array[i][:value].push card.attr('src')
end
end
hands_array
end
The answer by #kiddorails is correct, but you can be much more concise and Ruby-idiomatic in this problem. Consider the following, which gives the same result:
def get_all_hands
doc = Nokogiri::HTML(open('http://www.cardplayer.com/rules-of-poker/hand-rankings'))
hands = doc.css('div#rules-of-poker-accordion').first
hands.css('strong').zip(hands.css('div.rules-cards')).map do |hand, value|
{name: hand.text, value: value.css('img').map { |card| card.attr('src') }}
end
end
Ruby has strong methods for handling and transforming Arrays. Once you have an Array, you can transform it using one of Ruby's many expressive Enumerable methods. #kiddorails started down this path by using #zip to join two arrays together, and the above refactoring finishes the job by taking the resulting array of arrays and transforming it into an array of hashes using #map.
In Ruby, anytime you find yourself writing this pattern:
result = []
array.each { |element| result << method(element) }
result
You can generally replace those three lines with:
array.map { |element| method(element) }
I'm trying to parse a csv file with 16 columns to 16 separate arrays. I need each cell to be another object in the array. So the values in column 1 becomes arr1, column 2 becomes arr2, etc. This is the code I have so far:
file = "FS_Email_Test.csv"
arr1 = []
arr2 = []
arr3 = []
list =CSV.foreach(file, :col_sep => ";", :return_headers => false) do |row|
arr1 << row[0].to_i
arr2 << row[1].to_i
arr3 << row[2].to_s
end
puts arr1
This code correctly parses column1 into arr1, but it returns 0 values for arr2 and arr3. I need it to work for each column. Ideas/thoughts? Thanks for the help.
Problem solved. There was an issue with the .to_i and .to_s on the end of the arrays. I took that piece off and the code works just fine. Thanks for the help.
Code:
file = "FS_Email_Test.csv"
arr1 = []
arr2 = []
arr3 = []
list =CSV.foreach(file, :col_sep => ";", :return_headers => false) do |row|
arr1 << row[0]
arr2 << row[1]
arr3 << row[2]
end
puts arr1
I have array which I read from excel (using ParseExcel) using the following code:
workbook = Spreadsheet::ParseExcel.parse("test.xls")
rows = workbook.worksheet(1).map() { |r| r }.compact
grid = rows.map() { |r| r.map() { |c| c.to_s('latin1') unless c.nil?}.compact rescue nil }
grid.sort_by { |k| k[2]}
test.xls has lots of rows and 6 columns. The code above sort by column 3.
I would like to output rows in array "grid" to many text file like this:
- After sorting, I want to print out all the rows where column 3 have the same value into one file and so on for a different file for other same value in column3.
Hope I explain this right. Thanks for any help/tips.
ps.
I search through most posting on this site but could not find any solution.
instead of using your above code, I made a test 100-row array, each row containing a 6-element array.
You pass in the array, and the column number you want matched, and this method prints into separate files rows that have the same nth element.
Since I used integers, I used the nth element of each row as the filename. You could use a counter, or the md5 of the element, or something like that, if your nth element does not make a good filename.
a = []
100.times do
b = []
6.times do
b.push rand(10)
end
a.push(b)
end
def print_files(a, column)
h = Hash.new
a.each do |element|
h[element[2]] ? (h[element[column]] = h[element[column]].push(element)) : (h[element[column]] = [element])
end
h.each do |k, v|
File.open("output/" + k.to_s, 'w') do |f|
v.each do |line|
f.puts line.join(", ")
end
end
end
end
print_files(a, 2)
Here is the same code using blocks instead of do .. end:
a = Array.new
100.times{b = Array.new;6.times{b.push rand(10)};a.push(b)}
def print_files(a, column)
h = Hash.new
a.each{|element| h[element[2]] ? (h[element[column]] = h[element[column]].push(element)) : (h[element[column]] = [element])}
h.map{|k, v| File.open("output/" + k.to_s, 'w'){|f| v.map{|line| f.puts line.join(", ")}}}
end
print_files(a, 2)