Why does exporting data to CSV give only numbers? - ruby

I am trying to export a mongo strucuture to CSV with the following code:
file = Tempfile.new(['genreport','.csv'],file_path)
file_name = file.path()
CSV.open(file_name,"w") do |csv|
result_cursor.each do |eachdoc|
eachdoc.each do |key,value|
csv<<key.to_s
csv<<value.to_s
end
csv<<"\n"
end
end
The CSV file is created as expected, but it is full of numbers only. What am I doing wrong?
Here are the types:
result_cursor is a mongo cursor, eachdoc will be a hash, and key and value will be a String.

I'm not sure how your code differs or alters the context of whats posted, but when I try to run the code as is, I get an exception (undefined method `map' for 'value of key'). However, when I do this, it works fine.
file = Tempfile.new(['genreport','.csv'],file_path)
file_name = file.path()
CSV.open(file_name,"w") do |csv|
result_cursor.each do |eachdoc|
eachdoc.each do |key,value|
csv << [key, value]
end
end
end
That really doesn't help explain the numbers you're seeing though. Perhaps something else is overriding the contents of the temp file.

Related

Stub a CSV file in Ruby Test::Unit

So I have a Ruby script which parses a CSV file as a list of rules and does some processing and returns an array of hashes as processed data.
CSV looks like this:
id_number,rule,notes
1,"dummy_rule","necessary"
2,"sample_rule","optional"
The parsing of CSV looks like this:
def parse_csv(file_name)
filtered_data = []
CSV.foreach(file_name, headers: true) do |row|
filtered_data << row # some processing
end
filtered_data
end
Now I am wondering if it is possible to stub/mock an actual CSV file for unit testing in such a way that I could pass a "filename" into this function. I could make a CSV object and use the generate function but then the actual filename would not be possible.
def test_parse_csv
expected_result = ["id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
# stub/mock csv with filename: say "rules.csv"
assert.equal(expected_result, parse_csv(file_name))
end
I use Test::Ruby
I also found a ruby gem library called mocha but I don't know how this works for CSVs
https://github.com/freerange/mocha
Any thoughts are welcome!
I would create a support directory inside your test directory then create a file..
# test/support/rules.csv
id_number,rule,notes
1,"dummy_rule","necessary"
2,"sample_rule","optional"
Your test should look like this, also adding an opening curly bracket which looks like you've missed on line #2.
def test_parse_csv
expected_result = [{"id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
assert.equal(expected_result, parse_csv(File.read(csv_file)).first)
end
def csv_file
Rails.root.join('test', 'support', 'rules.csv')
end
EDIT: Sorry, I should have noticed this wasn't Ruby on Rails... heres a ruby solution:
def test_parse_csv
expected_result = [{"id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
assert.equal(expected_result, parse_csv(csv_file).first)
end
def csv_file
File.read('test/support/rules.csv')
end
I did this in my test let(:raw_csv) { "row_number \n 1 \n 2" }, telling myself that a CSV file was surely read as a simple string with linebreaks and commas... and it works fine. Be careful with additional whitespaces and commas, it is easy to make a mistake.

I have a conundrum involving blocks and passing them around, need help solving it

Ok, so I've build a DSL and part of it requires the user of the DSL to define what I called a 'writer block'
writer do |data_block|
CSV.open("data.csv", "wb") do |csv|
headers_written = false
data_block do |hash|
(csv << headers_written && headers_written = true) unless headers_written
csv << hash.values
end
end
end
The writer block gets called like this:
def pull_and_store
raise "No writer detected" unless #writer
#writer.call( -> (&block) {
pull(pull_initial,&block)
})
end
The problem is two fold, first, is this the best way to handle this kind of thing and second I'm getting a strange error:
undefined method data_block' for Servo_City:Class (NoMethodError)
It's strange becuase I can see data_block right there, or at least it exists before the CSV block at any rate.
What I'm trying to create is a way for the user to write a wrapper block that both wraps around a block and yields a block to the block that is being wrapped, wow that's a mouthful.
Inner me does not want to write an answer before the question is clarified.
Other me wagers that code examples will help to clarify the problem.
I assume that the writer block has the task of persisting some data. Could you pass the data into the block in an enumerable form? That would allow the DSL user to write something like this:
writer do |data|
CSV.open("data.csv", "wb") do |csv|
csv << header_row
data.each do |hash|
data_row = hash.values
csv << data_row
end
end
end
No block passing required.
Note that you can pass in a lazy collection if dealing with hugely huge data sets.
Does this solve your problem?
Trying to open the CSV file every time you want to write a record seems overly complex and likely to cause bad performance (unless writing is intermittent). It will also overwrite the CSV file each time unless you change the file mode from wb to ab.
I think something simple like:
csv = CSV.open('data.csv', 'wb')
csv << headers
writer do |hash|
csv << hash.values
end
would be something more understandable.

CSV.generate and converters?

I'm trying to create a converter to remove newline characters from CSV output.
I've got:
nonewline=lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
I've verified that this works properly IF I load a variable and then run something like:
csv=CSV(variable,:converters=>[nonewline])
However, I'm attempting to use this code to update a bunch of preexisting code using CSV.generate, and it does not appear to work at all.
CSV.generate(:converters=>[nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
returns:
"\"hello\ngoodbye\"\n"
I've tried quite a few things as well as trying other examples I've found online, and it appears as though :converters has no effect when used with CSV.generate.
Is this correct, or is there something I'm missing?
You need to write your converter as as below :
CSV::Converters[:nonewline] = lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
Then do :
CSV.generate(:converters => [:nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
Read the documentation Converters .
Okay, above part I didn't remove, as to show you how to write the custom CSV converters. The way you wrote it is incorrect.
Read the documentation of CSV::generate
This method wraps a String you provide, or an empty default String, in a CSV object which is passed to the provided block. You can use the block to append CSV rows to the String and when the block exits, the final String will be returned.
After reading the docs, it is quite clear that this method is for writing to a csv file, not for reading. Now all the converters options ( like :converters, :header_converters) is applied, when you are reading a CSV file, but not applied when you are writing into a CSV file.
Let me show you 2 examples to illustrate this more clearly.
require 'csv'
string = <<_
foo,bar
baz,quack
_
File.write('a',string)
CSV::Converters[:upcase] = lambda do |s|
s.upcase
end
I am reading from a CSV file, so :converters option is applied to it.
CSV.open('a','r',:converters => :upcase) do |csv|
puts csv.read
end
output
# >> FOO
# >> BAR
# >> BAZ
# >> QUACK
Now I am writing into the CSV file, converters option is not applied.
CSV.open('a','w',:converters => :upcase) do |csv|
csv << ['dog','cat']
end
CSV.read('a') # => [["dog", "cat"]]
Attempting to remove newlines using :converters did not work.
I had to override the << method from csv.rb adding the following code to it:
# Change all CR/NL's into one space
row.map! { |element|
if element.is_a?(String)
element.gsub(/(\r?\n)+/,' ')
else
element
end
}
Placed right before
output = row.map(&#quote).join(#col_sep) + #row_sep # quote and separate
at line 21.
I would think this would be a good patch to CSV, as newlines will always produce bad CSV output.

How not to save to csv when array is empty

I'm parsing through a website and i'm looking for potentially many million rows of content. However, csv/excel/ods doesn't allow for more than a million rows.
That is why I'm trying to use a provisionary to exclude saving empty content. However, it's not working: My code keeps creating empty rows in csv.
This is the code I have:
# create csv
CSV.open("neverending.csv", "w") do |csv|
csv << ["kuk","date","name"]
# loop through all urls
File.foreach("neverendingurls.txt") do |line|
begin
doorzoekbarefile = Nokogiri::HTML(open(line))
for k in 1..999 do
# PROVISIONARY / CONDITIONAL
unless doorzoekbarefile.at_xpath("//td[contains(style, '60px')])[#{k}]").nil?
# xpaths
kuk = doorzoekbarefile.at_xpath("(//td[contains(#style,'60px')])[#{k}]")
date = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[1]")
name = doorzoekbarefile.at_xpath("(//td[contains(#style, '60px')])[#{k}]/following-sibling::*[2]")
# save to csv
csv << [kuk,date,name]
end
end
end
rescue
puts "error bij url #{line}"
end
end
end
Anybody have a clue what's going wrong or how to solve the problem? Basically I simply need to change the code so that it doesn't create a new row of csv data when the xpaths are empty.
This really doesn't have to do with xpath. It's simple Array#empty?
row = [kuk,date,name]
csv << row if row.compact.empty?
BTW, your code is a mess. Learn how to indent at least beore posting again.

Ruby: how can use the dump method to output data to a csv file?

I try to use the ruby standard csv lib to dump out the arr of object to a csv.file , called 'a.csv'
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html#method-c-dump
dump(ary_of_objs, io = "", options = Hash.new)
but in this method, how can i dump into a file?
there is no such examples exists and help. I google it no example to do for me...
Also, the docs said that...
The next method you can provide is an instance method called
csv_headers(). This method is expected to return the second line of
the document (again as an Array), which is to be used to give each
column a header. By default, ::load will set an instance variable if
the field header starts with an # character or call send() passing the
header as the method name and the field value as an argument. This
method is only called on the first object of the Array.
Anyone knows how to pass the instance method csv_headers() to this dump function?
I haven't tested this out yet, but it looks like io should be set to a file. According to the doc you linked "The io parameter can be used to serialize to a File"
Something like:
f = File.open("filename")
dump(ary_of_objs, io = f, options = Hash.new)
The accepted answer doesn't really answer the question so I thought I'd give a useful example.
First of all if you look at the docs at http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html, if you hover over the method name for dump you see you can click to show source. If you do that you'll see that the dump method attempts to call csv_headers on the first object you pass in from ary_of_objs:
obj_template = ary_of_objs.first
...snip...
headers = obj_template.csv_headers
Then later you see that the method will call csv_dump on each object in ary_of_objs and pass in the headers:
ary_of_objs.each do |obj|
begin
csv << obj.csv_dump(headers)
rescue NoMethodError
csv << headers.map do |var|
if var[0] == #
obj.instance_variable_get(var)
else
obj[var[0..-2]]
end
end
end
end
So we need to augment each entry in array_of_objs to respond to those two methods. Here's an example wrapper class that would take a Hash, and return the hash keys as the CSV headers and then be able to dump each row based on the headers.
class CsvRowDump
def initialize(row_hash)
#row = row_hash
end
def csv_headers
#row.keys
end
def csv_dump(headers)
headers.map { |h| #row[h] }
end
end
There's one more catch though. This dump method wants to write an extra line at the top of the CSV file before the headers, and there's no way to skip that if you call this method due to this code at the top:
# write meta information
begin
csv << obj_template.class.csv_meta
rescue NoMethodError
csv << [:class, obj_template.class]
end
Even if you return '' from CsvRowDump.csv_meta that will still be a blank line where a parse expects the headers. So instead lets let dump write that line and then remove it afterwards when we call dump. This example assumes you have an array of hashes that all have the same keys (which will be the CSV header).
#rows = #hashes.map { |h| CsvRowDump.new(h) }
File.open(#filename, "wb") do |f|
str = CSV::dump(#rows)
f.write(str.split(/\n/)[1..-1].join("\n"))
end

Resources