Ruby: how can use the dump method to output data to a csv file? - ruby

I try to use the ruby standard csv lib to dump out the arr of object to a csv.file , called 'a.csv'
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html#method-c-dump
dump(ary_of_objs, io = "", options = Hash.new)
but in this method, how can i dump into a file?
there is no such examples exists and help. I google it no example to do for me...
Also, the docs said that...
The next method you can provide is an instance method called
csv_headers(). This method is expected to return the second line of
the document (again as an Array), which is to be used to give each
column a header. By default, ::load will set an instance variable if
the field header starts with an # character or call send() passing the
header as the method name and the field value as an argument. This
method is only called on the first object of the Array.
Anyone knows how to pass the instance method csv_headers() to this dump function?

I haven't tested this out yet, but it looks like io should be set to a file. According to the doc you linked "The io parameter can be used to serialize to a File"
Something like:
f = File.open("filename")
dump(ary_of_objs, io = f, options = Hash.new)

The accepted answer doesn't really answer the question so I thought I'd give a useful example.
First of all if you look at the docs at http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html, if you hover over the method name for dump you see you can click to show source. If you do that you'll see that the dump method attempts to call csv_headers on the first object you pass in from ary_of_objs:
obj_template = ary_of_objs.first
...snip...
headers = obj_template.csv_headers
Then later you see that the method will call csv_dump on each object in ary_of_objs and pass in the headers:
ary_of_objs.each do |obj|
begin
csv << obj.csv_dump(headers)
rescue NoMethodError
csv << headers.map do |var|
if var[0] == #
obj.instance_variable_get(var)
else
obj[var[0..-2]]
end
end
end
end
So we need to augment each entry in array_of_objs to respond to those two methods. Here's an example wrapper class that would take a Hash, and return the hash keys as the CSV headers and then be able to dump each row based on the headers.
class CsvRowDump
def initialize(row_hash)
#row = row_hash
end
def csv_headers
#row.keys
end
def csv_dump(headers)
headers.map { |h| #row[h] }
end
end
There's one more catch though. This dump method wants to write an extra line at the top of the CSV file before the headers, and there's no way to skip that if you call this method due to this code at the top:
# write meta information
begin
csv << obj_template.class.csv_meta
rescue NoMethodError
csv << [:class, obj_template.class]
end
Even if you return '' from CsvRowDump.csv_meta that will still be a blank line where a parse expects the headers. So instead lets let dump write that line and then remove it afterwards when we call dump. This example assumes you have an array of hashes that all have the same keys (which will be the CSV header).
#rows = #hashes.map { |h| CsvRowDump.new(h) }
File.open(#filename, "wb") do |f|
str = CSV::dump(#rows)
f.write(str.split(/\n/)[1..-1].join("\n"))
end

Related

How to "observe" a stream in Ruby's CSV module?

I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
#file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
Look at the transformed data without writing it out (so I can test that it transforms the data correctly, and I don't need to write it to file right away: maybe I want to do more manipulation before writing it out)
Don't slurp all the file at once, so it works no matter the size of the original data
Have this as a base class with an empty transformer so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.
But obviously the above doesn't work because I don't really have a reference to new_data in transformer.
How could I achieve this elegantly?
I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
1. Simple read-modify-write loop
Since you do not want to slurp the file, use CSV::Foreach. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
2. Using CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters - (see the :converters definition in the CSV::New documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
3. Separate input and output from your converter classes
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.

I have a conundrum involving blocks and passing them around, need help solving it

Ok, so I've build a DSL and part of it requires the user of the DSL to define what I called a 'writer block'
writer do |data_block|
CSV.open("data.csv", "wb") do |csv|
headers_written = false
data_block do |hash|
(csv << headers_written && headers_written = true) unless headers_written
csv << hash.values
end
end
end
The writer block gets called like this:
def pull_and_store
raise "No writer detected" unless #writer
#writer.call( -> (&block) {
pull(pull_initial,&block)
})
end
The problem is two fold, first, is this the best way to handle this kind of thing and second I'm getting a strange error:
undefined method data_block' for Servo_City:Class (NoMethodError)
It's strange becuase I can see data_block right there, or at least it exists before the CSV block at any rate.
What I'm trying to create is a way for the user to write a wrapper block that both wraps around a block and yields a block to the block that is being wrapped, wow that's a mouthful.
Inner me does not want to write an answer before the question is clarified.
Other me wagers that code examples will help to clarify the problem.
I assume that the writer block has the task of persisting some data. Could you pass the data into the block in an enumerable form? That would allow the DSL user to write something like this:
writer do |data|
CSV.open("data.csv", "wb") do |csv|
csv << header_row
data.each do |hash|
data_row = hash.values
csv << data_row
end
end
end
No block passing required.
Note that you can pass in a lazy collection if dealing with hugely huge data sets.
Does this solve your problem?
Trying to open the CSV file every time you want to write a record seems overly complex and likely to cause bad performance (unless writing is intermittent). It will also overwrite the CSV file each time unless you change the file mode from wb to ab.
I think something simple like:
csv = CSV.open('data.csv', 'wb')
csv << headers
writer do |hash|
csv << hash.values
end
would be something more understandable.

How do I call a function in Ruby?

I'm trying to call but I keep getting an error. This is my code:
require 'rubygems'
require 'net/http'
require 'uri'
require 'json'
class AlchemyAPI
#Setup the endpoints
##ENDPOINTS = {}
##ENDPOINTS['taxonomy'] = {}
##ENDPOINTS['taxonomy']['url'] = '/url/URLGetRankedTaxonomy'
##ENDPOINTS['taxonomy']['text'] = '/text/TextGetRankedTaxonomy'
##ENDPOINTS['taxonomy']['html'] = '/html/HTMLGetRankedTaxonomy'
##BASE_URL = 'http://access.alchemyapi.com/calls'
def initialize()
begin
key = File.read('C:\Users\KVadher\Desktop\api_key.txt')
key.strip!
if key.empty?
#The key file should't be blank
puts 'The api_key.txt file appears to be blank, please copy/paste your API key in the file: api_key.txt'
puts 'If you do not have an API Key from AlchemyAPI please register for one at: http://www.alchemyapi.com/api/register.html'
Process.exit(1)
end
if key.length != 40
#Keys should be exactly 40 characters long
puts 'It appears that the key in api_key.txt is invalid. Please make sure the file only includes the API key, and it is the correct one.'
Process.exit(1)
end
#apiKey = key
rescue => err
#The file doesn't exist, so show the message and create the file.
puts 'API Key not found! Please copy/paste your API key into the file: api_key.txt'
puts 'If you do not have an API Key from AlchemyAPI please register for one at: http://www.alchemyapi.com/api/register.html'
#create a blank file to hold the key
File.open("api_key.txt", "w") {}
Process.exit(1)
end
end
# Categorizes the text for a URL, text or HTML.
# For an overview, please refer to: http://www.alchemyapi.com/products/features/text-categorization/
# For the docs, please refer to: http://www.alchemyapi.com/api/taxonomy/
#
# INPUT:
# flavor -> which version of the call, i.e. url, text or html.
# data -> the data to analyze, either the the url, text or html code.
# options -> various parameters that can be used to adjust how the API works, see below for more info on the available options.
#
# Available Options:
# showSourceText -> 0: disabled (default), 1: enabled.
#
# OUTPUT:
# The response, already converted from JSON to a Ruby object.
#
def taxonomy(flavor, data, options = {})
unless ##ENDPOINTS['taxonomy'].key?(flavor)
return { 'status'=>'ERROR', 'statusInfo'=>'Taxonomy info for ' + flavor + ' not available' }
end
#Add the URL encoded data to the options and analyze
options[flavor] = data
return analyze(##ENDPOINTS['taxonomy'][flavor], options)
print
end
**taxonomy(text,"trees",1)**
end
In ** ** I have entered my call. Am I doing something incorrect. The error I receive is:
C:/Users/KVadher/Desktop/testrub:139:in `<class:AlchemyAPI>': undefined local variable or method `text' for AlchemyAPI:Class (NameError)
from C:/Users/KVadher/Desktop/testrub:6:in `<main>'
I feel as though I'm calling as normal and that there is something wrong with the api code itself? Although I may be wrong.
Yes, as jon snow says, the function (method) call must be outside of the class. The methods are defined along with the class.
Also, Options should be a Hash, not a number, as you call options[flavor] = data, which is going to cause you another problem.
I believe maybe you meant to put text in quotes, as that is one of your flavors.
Furthermore, because you declared a class, this is called an instance method, and you must make an instance of the class to use this:
my_instance = AlchemyAPI.new
my_taxonomy = my_instance.taxonomy("text", "trees")
That's enough to get it to work, it seems like you have a ways to go to get this all working though. Good luck!

How do I post/upload multiple files at once using HttpClient?

def test_post_with_file filename = 'test01.xml'
File.open(filename) do |file|
response = #http_client.post(url, {'documents'=>file})
end
end
How do I modify the above method to handle a multi-file-array post/upload?
file_array = ['test01.xml', 'test02.xml']
You mean like this?
def test_post_with_file(file_array=[])
file_array.each do |filename|
File.open(filename) do |file|
response = #http_client.post(url, {'documents'=>file})
end
end
end
I was having the same problem and finally figured out how to do it:
def test_post_with_file(file_array)
form = file_array.map { |n| ['documents[]', File.open(n)] }
response = #http_client.post(#url, form)
end
You can see in the docs how to pass multiple values: http://rubydoc.info/gems/httpclient/HTTPClient#post_content-instance_method .
In the "body" row, I tried without success to use the 4th example. Somehow HttpClient just decides to apply .to_s to each hash in the array.
Then I tried the 2nd solution and it wouldn't work either because only the last value is kept by the server. But I discovered after some tinkering that the second solution works if the parameter name includes the square brackets to indicate there are mutiple values as an array.
Maybe this is a bug in Sinatra (that's what I'm using), maybe the handling of such data is implementation-dependent, maybe the HttpClient doc is outdated/wrong. Or a combination of these.

Sinatra with a persistent variable

My sinatra app has to parse a ~60MB XML-file. This file hardly ever changes: on a nightly cron job, It is overwritten with another one.
Are there tricks or ways to keep the parsed file in memory, as a variable, so that I can read from it on incoming requests, but not have to parse it over and over for each incoming request?
Some Pseudocode to illustrate my problem.
get '/projects/:id'
return #nokigiri_object.search("//projects/project[#id=#{params[:id]}]/name/text()")
end
post '/projects/update'
if params[:token] == "s3cr3t"
#nokogiri_object = reparse_the_xml_file
end
end
What I need to know, is how to create such a #nokogiri_object so that it persists when Sinatra runs. Is that possible at all? Or do I need some storage for that?
You could try:
configure do
##nokogiri_object = parse_xml
end
Then ##nokogiri_object will be available in your request methods. It's a class variable rather than an instance variable, but should do what you want.
The proposed solution gives a warning
warning: class variable access from toplevel
You can use a class method to access the class variable and the warning will disappear
require 'sinatra'
class Cache
##count = 0
def self.init()
##count = 0
end
def self.increment()
##count = ##count + 1
end
def self.count()
return ##count
end
end
configure do
Cache::init()
end
get '/' do
if Cache::count() == 0
Cache::increment()
"First time"
else
Cache::increment()
"Another time #{Cache::count()}"
end
end
Two options:
Save the parsed file to a new file and always read that one.
You can save in a file – serialize - a hash with two keys: 'last-modified' and 'data'.
The 'last-modified' value is a date and you check in every request if that day is today. If it is not today then a new file is downloaded, parsed and stored with today's date.
The 'data' value is the parsed file.
That way you parse just once time, sort of a cache.
Save the parsed file to a NoSQL database, for example redis.

Resources