Ruby - reading from .csv and creating objects out of it

Ruby - reading from .csv and creating objects out of it - ruby

I have .csv file with rows of which every row represents one call with certain duration, number etc. I need to create array of Call objects - every Call.new expects Hash of parameters, so it's easy - it just takes rows from CSV. But for some reason it doesn't work - when I invoke Call.new(raw_call) it's nil.
It's also impossible for me to see any output - I placed puts in various places in code (inside blocks etc) and it simply doesn't show anything. I obviously have another class - Call, which holds initialize for Call etc.
require 'csv'
class CSVCallParser
attr_accessor :io
def initialize(io)
self.io = io
end
NAMES = {
a: :date,
b: :service,
c: :phone_number,
d: :duration,
e: :unit,
f: :cost
}
def run
parse do |raw_call|
parse_call(raw_call)
end
end
private
def parse_call(raw_call)
NAMES.each_with_object({}) do |name, title, memo|
memo[name] = raw_call[title.to_s]
end
end
def parse(&block)
CSV.parse(io, headers: true, header_converters: :symbol, &block)
end
end
CSVCallParser.new(ARGV[0]).run
Small sample of my .csv file: headers and one row:
"a","b","c","d","e","f"
"01.09.2016 08:49","International","48627843111","0:29","","0,00"

I noticed a few things that isn't going as expected. In the parse_call method,
def parse_call(raw_call)
NAMES.each_with_object({}) do |name, title, memo|
memo[name] = raw_call[title.to_s]
end
end
I tried to print name, title, and memo. I expected to get :a, :date, and {}, but what I actually got was [:a,:date],{}, and nil.
Also, raw_call headers are :a,:b,:c..., not :date, :service..., so you should be using raw_call[name], and converting that to string will not help, since the key is a symbol in the raw_call.
So I modified the function to
def parse_call(raw_call)
NAMES.each_with_object({}) do |name_title, memo|
memo[name_title[1]] = raw_call[name_title[0]]
end
end
name_title[1] returns the title (:date, :service, etc)
name_title[0] returns the name (:a, :b, etc)
Also, in this method
def run
parse do |raw_call|
parse_call(raw_call)
end
end
You are not returning any results you get, so you are getting nil,
So, I changed it to
def run
res = []
parse do |raw_call|
res << parse_call(raw_call)
end
res
end
Now, if I output the line
p CSVCallParser.new(File.read("file1.csv")).run
I get (I added two more lines to the csv sample)
[{:date=>"01.09.2016 08:49", :service=>"International", :phone_number=>"48627843111", :duration=>"0:29", :unit=>"", :cost=>"0,00"},
{:date=>"02.09.2016 08:49", :service=>"International", :phone_number=>"48622454111", :duration=>"1:29", :unit=>"", :cost=>"0,00"},
{:date=>"03.09.2016 08:49", :service=>"Domestic", :phone_number=>"48627843111", :duration=>"0:29", :unit=>"", :cost=>"0,00"}]
If you want to run this program from the terminal like so
ruby csv_call_parser.rb calls.csv
(In this case, calls.csv is passed in as an argument to ARGV)
You can do so by modifying the last line of the ruby file.
p CSVCallParser.new(File.read(ARGV[0])).run
This will also return the array with hashes like before.

csv = CSV.parse(csv_text, :headers => true)
puts csv.map(&:to_h)
outputs:
[{a:1, b:1}, {a:2, b:2}]

Related

How to "observe" a stream in Ruby's CSV module?

I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
#file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
Look at the transformed data without writing it out (so I can test that it transforms the data correctly, and I don't need to write it to file right away: maybe I want to do more manipulation before writing it out)
Don't slurp all the file at once, so it works no matter the size of the original data
Have this as a base class with an empty transformer so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.
But obviously the above doesn't work because I don't really have a reference to new_data in transformer.
How could I achieve this elegantly?

I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
1. Simple read-modify-write loop
Since you do not want to slurp the file, use CSV::Foreach. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
2. Using CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters - (see the :converters definition in the CSV::New documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
3. Separate input and output from your converter classes
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.

Parse JSON like syntax to ruby object

Simple parser which turned out to be much harder than i thought. I need a string parser to convert nested fields to ruby object. In my case api response will only return desired fields.
Given
Parser.parse "album{name, photo{name, picture, tags}}, post{id}"
Desired output or similar
{album: [:name, photo: [:name, :picture, :tags]], post: [:id]}
Any thoughts?

Wrote my own solution
module Parser
extend self
def parse str
parse_list(str).map do |i|
extract_item_fields i
end
end
def extract_item_fields item
field_name, fields_str = item.scan(/(.+?){(.+)}/).flatten
if field_name.nil?
item
else
fields = parse_list fields_str
result = fields.map { |field| extract_item_fields(field) }
{ field_name => result }
end
end
def parse_list list
return list if list.nil?
list.concat(',').scan(/([^,{}]+({.+?})?),/).map(&:first).map(&:strip)
end
end
str = 'album{name, photo{name, picture, tags}}, post{id}'
puts Parser.parse(str).inspect
# => [{"album"=>["name", {"photo"=>["name", "picture", "tags"]}]}, {"post"=>["id"]}]

How to access block parameters using Object.send

I'm trying to run the following code:
class RentLimit < ActiveRecord::Base
def self.load_data
rows = CSV.open("csvs/income_limits_2011_to_2015.csv").read
rows.shift
rows.each do |county, yr, date, _50pct_1br, _50pct_2br, _50pct_3br, _50pct_4br, _60pct_1br, _60pct_2br, _60pct_3br, _60pct_4br|
[50, 60].each do |ami|
[1, 2, 3, 4].each do |br|
r = new
r.county = county
r.state = "SC"
r.year = yr
r.effective_date = Date.parse(date)
r.pct_ami = ami
r.br = br
r.max_rent = self.send("_#{ami}pct_#{br}br".to_sym)
r.save
end#of brs
end# of amis
end# of rows
end
end
but am getting this error message when trying to run it:
NoMethodError: undefined method `_50pct_1br' for #<Class:0x007fe942ce3b18>
The send method isn't able to access those block parameters inside of the scope. Is there any way to give access to block parameters to send? If not, how else might I dynamically access block parameters?
How do I use send or its equivalent to access block parameters in Ruby?

This is much easier if you tell CSV.open what your column names are. It looks like your CSV file might have a header row that you're skipping with rows.shift, in which case you shouldn't skip it, and use the headers: true option. Then you can access each field by name with row["field_name"] or, in your case, row["_#{ami}pct_#{br}br"]:
CSV_PATH = "csvs/income_limits_2011_to_2015.csv"
DEFAULT_STATE = "SC"
def self.load_data
CSV.open(CSV_PATH, 'r', headers: true) do |csv|
csv.each do |row|
max_rent = row["_#{ami}pct_#{br}br"]
create(
county: row["county"],
state: DEFAULT_STATE,
year: row["yr"],
effective_date: Date.parse(row["date"]),
pct_ami: ami,
br: br,
max_rent: max_rent,
)
end
end
end
Note that I used CSV.open with a block to ensure that the file is closed after it's been read, which your original code wasn't doing. I also used create instead of new; ... save, since the latter is needlessly verbose.
If you're skipping the first row for some other reason, or you want to use field names other than those in the header row, you can set the options return_headers: false, headers: names, where names is an array of names, e.g.:
CSV_HEADERS = %w[
county yr date _50pct_1br _50pct_2br _50pct_3br _50pct_4br
_60pct_1br _60pct_2br _60pct_3br _60pct_4br
].freeze
def self.load_data
CSV.open(CSV_PATH, 'r', return_headers: false, headers: CSV_HEADERS) do |csv|
# ...
end
end
Finally, since some of your attributes are the same for every object created, I'd move those out of the loop:
def self.load_data
base_attrs = { state: DEFAULT_STATE, pct_ami: ami, br: br }
CSV.open(CSV_PATH, 'r', headers: true) do |csv|
csv.each do |row|
create(base_attrs.merge(
county: row["county"],
year: row["yr"],
effective_date: row["date"],
max_rent: row["_#{ami}pct_#{br}br"]
))
end
end
end

Minitest: How to stub/mock the file result of Kernel.open on a URL

I have been trying to use Minitest to test my code (full repo) but am having trouble with one method which downloads a SHA1 hash from a .txt file on a website and returns the value.
Method:
def download_remote_sha1
#log.info('Downloading Elasticsearch SHA1.')
#remote_sha1 = ''
Kernel.open(#verify_url) do |file|
#remote_sha1 = file.read
end
#remote_sha1 = #remote_sha1.split(/\s\s/)[0]
#remote_sha1
end
You can see that I log what is occurring to the command line, create an object to hold my SHA1 value, open the url (e.g. https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.2.deb.sha1.txt)
I then split the string so that I only have the SHA1 value.
The problem is that during a test, I want to stub the Kernel.open which uses OpenURI to open the URL. I would like to ensure that I'm not actually reaching out to download any file, but rather I'm just passing the block my own mock IO object testing just that it correctly splits stuff.
I attempted it like the block below but when #remote_sha1 = file.read occurs the file item is nil.
#mock_file = Minitest::Mock.new
#mock_file.expect(:read, 'd377e39343e5cc277104beee349e1578dc50f7f8 elasticsearch-1.4.2.deb')
Kernel.stub :open, #mock_file do
#downloader = ElasticsearchUpdate::Downloader.new(hash, true)
#downloader.download_remote_sha1.must_equal 'd377e39343e5cc277104beee349e1578dc50f7f8'
end

I was working on this question too, but matt figured it out first. To add to what matt posted:
When you write:
Kernel.stub(:open, #mock_file) do
#block code
end
...that means when Kernel.open() is called--in any code, anywhere before the stub() block ends--the return value of Kernel.open() will be #mock_file. However, you never use the return value of Kernel.open() in your code:
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
If you wanted to use the return value of Kernel.open(), you would have to write:
return_val = Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
#do something with return_val
Therefore, the return value of Kernel.open() is irrelevant in your code--which means the second argument of stub() is irrelevant.
A careful examination of the source code for stub() reveals that stub() takes a third argument--an argument which will be passed to a block specified after the stubbed method call. You, in fact, have specified a block after your stubbed Kernel.open() method call:
stubbed method call -+ +- start of block
| | |
V V V
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
^
|
end of block
So, in order to pass #mockfile to the block you need to specify it as the third argument to Kernel.stub():
Kernel.stub(:open, 'irrelevant', #mock_file) do
end
Here is a full example for future searchers:
require 'minitest/autorun'
class Dog
def initialize
#verify_url = 'http://www.google.com'
end
def download_remote_sha1
#remote_sha1 = ''
Kernel.open(#verify_url) do |f|
#remote_sha1 = f.read
end
#puts #remote_sha1[0..300]
#remote_sha1 = #remote_sha1.split(" ")[0] #Using a single space for the split() pattern will split on contiguous whitespace.
end
end
#Dog.new.download_remote_sha1
describe 'downloaded file' do
it 'should be an sha1 code' do
#mock_file = Minitest::Mock.new
#mock_file.expect(:read, 'd377e39343e5cc277104beee349e1578dc50f7f8 elasticsearch-1.4.2.deb')
Kernel.stub(:open, 'irrelevant', #mock_file) do
#downloader = Dog.new
#downloader.download_remote_sha1.must_equal 'd377e39343e5cc277104beee349e1578dc50f7f8'
end
end
end
xxx

The second argument to stub is what you want the return value to be for the duration of your test, but the way Kernel.open is used here requires the value it yields to the block to be changed instead.
You can achieve this by providing a third argument. Try changing the call to Kernel.stub to
Kernel.stub :open, true, #mock_file do
#...
Note the extra argument true, so that #mock_file is now the third argument and will be yielded to the block. The actual value of the second argument doesn’t really matter in this case, you might want to use #mock_file there too to more closely correspond to how open behaves.

Inserting values into a Hash for YAML dump

I'm creating a hash that will eventually be dumped on disk in YAML, but I need to capture multiple values stored in a file on disk and insert them into a hash. I can successfully create a variable with comma separated values, but I need to insert those values into a my "classes" key:
variable_values = "class1,class2,class3"
Ultimately, I need to get them into my test hash so it simulates something like this:
test_hash = {'Classes' => ['class1', 'class2', 'class3']}
Finally, I can output them to yaml so it looks like this:
---
classes:
- class1
- class2
- class3
What's the best way to iterate through the values and insert them into the hash? Thanks for any help you can offer!

You'd probably want something like:
test_hash = {'Classes' => variable_values.split(',')}

If you're wanting to serialize Ruby Classes (I'm not able to tell for sure), you'll probably want the following code (courtesy of opensoul.org, and as used in the Small Eigen Collider)
class Module
yaml_as "tag:ruby.yaml.org,2002:module"
def Module.yaml_new( klass, tag, val )
if String === val
val.split(/::/).inject(Object) {|m, n| m.const_get(n)}
else
raise YAML::TypeError, "Invalid Module: " + val.inspect
end
end
def to_yaml( opts = {} )
YAML::quick_emit( nil, opts ) { |out|
out.scalar( "tag:ruby.yaml.org,2002:module", self.name, :plain )
}
end
end
class Class
yaml_as "tag:ruby.yaml.org,2002:class"
def Class.yaml_new( klass, tag, val )
if String === val
val.split(/::/).inject(Object) {|m, n| m.const_get(n)}
else
raise YAML::TypeError, "Invalid Class: " + val.inspect
end
end
def to_yaml( opts = {} )
YAML::quick_emit( nil, opts ) { |out|
out.scalar( "tag:ruby.yaml.org,2002:class", self.name, :plain )
}
end
end
The code currently throws an exception if you try to serialize/deserialize anonymous classes (something I could fix but don't need to), and apart from that it works well for me.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ruby - reading from .csv and creating objects out of it - ruby

csv = CSV.parse(csv_text, :headers => true) puts csv.map(&:to_h) outputs: [{a:1, b:1}, {a:2, b:2}]

Related

How to "observe" a stream in Ruby's CSV module?

Parse JSON like syntax to ruby object

How to access block parameters using Object.send

Minitest: How to stub/mock the file result of Kernel.open on a URL

Inserting values into a Hash for YAML dump

Categories

Resources