objected oriented and parsing csv - ruby

i have to parse csv file which has customers and product they have ordered . customers can repeat for different product . i have to get all the unique customers and products they have ordered. Then print out each customer and there product . i have been asked to do in a object oriented way so
1) should i create a customer objects and have a product as there attribute
2) just write a program using foreach and loop through and store customer and product in a hash and print it out .
what throws me off is i have been asked to do it in a object oriented way. if do it by creating objects how can i store a custom object in memory ? so that if i come across a customer second time i have to add the product and at the end i have to loop through all the objects and print it out . sorry i have bad English thanks reading a long question and for the help.

How can you store a custom object in memory? By creating the object and keeping it in a list, hash, or whatever seems appropriate. (Probably a hash, with the key being whatever unique value you have in your CSV, and the value would be a collection of products.)
Being asked to do it in "an object-oriented way" is a little arbitrary, though.

If you are using FasterCSV or Ruby 1.9 you can extend the parser allowing you to map each CSV row to a custom object.
# http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-load
# http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-dump
# https://github.com/JEG2/faster_csv/blob/master/test/tc_serialization.rb
require 'csv'
class Person
attr_accessor :id, :name, :email
def self.csv_load(meta, headers, row)
person = Person.new
headers.each.with_index { |h,i|
person.send "#{h}=", row[i]
}
person
end
def self.parse(csv)
meta = "class,#{self.to_s}\n"
CSV.load("#{meta}#{csv}")
end
def dump
self.class.dump([self])
end
def self.dump(people, io='', options={})
CSV.dump(people, io, options).strip
end
def self.csv_meta
[]
end
def csv_headers
%w(id name email)
end
def csv_dump(headers)
headers.map { |h| self.instance_variable_get "##{h}" }
end
end
CSV_DUMP = <<-CSV
class,Person
id=,name=,email=
1,"First Dude",dude#company.com
2,"Second Dude",2nddude#company.com
3,"Third Dude",3rddude#company.com
CSV
CSV_INPUT = <<-CSV
id,name,email
1,"First Dude",dude#company.com
2,"Second Dude",2nddude#company.com
3,"Third Dude",3rddude#company.com
CSV
CSV_DUMP2 = <<-CSV
class,Person
#{CSV_INPUT}
CSV
people = Person.parse(CSV_INPUT)
puts people.inspect
dumped = Person.dump(people)
puts dumped
puts "----"
puts Person.parse(dumped).inspect

Related

How to "observe" a stream in Ruby's CSV module?

I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
#file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
Look at the transformed data without writing it out (so I can test that it transforms the data correctly, and I don't need to write it to file right away: maybe I want to do more manipulation before writing it out)
Don't slurp all the file at once, so it works no matter the size of the original data
Have this as a base class with an empty transformer so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.
But obviously the above doesn't work because I don't really have a reference to new_data in transformer.
How could I achieve this elegantly?
I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
1. Simple read-modify-write loop
Since you do not want to slurp the file, use CSV::Foreach. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
2. Using CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters - (see the :converters definition in the CSV::New documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
3. Separate input and output from your converter classes
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.

Naming different files. Handler? IO? Stream? Processor? Controller?

I'm having some trouble naming some files that I wrote. I don't really know the different between a stream, I/O, a handler, a processor (Is this a real concept?), and a controller. These are what my files look like in Ruby:
Starting from the rakefile:
desc "Calculate chocolate totals from a CSV of orders"
task :redeem_orders, [:orders_csv_path, :redemptions_csv_path] do |t, args|
args.with_defaults(:orders_csv_path => "./public/input/orders.csv", :redemptions_csv_path => "./public/output/redemptions.csv")
DataController.transfer(
input_path: args[:orders_csv_path],
output_path: args[:redemptions_csv_path],
formatter: ChocolateTotalsFormatter,
converter: ChocolateTotalsConverter
)
end
Then the controller (which in my mind delegates between different classes with the data obtained from the rakefile):
class DataController
def self.transfer(input_path:, output_path:, formatter:, converter:)
data_processor = DataProcessor.new(
input_path: input_path,
output_path: output_path,
formatter: formatter
)
export_data = converter.convert(data_processor.import)
data_processor.export(export_data)
end
end
The processor (which performs imports and exports according to the various files that were passed into this file):
class DataProcessor
attr_reader :input_path,
:output_path,
:formatter,
:input_file_processor,
:output_file_processor
def initialize(input_path:, output_path:, formatter:)
#input_path = input_path
#output_path = output_path
#formatter = formatter
#input_file_processor = FileProcessorFactory.create(File.extname(input_path))
#output_file_processor = FileProcessorFactory.create(File.extname(output_path))
end
def import
formatter.format_input(input_file_processor.read(input_path: input_path))
end
def export(export_data)
output_file_processor.write(
output_path: output_path,
data: formatter.format_output(export_data)
)
end
end
the converter referenced in the controller looks like this (it converts data that was passed in to a different format... I'm more confident about this naming):
class ChocolateTotalsConverter
def self.convert(data)
data.map do |row|
ChocolateTotalsCalculator.new(row).calculate
end
end
end
And the FileProcessorFactory in the above code snippet creates a file like this one that actually does the reading and the writing to CSV:
require 'csv'
class CSVProcessor
include FileTypeProcessor
def self.read(input_path:, with_headers: true, return_headers: false)
CSV.read(input_path, headers: with_headers, return_headers: return_headers, converters: :numeric)
end
def self.write(output_path:, data:, write_headers: false)
CSV.open(output_path, "w", write_headers: write_headers) do |csv|
data.each do |row|
csv << row
end
end
end
end
I'm having trouble with naming. Does it looks like I named things correctly? What should be named something like DataIO vs DataProcessor? What should a file named DataStream be doing? What about something that's a converter?
Ruby isn't a kingdom of nouns. Some programmers hear "everything is an object" and think "I am processing data, therefore I need a DataProcessor object!" But in Ruby, "everything is an object". There's only one novel "thing" in your example: a chocolate order (maybe redemptions, too). So you only need one custom class: ChocolateOrder. The other "things" we already have objects for: CSV represents the CSV file, Array (or Set or Hash) can represent the collection of chocolate orders.
Processing a CSV row into an order, converting an order into workable data, and totaling those data into a result aren't "things". They're actions! In Ruby, actions are methods, blocks, procs, lambdas, or top-level functions*. In your case I see a method like ChocolateOrder#payment for getting just the price to add up, then maybe some blocks for the rest of the processing.
In pseudocode I imagine something like this:
# input
orders = CSV.foreach(input_file).map do |row|
# get important stuff out of the row
Order.new(x, y, z)
end
# processing
redemptions = orders.map { |order| order.get_redemption }
# output
CSV.open(output_file, "wb") do |csv|
redemptions.each do |redemption|
# convert redemption to an array of strings
csv << redemption_ary
end
end
If your rows are really simple, I would even consider just setting headers:true on the CSV so it returns Hash and leave orders as that.
* Procs, lambdas, and top-level functions are objects too. But that's beside the point.
This seems like quite a 'java' way of thinking - in Ruby I haven't seen patterns like this used very often. I'd say that you might only really need the DataProcessor class. CSVProcessor and ChocolateTotalsConverter have only class methods, which might be more idiomatic if they were instance methods of DataProcessor instead. I'd start there and see how you feel about it.

List dynamic attributes in a Mongoid Model

I have gone over the documentation, and I can't find a specific way to go about this. I have already added some dynamic attributes to a model, and I would like to be able to iterate over all of them.
So, for a concrete example:
class Order
include Mongoid::Document
field :status, type: String, default: "pending"
end
And then I do the following:
Order.new(status: "processed", internal_id: "1111")
And later I want to come back and be able to get a list/array of all the dynamic attributes (in this case, "internal_id" is it).
I'm still digging, but I'd love to hear if anyone else has solved this already.
Just include something like this in your model:
module DynamicAttributeSupport
def self.included(base)
base.send :include, InstanceMethods
end
module InstanceMethods
def dynamic_attributes
attributes.keys - _protected_attributes[:default].to_a - fields.keys
end
def static_attributes
fields.keys - dynamic_attributes
end
end
end
and here is a spec to go with it:
require 'spec_helper'
describe "dynamic attributes" do
class DynamicAttributeModel
include Mongoid::Document
include DynamicAttributeSupport
field :defined_field, type: String
end
it "provides dynamic_attribute helper" do
d = DynamicAttributeModel.new(age: 45, defined_field: 'George')
d.dynamic_attributes.should == ['age']
end
it "has static attributes" do
d = DynamicAttributeModel.new(foo: 'bar')
d.static_attributes.should include('defined_field')
d.static_attributes.should_not include('foo')
end
it "allows creation with dynamic attributes" do
d = DynamicAttributeModel.create(age: 99, blood_type: 'A')
d = DynamicAttributeModel.find(d.id)
d.age.should == 99
d.blood_type.should == 'A'
d.dynamic_attributes.should == ['age', 'blood_type']
end
end
this will give you only the dynamic field names for a given record x:
dynamic_attribute_names = x.attributes.keys - x.fields.keys
if you use additional Mongoid features, you need to subtract the fields associated with those features:
e.g. for Mongoid::Versioning :
dynamic_attribute_names = (x.attributes.keys - x.fields.keys) - ['versions']
To get the key/value pairs for only the dynamic attributes:
make sure to clone the result of attributes(), otherwise you modify x !!
attr_hash = x.attributes.clone #### make sure to clone this, otherwise you modify x !!
dyn_attr_hash = attr_hash.delete_if{|k,v| ! dynamic_attribute_names.include?(k)}
or in one line:
x.attributes.clone.delete_if{|k,v| ! dynamic_attribute_names.include?(k)}
So, what I ended up doing is this. I'm not sure if it's the best way to go about it, but it seems to give me the results I'm looking for.
class Order
def dynamic_attributes
self.attributes.delete_if { |attribute|
self.fields.keys.member? attribute
}
end
end
Attributes appears to be a list of the actual attributes on the object, while fields appears to be a hash of the fields that were predefined. Couldn't exactly find that in the documentation, but I'm going with it for now unless someone else knows of a better way!
try .methods or .instance_variables
Not sure if I liked the clone approach, so I wrote one too. From this you could easily build a hash of the content too. This merely outputs it all the dynamic fields (flat structure)
(d.attributes.keys - d.fields.keys).each {|a| puts "#{a} = #{d[a]}"};
I wasn't able to get any of the above solutions to work (as I didn't want to have to add slabs and slabs of code to each model, and, for some reason, the attributes method does not exist on a model instance, for me. :/), so I decided to write my own helper to do this for me. Please note that this method includes both dynamic and predefined fields.
helpers/mongoid_attribute_helper.rb:
module MongoidAttributeHelper
def self.included(base)
base.extend(AttributeMethods)
end
module AttributeMethods
def get_all_attributes
map = %Q{
function() {
for(var key in this)
{
emit(key, null);
}
}
}
reduce = %Q{
function(key, value) {
return null;
}
}
hashedResults = self.map_reduce(map, reduce).out(inline: true) # Returns an array of Hashes (i.e. {"_id"=>"EmailAddress", "value"=>nil} )
# Build an array of just the "_id"s.
results = Array.new
hashedResults.each do |value|
results << value["_id"]
end
return results
end
end
end
models/user.rb:
class User
include Mongoid::Document
include MongoidAttributeHelper
...
end
Once I've added the aforementioned include (include MongoidAttributeHelper) to each model which I would like to use this method with, I can get a list of all fields using User.get_all_attributes.
Granted, this may not be the most efficient or elegant of methods, but it definitely works. :)

Testing a lambda

I am creating an import feature that imports CSV files into several tables. I made a module called CsvParser which parses a CSV file and creates records. My models that receive the create actions extends theCsvParser. They make a call to CsvParser.create and pass the correct attribute order and an optional lambda called value_parser. This lambda transforms values in a hash to a preffered format.
class Mutation < ActiveRecord::Base
extend CsvParser
def self.import_csv(csv_file)
attribute_order = %w[reg_nr receipt_date reference_number book_date is_credit sum balance description]
value_parser = lambda do |h|
h["is_credit"] = ((h["is_credit"] == 'B') if h["is_credit"].present?)
h["sum"] = -1 * h["sum"].to_f unless h["is_credit"]
return [h]
end
CsvParser.create(csv_file, self, attribute_order, value_parser)
end
end
The reason that I'm using a lambda instead of checks inside the CsvParser.create method is because the lambda is like a business rule that belongs to this model.
My question is how i should test this lambda. Should i test it in the model or the CsvParser? Should i test the lambda itself or the result of an array of the self.import method? Maybe i should make another code structure?
My CsvParser looks as follows:
require "csv"
module CsvParser
def self.create(csv_file, klass, attribute_order, value_parser = nil)
parsed_csv = CSV.parse(csv_file, col_sep: "|")
records = []
ActiveRecord::Base.transaction do
parsed_csv.each do |row|
record = Hash.new {|h, k| h[k] = []}
row.each_with_index do |value, index|
record[attribute_order[index]] = value
end
if value_parser.blank?
records << klass.create(record)
else
value_parser.call(record).each do |parsed_record|
records << klass.create(parsed_record)
end
end
end
end
return records
end
end
I'm testing the module itself:
require 'spec_helper'
describe CsvParser do
it "should create relations" do
file = File.new(Rails.root.join('spec/fixtures/files/importrelaties.txt'))
Relation.should_receive(:create).at_least(:once)
Relation.import_csv(file).should be_kind_of Array
end
it "should create mutations" do
file = File.new(Rails.root.join('spec/fixtures/files/importmutaties.txt'))
Mutation.should_receive(:create).at_least(:once)
Mutation.import_csv(file).should be_kind_of Array
end
it "should create strategies" do
file = File.new(Rails.root.join('spec/fixtures/files/importplan.txt'))
Strategy.should_receive(:create).at_least(:once)
Strategy.import_csv(file).should be_kind_of Array
end
it "should create reservations" do
file = File.new(Rails.root.join('spec/fixtures/files/importreservering.txt'))
Reservation.should_receive(:create).at_least(:once)
Reservation.import_csv(file).should be_kind_of Array
end
end
Some interesting questions. A couple of notes:
You probably shouldn't have a return within the lambda. Just make the last statement [h].
If I understand the code correctly, the first and second lines of your lambda are overcomplicated. Reduce them to make them more readable and easier to refactor:
h["is_credit"] = (h['is_credit'] == 'B') # I *think* that will do the same
h['sum'] = h['sum'].to_f # Your original code would have left this a string
h['sum'] *= -1 unless h['is_credit']
It looks like your lambda doesn't depend on anything external (aside from h), so I would test it separately. You could even make it a constant:
class Mutation < ActiveRecord::Base
extend CsvParser # <== See point 5 below
PARSE_CREDIT_AND_SUM = lambda do |h|
h["is_credit"] = (h['is_credit'] == 'B')
h['sum'] = h['sum'].to_f
h['sum'] *= -1 unless h['is_credit']
[h]
end
Without knowing the rationale, it's hard to say where you should put this code. My gut instinct is that it is not the job of the CSV parser (although a good parser may detect floating point numbers and convert them from strings?) Keep your CSV parser reusable. (Note: Re-reading, I think you've answered this question yourself - it is business logic, tied to the model. Go with your gut!)
Lastly, you are defining and the method CsvParser.create. You don't need to extend CsvParser to get access to it, although if you have other facilities in CsvParser, consider making CsvParser.create a normal module method called something like create_from_csv_file

Ruby regular expressions - speed problem

I want to obtain the information of students in class c79363 ONLY.
The following file(users.txt) contains the user id's
c79363::7117:dputnam,gliao01,hmccon01,crober06,cpurce01,cdavid03,dlevin01,jsmith88
d79363::7117:dputn,gliao0,hmcc01,crob06,cpur01,cdad03,dlen01,jsmh88
f79363::7117:dpnam,gli01,hmcn01,ober06,crce01,cdav03,dln01,jith88
FILENAME=user_info.txt
The other one contains specific information about a user like in this format
jsmith88:*:4185:208:jsmith113:/students/jsmith88:/usr/bin/bash
userd:*:4185:208:jsmith113:/students/jsmith88:/usr/bin/bash
gliao01:*:4185:208:jsmith113:/students/jsmith88:/usr/bin/bash
Here was my solution but was slow! I want to optimize the speed.
pseudo code
I read the file using File.readlines(users.txt) ->
I used split(/,/) -> I then pop array until i had an array with the following values
dputnam,gliao01,hmccon01,crober06,cpurce01,cdavid03,dlevin01,jsmith88
I then continue to read user_info.txt with File.readlines(user_info.txt)
I split(/:/) i have USER ARRAY
Finally I compared the first entry USER ARRAY with my users in class c79363.
user_info = {}
File.foreach("user_info.txt") {|line| user_info[line[/^[^:]+/]] = line.chomp}
class_name = "c79363"
File.foreach("users.txt") do |line|
next unless line[/^[^:]+/] == class_name
line[/[^:]+$/].rstrip.split(/,/).each do |user|
puts user_info[user] if user_info.has_key?(user)
end
end
user_info = {}
File.readlines("users_info.txt").each do |line|
user_info[line.split(/:/,2)[0]] = line.chomp
end
class_name = "c79363"
File.readlines("users.txt").each do |line|
line=line.strip.split(/:/)
if line[0] == class_name then
line[-1].split(",").each {|u| puts user_info[u] if user_info[u] }
end
end

Resources