I'm having some trouble naming some files that I wrote. I don't really know the different between a stream, I/O, a handler, a processor (Is this a real concept?), and a controller. These are what my files look like in Ruby:
Starting from the rakefile:
desc "Calculate chocolate totals from a CSV of orders"
task :redeem_orders, [:orders_csv_path, :redemptions_csv_path] do |t, args|
args.with_defaults(:orders_csv_path => "./public/input/orders.csv", :redemptions_csv_path => "./public/output/redemptions.csv")
DataController.transfer(
input_path: args[:orders_csv_path],
output_path: args[:redemptions_csv_path],
formatter: ChocolateTotalsFormatter,
converter: ChocolateTotalsConverter
)
end
Then the controller (which in my mind delegates between different classes with the data obtained from the rakefile):
class DataController
def self.transfer(input_path:, output_path:, formatter:, converter:)
data_processor = DataProcessor.new(
input_path: input_path,
output_path: output_path,
formatter: formatter
)
export_data = converter.convert(data_processor.import)
data_processor.export(export_data)
end
end
The processor (which performs imports and exports according to the various files that were passed into this file):
class DataProcessor
attr_reader :input_path,
:output_path,
:formatter,
:input_file_processor,
:output_file_processor
def initialize(input_path:, output_path:, formatter:)
#input_path = input_path
#output_path = output_path
#formatter = formatter
#input_file_processor = FileProcessorFactory.create(File.extname(input_path))
#output_file_processor = FileProcessorFactory.create(File.extname(output_path))
end
def import
formatter.format_input(input_file_processor.read(input_path: input_path))
end
def export(export_data)
output_file_processor.write(
output_path: output_path,
data: formatter.format_output(export_data)
)
end
end
the converter referenced in the controller looks like this (it converts data that was passed in to a different format... I'm more confident about this naming):
class ChocolateTotalsConverter
def self.convert(data)
data.map do |row|
ChocolateTotalsCalculator.new(row).calculate
end
end
end
And the FileProcessorFactory in the above code snippet creates a file like this one that actually does the reading and the writing to CSV:
require 'csv'
class CSVProcessor
include FileTypeProcessor
def self.read(input_path:, with_headers: true, return_headers: false)
CSV.read(input_path, headers: with_headers, return_headers: return_headers, converters: :numeric)
end
def self.write(output_path:, data:, write_headers: false)
CSV.open(output_path, "w", write_headers: write_headers) do |csv|
data.each do |row|
csv << row
end
end
end
end
I'm having trouble with naming. Does it looks like I named things correctly? What should be named something like DataIO vs DataProcessor? What should a file named DataStream be doing? What about something that's a converter?
Ruby isn't a kingdom of nouns. Some programmers hear "everything is an object" and think "I am processing data, therefore I need a DataProcessor object!" But in Ruby, "everything is an object". There's only one novel "thing" in your example: a chocolate order (maybe redemptions, too). So you only need one custom class: ChocolateOrder. The other "things" we already have objects for: CSV represents the CSV file, Array (or Set or Hash) can represent the collection of chocolate orders.
Processing a CSV row into an order, converting an order into workable data, and totaling those data into a result aren't "things". They're actions! In Ruby, actions are methods, blocks, procs, lambdas, or top-level functions*. In your case I see a method like ChocolateOrder#payment for getting just the price to add up, then maybe some blocks for the rest of the processing.
In pseudocode I imagine something like this:
# input
orders = CSV.foreach(input_file).map do |row|
# get important stuff out of the row
Order.new(x, y, z)
end
# processing
redemptions = orders.map { |order| order.get_redemption }
# output
CSV.open(output_file, "wb") do |csv|
redemptions.each do |redemption|
# convert redemption to an array of strings
csv << redemption_ary
end
end
If your rows are really simple, I would even consider just setting headers:true on the CSV so it returns Hash and leave orders as that.
* Procs, lambdas, and top-level functions are objects too. But that's beside the point.
This seems like quite a 'java' way of thinking - in Ruby I haven't seen patterns like this used very often. I'd say that you might only really need the DataProcessor class. CSVProcessor and ChocolateTotalsConverter have only class methods, which might be more idiomatic if they were instance methods of DataProcessor instead. I'd start there and see how you feel about it.
Related
I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
#file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
Look at the transformed data without writing it out (so I can test that it transforms the data correctly, and I don't need to write it to file right away: maybe I want to do more manipulation before writing it out)
Don't slurp all the file at once, so it works no matter the size of the original data
Have this as a base class with an empty transformer so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.
But obviously the above doesn't work because I don't really have a reference to new_data in transformer.
How could I achieve this elegantly?
I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
1. Simple read-modify-write loop
Since you do not want to slurp the file, use CSV::Foreach. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
2. Using CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters - (see the :converters definition in the CSV::New documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
3. Separate input and output from your converter classes
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.
Ok, so I've build a DSL and part of it requires the user of the DSL to define what I called a 'writer block'
writer do |data_block|
CSV.open("data.csv", "wb") do |csv|
headers_written = false
data_block do |hash|
(csv << headers_written && headers_written = true) unless headers_written
csv << hash.values
end
end
end
The writer block gets called like this:
def pull_and_store
raise "No writer detected" unless #writer
#writer.call( -> (&block) {
pull(pull_initial,&block)
})
end
The problem is two fold, first, is this the best way to handle this kind of thing and second I'm getting a strange error:
undefined method data_block' for Servo_City:Class (NoMethodError)
It's strange becuase I can see data_block right there, or at least it exists before the CSV block at any rate.
What I'm trying to create is a way for the user to write a wrapper block that both wraps around a block and yields a block to the block that is being wrapped, wow that's a mouthful.
Inner me does not want to write an answer before the question is clarified.
Other me wagers that code examples will help to clarify the problem.
I assume that the writer block has the task of persisting some data. Could you pass the data into the block in an enumerable form? That would allow the DSL user to write something like this:
writer do |data|
CSV.open("data.csv", "wb") do |csv|
csv << header_row
data.each do |hash|
data_row = hash.values
csv << data_row
end
end
end
No block passing required.
Note that you can pass in a lazy collection if dealing with hugely huge data sets.
Does this solve your problem?
Trying to open the CSV file every time you want to write a record seems overly complex and likely to cause bad performance (unless writing is intermittent). It will also overwrite the CSV file each time unless you change the file mode from wb to ab.
I think something simple like:
csv = CSV.open('data.csv', 'wb')
csv << headers
writer do |hash|
csv << hash.values
end
would be something more understandable.
I want to communicate between ruby and other applications in XML. I have defined a schema for this communication and I'm looking for the best way to do the transformation from data in Ruby to the XML and vice versa.
I have an XML document my_document.xml:
<myDocument>
<number>1</number>
<distance units="km">20</distance>
</myDocument>
Which conforms to an Schema my_document_type.xsd (I shalln't bother writing it out here).
Now I'd like to have the following class automatically generated from the XSD - is this reasonable or feasible?
# Represents a document created in the form of my_document_type.xsd
class MyDocument
attr_accessor :number, :distance, :distance_units
# Allows me to create this object from data in Ruby
def initialize(data)
#number = data['number']
#distance = data['distance']
#distance_units = data['distance_units']
end
# Takes an XML document of the correct form my_document.xml and populates internal systems
def self.from_xml(xml)
# Reads the XML and populates:
doc = ALibrary.load(xml)
#number = doc.xpath('/number').text()
#distance = doc.xpath('/distance').text()
#distance_units = doc.xpath('/distance').attr('units') # Or whatever
end
def to_xml
# Jiggery pokery
end
end
So that now I can do:
require 'awesomelibrary'
awesome_class = AwesomeLibrary.load_from_xsd('my_document_type.xsd')
doc = awesome_class.from_xml('my_document.xml')
p doc.distance # => 20
p doc.distance_units # => 'km'
And I can also do
doc = awesome_class.new('number' => 10, 'distance_units' => 'inches', 'distance' => '5')
p doc.to_xml
And get:
<myDocument>
<number>10</number>
<distance units="inches">5</distance>
</myDocument>
This sounds like fairly intense functionality to me, so I'm not expecting a full answer, but any tips as to libraries which already do this (I've tried using RXSD, but I can't figure out how to get it to do this) or any feasibility thoughts and so on.
Thanks in advance!
Have you tried Nokogiri? The Slop decorator implements method_missing into the document in such a way that it essentially duplicates the functionality you're looking for.
I am creating an import feature that imports CSV files into several tables. I made a module called CsvParser which parses a CSV file and creates records. My models that receive the create actions extends theCsvParser. They make a call to CsvParser.create and pass the correct attribute order and an optional lambda called value_parser. This lambda transforms values in a hash to a preffered format.
class Mutation < ActiveRecord::Base
extend CsvParser
def self.import_csv(csv_file)
attribute_order = %w[reg_nr receipt_date reference_number book_date is_credit sum balance description]
value_parser = lambda do |h|
h["is_credit"] = ((h["is_credit"] == 'B') if h["is_credit"].present?)
h["sum"] = -1 * h["sum"].to_f unless h["is_credit"]
return [h]
end
CsvParser.create(csv_file, self, attribute_order, value_parser)
end
end
The reason that I'm using a lambda instead of checks inside the CsvParser.create method is because the lambda is like a business rule that belongs to this model.
My question is how i should test this lambda. Should i test it in the model or the CsvParser? Should i test the lambda itself or the result of an array of the self.import method? Maybe i should make another code structure?
My CsvParser looks as follows:
require "csv"
module CsvParser
def self.create(csv_file, klass, attribute_order, value_parser = nil)
parsed_csv = CSV.parse(csv_file, col_sep: "|")
records = []
ActiveRecord::Base.transaction do
parsed_csv.each do |row|
record = Hash.new {|h, k| h[k] = []}
row.each_with_index do |value, index|
record[attribute_order[index]] = value
end
if value_parser.blank?
records << klass.create(record)
else
value_parser.call(record).each do |parsed_record|
records << klass.create(parsed_record)
end
end
end
end
return records
end
end
I'm testing the module itself:
require 'spec_helper'
describe CsvParser do
it "should create relations" do
file = File.new(Rails.root.join('spec/fixtures/files/importrelaties.txt'))
Relation.should_receive(:create).at_least(:once)
Relation.import_csv(file).should be_kind_of Array
end
it "should create mutations" do
file = File.new(Rails.root.join('spec/fixtures/files/importmutaties.txt'))
Mutation.should_receive(:create).at_least(:once)
Mutation.import_csv(file).should be_kind_of Array
end
it "should create strategies" do
file = File.new(Rails.root.join('spec/fixtures/files/importplan.txt'))
Strategy.should_receive(:create).at_least(:once)
Strategy.import_csv(file).should be_kind_of Array
end
it "should create reservations" do
file = File.new(Rails.root.join('spec/fixtures/files/importreservering.txt'))
Reservation.should_receive(:create).at_least(:once)
Reservation.import_csv(file).should be_kind_of Array
end
end
Some interesting questions. A couple of notes:
You probably shouldn't have a return within the lambda. Just make the last statement [h].
If I understand the code correctly, the first and second lines of your lambda are overcomplicated. Reduce them to make them more readable and easier to refactor:
h["is_credit"] = (h['is_credit'] == 'B') # I *think* that will do the same
h['sum'] = h['sum'].to_f # Your original code would have left this a string
h['sum'] *= -1 unless h['is_credit']
It looks like your lambda doesn't depend on anything external (aside from h), so I would test it separately. You could even make it a constant:
class Mutation < ActiveRecord::Base
extend CsvParser # <== See point 5 below
PARSE_CREDIT_AND_SUM = lambda do |h|
h["is_credit"] = (h['is_credit'] == 'B')
h['sum'] = h['sum'].to_f
h['sum'] *= -1 unless h['is_credit']
[h]
end
Without knowing the rationale, it's hard to say where you should put this code. My gut instinct is that it is not the job of the CSV parser (although a good parser may detect floating point numbers and convert them from strings?) Keep your CSV parser reusable. (Note: Re-reading, I think you've answered this question yourself - it is business logic, tied to the model. Go with your gut!)
Lastly, you are defining and the method CsvParser.create. You don't need to extend CsvParser to get access to it, although if you have other facilities in CsvParser, consider making CsvParser.create a normal module method called something like create_from_csv_file
i have to parse csv file which has customers and product they have ordered . customers can repeat for different product . i have to get all the unique customers and products they have ordered. Then print out each customer and there product . i have been asked to do in a object oriented way so
1) should i create a customer objects and have a product as there attribute
2) just write a program using foreach and loop through and store customer and product in a hash and print it out .
what throws me off is i have been asked to do it in a object oriented way. if do it by creating objects how can i store a custom object in memory ? so that if i come across a customer second time i have to add the product and at the end i have to loop through all the objects and print it out . sorry i have bad English thanks reading a long question and for the help.
How can you store a custom object in memory? By creating the object and keeping it in a list, hash, or whatever seems appropriate. (Probably a hash, with the key being whatever unique value you have in your CSV, and the value would be a collection of products.)
Being asked to do it in "an object-oriented way" is a little arbitrary, though.
If you are using FasterCSV or Ruby 1.9 you can extend the parser allowing you to map each CSV row to a custom object.
# http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-load
# http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html#method-c-dump
# https://github.com/JEG2/faster_csv/blob/master/test/tc_serialization.rb
require 'csv'
class Person
attr_accessor :id, :name, :email
def self.csv_load(meta, headers, row)
person = Person.new
headers.each.with_index { |h,i|
person.send "#{h}=", row[i]
}
person
end
def self.parse(csv)
meta = "class,#{self.to_s}\n"
CSV.load("#{meta}#{csv}")
end
def dump
self.class.dump([self])
end
def self.dump(people, io='', options={})
CSV.dump(people, io, options).strip
end
def self.csv_meta
[]
end
def csv_headers
%w(id name email)
end
def csv_dump(headers)
headers.map { |h| self.instance_variable_get "##{h}" }
end
end
CSV_DUMP = <<-CSV
class,Person
id=,name=,email=
1,"First Dude",dude#company.com
2,"Second Dude",2nddude#company.com
3,"Third Dude",3rddude#company.com
CSV
CSV_INPUT = <<-CSV
id,name,email
1,"First Dude",dude#company.com
2,"Second Dude",2nddude#company.com
3,"Third Dude",3rddude#company.com
CSV
CSV_DUMP2 = <<-CSV
class,Person
#{CSV_INPUT}
CSV
people = Person.parse(CSV_INPUT)
puts people.inspect
dumped = Person.dump(people)
puts dumped
puts "----"
puts Person.parse(dumped).inspect