I'm trying to serialize an S3 Object so that I can deserialize at a later time. Deserialization is failing to grab the Object's class and is not grouping the object's variables. Here's my current code:
require 'yaml'
def serialize_array_of_objects(array, filename)
unless array.empty?
File.open(filename, "w+") do |f|
array.each { |element|
serialized_object = YAML::dump(element)
f.write(serialized_object)
}
end
end
end
Here's the contents of the file (redacted):
--- !ruby/struct:Aws::S3::Types::Object
key: file1.csv
last_modified: 2019-03-24 17:24:41.000000000 Z
etag: '"REDACTED"'
size: 41248
storage_class: STANDARD
owner:
--- !ruby/struct:Aws::S3::Types::Object
key: file2.csv
last_modified: 2019-04-24 15:30:41.000000000 Z
etag: '"REDACTED"'
size: 33527
storage_class: STANDARD
owner:
To deserialize the objects I'm using this code:
def serialized_file_to_array(filename)
array = []
File.open(filename, "r").each { |line|
array << YAML::load(line)
}
return array
end
My problem is that the object get's distorted on load. Here's the array now:
[nil, {"key"=>"file1.csv"}, {"last_modified"=>2019-03-24 17:24:41 UTC}, {"etag"=>"\"REDACTED\""}, {"size"=>41248}, {"storage_class"=>"STANDARD"}, {"owner"=>nil}, nil, {"key"=>"file2.csv"}, {"last_modified"=>2019-04-24 15:30:41 UTC}, {"etag"=>"\"REDACTED\""}, {"size"=>33527}, {"storage_class"=>"STANDARD"}, {"owner"=>nil}]
I need to be able to pull the object key values in the deserialized version.
The issue is you dump objects resulting in several lines in the yaml file, but you load back lines deserializing lines. Each line does not obviously contain the whole object, that’s why you get an array hashes (one per line) back.
You need to collect lines until the new object marker is there or to read the whole file content, split it into objects with e. g. regular expression and load split objects.
The first approach would be like:
File.readlines(FILE).
each_with_object([[], []]) do |line, (inner_acc, outer_acc)|
if line.start_with?('---')
outer_acc << YAML.load(inner_acc.join) unless inner_acc.empty?
inner_acc.clear << line
else
inner_acc << line
end
end.tap do |inner_acc, outer_acc|
break outer_acc << YAML.load(inner_acc.join) # last chunk
end
With regular expression, it should be even simpler.
Related
I am writing a class that takes a CSV files, transforms it, and then writes the new data out.
module Transformer
class Base
def initialize(file)
#file = file
end
def original_data(&block)
opts = { headers: true }
CSV.open(file, 'rb', opts, &block)
end
def transformer
# complex manipulations here like modifying columns, picking only certain
# columns to put into new_data, etc but simplified to `+10` to keep
# example concise
-> { |row| new_data << row['some_header'] + 10 }
end
def transformed_data
self.original_data(self.transformer)
end
def write_new_data
CSV.open('new_file.csv', 'wb', opts) do |new_data|
transformed_data
end
end
end
end
What I'd like to be able to do is:
Look at the transformed data without writing it out (so I can test that it transforms the data correctly, and I don't need to write it to file right away: maybe I want to do more manipulation before writing it out)
Don't slurp all the file at once, so it works no matter the size of the original data
Have this as a base class with an empty transformer so that instances only need to implement their own transformers but the behavior for reading and writing is given by the base class.
But obviously the above doesn't work because I don't really have a reference to new_data in transformer.
How could I achieve this elegantly?
I can recommend one of two approaches, depending on your needs and personal taste.
I have intentionally distilled the code to just its bare minimum (without your wrapping class), for clarity.
1. Simple read-modify-write loop
Since you do not want to slurp the file, use CSV::Foreach. For example, for a quick debugging session, do:
CSV.foreach "source.csv", headers: true do |row|
row["name"] = row["name"].upcase
row["new column"] = "new value"
p row
end
And if you wish to write to file during that same iteration:
require 'csv'
csv_options = { headers: true }
# Open the target file for writing
CSV.open("target.csv", "wb") do |target|
# Add a header
target << %w[new header column names]
# Iterate over the source CSV rows
CSV.foreach "source.csv", **csv_options do |row|
# Mutate and add columns
row["name"] = row["name"].upcase
row["new column"] = "new value"
# Push the new row to the target file
target << row
end
end
2. Using CSV::Converters
There is a built in functionality that might be helpful - CSV::Converters - (see the :converters definition in the CSV::New documentation)
require 'csv'
# Register a converter in the options hash
csv_options = { headers: true, converters: [:stripper] }
# Define a converter
CSV::Converters[:stripper] = lambda do |value, field|
value ? value.to_s.strip : value
end
CSV.open("target.csv", "wb") do |target|
# same as above
CSV.foreach "source.csv", **csv_options do |row|
# same as above - input data will already be converted
# you can do additional things here if needed
end
end
3. Separate input and output from your converter classes
Based on your comment, and since you want to minimize I/O and iterations, perhaps extracting the read/write operations from the responsibility of the transformers might be of interest. Something like this.
require 'csv'
class NameCapitalizer
def self.call(row)
row["name"] = row["name"].upcase
end
end
class EmailRemover
def self.call(row)
row.delete 'email'
end
end
csv_options = { headers: true }
converters = [NameCapitalizer, EmailRemover]
CSV.open("target.csv", "wb") do |target|
CSV.foreach "source.csv", **csv_options do |row|
converters.each { |c| c.call row }
target << row
end
end
Note that the above code still does not handle the header, in case it was changed. You will probably have to reserve the last row (after all transformations) and prepend its #headers to the output CSV.
There are probably plenty other ways to do it, but the CSV class in Ruby does not have the cleanest interface, so I try to keep code that deals with it as simple as I can.
I have .csv file with rows of which every row represents one call with certain duration, number etc. I need to create array of Call objects - every Call.new expects Hash of parameters, so it's easy - it just takes rows from CSV. But for some reason it doesn't work - when I invoke Call.new(raw_call) it's nil.
It's also impossible for me to see any output - I placed puts in various places in code (inside blocks etc) and it simply doesn't show anything. I obviously have another class - Call, which holds initialize for Call etc.
require 'csv'
class CSVCallParser
attr_accessor :io
def initialize(io)
self.io = io
end
NAMES = {
a: :date,
b: :service,
c: :phone_number,
d: :duration,
e: :unit,
f: :cost
}
def run
parse do |raw_call|
parse_call(raw_call)
end
end
private
def parse_call(raw_call)
NAMES.each_with_object({}) do |name, title, memo|
memo[name] = raw_call[title.to_s]
end
end
def parse(&block)
CSV.parse(io, headers: true, header_converters: :symbol, &block)
end
end
CSVCallParser.new(ARGV[0]).run
Small sample of my .csv file: headers and one row:
"a","b","c","d","e","f"
"01.09.2016 08:49","International","48627843111","0:29","","0,00"
I noticed a few things that isn't going as expected. In the parse_call method,
def parse_call(raw_call)
NAMES.each_with_object({}) do |name, title, memo|
memo[name] = raw_call[title.to_s]
end
end
I tried to print name, title, and memo. I expected to get :a, :date, and {}, but what I actually got was [:a,:date],{}, and nil.
Also, raw_call headers are :a,:b,:c..., not :date, :service..., so you should be using raw_call[name], and converting that to string will not help, since the key is a symbol in the raw_call.
So I modified the function to
def parse_call(raw_call)
NAMES.each_with_object({}) do |name_title, memo|
memo[name_title[1]] = raw_call[name_title[0]]
end
end
name_title[1] returns the title (:date, :service, etc)
name_title[0] returns the name (:a, :b, etc)
Also, in this method
def run
parse do |raw_call|
parse_call(raw_call)
end
end
You are not returning any results you get, so you are getting nil,
So, I changed it to
def run
res = []
parse do |raw_call|
res << parse_call(raw_call)
end
res
end
Now, if I output the line
p CSVCallParser.new(File.read("file1.csv")).run
I get (I added two more lines to the csv sample)
[{:date=>"01.09.2016 08:49", :service=>"International", :phone_number=>"48627843111", :duration=>"0:29", :unit=>"", :cost=>"0,00"},
{:date=>"02.09.2016 08:49", :service=>"International", :phone_number=>"48622454111", :duration=>"1:29", :unit=>"", :cost=>"0,00"},
{:date=>"03.09.2016 08:49", :service=>"Domestic", :phone_number=>"48627843111", :duration=>"0:29", :unit=>"", :cost=>"0,00"}]
If you want to run this program from the terminal like so
ruby csv_call_parser.rb calls.csv
(In this case, calls.csv is passed in as an argument to ARGV)
You can do so by modifying the last line of the ruby file.
p CSVCallParser.new(File.read(ARGV[0])).run
This will also return the array with hashes like before.
csv = CSV.parse(csv_text, :headers => true)
puts csv.map(&:to_h)
outputs:
[{a:1, b:1}, {a:2, b:2}]
I have the following json...
{
"NumPages":"17",
"Page":"1",
"PageSize":"50",
"Total":"808",
"Start":"1",
"End":"50",
"FirstPageUri":"/v3/results?PAGE=1",
"LastPageUri":"/v3/results?PAGE=17",
"PreviousPageUri":"",
"NextPageUri":"/v3/results?PAGE=2",
"User":[
{
"RowNumber":"1",
"UserId":"86938",
"InternalId":"",
"CompletionPercentage":"100",
"DateTimeTaken":"2014-06-18T01:43:25Z",
"DateTimeLastUpdated":"2014-06-18T01:58:11Z",
"DateTimeCompleted":"2014-06-18T01:58:11Z",
"Account":{
"Id":"655",
"Name":"Technical Community College"
},
"FirstName":"Matthew",
"LastName":"Knice",
"EmailAddress":"knice#gmail.com",
"AssessmentResults":[
{
"Title":"Life Factors",
"Code":"LifeFactors",
"IsComplete":"1",
"AttemptNumber":"1",
"Percent":"58",
"Readiness":"fail",
"DateTimeCompleted":"2014-06-18T01:46:00Z"
},
{
"Title":"Learning Styles",
"Code":"LearnStyles",
"IsComplete":"0"
},
{
"Title":"Personal Attributes",
"Code":"PersonalAttributes",
"IsComplete":"1",
"AttemptNumber":"1",
"Percent":"52.08",
"Readiness":"fail",
"DateTimeCompleted":"2014-06-18T01:49:00Z"
},
{
"Title":"Technical Competency",
"Code":"TechComp",
"IsComplete":"1",
"AttemptNumber":"1",
"Percent":"100",
"Readiness":"pass",
"DateTimeCompleted":"2014-06-18T01:51:00Z"
},
{
"Title":"Technical Knowledge",
"Code":"TechKnowledge",
"IsComplete":"1",
"AttemptNumber":"1",
"Percent":"73.44",
"Readiness":"question",
"DateTimeCompleted":"2014-06-18T01:58:00Z"
},
{
"Title":"Reading Rate & Recall",
"Code":"Reading",
"IsComplete":"0"
},
{
"Title":"Typing Speed & Accuracy",
"Code":"Typing",
"IsComplete":"0"
}
]
},
{
"RowNumber":"2",
"UserId":"8654723",
"InternalId":"",
"CompletionPercentage":"100",
"DateTimeTaken":"2014-06-13T14:37:59Z",
"DateTimeLastUpdated":"2014-06-13T15:00:12Z",
"DateTimeCompleted":"2014-06-13T15:00:12Z",
"Account":{
"Id":"655",
"Name":"Technical Community College"
},
"FirstName":"Virginia",
"LastName":"Bustas",
"EmailAddress":"bigBusta#students.college.edu",
"AssessmentResults":[
{
...
I need to start processing where you see "User:" The stuff at the beginning (numpages, page, ect) I want to ignore. Here is the processing script I am working on...
require 'csv'
require 'json'
CSV.open("your_csv.csv", "w") do |csv| #open new file for write
JSON.parse(File.open("sample.json").read).each do |hash| #open json to parse
csv << hash.values
end
end
Right now this fails with the error:
convert.rb:6:in `block (2 levels) in <main>': undefined method `values' for ["NumPages", "17"]:Array (NoMethodError)
I have ran the json through a parser, and it seems to be valid. What is the best way to only process the "User" data?
You have to look at the structure of the JSON object being created. Here's a very small subset of your document being parsed, which makes it easier to see and understand:
require 'json'
foo = '{"NumPages":17,"User":[{"UserId":12345}]}'
bar = JSON[foo]
# => {"NumPages"=>17, "User"=>[{"UserId"=>12345}]}
bar['User'].first['UserId'] # => 12345
foo contains the JSON for a hash. bar contains the Ruby object created by the JSON parser after it reads foo.
User is the key pointing to an array of hashes. Because it's an array, you have to specify which of the hashes in the array you want to look at, which is what bar['User'].first does.
An alternate way to access that sub-hash is:
bar['User'][0]['UserId'] # => 12345
If there were multiple hashes inside the array, you could access them by using the appropriate index value. For example, if there are two hashes, and I want the second one:
foo = '{"NumPages":17,"User":[{"UserId":12345},{"UserId":12346}]}'
bar = JSON[foo]
# => {"NumPages"=>17, "User"=>[{"UserId"=>12345}, {"UserId"=>12346}]}
bar['User'].first['UserId'] # => 12345
bar['User'][0]['UserId'] # => 12345
bar['User'][1]['UserId'] # => 12346
I'm wondering if I am going down the wrong road with the JSON.parse(File.open("sample.json").read).each do |hash|?
Yes, you are. You need to understand what you're doing, and break your code into digestible pieces so they make sense to you. Consider this:
require 'csv'
require 'json'
json_object = JSON.parse(File.read("sample.json"))
CSV.open("your_csv.csv", "w") do |csv| #open new file for write
csv << %w[RowNumber UserID AccountID AccountName FirstName LastName EmailAddress]
json_object['User'].each do |user_hash|
puts 'RowNumber: %s' % user_hash['RowNumber']
puts 'UserID: %s' % user_hash['UserID']
account = user_hash['UserID']['Account']
puts 'Account->Id: %s' % account['Id']
puts 'Account->Name: %s' % account['Name']
puts 'FirstName: %s' % user_hash['FirstName']
puts 'LastName: %s' % user_hash['LastName']
puts 'EmailAddress: %s' % user_hash['EmailAddress']
csv << [
user_hash['RowNumber'],
user_hash['UserID'],
account['Id'],
account['Name'],
user_hash['FirstName'],
user_hash['LastName'],
user_hash['EmailAddress']
]
end
end
This reads the JSON file and parses it into a Ruby object immediately. There is no special magic or anything else that happens with the file, it's opened, read, closed, and its content is passed to the JSON parser and assigned to json_object.
Once parsed, the CSV file is opened and a header row is written. It could have been written as part of the open statement but this is clearer for explaining what's going on.
json_object is a hash, so to access the 'User' data you have to use a normal hash access json_object['User']. The value for the User key is an array of hashes, so those need to be iterated over, which is what json_object['User'].each does, passing the hash elements of that array into the block as user_hash.
Inside that block it's pretty much the same thing as access the value for 'User', each "element" is a key/value pair, except 'Account' which is an embedded hash.
Read the error message. each called on a hash is giving you a sequence of arrays with two members (the key and value together). There is no values method on an array. And in any case if what you have is a hash there seems little point cycling through it with each; if you want the "User" entry in the hash, why don't you ask for it up front?
Just for posterity and context this is the script I ended up using in its entity. I needed to pull from a url, and process the results and move them to a simple CSV. I needed to wite the student id, first name, last name, and the score from each of 4 assessments to the csv.
require 'csv'
require 'json'
require 'curb'
c = Curl::Easy.new('myURL/m/v3/results')
c.http_auth_types = :basic
c.username = 'myusername'
c.password = 'mypassword'
c.perform
json_object = JSON.parse(c.body_str)
CSV.open("your_csv.csv", "w") do |csv| #open new file for write
csv << %w[UserID FirstName LastName LifeFactors PersonalAttributes TechComp TechKnowledge]
json_object['User'].each do |user_hash|
csv << [
user_hash['UserId'],
user_hash['FirstName'],
user_hash['LastName'],
user_hash['AssessmentResults'][0]['Percent'],
user_hash['AssessmentResults'][2]['Percent'],
user_hash['AssessmentResults'][3]['Percent'],
user_hash['AssessmentResults'][4]['Percent']
]
end
end
I have a question about Ruby. What I want to do is first to sort my items ascending and then write them out to a CSV-file. Now, the problem is further complicated by the fact that I want to iterate over a lot of CSV-files. I found this thread and the answer looks fine, but I am not able to get more than the last line written to my output file.
How can I get the whole data sorted and written to different CSV-files?
My code:
require 'date'
require 'csv'
class Daily <
# Daily has a open
Struct.new(:open)
# a method to print out a csv record for the current Daily.
def print_csv_record
printf("%s,", open)
printf("\n")
end
end
#------#
# MAIN #
#------#
# This is where I iterate over my csv-files:
foobar = ['foo', 'bar']
foobar.each do |foobar|
# get the input filename from the command line
input_file = "#{foobar}.csv"
# define an array to hold the Daily records
arr = Array.new
# loop through each record in the csv file, adding
# each record to my array while overlooking the header.
f = File.open(input_file, "r")
f.each_with_index { |row, i|
next if i == 0
words = row.split(',')
p = Daily.new
# do a little work here to convert my numbers
p.open = words[1].to_f
arr.push(p)
}
# sort the data by ascending opens
arr.sort! { |a,b| a.open <=> b.open }
# print out all the sorted records (just print to stdout)
arr.each { |p|
CSV.open("#{foobar}_new.csv", "w") do |csv|
csv << p.print_csv_record
end
}
end
My input CSV-file:
Open
52.23
52.45
52.36
52.07
52.69
52.38
51.2
50.99
51.41
51.89
51.38
50.94
49.55
50.21
50.13
50.14
49.49
48.5
47.92
My output CSV-file:
47.92
You need to put the iteration inside the open CSV file:
CSV.open("#{foobar}_new.csv", "w") do |csv|
arr.each { |p|
csv << p.print_csv_record
}
end
I try to use the ruby standard csv lib to dump out the arr of object to a csv.file , called 'a.csv'
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html#method-c-dump
dump(ary_of_objs, io = "", options = Hash.new)
but in this method, how can i dump into a file?
there is no such examples exists and help. I google it no example to do for me...
Also, the docs said that...
The next method you can provide is an instance method called
csv_headers(). This method is expected to return the second line of
the document (again as an Array), which is to be used to give each
column a header. By default, ::load will set an instance variable if
the field header starts with an # character or call send() passing the
header as the method name and the field value as an argument. This
method is only called on the first object of the Array.
Anyone knows how to pass the instance method csv_headers() to this dump function?
I haven't tested this out yet, but it looks like io should be set to a file. According to the doc you linked "The io parameter can be used to serialize to a File"
Something like:
f = File.open("filename")
dump(ary_of_objs, io = f, options = Hash.new)
The accepted answer doesn't really answer the question so I thought I'd give a useful example.
First of all if you look at the docs at http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html, if you hover over the method name for dump you see you can click to show source. If you do that you'll see that the dump method attempts to call csv_headers on the first object you pass in from ary_of_objs:
obj_template = ary_of_objs.first
...snip...
headers = obj_template.csv_headers
Then later you see that the method will call csv_dump on each object in ary_of_objs and pass in the headers:
ary_of_objs.each do |obj|
begin
csv << obj.csv_dump(headers)
rescue NoMethodError
csv << headers.map do |var|
if var[0] == #
obj.instance_variable_get(var)
else
obj[var[0..-2]]
end
end
end
end
So we need to augment each entry in array_of_objs to respond to those two methods. Here's an example wrapper class that would take a Hash, and return the hash keys as the CSV headers and then be able to dump each row based on the headers.
class CsvRowDump
def initialize(row_hash)
#row = row_hash
end
def csv_headers
#row.keys
end
def csv_dump(headers)
headers.map { |h| #row[h] }
end
end
There's one more catch though. This dump method wants to write an extra line at the top of the CSV file before the headers, and there's no way to skip that if you call this method due to this code at the top:
# write meta information
begin
csv << obj_template.class.csv_meta
rescue NoMethodError
csv << [:class, obj_template.class]
end
Even if you return '' from CsvRowDump.csv_meta that will still be a blank line where a parse expects the headers. So instead lets let dump write that line and then remove it afterwards when we call dump. This example assumes you have an array of hashes that all have the same keys (which will be the CSV header).
#rows = #hashes.map { |h| CsvRowDump.new(h) }
File.open(#filename, "wb") do |f|
str = CSV::dump(#rows)
f.write(str.split(/\n/)[1..-1].join("\n"))
end