Calculate elements in CSV file - ruby

I am new to Ruby and now I have an issue while I try to calculate some elements.
I've got 6 CSV files with the same headers and the question is how to find the total amount of payments for each payed month.
01-test.csv
Payment date,Payable month,House,Apartment,Amount of payment
2014-09-14,2014-08,Panel,84,5839.77
2014-09-14,2014-08,Brick,118,4251.63
2014-09-14,2014-08,Brick,97,471.5
2014-09-14,2014-08,Panel,53,236.22
2014-09-14,2014-08,Panel,83,4220.77
.......
02-test.csv
Payment date,Payable month,House,Apartment,Amount of payment
2014-10-01,2014-08,Brick,34,1522.59
2014-10-01,2014-08,Brick,117,1285.57
2014-10-01,2014-08,Brick,136,1925.97
2014-10-01,2014-08,Brick,24,1032.95
2014-10-01,2014-08,Brick,113,957.01
.......
Here is my code:
def create_month_array(payments)
months = []
months = payments.uniq { |a| a[:payed_for]
months
end
def payed_for_each_month(payments, months)
sums = Array.new(months.length){|a| a = 0}
months.each{|a|
if(a[:payed_for] == payments.each{|x| x[:payed_for]})
.....
end
}
p sum
sum.round(2)
end
Thanks for any hints.

Suppose the data were read from files into strings.
str1 =<<_
2014-09-14,2014-08,Panel,84,5839.77
2014-09-14,2014-08,Brick,118,4251.63
2014-09-14,2014-09,Brick,97,471.5
2014-09-14,2014-10,Panel,53,236.22
2014-09-14,2014-10,Panel,83,4220.77
_
str2 =<<_
2014-10-01,2014-08,Brick,34,1522.59
2014-10-01,2014-09,Brick,117,1285.57
2014-10-01,2014-09,Brick,136,1925.97
2014-10-01,2014-10,Brick,24,1032.95
2014-10-01,2014-11,Brick,113,957.01
_
We can then combine the strings into a single string, convert it an array of lines and then use a counting hash to aggregate values for each payable month, which I assume to be the values of the second field. See Hash::new, specifically when new is assigned an argument equal to the default value (here 0).
(str1 + str2).lines.each_with_object(Hash.new(0)) do |line,h|
_, payable_month, _, _, amount = line.split(',')
h[payable_month] += amount.to_f
end
#=> {"2014-08"=>11613.990000000002, (5839.77 + 4251.63 + 1522.59)
# "2014-09"=>3683.04, ( 471.5 + 1285.57 + 1925.97)
# "2014-10"=>5489.9400000000005, ( 236.22 + 4220.77 + 1032.95)
# "2014-11"=>957.01} ( 957.01)
If a hash h is defined
h = Hash.new(0)
Ruby expands h[payable_month] += amount.to_f to
h[payable_month] = h[payable_month] + amount.to_f
If h has no key payable_month, h[payable_month] on the right of the equality sign returns the default value. Hence,
h[payable_month] = 0 + amount.to_f
#=> amount.to_f
Note we could have alternatively written
(str1.lines + str2.lines).each_with_object(Hash.new(0))...
or we could have read each file line-by-line and written all those lines to one file.

To combine all CSV data across multiple files use the following:
csv_files = ["01-test.csv", "02-test.csv", "03-test.csv", "04-test.csv", "05-test.csv", "06-test.csv"]
csv_data = CSV.generate(headers: :first_row) do |csv|
csv << CSV.open(csv_files.first).readline
csv_files.each do |csv_file|
CSV.read(csv_file)[1..-1].each { |row| csv << row }
end
end
To then to the calculate the sum of each "Payable month" (or "Payment date",
it was not clear which was the payed month), you do the following
Interpret the data, using Ruby's CSV library
data = CSV.parse(csv_data, headers: true)
Group the data by the payed month
month_array = data.group_by { |row| row["Payable month"] }
# month_array = data.group_by { |row| row["Payment date"][0..6] }
Chose either line and comment out the other
For each month get the sum/reduce of all the "Amount of payment" into a
total for that month within our collection of totals
payed_for_each_month = month_array.each_with_object({}) do |(month, rows), totals|
totals[month] = rows.reduce(0.0) { |sum, row| sum + row["Amount of payment"].to_f }
end
This produces the final result with the presented data
payed_for_each_month
# => {"2014-08"=>21743.98}
If "Payment date" month was used instead the totals would produce the following:
month_array = data.group_by { |row| row["Payment date"][0..6] }
# ...
payed_for_each_month
# => {"2014-09"=>15019.890000000001,
# "2014-10"=>6724.09}
All the code together:
data = CSV.parse(csv_data, headers: true)
month_array = data.group_by { |row| row["Payable month"] }
# month_array = data.group_by { |row| row["Payment date"][0..6] }
payed_for_each_month = month_array.each_with_object({}) do |(month, rows), totals|
totals[month] = rows.reduce(0.0) { |sum, row| sum + row["Amount of payment"].to_f }
end
payed_for_each_month
# => {"2014-08"=>21743.98}
References:
group_by
reduce
each_with_object

Related

Sum matrixes as numbers not strings ruby

I am trying to sum 2 matrixes from a CSV file
Currently, I put them into to arrays and then transform the array into matrixes. However, when I print them, I get concatenated strings not summed integers.
require 'csv'
require 'matrix'
matrix1 = "./matrix1.csv"
matrix2 = "./matrix2.csv"
line_count = 0
elements_in_line_count = 0
arr1 = Array.new #=> []
arr2 = Array.new #=> []
CSV.foreach(matrix1) do |row|
arr1 << row
line_count += 1
elements_in_line_count = row.size
end
n1 = elements_in_line_count
m1 = line_count
# find n and m of second matrix
line_count = 0
elements_in_line_count = 0
CSV.foreach(matrix2) do |row|
# print row
arr2 << row
line_count += 1
elements_in_line_count = row.size
end
puts Matrix.rows(arr1) + Matrix.rows(arr2)
For example, CSV 1 is:
1,2
3,4
Same for CSV 2.
The output is
Matrix[[11, 22], [33, 44]]
But I want it to be [2,4],[6,8]
When you read in the CSV, by default it reads in all the rows/columns as strings, the Ruby CSV class can take an optional parameter to foreach and new and similar methods called :converters that it will use to convert each applicable column. One of the converters it can take is
:integer
Converts any field Integer() accepts.
So you can also change your code to look like:
csv_options = { converters: [:integer] }
CSV.foreach(matrix1, csv_options) do |row|
# ...
CSV.foreach(matrix2, csv_options) do |row|
to achieve results similar to calling map(&:to_i) on each row.
[m1, m2].map do |m|
CSV.foreach(m).map { |row| row.map(&:to_i) }
end.reduce do |m1, m2|
m1.map.with_index do |row, idx|
row.zip(m2[idx]).map { |cell1, cell2| cell1 + cell2 }
end
end
When you're reading in the CSV, all columns will be strings, so you'll have to manually do the conversion to a number in the code.
If all of the columns of the CSV are intended to be numbers, you can add .map(&:to_i) to the row line. Like this:
CSV.foreach(matrix1) do |row|
arr1 << row.map(&:to_i) # <-- this will turn everything in the row into a number
line_count += 1
elements_in_line_count = row.size
end
As you want to add matrices, consider using Ruby's built-in Matrix class, and the instance method Matrix#+ in particular.
Let's first construct three CSV files.
fname1 = 't1.csv'
fname2 = 't2.csv'
fname3 = 't3.csv'
File.write(fname1, "1,2\n3,4")
#=> 7
File.write(fname2, "100,200\n300,400")
#=> 15
File.write(fname3, "1000,2000\n3000,4000")
#=> 19
We can sum the underlying matrices as follows.
require 'csv'
require 'matrix'
fnames = [fname1, fname2, fname3]
fnames.drop(1).reduce(matrix_from_CSV(fnames.first)) do |t,fname|
t + matrix_from_CSV(fname)
end.to_a
#=> [[1101, 2202],
# [3303, 4404]]
def matrix_from_CSV(fname)
Matrix[*CSV.read(fname, converters: [:integer])]
end
I borrowed converters: [:integer] from #Simple's answer. I wasn't aware of that.

Modifying reference to hash value in Ruby

I'm new to Ruby and I have a JSON data set that I am de-identifying using stympy's Faker. I would prefer to change the values in the Hash by reference.
I've tried changing the assignments eg. key['v] = namea[1] to data['cachedBook']['rows'][key][value] = namea[1] but I get a no implicit conversion of Array into String error. Which makes sense since each is an array in itself, but I'm unsure as to how proceed on this.
A single row e.g. data['cachedBook']['rows'] looks like this:
[{"v":"Sijpkes_PreviewUser","c":"LN","uid":"9######","iuid":"3####7","avail":true,"sortval":"Sijpkes_PreviewUser"},
{"v":"Paul","c":"FN","sortval":"Paul"},
{"v":"#####_previewuser","c":"UN"},
{"v":"","c":"SI"},{"v":"30 June 2016","c":"LA","sortval":1467261918000},
{"v":"Available","c":"AV"},[],[],[],[],[],[],
{"v":"-","tv":"","numAtt":"0","c":"374595"},[],[],
{"v":"-","tv":"","numAtt":"0","c":"374596"},[],[],[],
{"v":0,"tv":"0.0","mp":840,"or":"y","c":"362275"},
{"v":0,"tv":"0.0","mp":99.99999,"or":"y","c":"389721"}]
The key and value are interpreted as the first two entries.
Sensitive data has been removed with ####s.
Ruby code:
data['cachedBook']['rows'].each do |key, value|
fullname = Faker::Name.name
namea = fullname.split(' ')
str = "OLD: " + String(key['v']) + " " + String(value['v']) +"\n";
puts str
if ["Ms.", "Mr.", "Dr.", "Miss", "Mrs."].any? { |needle| fullname.include? needle }
key['v'] = namea[2]
value['v'] = namea[1]
value['sortval'] = namea[1]
else
key['v'] = namea[1]
value['v'] = namea[0]
value['sortval'] = namea[1]
end
str = "\nNEW: \nFullname: "+String(fullname)+"\nConverted surname: "+ String(key['v']) + "\n\t firstname: " + String(value['v'])
puts str
end
puts data
OK, this has been an excellent learning exercise!
The problem I was having was in two parts:
the JSON output from JSON.parse was a Hash, but the Hash was storing Arrays, so my code was breaking. Looking at the sample data rows above, it includes some empty arrays: ... [],[],[] ....
I misunderstood how each was working with a Hash, I assumed key, value (similar to jquery each) but the key, value in the original each statement actually evaluated to the first two array elements.
So here is my amended code:
data['cachedBook']['rows'].map! { |row|
fullname = Faker::Name.name
namea = fullname.split(' ')
row.each { |val|
if val.class == Hash
newval = val.clone
if ["Ms.", "Mr.", "Dr.", "Miss", "Mrs."].any? { |needle| fullname.include? needle }
if val.key?("c") && val["c"] == "LN"
newval["v"] = namea[1]
newval["sortval"] = namea[1]
end
if val.key?("c") && val["c"] == "FN"
newval["v"] = namea[2]
newval["sortval"] = namea[2]
end
else
if val.key?("c") && val["c"] == "LN"
newval["v"] = namea[0]
newval["sortval"] = namea[0]
end
if val.key?("c") && val["c"] == "FN"
newval["v"] = namea[1]
newval["sortval"] = namea[1]
end
end
val.merge!(newval)
end
}
}

Merging Ranges using Sets - Error - Stack level too deep (SystemStackError)

I have a number of ranges that I want merge together if they overlap. The way I’m currently doing this is by using Sets.
This is working. However, when I attempt the same code with a larger ranges as follows, I get a `stack level too deep (SystemStackError).
require 'set'
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten!
sets_subsets = set.divide { |i, j| (i - j).abs == 1 } # this line causes the error
puts sets_subsets
The line that is failing is taken directly from the Ruby Set Documentation.
I would appreciate it if anyone could suggest a fix or an alternative that works for the above example
EDIT
I have put the full code I’m using here:
Basically it is used to add html tags to an amino acid sequence according to some features.
require 'set'
def calculate_formatting_classes(hsps, signalp)
merged_hsps = merge_ranges(hsps)
sp = format_signalp(merged_hsps, signalp)
hsp_class = (merged_hsps - sp[1]) - sp[0]
rank_format_positions(sp, hsp_class)
end
def merge_ranges(ranges)
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten
end
def format_signalp(merged_hsps, sp)
sp_class = sp - merged_hsps
sp_hsp_class = sp & merged_hsps # overlap regions between sp & merged_hsp
[sp_class, sp_hsp_class]
end
def rank_format_positions(sp, hsp_class)
results = []
results += sets_to_hash(sp[0], 'sp')
results += sets_to_hash(sp[1], 'sphsp')
results += sets_to_hash(hsp_class, 'hsp')
results.sort_by { |s| s[:pos] }
end
def sets_to_hash(set = nil, cl)
return nil if set.nil?
hashes = []
merged_set = set.divide { |i, j| (i - j).abs == 1 }
merged_set.each do |s|
hashes << { pos: s.min.to_i - 1, insert: "<span class=#{cl}>" }
hashes << { pos: s.max.to_i - 0.1, insert: '</span>' } # for ordering
end
hashes
end
working_hsp = [Range.new(7, 136), Range.new(143, 178)]
not_working_hsp = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
sp = Range.new(1, 20).to_set
# working
results = calculate_formatting_classes(working_hsp, sp)
# Not Working
# results = calculate_formatting_classes(not_working_hsp, sp)
puts results
Here is one way to do this:
ranges = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
ranges.size.times do
ranges = ranges.sort_by(&:begin)
t = ranges.each_cons(2).to_a
t.each do |r1, r2|
if (r2.cover? r1.begin) || (r2.cover? r1.end) ||
(r1.cover? r2.begin) || (r1.cover? r2.end)
ranges << Range.new([r1.begin, r2.begin].min, [r1.end, r2.end].max)
ranges.delete(r1)
ranges.delete(r2)
t.delete [r1,r2]
end
end
end
p ranges
#=> [73..2914, 3203..3241]
The other answers aren't bad, but I prefer a simple recursive approach:
def merge_ranges(*ranges)
range, *rest = ranges
return if range.nil?
# Find the index of the first range in `rest` that overlaps this one
other_idx = rest.find_index do |other|
range.cover?(other.begin) || other.cover?(range.begin)
end
if other_idx
# An overlapping range was found; remove it from `rest` and merge
# it with this one
other = rest.slice!(other_idx)
merged = ([range.begin, other.begin].min)..([range.end, other.end].max)
# Try again with the merged range and the remaining `rest`
merge_ranges(merged, *rest)
else
# No overlapping range was found; move on
[ range, *merge_ranges(*rest) ]
end
end
Note: This code assumes each range is ascending (e.g. 10..5 will break it).
Usage:
ranges = [ 73..856, 82..1145, 116..2914, 3203..3241 ]
p merge_ranges(*ranges)
# => [73..2914, 3203..3241]
ranges = [ 0..10, 5..20, 30..50, 45..80, 50..90, 100..101, 101..200 ]
p merge_ranges(*ranges)
# => [0..20, 30..90, 100..200]
I believe your resulting set has too many items (2881) to be used with divide, which if I understood correctly, would require 2881^2881 iterations, which is such a big number (8,7927981983090337174360463368808e+9966) that running it would take nearly forever even if you didn't get stack level too deep error.
Without using sets, you can use this code to merge the ranges:
module RangeMerger
def merge(range_b)
if cover?(range_b.first) && cover?(range_b.last)
self
elsif cover?(range_b.first)
self.class.new(first, range_b.last)
elsif cover?(range_b.last)
self.class.new(range_b.first, last)
else
nil # Unmergable
end
end
end
module ArrayRangePusher
def <<(item)
if item.kind_of?(Range)
item.extend RangeMerger
each_with_index do |own_item, idx|
own_item.extend RangeMerger
if new_range = own_item.merge(item)
self[idx] = new_range
return self
end
end
end
super
end
end
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
new_ranges = Array.new
new_ranges.extend ArrayRangePusher
ranges.each do |range|
new_ranges << range
end
puts ranges.inspect
puts new_ranges.inspect
This will output:
[73..856, 82..1145, 116..2914, 3203..3241]
[73..2914, 3203..3241]
which I believe is the intended output for your original problem. It's a bit ugly, but I'm a bit rusty at the moment.
Edit: I don't think this has anything to do with your original problem before the edits which was about merging ranges.

local variable vs instance variable Ruby initialize

I have a class in Ruby where I pass in a Hash of commodity prices. They are in the form
{"date (string)" => price (float), etc, etc}
and in the initialise method I convert the dates to Dates like so:
#data = change_key_format(dates)
But I notice that that method seems to change the original argument as well. Why is that? Here is the code:
def initialize(commodity_name, data)
puts "creating ...#{commodity_name}"
#commodity_name = commodity_name
#data = change_hash_keys_to_dates(data)
#dates = array_of_hash_keys(data)
puts data ######## UNCHANGED
#data = fix_bloomberg_dates(#data, #dates)
puts data ######## CHANGED -------------------- WHY???
#get_price_data
end
def fix_bloomberg_dates(data, dates)
#Fixes the bad date from bloomberg
data.clone.each do |date, price|
#Looks for obvious wrong date
if date < Date.strptime("1900-01-01")
puts dates[1].class
date_gap = (dates[1] - dates[2]).to_i
last_date_day = dates[1].strftime("%a %d %b")
last_date_day = last_date_day.split(" ")
last_date_day = last_date_day[0].downcase
#Correct the data for either weekly or daily prices
#Provided there are no weekend prices
if date_gap == 7 && last_date_day == "fri"
new_date = dates[1] + 7
data[new_date] = data.delete(date)
elsif date_gap == 1 && last_date_day == "thu"
new_date = dates[1] + 4
data[new_date] = data.delete(date)
else
new_date = dates[1] + 1
data[new_date] = data.delete(date)
end
end
end
return data
end
def change_hash_keys_to_dates(hash)
hash.clone.each do |k,v|
date = Date.strptime(k, "%Y-%m-%d")
#Transforms the keys from strings to dates format
hash[date] = hash.delete(k)
end
return hash
end
def array_of_hash_keys(hash)
keys = hash.map do |date, price|
date
end
return keys
end
Because of these lines:
data[new_date] = data.delete(date)
You're modifying the original data object. If you don't want to do this, create a copy of the object:
data2 = data.clone
and then replace all other references to data with data2 in your method (including return data2).

Parse CSV Data with Ruby

I am trying to return a specific cell value based on two criteria.
The logic:
If ClientID = 1 and BranchID = 1, puts SurveyID
Using Ruby 1.9.3, I want to basically look through an excel file and for two specific values located within the ClientID and BranchID column, return the corresponding value in the SurveyID column.
This is what I have so far, which I found during my online searches. It seemed promising, but no luck:
require 'csv'
# Load file
csv_fname = 'FS_Email_Test.csv'
# Key is the column to check, value is what to match
search_criteria = { 'ClientID' => '1',
'BranchID' => '1' }
options = { :headers => :first_row,
:converters => [ :numeric ] }
# Save `matches` and a copy of the `headers`
matches = nil
headers = nil
# Iterate through the `csv` file and locate where
# data matches the options.
CSV.open( csv_fname, "r", options ) do |csv|
matches = csv.find_all do |row|
match = true
search_criteria.keys.each do |key|
match = match && ( row[key] == search_criteria[key] )
end
match
end
headers = csv.headers
end
# Once matches are found, we print the results
# for a specific row. The row `row[8]` is
# tied specifically to a notes field.
matches.each do |row|
row = row[1]
puts row
end
I know the last bit of code following matches.each do |row| is invalid, but I left it in in hopes that it will make sense to someone else.
How can I write puts surveyID if ClientID == 1 & BranchID == 1?
You were very close indeed. Your only error was setting the values of the search_criteria hash to strings '1' instead of numbers. Since you have converters: :numeric in there the find_all was comparing 1 to '1' and getting false. You could just change that and you're done.
Alternatively this should work for you.
The key is the line
Hash[row].select { |k,v| search_criteria[k] } == search_criteria
Hash[row] converts the row into a hash instead of an array of arrays. Select generates a new hash that has only those elements that appear in search_criteria. Then just compare the two hashes to see if they're the same.
require 'csv'
# Load file
csv_fname = 'FS_Email_Test.csv'
# Key is the column to check, value is what to match
search_criteria = {
'ClientID' => 1,
'BranchID' => 1,
}
options = {
headers: :first_row,
converters: :numeric,
}
# Save `matches` and a copy of the `headers`
matches = nil
headers = nil
# Iterate through the `csv` file and locate where
# data matches the options.
CSV.open(csv_fname, 'r', options) do |csv|
matches = csv.find_all do |row|
Hash[row].select { |k,v| search_criteria[k] } == search_criteria
end
headers = csv.headers
end
p headers
# Once matches are found, we print the results
# for a specific row. The row `row[8]` is
# tied specifically to a notes field.
matches.each { |row| puts row['surveyID'] }
Possibly...
require 'csv'
b_headers = false
client_id_col = 0
branch_id_col = 0
survey_id_col = 0
CSV.open('FS_Email_Test.csv') do |file|
file.find_all do |row|
if b_headers == false then
client_id_col = row.index("ClientID")
branch_id_col = row.index("BranchID")
survey_id_col = row.index("SurveyID")
b_headers = true
if branch_id_col.nil? || client_id_col.nil? || survey_id_col.nil? then
puts "Invalid csv file - Missing one of these columns (or no headers):\nClientID\nBranchID\nSurveyID"
break
end
else
puts row[survey_id_col] if row[branch_id_col] == "1" && row[client_id_col] == "1"
end
end
end

Resources