Can anyone think of a elegant way to take a .csv file and multiple two columns together?
I want to calculate a persons body mass index (BMI Wikipaedia) from their height and weight and store it in a third column.
The formula is:
weight/height^2
Should I read the .csv line by line or make an array of arrays?
Data would look like this.
ID,Forename,Surname,height,weight,
0,jack,smith,177,80,
1,dan,barker,178,82,
2,ben,allen,176,93,
3,ian,bell,175,76,
4,tim,hope,174,75,
5,john,smith,165,80,
Thanks
UPDATE:
So far I have two arrays of height and weight
require 'csv'
filename = 'bmi_test.csv'
height = []
weight = []
CSV.foreach(filename, :headers => true) do |row|
height << row[3].to_i
weight << row[4].to_i
end
...and now I have two arrays and I was trying to multiply index 0 from one array with index 0 from the other.
require 'csv'
CSV.open("output.csv", "wb", :headers => true) do |output|
CSV.foreach("input.csv", :headers => true, :return_headers => true) do |row|
if row.header_row?
output << (row << "bmi")
else
output << (row << row['weight'].to_f / (row['height'].to_f / 100) ** 2)
end
end
end
Or if you don't want to output a CSV, you just want the result in an array:
result = []
CSV.foreach("input.csv", :headers => true) do |row|
result << (row << row['weight'].to_f / (row['height'].to_f / 100) ** 2)
end
You should now have an array in which you can access result[0]['bmi'], etc.
Here's one way to do it once you have your two arrays:
bmi = weight.each_with_index.map { |w, i| w.to_f / height[i]**2 }
This could be done with ruby, but for an elegant solution use awk:
awk -F, 'NR==1 { print $0 "bmi"; next } { printf "%s%.2f\n", $0, $5/($4/100)^2 }' file
Results:
ID,Forename,Surname,height,weight,bmi
0,jack,smith,177,80,25.54
1,dan,barker,178,82,25.88
2,ben,allen,176,93,30.02
3,ian,bell,175,76,24.82
4,tim,hope,174,75,24.77
5,john,smith,165,80,29.38
Related
I am trying to sum 2 matrixes from a CSV file
Currently, I put them into to arrays and then transform the array into matrixes. However, when I print them, I get concatenated strings not summed integers.
require 'csv'
require 'matrix'
matrix1 = "./matrix1.csv"
matrix2 = "./matrix2.csv"
line_count = 0
elements_in_line_count = 0
arr1 = Array.new #=> []
arr2 = Array.new #=> []
CSV.foreach(matrix1) do |row|
arr1 << row
line_count += 1
elements_in_line_count = row.size
end
n1 = elements_in_line_count
m1 = line_count
# find n and m of second matrix
line_count = 0
elements_in_line_count = 0
CSV.foreach(matrix2) do |row|
# print row
arr2 << row
line_count += 1
elements_in_line_count = row.size
end
puts Matrix.rows(arr1) + Matrix.rows(arr2)
For example, CSV 1 is:
1,2
3,4
Same for CSV 2.
The output is
Matrix[[11, 22], [33, 44]]
But I want it to be [2,4],[6,8]
When you read in the CSV, by default it reads in all the rows/columns as strings, the Ruby CSV class can take an optional parameter to foreach and new and similar methods called :converters that it will use to convert each applicable column. One of the converters it can take is
:integer
Converts any field Integer() accepts.
So you can also change your code to look like:
csv_options = { converters: [:integer] }
CSV.foreach(matrix1, csv_options) do |row|
# ...
CSV.foreach(matrix2, csv_options) do |row|
to achieve results similar to calling map(&:to_i) on each row.
[m1, m2].map do |m|
CSV.foreach(m).map { |row| row.map(&:to_i) }
end.reduce do |m1, m2|
m1.map.with_index do |row, idx|
row.zip(m2[idx]).map { |cell1, cell2| cell1 + cell2 }
end
end
When you're reading in the CSV, all columns will be strings, so you'll have to manually do the conversion to a number in the code.
If all of the columns of the CSV are intended to be numbers, you can add .map(&:to_i) to the row line. Like this:
CSV.foreach(matrix1) do |row|
arr1 << row.map(&:to_i) # <-- this will turn everything in the row into a number
line_count += 1
elements_in_line_count = row.size
end
As you want to add matrices, consider using Ruby's built-in Matrix class, and the instance method Matrix#+ in particular.
Let's first construct three CSV files.
fname1 = 't1.csv'
fname2 = 't2.csv'
fname3 = 't3.csv'
File.write(fname1, "1,2\n3,4")
#=> 7
File.write(fname2, "100,200\n300,400")
#=> 15
File.write(fname3, "1000,2000\n3000,4000")
#=> 19
We can sum the underlying matrices as follows.
require 'csv'
require 'matrix'
fnames = [fname1, fname2, fname3]
fnames.drop(1).reduce(matrix_from_CSV(fnames.first)) do |t,fname|
t + matrix_from_CSV(fname)
end.to_a
#=> [[1101, 2202],
# [3303, 4404]]
def matrix_from_CSV(fname)
Matrix[*CSV.read(fname, converters: [:integer])]
end
I borrowed converters: [:integer] from #Simple's answer. I wasn't aware of that.
I've got a list of persons saved in an array and I want to loop a file with organizations looking for matches and save them but it keeps going wrong. I think I'm doing something wrong with the arrays.
This is exactly what I'm doing:
I have a list of persons in a file called 'personen_fixed.csv'.
I save that list into an array.
I have another file that also has the name of the people ("pers2"), but also three other interesting columns of data. I save the four columns into arrays.
I want to loop over the first array (the persons) and search for matches with the list of persons ("pers2").
If there is a match I want to save that row.
What I'm getting now is two rows of data, of which one is filled with ALL persons. See my code below. On the bottom i have some sample input data.
require 'csv'
array_pers1 = []
array_pers2 = []
array_orgaan = []
array_functie = []
array_rol = []
filename_1 = 'personen_fixed.csv'
CSV.foreach(filename_1, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
array_pers1 << row[0].to_s
end
filename_2 = 'Functies_fixed.csv'
CSV.foreach(filename_2, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
array_pers2 << row[1].to_s
array_orgaan << row[16].to_s
array_functie << row[17].to_s
array_rol << row[18].to_s
end
CSV.open("testrij.csv", "w") do |row|
row << ["rijnummer","link","ptext","soort_woonhuis"]
for rij in array_pers1
for x in 1...4426 do
if rij === array_pers2["#{x}".to_f]
pers2 = array_pers2["#{x}".to_f]
orgaan = array_orgaan["#{x}".to_f]
functie = array_functie["#{x}".to_f]
rol = array_rol["#{x}".to_f]
row << [pers2,orgaan,functie,rol]
else
pers2 = ""
orgaan = ""
functie = ""
rol = ""
end
end
end
end
input data for the first excel data (excel column name and first row of data):
person
someonesname
Input data for the second excel file:
person,organizationid,role,organization,function
someonesname,34971,member,americanairways,boardofdirectors
Since many of the people in the dataset have multiple jobs at different organizations, I want to save all them next to eachother (output I'm going for):
person,organization(1),function(1),role(1),organization(2),function(2),role(2) (max 5)
I don't understand the purpose of storing a single row from your Functies csv file in 4 separate arrays, and then combining them together later, so my answer doesn't tell you why your approach isn't working. Instead, I suggest a different approach that I believe is cleaner.
Building an array of names from the first file is ok. For the second file, I would store each row as an array and use a hash:
data = {
"name1 => ["name1", "orgaan1", "functie1", "rol1"],
"name2 => ["name2", "orgaan2", "functie2", "rol2"],
...
}
Building it might look like
data = {}
CSV.foreach(filename_2, :col_sep => ";", :encoding => "windows-1251:utf-8", :return_headers => false) do |row|
name = row[1]
orgaan = row[16]
functie = row[17]
rol = row[18]
data[name] = [name, orgaan, functie, rol]
end
Then you would iterate over your first array and keep all the arrays that match
results = []
for name in array_pers1
results << data[name] if data.include?(name)
end
On the other hand, if you don't want to use a hash and insist on using arrays (perhaps because names are not unique), I would still store them like
data = [
["name1", "orgaan1", "functie1", "rol1"],
["name2", "orgaan2", "functie2", "rol2"]
]
And then during your search step you would just iterate like
results = []
for name in array_pers1
for row in data
results << row if row[0] == name
end
end
I want to open a text file with three lines
3 televisions at 722.49
1 carton of eggs at 14.99
2 pairs of shoes at 34.85
and turn it into this:
hash = {
"1"=>{:item=>"televisions", :price=>722.49, :quantity=>3},
"2"=>{:item=>"carton of eggs", :price=>14.99, :quantity=>1},
"3"=>{:item=>"pair of shoes", :price=>34.85, :quantity=>2}
}
I'm very stuck not sure how to go about doing this. Here's what I have so far:
f = File.open("order.txt", "r")
lines = f.readlines
h = {}
n = 1
while n < lines.size
lines.each do |line|
h["#{n}"] = {:quantity => line[line =~ /^[0-9]/]}
n+=1
end
end
No reason for anything this simple to look ugly!
h = {}
lines.each_with_index do |line, i|
quantity, item, price = line.match(/^(\d+) (.*) at (\d+\.\d+)$/).captures
h[i+1] = {quantity: quantity.to_i, item: item, price: price.to_f}
end
File.open("order.txt", "r") do |f|
n,h = 0,{}
f.each_line do |line|
n += 1
line =~ /(\d) (.*) at (\d*\.\d*)/
h[n.to_s] = { :quantity => $1.to_i, :item => $2, :price => $3 }
end
end
hash = File.readlines('/path/to/your/file.txt').each_with_index.with_object({}) do |(line, idx), h|
/(?<quantity>\d+)\s(?<item>.*)\sat\s(?<price>\d+(:?\.\d+)$)/ =~ line
h[(idx + 1).to_s] = {:item => item, :price => price.to_f, :quantity => quantity.to_i}
end
I don't know ruby so feel free to ignore my answer as I'm just making assumptions based on documentation, but I figured I'd provide a non-regex solution since it seems like overkill in a case like this.
I'd assume you can just use line.split(" ") and assign position [0] to quantity, position [-1] to price, and then assign item to [1..-3].join(" ")
Per the first ruby console I could find:
test = "3 televisions at 722.49"
foo = test.split(" ")
hash = {1=>{:item=>foo[1..-3].join(" "),:quantity=>foo[0], :price=>foo[-1]}}
=> {1=>{:item=>"televisions", :quantity=>"3", :price=>"722.49"}}
I want to load a CSV-file with two columns (each with a name and a row of numbers) and save only the numbers of the two columns in two different arrays.
Then I want to make some calculations with the data in those two columns, using two arrays to save the numbers of each column.
This is what I still have:
require 'csv'
filename = 'file.csv'
csv_data = CSV.read(filename, :col_sep => ";")
csv_data.shift
csv_data.each_with_index { |column, index_c|
average = 0
column.each_with_index{ |element, index_e|
csv_data[index_c][index_e] = element.to_i
}
}
csv_data = csv_data.transpose
How can I split the columns of csv_data in two arrays ?
This should do the trick for you creating your two column arrays without wasting storage reading the whole file redundantly into csv_data.
require 'csv'
filename = 'file.csv'
arr1 = []
arr2 = []
CSV.foreach(filename, :col_sep => ";", :return_headers => false) do |row|
arr1 << row[0].to_i
arr2 << row[1].to_i
end
I have array which I read from excel (using ParseExcel) using the following code:
workbook = Spreadsheet::ParseExcel.parse("test.xls")
rows = workbook.worksheet(1).map() { |r| r }.compact
grid = rows.map() { |r| r.map() { |c| c.to_s('latin1') unless c.nil?}.compact rescue nil }
grid.sort_by { |k| k[2]}
test.xls has lots of rows and 6 columns. The code above sort by column 3.
I would like to output rows in array "grid" to many text file like this:
- After sorting, I want to print out all the rows where column 3 have the same value into one file and so on for a different file for other same value in column3.
Hope I explain this right. Thanks for any help/tips.
ps.
I search through most posting on this site but could not find any solution.
instead of using your above code, I made a test 100-row array, each row containing a 6-element array.
You pass in the array, and the column number you want matched, and this method prints into separate files rows that have the same nth element.
Since I used integers, I used the nth element of each row as the filename. You could use a counter, or the md5 of the element, or something like that, if your nth element does not make a good filename.
a = []
100.times do
b = []
6.times do
b.push rand(10)
end
a.push(b)
end
def print_files(a, column)
h = Hash.new
a.each do |element|
h[element[2]] ? (h[element[column]] = h[element[column]].push(element)) : (h[element[column]] = [element])
end
h.each do |k, v|
File.open("output/" + k.to_s, 'w') do |f|
v.each do |line|
f.puts line.join(", ")
end
end
end
end
print_files(a, 2)
Here is the same code using blocks instead of do .. end:
a = Array.new
100.times{b = Array.new;6.times{b.push rand(10)};a.push(b)}
def print_files(a, column)
h = Hash.new
a.each{|element| h[element[2]] ? (h[element[column]] = h[element[column]].push(element)) : (h[element[column]] = [element])}
h.map{|k, v| File.open("output/" + k.to_s, 'w'){|f| v.map{|line| f.puts line.join(", ")}}}
end
print_files(a, 2)