Ruby and CSV - How to get a column value after filtering rows - ruby

I have parsed a CSV file with headers of ID, Name, and Address. I need to find the row(s) in the data that have a particular ID and Name. Once I find that row, I need to access the Address value.
data = CSV.parse(my_csv, headers: true)
rows = data.select { |row| row['ID'] == someId } //ids are not unique
row = rows.select { |row| row['Name'] == name } //names are unique
How do I get the Address value in this row of data? row[2] doesn't work.

I need to find the row(s) in the data that have a particular ID and Name.
You can combine as many criteria as you wish in a single select():
rows = data.select {|row| row['ID'] == someId && row['Name'] == name}
In fact, you can make this boolean statement as simple or as complex as you wish. You will need to determine the exact boolean logic necessary to get the results you want.
I need to access the Address value.
To get the 'Address' column of each row selected, use map():
addresses = rows.map {|row| row['Address']}
To get the 'Address' column of the first row selected:
rows[0]['Address']
Be careful with this last one, though. It will cause an error if rows is empty.

Let's first create some test data.
my_csv = <<~_
ID,Name,Address
1,cat,Mars
2,dog,Jupyter
1,dog,Venus
_
#=> "ID,Name,Address\n1,cat,Mars\n2,dog,Jupyter\n1,dog,Venus\n"
puts my_csv
ID,Name,Address
1,cat,Mars
2,dog,Jupyter
1,dog,Venus
As it is guaranteed that the values of 'Names' are unique, we can use Enumerable#find, which terminates as soon as the condition in its block evaluates true.
require 'csv'
def find_addr(csv_str, id, name)
row = CSV.parse(csv_str, headers: true).find do |row|
row['ID'] == id && row['Name'] == name
end
row.nil? ? nil : row['Address']
end
find_addr(my_csv, "1", "cat") #=> "Mars"
find_addr(my_csv, "1", "dog") #=> "Venus"
find_addr(my_csv, "2", "dog") #=> "Jupyter"
find_addr(my_csv, "2", "pig") #=> nil
Note that row is assigned nil if no row satisfies the block condition.
We could shorten this by using the Safe Navigation Operator, &.
def find_addr(csv_str, id, name)
CSV.parse(csv_str, headers: true).find do |row|
row['ID'] == id && row['Name'] == name
end&.[]('Address')
end
find_addr(my_csv, "1", "cat") #=> "Mars"
find_addr(my_csv, "1", "dog") #=> "Venus"
find_addr(my_csv, "2", "dog") #=> "Jupyter"
find_addr(my_csv, "2", "pig") #=> nil
Note that when using the Safe Navigation Operator we cannot use syntactic sugar versions of methods, here
...end&['Address']
would raise an exception.
If find_addr returning nil signals that there is a problem with the data we could either handle that in code or remove the Safe Navigation Operator in the second method above, which would cause a NoMethodError exception to be raised if no row satisfies the required condition.

Related

Transforming hash values based on condition after using group_by

I have an array of pipes which have the following attributes: pipe_id grade and grade_confidence.
I am looking to find objects within an array that have the same attributes as other objects with the same ID. I have been using group_by and transform_values to find the IDs that have only one grade - that works fine (thanks to answers in Using group_by for only certain attributes). However I would still like to keep the grade_confidence in the final result if possible.
class Pipes
attr_accessor :pipe_id, :grade, :grade_confidence
def initialize(pipe_id, grade, grade_confidence)
#pipe_id = pipe_id
#grade = grade
#grade_confidence = grade_confidence
end
end
pipe1 = Pipes.new("10001", "60", "A")
pipe2 = Pipes.new("10001", "60", "A")
pipe3 = Pipes.new("10001", "3", "F")
pipe4 = Pipes.new("1005", "15", "A")
pipe5 = Pipes.new("1004", "40", "A")
pipe6 = Pipes.new("1004", "40", "B")
pipes = []
pipes.push(pipe1, pipe2, pipe3, pipe4, pipe5, pipe6)
# We now have our array of pipe objects.
non_dups = pipes.group_by(&:pipe_id).transform_values { |a| a.map(&:grade).uniq }.select { |k,v| v.size == 1 }
puts non_dups
# => {"1005"=>["15"], "1004"=>["40"]}
Desired
The above does what I want - as "10001" has two different grades, it is ignored, and "1004" and "1005" have the same grades per ID. But what I would like is to keep the grade_confidence too, or include grade_confidence based on a condition also.
E.g. If grade_confidence is == "B" the final result would be # => {"1004"=>["40", "B"]}
or
If grade_confidence is == "A" the final result would be # => {"1005"=>["15", "A"], "1004"=>["40", "A"]}
Is it possible to tweak the transform_values to allow this or would I need to go another route?
Thanks
You need to update it:
non_dups = pipes
.group_by(&:pipe_id)
.transform_values { |a| [a.map(&:grade).uniq, a.map(&:grade_confidence)]}
.select { |_, (grades, _confidences)| grades.size == 1 }
.transform_values {|grades, confindeces| [grades.first, confindeces.sort.first]}

Convert List of Objects to Hash

In Ruby, I have a list of objects called Things with an Id property and a value property.
I want to make a Hash that contains Id as the key and Value as the value for the cooresponding key.
I tried:
result = Hash[things.map { |t| t.id, t.value }]
where things is a list of Thing
But this did not work.
class Thing
attr_reader :id, :value
def initialize(id, value)
#id = id
#value = value
end
end
cat = Thing.new("cat", 9)
#=> #<Thing:0x007fb86411ad90 #id="cat", #value=9>
dog = Thing.new("dog",1)
#=> #<Thing:0x007fb8650e49b0 #id="dog", #value=1>
instances =[cat, dog]
#=> [#<Thing:0x007fb86411ad90 #id="cat", #value=9>,
# #<Thing:0x007fb8650e49b0 #id="dog", #value=1>]
instances.map { |i| [i.id, i.value] }.to_h
#=> {"cat"=>9, "dog"=>1}
or, for Ruby versions prior to 2.0:
Hash[instances.map { |i| [i.id, i.value] }]
#=> {"cat"=>9, "dog"=>1}
result = things.map{|t| {t.id => t.value } }
The content of the outer pair of curly brackets is a block, the inner pair forms a hash.
However, if one hash is the desired result (as suggested by Cary Swoveland) this may work:
result = things.each_with_object({}){| t, h | h[t.id] = t.value}

Ruby pop an element from a hash table?

I am looking at http://ruby-doc.org/core-1.9.3/Hash.html and there does not appear to be a pop method? I think I am missing something though...
if (x = d['a']) != nil
d.delete('a')
end
If you know the key, just use delete directly
if the hash doesn't contain the key, you will get nil back, otherwise you will get whatever was stored there
from the doc you linked to:
h = { "a" => 100, "b" => 200 }
h.delete("a") #=> 100
h.delete("z") #=> nil
h.delete("z") { |el| "#{el} not found" } #=> "z not found"
There is also shift which deletes and returns a key-value pair:
hsh = Hash.new
hsh['bb'] = 42
hsh['aa'] = 23
hsh['cc'] = 65
p hsh.shift
=> ["bb", 42]
As can be seen, the order of a hash is the order of insertion, not the key or value. From the doc
Hashes enumerate their values in the order that the corresponding keys were inserted.

Comparing values of one hash to many hashes to get inverse document frequency in ruby

I'm trying to find the inverse document frequency for a categorization algorithm and am having trouble getting it the way that my code is structured (with nested hashes), and generally comparing one hash to many hashes.
My training code looks like this so far:
def train!
#data = {}
#all_books.each do |category, books|
#data[category] = {
words: 0,
books: 0,
freq: Hash.new(0)
}
books.each do |filename, tokens|
#data[category][:words] += tokens.count
#data[category][:books] += 1
tokens.each do |token|
#data[category][:freq][token] += 1
end
end
#data[category][:freq].map { |k, v| v = (v / #data[category][:freq].values.max) }
end
end
Basically, I have a hash with 4 categories (subject to change), and for each have word count, book count, and a frequency hash which shows term frequency for the category. How do I get the frequency of individual words from one category compared against the frequency of the words shown in all categories? I know how to do the comparison for one set of hash keys against another, but am not sure how to loop through a nested hash to get the frequency of terms against all other terms, if that makes sense.
Edit to include predicted outcome -
I'd like to return a hash of nested hashes (one for each category) that shows the word as the key, and the number of other categories in which it appears as the value. i.e. {:category1 = {:word => 3, :other => 2, :third => 1}, :category2 => {:another => 1, ...}} Alternately an array of category names as the value, instead of the number of categories, would also work.
I've tried creating a new hash as follows, but it's turning up empty:
def train!
#data = {}
#all_words = Hash.new([]) #new hash for all words, default value is empty array
#all_books.each do |category, books|
#data[category] = {
words: 0,
books: 0,
freq: Hash.new(0)
}
books.each do |filename, tokens|
#data[category][:words] += tokens.count
#data[category][:books] += 1
tokens.each do |token|
#data[category][:freq][token] += 1
#all_words[token] << category #should insert category name if the word appears, right?
end
end
#data[category][:freq].map { |k, v| v = (v / #data[category][:freq].values.max) }
end
end
If someone can help me figure out why the #all_words hash is empty when the code is run, I may be able to get the rest.
I haven't gone through it all, but you certainly have an error:
#all_words[token] << category #should insert category name if the word appears, right?
Nope. #all_words[token] will return empty array, but not create a new slot with an empty array, like you're assuming. So that statement doesn't modify the #all_words hash at all.
Try these 2 changes and see if it helps:
#all_words = {} # ditch the default value
...
(#all_words[token] ||= []) << category # lazy-init the array, and append

Parse CSV Data with Ruby

I am trying to return a specific cell value based on two criteria.
The logic:
If ClientID = 1 and BranchID = 1, puts SurveyID
Using Ruby 1.9.3, I want to basically look through an excel file and for two specific values located within the ClientID and BranchID column, return the corresponding value in the SurveyID column.
This is what I have so far, which I found during my online searches. It seemed promising, but no luck:
require 'csv'
# Load file
csv_fname = 'FS_Email_Test.csv'
# Key is the column to check, value is what to match
search_criteria = { 'ClientID' => '1',
'BranchID' => '1' }
options = { :headers => :first_row,
:converters => [ :numeric ] }
# Save `matches` and a copy of the `headers`
matches = nil
headers = nil
# Iterate through the `csv` file and locate where
# data matches the options.
CSV.open( csv_fname, "r", options ) do |csv|
matches = csv.find_all do |row|
match = true
search_criteria.keys.each do |key|
match = match && ( row[key] == search_criteria[key] )
end
match
end
headers = csv.headers
end
# Once matches are found, we print the results
# for a specific row. The row `row[8]` is
# tied specifically to a notes field.
matches.each do |row|
row = row[1]
puts row
end
I know the last bit of code following matches.each do |row| is invalid, but I left it in in hopes that it will make sense to someone else.
How can I write puts surveyID if ClientID == 1 & BranchID == 1?
You were very close indeed. Your only error was setting the values of the search_criteria hash to strings '1' instead of numbers. Since you have converters: :numeric in there the find_all was comparing 1 to '1' and getting false. You could just change that and you're done.
Alternatively this should work for you.
The key is the line
Hash[row].select { |k,v| search_criteria[k] } == search_criteria
Hash[row] converts the row into a hash instead of an array of arrays. Select generates a new hash that has only those elements that appear in search_criteria. Then just compare the two hashes to see if they're the same.
require 'csv'
# Load file
csv_fname = 'FS_Email_Test.csv'
# Key is the column to check, value is what to match
search_criteria = {
'ClientID' => 1,
'BranchID' => 1,
}
options = {
headers: :first_row,
converters: :numeric,
}
# Save `matches` and a copy of the `headers`
matches = nil
headers = nil
# Iterate through the `csv` file and locate where
# data matches the options.
CSV.open(csv_fname, 'r', options) do |csv|
matches = csv.find_all do |row|
Hash[row].select { |k,v| search_criteria[k] } == search_criteria
end
headers = csv.headers
end
p headers
# Once matches are found, we print the results
# for a specific row. The row `row[8]` is
# tied specifically to a notes field.
matches.each { |row| puts row['surveyID'] }
Possibly...
require 'csv'
b_headers = false
client_id_col = 0
branch_id_col = 0
survey_id_col = 0
CSV.open('FS_Email_Test.csv') do |file|
file.find_all do |row|
if b_headers == false then
client_id_col = row.index("ClientID")
branch_id_col = row.index("BranchID")
survey_id_col = row.index("SurveyID")
b_headers = true
if branch_id_col.nil? || client_id_col.nil? || survey_id_col.nil? then
puts "Invalid csv file - Missing one of these columns (or no headers):\nClientID\nBranchID\nSurveyID"
break
end
else
puts row[survey_id_col] if row[branch_id_col] == "1" && row[client_id_col] == "1"
end
end
end

Resources