Comparing lists of field-hashes with equivalent AR-objects - ruby

I have a list of hashes, as such:
incoming_links = [
{:title => 'blah1', :url => "http://blah.com/post/1"},
{:title => 'blah2', :url => "http://blah.com/post/2"},
{:title => 'blah3', :url => "http://blah.com/post/3"}]
And an ActiveRecord model which has fields in the database with some matching rows, say:
Link.all =>
[<Link#2 #title='blah2' #url='...post/2'>,
<Link#3 #title='blah3' #url='...post/3'>,
<Link#4 #title='blah4' #url='...post/4'>]
I'd like to do set operations on Link.all with incoming_links so that I can figure out that <Link#4 ...> is not in the set of incoming_links, and {:title => 'blah1', :url =>'http://blah.com/post/1'} is not in the Link.all set, like so:
#pseudocode
#incoming_links = as above
links = Link.all
expired_links = links - incoming_links
missing_links = incoming_links - links
expired_links.destroy
missing_links.each{|link| Link.create(link)}
Crappy solution a):
I'd rather not rewrite Array#- and such, and I'm okay with converting incoming_links to a set of unsaved Link objects; so I've tried overwriting hash eql? and so on in Link so that it ignored the id equality that AR::Base provides by default. But this is the only place this sort of equality should be considered in the application - in other places the Link#id default identity is required. Is there some way I could subclass Link and apply the hash, eql?, etc overwriting there?
Crappy solution b):
The other route I've tried is to pull out the attributes hash for each Link and doing a .slice('id',...etc) to prune the hashes down. But this requires writing seperate - methods for keeping track of the Link objects while doing set operations on the hashes, and writing seperate Proxy classes to wrap the incoming_links hashes and Links, which seems a bit overkill. Nonetheless, this is the current solution for me.
Can you think of a better way to design this interaction? Extra credit for cleanliness.

try this
incoming_links = [
{:title => 'blah1', :url => "http://blah.com/post/1"},
{:title => 'blah2', :url => "http://blah.com/post/2"},
{:title => 'blah3', :url => "http://blah.com/post/3"}]
ar_links = Link.all(:select => 'title, url').map(&:attributes)
# wich incoming links are not in ar_links
incoming_links - ar_links
# and vice versa
ar_links - incoming_links
upd
For your Link model:
def self.not_in_array(array)
keys = array.first.keys
all.reject do |item|
hash = {}
keys.each { |k| hash[k] = item.send(k) }
array.include? hash
end
end
def self.not_in_class(array)
keys = array.first.keys
class_array = []
all.each do |item|
hash = {}
keys.each { |k| hash[k] = item.send(k) }
class_array << hash
end
array - class_array
end
ar = [{:title => 'blah1', :url => 'http://blah.com/ddd'}]
Link.not_in_array ar
#=> all links from Link model which not in `ar`
Link.not_in_class ar
#=> all links from `ar` which not in your Link model

If you rewrite the equality method, will ActiveRecord complain still?
Can't you do something similar to this (as in a regular ruby class):
class Link
attr_reader :title, :url
def initialize(title, url)
#title = title
#url = url
end
def eql?(another_link)
self.title == another_link.title and self.url == another_link.url
end
def hash
title.hash * url.hash
end
end
aa = [Link.new('a', 'url1'), Link.new('b', 'url2')]
bb = [Link.new('a', 'url1'), Link.new('d', 'url4')]
(aa - bb).each{|x| puts x.title}

The requirements are:
# Keep track of original link objects when
# comparing against a set of incomplete `attributes` hashes.
# Don't alter the `hash` and `eql?` methods of Link permanently,
# or globally, throughout the application.
The current solution is in effect using Hash's eql? method, and annotating the hashes with the original objects:
class LinkComp < Hash
LINK_COLS = [:title, :url]
attr_accessor :link
def self.[](args)
if args.first.is_a?(Link) #not necessary for the algorithm,
#but nice for finding typos and logic errors
links = args.collect do |lnk|
lk = super(lnk.attributes.slice(*(LINK_COLS.collect(&:to_s)).to_a)
lk.link = lnk
lk
end
elsif args.blank?
[]
#else #raise error for finding typos
end
end
end
incoming_links = [
{:title => 'blah1', :url => "http://blah.com/post/1"},
{:title => 'blah2', :url => "http://blah.com/post/2"},
{:title => 'blah3', :url => "http://blah.com/post/3"}]
#Link.all =>
#[<Link#2 #title='blah2' #url='...post/2'>,
# <Link#3 #title='blah3' #url='...post/3'>,
# <Link#4 #title='blah4' #url='...post/4'>]
incoming_links= LinkComp[incoming_links.collect{|i| Link.new(i)}]
links = LinkComp[Link.all] #As per fl00r's suggestion
#this could be :select'd down somewhat, w.l.o.g.
missing_links = (incoming_links - links).collect(&:link)
expired_links = (links - incoming_links).collect(&:link)

Related

How would I construct a Hash from this scenario in Ruby?

Given I have the following code:
ENDPOINT = 'http://api.eventful.com'
API_KEY = 'PbFVZfjTXJQWrnJp'
def get_xml(url, options={})
compiled_url = "#{ENDPOINT}/rest#{url}" << "?app_key=#{API_KEY}&sort_order=popularity"
options.each { |k, v| compiled_url << "&#{k.to_s}=#{v.to_s}" }
REXML::Document.new((Net::HTTP.get(URI.parse(URI.escape(compiled_url)))))
end
def event_search(location, date)
get_xml('/events/search',
:location => "#{location}, United Kingdom",
:date => date
)
end
And we access the XML data formatted by REXML::Document like this:
events = event_search('London', 'Today').elements
And we can access these elements like this (this prints all the titles in the events):
events.each('search/events/event/title') do |title|
puts title.text
end
The XML I'm using can be found here. I would like this construct a Hash like so:
{"Title1" => {:title => 'Title1', :date => 'Date1', :post_code => 'PostCode1'},
"Title2" => {:title => 'Title2', :date => 'Date2', :post_code => 'PostCode2'}}
When using events.each('search/events/event/title'), events.each('search/events/event/date'), and events.each('search/events/event/post_code').
So I want to create a Hash from the XML provided by the URL I have included above. Thanks!
You should loop over the events themselves, not the titles. Something like this
events_by_title = {}
elements.each('search/events/event') do |event|
title = event.get_elements('title').first.text
events_by_title[title] = {
:title => title,
:date => event.get_elements('start_time').first.text
:post_code => event.get_elements('postal_code').first.text,
}
end
Get the root element using root() on the REXML:Document object then use each_element("search/events/event") to iterate over "event" node. You can then extract the different values out of it using the different methods on element: http://ruby-doc.org/stdlib-1.9.3/libdoc/rexml/rdoc/REXML/Element.html

Refactor ruby on rails model

Given the following code,
How would you refactor this so that the method search_word has access to issueid?
I would say that changing the function search_word so it accepts 3 arguments or making issueid an instance variable (#issueid) could be considered as an example of bad practices, but honestly I cannot find any other solution. If there's no solution aside from this, would you mind explaining the reason why there's no other solution?
Please bear in mind that it is a Ruby on Rails model.
def search_type_of_relation_in_text(issueid, type_of_causality)
relation_ocurrences = Array.new
keywords_list = {
:C => ['cause', 'causes'],
:I => ['prevent', 'inhibitors'],
:P => ['type','supersets'],
:E => ['effect', 'effects'],
:R => ['reduce', 'inhibited'],
:S => ['example', 'subsets']
}[type_of_causality.to_sym]
for keyword in keywords_list
relation_ocurrences + search_word(keyword, relation_type)
end
return relation_ocurrences
end
def search_word(keyword, relation_type)
relation_ocurrences = Array.new
#buffer.search('//p[text()*= "'+keyword+'"]/a').each { |relation|
relation_suggestion_url = 'http://en.wikipedia.org'+relation.attributes['href']
relation_suggestion_title = URI.unescape(relation.attributes['href'].gsub("_" , " ").gsub(/[\w\W]*\/wiki\//, ""))
if not #current_suggested[relation_type].include?(relation_suggestion_url)
if #accepted[relation_type].include?(relation_suggestion_url)
relation_ocurrences << {:title => relation_suggestion_title, :wiki_url => relation_suggestion_url, :causality => type_of_causality, :status => "A", :issue_id => issueid}
else
relation_ocurrences << {:title => relation_suggestion_title, :wiki_url => relation_suggestion_url, :causality => type_of_causality, :status => "N", :issue_id => issueid}
end
end
}
end
If you need additional context, pass it through as an additional argument. That's how it's supposed to work.
Setting #-type instance variables to pass context is bad form as you've identified.
There's a number of Ruby conventions you seem to be unaware of:
Instead of Array.new just use [ ], and instead of Hash.new use { }.
Use a case statement or a constant instead of defining a Hash and then retrieving only one of the elements, discarding the remainder.
Avoid using return unless strictly necessary, as the last operation is always returned by default.
Use array.each do |item| instead of for item in array
Use do ... end instead of { ... } for multi-line blocks, where the curly brace version is generally reserved for one-liners. Avoids confusion with hash declarations.
Try and avoid duplicating large chunks of code when the differences are minor. For instance, declare a temporary variable, conditionally manipulate it, then store it instead of defining multiple independent variables.
With that in mind, here's a reworking of it:
KEYWORDS = {
:C => ['cause', 'causes'],
:I => ['prevent', 'inhibitors'],
:P => ['type','supersets'],
:E => ['effect', 'effects'],
:R => ['reduce', 'inhibited'],
:S => ['example', 'subsets']
}
def search_type_of_relation_in_text(issue_id, type_of_causality)
KEYWORDS[type_of_causality.to_sym].collect do |keyword|
search_word(keyword, relation_type, issue_id)
end
end
def search_word(keyword, relation_type, issue_id)
relation_occurrences = [ ]
#buffer.search(%Q{//p[text()*= "#{keyword}'"]/a}).each do |relation|
relation_suggestion_url = "http://en.wikipedia.org#{relation.attributes['href']}"
relation_suggestion_title = URI.unescape(relation.attributes['href'].gsub("_" , " ").gsub(/[\w\W]*\/wiki\//, ""))
if (!#current_suggested[relation_type].include?(relation_suggestion_url))
occurrence = {
:title => relation_suggestion_title,
:wiki_url => relation_suggestion_url,
:causality => type_of_causality,
:issue_id => issue_id
}
occurrence[:status] =
if (#accepted[relation_type].include?(relation_suggestion_url))
'A'
else
'N'
end
relation_ocurrences << occurrence
end
end
relation_occurrences
end

Question on Ruby collect method

I have an array of hashes
Eg:
cars = [{:company => "Ford", :type => "SUV"},
{:company => "Honda", :type => "Sedan"},
{:company => "Toyota", :type => "Sedan"}]
# i want to fetch all the companies of the cars
cars.collect{|c| c[:company]}
# => ["Ford", "Honda", "Toyota"]
# i'm lazy and i want to do something like this
cars.collect(&:company)
# => undefined method `company'
I was wondering if there is a similar shortcut to perform the above.
I believe your current code cars.collect{|c| c[:company]} is the best way if you're enumerating over an arbitrary array. The method you would pass in via the & shortcut would have to be a method defined on Hash since each object in the array is of type Hash. Since there is no company method defined for Hash you get the "undefined method 'company'" error.
You could use cars.collect(&:company) if you were operating on an Array of Cars though, because each object passed into the collect block would be of type Car (which has the company method available). So maybe you could modify your code so that you use an array of Cars instead.
You could convert the hashes to OpenStructs.
require 'ostruct'
cars = [{:company => "Ford", :type => "SUV"},
{:company => "Honda", :type => "Sedan"},
{:company => "Toyota", :type => "Sedan"}]
cars = cars.map{|car| OpenStruct.new(car)}
p cars.map( &:company )
#=> ["Ford", "Honda", "Toyota"]
It's impossible to use in your case, because in collect you use method [] and argument :company. The construction &:company takes labels :company and converts to Proc, so it's only one argument - the name of method.
Unfortunately Ruby hashes can't do that. Clojure maps on the other hand have functions for each key which return the corresponding value, which would be easy enough to do if you are so inclined (you should also add the corresponding respond_to? method):
>> class Hash
.. def method_missing(m)
.. self.has_key?(m) ? self[m] : super
.. end
.. end #=> nil
>> cars.collect(&:company) #=> ["Ford", "Honda", "Toyota"]
>> cars.collect(&:compay)
NoMethodError: undefined method `compay' for {:type=>"SUV", :company=>"Ford"}:Hash
Note: I'm not advising this, I'm just saying it's possible.
Another horrible monkeypatch you shouldn't really use:
class Symbol
def to_proc
if self.to_s =~ /bracket_(.*)/
Proc.new {|x| x[$1.to_sym]}
else
Proc.new {|x| x.send(self)}
end
end
end
cars = [{:company => "Ford", :type => "SUV"},
{:company => "Honda", :type => "Sedan"},
{:company => "Toyota", :type => "Sedan"}]
cars.collect(&:bracket_company)

Recursive DFS Ruby method

I have a YAML file of groups that I would like to get into a MongoDB collection called groups with documents like {"name" => "golf", "parent" => "sports"} (Top level groups, like sports, would just be {"name" => "sports"} without a parent.)
We are trying to traverse the nested hash, but I'm not sure if it's working correctly. I'd prefer to use a recursive method than a lambda proc. What should we change to make it work?
Thanks!
Matt
Here's the working code:
require 'mongo'
require 'yaml'
conn = Mongo::Connection.new
db = conn.db("acani")
interests = db.collection("interests")
##interest_id = 0
interests_hash = YAML::load_file('interests.yml')
def interests.insert_interest(interest, parent=nil)
interest_id = ##interest_id.to_s(36)
if interest.is_a? String # base case
insert({:_id => interest_id, :n => interest, :p => parent})
##interest_id += 1
else # it's a hash
interest = interest.first # get key-value pair in hash
interest_name = interest[0]
insert({:_id => interest_id, :n => interest_name, :p => parent})
##interest_id += 1
interest[1].each do |i|
insert_interest(i, interest_name)
end
end
end
interests.insert_interest interests_hash
View the Interests YAML.
View the acani source.
Your question is just how to convert this code:
insert_enumerable = lambda {|obj, collection|
# obj = {:value => obj} if !obj.kind_of? Enumerable
if(obj.kind_of? Array or obj.kind_of? Hash)
obj.each do |k, v|
v = (v.nil?) ? k : v
insert_enumerable.call({:value => v, :parent => obj}, collection)
end
else
obj = {:value => obj}
end
# collection.insert({name => obj[:value], :parent => obj[:parent]})
pp({name => obj[:value], :parent => obj[:parent]})
}
...to use a method rather than a lambda? If so, then:
def insert_enumerable( obj, collection )
# obj = {:value => obj} if !obj.kind_of? Enumerable
if(obj.kind_of? Array or obj.kind_of? Hash)
obj.each do |k, v|
v = (v.nil?) ? k : v
insert_enumerable({:value => v, :parent => obj}, collection)
end
else
obj = {:value => obj}
end
# collection.insert({name => obj[:value], :parent => obj[:parent]})
pp({name => obj[:value], :parent => obj[:parent]})
end
If that's not what you're asking, please help clarify.

Ruby Style: How to check whether a nested hash element exists

Consider a "person" stored in a hash. Two examples are:
fred = {:person => {:name => "Fred", :spouse => "Wilma", :children => {:child => {:name => "Pebbles"}}}}
slate = {:person => {:name => "Mr. Slate", :spouse => "Mrs. Slate"}}
If the "person" doesn't have any children, the "children" element is not present. So, for Mr. Slate, we can check whether he has parents:
slate_has_children = !slate[:person][:children].nil?
So, what if we don't know that "slate" is a "person" hash? Consider:
dino = {:pet => {:name => "Dino"}}
We can't easily check for children any longer:
dino_has_children = !dino[:person][:children].nil?
NoMethodError: undefined method `[]' for nil:NilClass
So, how would you check the structure of a hash, especially if it is nested deeply (even deeper than the examples provided here)? Maybe a better question is: What's the "Ruby way" to do this?
The most obvious way to do this is to simply check each step of the way:
has_children = slate[:person] && slate[:person][:children]
Use of .nil? is really only required when you use false as a placeholder value, and in practice this is rare. Generally you can simply test it exists.
Update: If you're using Ruby 2.3 or later there's a built-in dig method that does what's described in this answer.
If not, you can also define your own Hash "dig" method which can simplify this substantially:
class Hash
def dig(*path)
path.inject(self) do |location, key|
location.respond_to?(:keys) ? location[key] : nil
end
end
end
This method will check each step of the way and avoid tripping up on calls to nil. For shallow structures the utility is somewhat limited, but for deeply nested structures I find it's invaluable:
has_children = slate.dig(:person, :children)
You might also make this more robust, for example, testing if the :children entry is actually populated:
children = slate.dig(:person, :children)
has_children = children && !children.empty?
With Ruby 2.3 we'll have support for the safe navigation operator:
https://www.ruby-lang.org/en/news/2015/11/11/ruby-2-3-0-preview1-released/
has_children now could be written as:
has_children = slate[:person]&.[](:children)
dig is being added as well:
has_children = slate.dig(:person, :children)
Another alternative:
dino.fetch(:person, {})[:children]
You can use the andand gem:
require 'andand'
fred[:person].andand[:children].nil? #=> false
dino[:person].andand[:children].nil? #=> true
You can find further explanations at http://andand.rubyforge.org/.
One could use hash with default value of {} - empty hash. For example,
dino = Hash.new({})
dino[:pet] = {:name => "Dino"}
dino_has_children = !dino[:person][:children].nil? #=> false
That works with already created Hash as well:
dino = {:pet=>{:name=>"Dino"}}
dino.default = {}
dino_has_children = !dino[:person][:children].nil? #=> false
Or you can define [] method for nil class
class NilClass
def [](* args)
nil
end
end
nil[:a] #=> nil
Traditionally, you really had to do something like this:
structure[:a] && structure[:a][:b]
However, Ruby 2.3 added a feature that makes this way more graceful:
structure.dig :a, :b # nil if it misses anywhere along the way
There is a gem called ruby_dig that will back-patch this for you.
def flatten_hash(hash)
hash.each_with_object({}) do |(k, v), h|
if v.is_a? Hash
flatten_hash(v).map do |h_k, h_v|
h["#{k}_#{h_k}"] = h_v
end
else
h[k] = v
end
end
end
irb(main):012:0> fred = {:person => {:name => "Fred", :spouse => "Wilma", :children => {:child => {:name => "Pebbles"}}}}
=> {:person=>{:name=>"Fred", :spouse=>"Wilma", :children=>{:child=>{:name=>"Pebbles"}}}}
irb(main):013:0> slate = {:person => {:name => "Mr. Slate", :spouse => "Mrs. Slate"}}
=> {:person=>{:name=>"Mr. Slate", :spouse=>"Mrs. Slate"}}
irb(main):014:0> flatten_hash(fred).keys.any? { |k| k.include?("children") }
=> true
irb(main):015:0> flatten_hash(slate).keys.any? { |k| k.include?("children") }
=> false
This will flatten all the hashes into one and then any? returns true if any key matching the substring "children" exist.
This might also help.
dino_has_children = !dino.fetch(person, {})[:children].nil?
Note that in rails you can also do:
dino_has_children = !dino[person].try(:[], :children).nil? #
Here is a way you can do a deep check for any falsy values in the hash and any nested hashes without monkey patching the Ruby Hash class (PLEASE don't monkey patch on the Ruby classes, such is something you should not do, EVER).
(Assuming Rails, although you could easily modify this to work outside of Rails)
def deep_all_present?(hash)
fail ArgumentError, 'deep_all_present? only accepts Hashes' unless hash.is_a? Hash
hash.each do |key, value|
return false if key.blank? || value.blank?
return deep_all_present?(value) if value.is_a? Hash
end
true
end
Simplifying the above answers here:
Create a Recursive Hash method whose value cannot be nil, like as follows.
def recursive_hash
Hash.new {|key, value| key[value] = recursive_hash}
end
> slate = recursive_hash
> slate[:person][:name] = "Mr. Slate"
> slate[:person][:spouse] = "Mrs. Slate"
> slate
=> {:person=>{:name=>"Mr. Slate", :spouse=>"Mrs. Slate"}}
slate[:person][:state][:city]
=> {}
If you don't mind creating empty hashes if the value does not exists for the key :)
You can try to play with
dino.default = {}
Or for example:
empty_hash = {}
empty_hash.default = empty_hash
dino.default = empty_hash
That way you can call
empty_hash[:a][:b][:c][:d][:e] # and so on...
dino[:person][:children] # at worst it returns {}
Given
x = {:a => {:b => 'c'}}
y = {}
you could check x and y like this:
(x[:a] || {})[:b] # 'c'
(y[:a] || {})[:b] # nil
Thks #tadman for the answer.
For those who want perfs (and are stuck with ruby < 2.3), this method is 2.5x faster:
unless Hash.method_defined? :dig
class Hash
def dig(*path)
val, index, len = self, 0, path.length
index += 1 while(index < len && val = val[path[index]])
val
end
end
end
and if you use RubyInline, this method is 16x faster:
unless Hash.method_defined? :dig
require 'inline'
class Hash
inline do |builder|
builder.c_raw '
VALUE dig(int argc, VALUE *argv, VALUE self) {
rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);
self = rb_hash_aref(self, *argv);
if (NIL_P(self) || !--argc) return self;
++argv;
return dig(argc, argv, self);
}'
end
end
end
You can also define a module to alias the brackets methods and use the Ruby syntax to read/write nested elements.
UPDATE: Instead of overriding the bracket accessors, request Hash instance to extend the module.
module Nesty
def []=(*keys,value)
key = keys.pop
if keys.empty?
super(key, value)
else
if self[*keys].is_a? Hash
self[*keys][key] = value
else
self[*keys] = { key => value}
end
end
end
def [](*keys)
self.dig(*keys)
end
end
class Hash
def nesty
self.extend Nesty
self
end
end
Then you can do:
irb> a = {}.nesty
=> {}
irb> a[:a, :b, :c] = "value"
=> "value"
irb> a
=> {:a=>{:b=>{:c=>"value"}}}
irb> a[:a,:b,:c]
=> "value"
irb> a[:a,:b]
=> {:c=>"value"}
irb> a[:a,:d] = "another value"
=> "another value"
irb> a
=> {:a=>{:b=>{:c=>"value"}, :d=>"another value"}}
I don't know how "Ruby" it is(!), but the KeyDial gem which I wrote lets you do this basically without changing your original syntax:
has_kids = !dino[:person][:children].nil?
becomes:
has_kids = !dino.dial[:person][:children].call.nil?
This uses some trickery to intermediate the key access calls. At call, it will try to dig the previous keys on dino, and if it hits an error (as it will), returns nil. nil? then of course returns true.
You can use a combination of & and key? it is O(1) compared to dig which is O(n) and this will make sure person is accessed without NoMethodError: undefined method `[]' for nil:NilClass
fred[:person]&.key?(:children) //=>true
slate[:person]&.key?(:children)

Resources