How to optimize extracting data from nested hashes in ruby? - ruby

Background
I have a collection of nested hashes which present a set of parameters to define application behavior:
custom_demo_options: {
verticals: {
fashion: true,
automotive: false,
fsi: false
},
channels: {
b2b: true,
b2c: true
}
}
website_data: {
verticals: {
fashion: {
b2b: {
code: 'luma_b2b',
url: 'b2b.luma.com'
},
b2c: {
code: 'base',
url: 'luma.com'
}
}
}
}
The choices made in the custom_demo_options hash relate to data stored in the website_data hash and serve to return values from it:
data = []
collection = {}
custom_demo_options[:verticlas].each do |vertical_name, vertical_choice|
# Get each vertical selection
if vertical_choice == true
# Loop through the channels for each selected vertical
custom_demo_options[:channels].each do |channel_name, channel_choice|
# Get each channel selection for each vertical selection
if channel_choice == true
# Loop through the website data for each vertical/channel selection
website_data[:verticals].each do |site_vertical, vertical_data|
# Look at the keys of the [:website_data][:verticals] hash
# If we have a vertical selection that matches a website_data vertical...
if site_vertical == vertical_name
# For each website_data vertical collection...
vertical_data.each do |vertical_channel, channel_value|
# If we have a matching channel in the collection...
if vertical_channel == channel_name
# Add the channel's url and code to the collection hash
collection[:url] = channel_value[:url]
collection[:code] = channel_value[:code]
# Push the collection hash(es) onto the data array
data.push(collection)
}
}
}
}
}
}
}
}
The data pushed to the data array is ultimately used to create the following nginx map definition:
map $http_host $MAGE_RUN_CODE {
luma.com base;
b2b.luma.com luma_b2b;
}
As an example of the relationship between the hashes, if a user sets custom_demo_options[:channels][:b2b] tofalse, the b2b code/url pair stored in thewebsite_data` hash would be removed from the nginx block:
map $http_host $MAGE_RUN_CODE {
luma.com base;
}
Question
The above code works, but I know it's horribly inefficient. I'm relatively new to ruby, but I think this is most likely a logical challenge rather than a language-specific one.
My question is, what is the proper way to connect these hashes rather than using loops as I've done? I've done some reading on hash.select and it seems like this might be the best route, but I'd like to know: are there are other approaches I should consider that would optimize this operation?
UPDATE
I've been able to implement the first suggestion (thanks again to the poster); however, I think the second solution will be a better approach. Everything works as described; however, my data structure has changed slightly, and although I understand what the solution is doing, I'm having trouble adapting accordingly. Here's the new structure:
custom_demo_options = {
verticals: {
fashion: true,
automotive: false,
fsi: false
},
channels: {
b2b: true,
b2c: true
},
geos: [
'us_en'
]
}
website_data = {
verticals: {
fashion: {
us_en: {
b2b: {
code: 'luma_b2b',
url: 'b2b.luma.com'
},
b2c: {
code: 'base',
url: 'luma.com'
}
}
}
}
}
So, I add another level to the hashes, :geo.
I've tried to adapt the second solution has follows:
class CustomOptionsMap
attr_accessor :custom_options, :website_data
def initialize(custom_options, website_data)
#custom_options = custom_options
#website_data = website_data[:verticals]
end
def data
verticals = selected_verticals
channels = selected_channels
geos = selected_geos
# I know this is the piece I'm not understanding. How to map channels and geos accordingly.
verticals.map{ |vertical| #website_data.fetch(vertical).slice(*channels) }
end
private
def selected_geos
#custom_options[:geos].select{|_,v| v } # I think this is correct, as it extracts the geo from the array and we don't have additional keys
end
def selected_verticals
#custom_options[:verticals].select{|_,v| v }.keys
end
def selected_channels
#custom_options[:channels].select{|_,v| v }.keys
end
end
demo_configuration = CustomOptionsMap.new(custom_demo_options, website_data)
print demo_configuration.data
Any guidance on what I'm missing regarding the map statement would be very much appreciated.

Object Oriented approach.
Using OOP might be more readable and consistent in this context, as Ruby is Object Oriented language.
Introducing simple Ruby class and using activesupport module, which is extending Hash with some useful methods, same result can be achieved in the following way:
class WebsiteConifg
attr_accessor :custom_options, :website_data
def initialize(custom_options, website_data)
#custom_options = custom_options
#website_data = website_data[:verticals]
end
def data
verticals = selected_verticals
channels = selected_channels
verticals.map{ |vertical| #website_data.fetch(vertical).slice(*channels) }
end
private
def selected_verticals
#custom_options[:verticals].select{|_,v| v }.keys
end
def selected_channels
#custom_options[:channels].select{|_,v| v }.keys
end
Based on passed custom_demo_options we can select verticals and channels of only those keys, which values are set as true.
For your configuration will return
selected_verticals # [:fashion]
selected_channels # [:b2b, :b2c]
+data()
Simple public interface is iterating through all selected verticals based on the passed options and return Array of hashes for the given channels by using slice(keys).
fetch(key)
return value for the given key it is an equivalent of h[:key]
h = {a: 2, b: 3}
h.fetch(:a) # 2
h.fetch(:b) # 3
slice(key1, key2) does require activesupport
returns hash which contains passed as an arguments, keys. Method is accepting multiple arguments, as in our example we are getting an Array of those keys, we can use * splat operator to comply with this interface.
h = {a: 2, b: 3}
h.slice(:a) # {:a=>2}
h.slice(:a, :b) # {:a=>2, :b=>3}
h.slice(*[:a, :b]) # {:a=>2, :b=>3}
Usage
website_config = WebsiteConifg.new(custom_demo_options, website_data)
website_config.data
# returns
# [{:b2b=>{:code=>"luma_b2b", :url=>"b2b.luma.com"}, :b2c=>{:code=>"base", :url=>"luma.com"}}]
UPDATE
Changed relevant parts:
def data
verticals = selected_verticals
channels = selected_channels
geos = selected_geos
verticals.map do |vertical|
verticals_data = #website_data.fetch(vertical)
# in case of multiple geolocations
# collecting relevant entries of all of them
geos_data = geos.map{|geo| verticals_data.fetch(geo) }
# for each geo-location getting selected channels
geos_data.map {|geo_data| geo_data.slice(*channels) }
end.flatten
end
private
# as `website_data' hash is using symbols, we need to covert string->sym
def selected_geos
#custom_options[:geos].map(&:to_sym)
end
def selected_verticals
selected_for(:verticals).keys
end
def selected_channels
selected_for(:channels).keys
end
def selected_for(key)
#custom_options[key].select{|_,v| v }
end
Easiest way to understand what kind of output(data) you have on each of the steps in the each(map) iterator, would be to place there debugger
like: pry, byebug.

Say you have key = :foo and hash = { foo: 1, bar: 2 } - you want to know the hash's value for that key.
The approach you're using here is essentially
result = nil
hsh.each { |k,v| result = v if k == :foo }
But why do that when you can simply say
result = hsh[:foo]
It seems like you understand how hashes can be iterable structures, and you can traverse them like arrays. But you're overdoing it, and forgetting that hashes are indexed structures. In terms of your code I would refactor it like so:
# fixed typo here: verticlas => verticals
custom_demo_options[:verticals].each do |vertical_name, vertical_choice|
# == true is almost always unnecessary, just use a truthiness check
next unless vertical_choice
custom_demo_options[:channels].each do |channel_name, channel_choice|
next unless channel_choice
vertical_data = website_data[:verticals][site_vertical]
channel_value = vertical_data[channel_name]
# This must be initialized here:
collection = {}
collection[:url] = channel_value[:url]
collection[:code] = channel_value[:code]
data.push(collection)
end
end
You can see that a lot of the nesting and complexity is removed. Note that I am initializing collection at the time it has attributes added to it. This is a little too much to go into here but I highly advise reading up on mutability in Ruby. You're current code will probably not do what you expect it to because you're pushing the same collection hash into the array multiple times.
At this point, you could refactor it into a more functional-programming style, with some chained methods, but I'd leave that exercise for you

Related

Ruby: Initializing a hash: Assign values, throw on undefined key access, and freeze, all at once?

In Ruby, I want to initialize a new hash, such that:
The hash is assigned a specific set of initial key-value pairs;
The hash is configured to raise an error if an attempt is made to retrieve the value for an undefined key;
The hash is frozen (can't be further modified).
Is there an elegant Ruby-ish way to do this setup all at once?
I'm aware that this can be done in three separate lines, e.g.:
COIN_SIDES = { heads: 'heads', tails: 'tails' }
COIN_SIDES.default_proc = -> (h, k) { raise KeyError, "Key '#{k}' not found" }
COIN_SIDES.freeze
You can do this by initializing hash with default_proc and then adding components with merge!:
h = Hash.new{|hash, key| raise KeyError, "Key '#{key}' not found"}.merge!({ heads: 'heads', tails: 'tails' }).freeze
I'm not sure that this is terribly elegant, but one way to achieve this in one (long) line is by using .tap:
COIN_SIDES = { heads: 'heads', tails: 'tails' }.tap { |cs| cs.default_proc = -> (h, k) { raise KeyError, "Key '#{k}' not found" } }.tap(&:freeze)
This approach does at least avoid the RuboCop: Freeze mutable objects assigned to constants [Style/MutableConstant] warning generated when running the RuboCop linter on the 3-line version of the code from the original question, above.
You can accomplish most of this functionality by making a custom class, the only downside being it's not really a hash, so you'd need to explicitly add on extra functionality like .keys, each, etc if needed:
class HashLike
def initialize(hsh)
singleton_class.attr_reader *hsh.keys
hsh.each { |k,v| instance_variable_set "##{k}", v }
end
end
hashlike = HashLike.new(some_value: 1)
hashlike.some_value # 1
hashlike.missing_value # NoMethodError
hashlike.some_value = 2 # NoMethodError
Another similar way:
class HashLike2
def initialize(hsh)
#hsh = hsh
end
def [](key)
#hsh.fetch(key)
end
end
hashlike2 = HashLike2.new(some_value: 1)
hashlike2[:some_value] # 1
hashlike2[:missing_value] # KeyError
hashlike2[:some_value] = 2 # NoMethodError
But in my opinion, there's not much a reason to do this. You can easily move your original 3 lines into a method somewhere out of the way, and then it doesn't matter if it's 3 lines or 1 anymore.

Return a Hash Grouped by Top Level Domains

I have an array of emails that I need to convert into a Hash using their Top Level Domain:
Example:
["kevin#yahoo.fr", "edward#gmail.fr", "julien#mdn.com", "dimitri#berlin.de"]
Should Return
{
com: ["julien#mdn.com"],
de: ["dimitri#berlin.de"],
fr: ["kevin#yahoo.fr", "edward#gmail.fr"]
}
What I have done so far.
def group_by_tld(emails)
# TODO: return a Hash with emails grouped by TLD
new_hash = {}
emails.each do |e|
last_el = e.partition(".").last
if last_el == e.partition(".").last
new_hash[last_el] = e
else
break
end
end
return new_hash
end
Output: {"fr"=>"edward#gmail.fr", "com"=>"julien#mdn.com", "de"=>"dimitri#berlin.de"}
How can I fix so both values are in an array.
Thanks
Onur
How can I fix so both values are in an array.
You're not actually creating an array. Do create one and append values to it.
new_hash[last_el] ||= [] # make sure array exists, and don't overwrite it if it does
new_hash[last_el] << e
Alternatively, this whole snippet can be replaced with
emails.group_by{|e| e.partition(".").last }

how to pass variable from a class to another class in ruby

I'm trying to extract data from mongodb to Elasticsearch, getMongodoc = coll.find().limit(10)
will find the first 10 entries in mongo.
As you can see , result = ec.mongoConn should get result from method mongoConn() in class MongoConnector. when I use p hsh(to examine the output is correct), it will print 10 entires, while p result = ec.mongoConn will print #<Enumerator: #<Mongo::Cursor:0x70284070232580 #view=#<Mongo::Collection::View:0x70284066032180 namespace='mydatabase.mycollection' #filter={} #options={"limit"=>10}>>:each>
I changed p hsh to return hsh, p result = ec.mongoConn will get the correct result, but it just prints the first entry not all 10 entries. it seems that the value of hsh did not pass to result = ec.mongoConn correctly, Can anyone tell me what am I doing wrong? is this because I did something wrong with method calling?
class MongoConncetor
def mongoConn()
BSON::OrderedHash.new
client = Mongo::Client.new([ 'xx.xx.xx.xx:27017' ], :database => 'mydatabase')
coll = client[:mycollection]
getMongodoc = coll.find().limit(10)
getMongodoc.each do |document|
hsh = symbolize_keys(document.to_hash).select { |hsh| hsh != :_id }
return hsh
# p hsh
end
end
class ElasticConnector < MongoConncetor
include Elasticsearch::API
CONNECTION = ::Faraday::Connection.new url: 'http://localhost:9200'
def perform_request(method, path, params, body)
puts "--> #{method.upcase} #{path} #{params} #{body}"
CONNECTION.run_request \
method.downcase.to_sym,
path,
((
body ? MultiJson.dump(body) : nil)),
{'Content-Type' => 'application/json'}
end
ec = ElasticConnector.new
p result = ec.mongoConn
client = ElasticConnector.new
client.bulk index: 'myindex',
type:'test' ,
body: result
end
You are calling return inside a loop (each). This will stop the loop and return the first result. Try something like:
getMongodoc.map do |document|
symbolize_keys(document.to_hash).select { |hsh| hsh != :_id }
end
Notes:
In ruby you usually don't need the return keyword as the last value is returned automatically. Usually you'd use return to prevent some code from being executed
in ruby snake_case is used for variable and method names (as opposed to CamelCase or camelCase)
map enumerates a collection (by calling the block for every item in the collection) and returns a new collection of the same size with the return values from the block.
you don't need empty parens () on method definitions
UPDATE:
The data structure returned by MongoDB is a Hash (BSON is a special kind of serialization). A Hash is a collection of keys ("_id", "response") that point to values. The difference you point out in your comment is the class of the hash key: string vs. symbol
In your case a document in Mongo is represented as Hash, one hash per document
If you want to return multiple documents, then an array is required. More specifically an array of hashes: [{}, {}, ...]
If your target (ES) does only accept one hash at a time, then you will need to loop over the results from mongo and add them one by one:
list_of_results = get_mongo_data
list_of_results.each do |result|
add_result_to_es(result)
end

Issues iterating over a hash in Ruby

What I'd like to do is pass in a hash of hashes that looks something like this:
input = {
"configVersion" => "someVers",
"box" =>
{
"primary" => {
"ip" => "192.168.1.1",
"host" => "something"
},
"api" => {
"live" => "livekey",
"test" => "testkey"
}
}
}
then iterate over it, continuing if the value is another hash, and generating output with it. The result should be something like this:
configVersion = "someVers"
box.primary.ip = "192.168.1.1"
box.primary.host = "something"
and so on...
I know how to crawl through and continue if the value is a hash, but I'm unsure how to concatenate the whole thing together and pass the value back up. Here is my code:
def crawl(input)
input.each do |k,v|
case v
when Hash
out < "#{k}."
crawl(v)
else
out < " = '#{v}';"
end
end
end
My problem is: where to define out and how to return it all back. I'm very new to Ruby.
You can pass strings between multiple calls of the recursive method and use them like accumulators.
This method uses an ancestors string to build up your dot-notation string of keys, and an output str that collects the output and returns it at the end of the method. The str is passed through every call; the chain variable is a modified version of the ancestor string that changes from call to call:
def hash_to_string(hash, ancestors = "", str = "")
hash.each do |key, value|
chain = ancestors.empty? ? key : "#{ancestors}.#{key}"
if value.is_a? Hash
hash_to_string(value, chain, str)
else
str << "#{chain} = \"#{value}\"\n"
end
end
str
end
hash_to_string input
(This assumes you want your output to be a string formatted as you've shown above)
This blog post has a decent solution for the recursion and offers a slightly better alternative using the method_missing method available in Ruby.
In general, your recursion is correct, you just want to be doing something different instead of concatenating the output to out.

What is this Hash-like/Tree-like Construct Called?

I want to create a "Config" class that acts somewhere between a hash and a tree. It's just for storing global values, which can have a context.
Here's how I use it:
Config.get("root.parent.child_b") #=> "value"
Here's what the class might look like:
class Construct
def get(path)
# split path by "."
# search tree for nodes
end
def set(key, value)
# split path by "."
# create tree node if necessary
# set tree value
end
def tree
{
:root => {
:parent => {
:child_a => "value",
:child_b => "another value"
},
:another_parent => {
:something => {
:nesting => "goes on and on"
}
}
}
}
end
end
Is there a name for this kind of thing, somewhere between Hash and Tree (not a Computer Science major)? Basically a hash-like interface to a tree.
Something that outputs like this:
t = TreeHash.new
t.set("root.parent.child_a", "value")
t.set("root.parent.child_b", "another value")
desired output format:
t.get("root.parent.child_a") #=> "value"
t.get("root") #=> {"parent" => {"child_a" => "value", "child_b" => "another value"}}
instead of this:
t.get("root") #=> nil
or this (which you get the value from by calling {}.value)
t.get("root") #=> {"parent" => {"child_a" => {}, "child_b" => {}}}
You can implement one in no-time:
class TreeHash < Hash
attr_accessor :value
def initialize
block = Proc.new {|h,k| h[k] = TreeHash.new(&block)}
super &block
end
def get(path)
find_node(path).value
end
def set(path, value)
find_node(path).value = value
end
private
def find_node(path)
path.split('.').inject(self){|h,k| h[k]}
end
end
You could improve implementation by setting unneeded Hash methods as a private ones, but it already works the way you wanted it. Data is stored in hash, so you can easily convert it to yaml.
EDIT:
To meet further expectations (and, convert to_yaml by default properly) you should use modified version:
class TreeHash < Hash
def initialize
block = Proc.new {|h,k| h[k] = TreeHash.new(&block)}
super &block
end
def get(path)
path.split('.').inject(self){|h,k| h[k]}
end
def set(path, value)
path = path.split('.')
leaf = path.pop
path.inject(self){|h,k| h[k]}[leaf] = value
end
end
This version is slight trade-off, as you cannot store values in non-leaf nodes.
I think the name for the structure is really a nested hash, and the code in the question is a reinvention of javascript's dictionaries. Since a dictionary in JS (or Python or ...) can be nested, each value can be another dictionary, which has its own key/val pairs. In javascript, that's all an object is.
And the best bit is being able to use JSON to define it neatly, and pass it around:
tree : {
'root' : {
'parent' : {
'child_a' : "value",
'child_b' : "another value"
},
'another_parent' : {
'something' : {
'nesting' : "goes on and on"
}
}
}
};
In JS you can then do tree.root.parent.child_a.
This answer to another question suggests using the Hashie gem to convert JSON objects into Ruby objects.
I think this resembles a TreeMap data structure similar to the one in Java described here. It does the same thing (key/value mappings) but retrieval might be different since you are using the nodes themselves as the keys. Retrieval from the TreeMap described is abstracted from the implementation since, when you pass in a key, you don't know the exact location of it in the tree.
Hope that makes sense!
Er... it can certainly be done, using a hierarchical hash table, but why do you need the hierarchy? IF you only need exactly-matching get and put, why can't you just make a single hash table that happens to use a dot-separated naming convention?
That's all that's needed to implement the functionality you've asked for, and it's obviously very simple...
Why use a hash-like interface at all? Why not use chaining of methods to navigate your tree? For example config.root.parent.child_b and use instance methods and if needed method_missing() to implement them?

Resources