How to merge hash results from an API in Ruby? - ruby

I am retrieving subscriber information from our email provider's API. They return a hash of 25 subscribers per page. I would like to merge the results from each page into one hash (#members_subs_all). However, while my method returns the #members_subs correctly #members_subs_all does not contain any data.
Is merging the right approach? If so, what am I missing to make this work correctly? If not, what should I do differently?
def retrieve_members_info
#members_subs_all = {}
pages =* (0..1000)
pages.each do |i|
#members_subs = # API returns hash of 25 subscribers per page
#members_subs_all.merge(#members_subs)
break if #members_subs["data"].empty?
end
puts "members_subs_all #{#members_subs_all["data"]}"
end

You want merge! rather than merge. merge returns the value of the merged hashes, but doesn't change the hash it's called on. merge! returns the value of the merged hashes and updates the hash it's called on to that value.
Compare the docs on each:
http://www.ruby-doc.org/core-2.0.0/Hash.html#method-i-merge
http://www.ruby-doc.org/core-2.0.0/Hash.html#method-i-merge-21
Note also that this is a pattern for naming in Ruby and Ruby libraries:
method returns the value of some change to the caller, but doesn't change the caller's value permanently.
method! returns the value of the change and updates the caller with that value.
(See also method? which are predicates that return a truthy value based on whether or not method is true of the caller.)

def merge_hashs(a,b)
a.merge(b) {|key, a_v, b_v| merge_hashs(a_v, b_v) }
end

Related

how can I extract out this ruby code as a method that converts a hash to variables?

I have the following code spread out across a bunch of methods:
json_element is passed as an argument to the method.
issues:
The values in the hash change, meaning one could have key but the next time could have search
sometimes the value is nil so it blows up.
The gem I used which creates it has those ['$'] for elements if there's a value, but it errors out if you do json_element['COLLECTION']['$].nil?
json = json_element['JSON']['$'] unless json_element['JSON'].nil?
predicate = json_element['PREDICATE']['$'] unless json_element['PREDICATE'].nil?
key = json_element['KEY']['$'] unless json_element['KEY'].nil?
options = json_element['OPTIONS']['$'] unless json_element['OPTIONS'].nil?
cache_key = json_element['CACHE-KEY']['$'] unless json_element['CACHE-KEY'].nil?
question: how can I extract this whole bit as a method which allows for flexible keys and doesn't error out when a value is nil
I'm not sure I understand this right. If the values in the hash are nil, it shouldn't error out. The variables would just be assigned nil. #nil? also shouldn't error out.
Just refactoring your code into a method:
def process(json_element)
return if json_element.nil?
hash = {} # store the variables in a hash
%w{json predicate key options cache-key}.each do |i|
hash[i.upcase] = json_element[i.upcase]['$'] unless json_element[i.upcase].nil?
end
end

Detect if something is Enumerable in Ruby

I have a method that accepts either a single object or a collection of objects. What is the proper way of detecting if what's passed in is Enumerable? I'm currently doing the following (which works but I'm not sure it's the correct way):
def foo(bar)
if bar.respond_to? :map
# loop over it
else
# single object
end
end
I would use is_a?.
bar.is_a? Enumerable
But there’s a better way to take a single object or a collection, assuming that the caller knows which one they’re passing in. Use a splat:
def foo(*args)
args.each do |arg|
…
end
end
Then you can call it as foo(single_arg), foo(arg1, arg2), and foo(*argv).
I depends on your exact needs, but it's usually not a great idea to differentiate between a single object and an Enumerable. In particular, a Hash is an Enumerable, but in most cases it should be considered as a single object.
It's usually better to distinguish between a single object and an array-like argument. That's what Ruby often does. The best way to do this is to check if arg.respond_to? :to_ary. If it does, then all methods of Array should be available to you, if not treat it as a single object.
If you really want to check for Enumerable, you could test arg.is_a? Enumerable but consider that a Hash is an Enumerable and so are Lazy enumerators (and calling map on them won't even give you an array!)
If your purpose is to loop over it, then the standard way is to ensure it is an array. You can do it like this without condition.
def foo(bar)
[*bar] # Loop over it. It is ensured to be an array.
end
What about handling single items or a collection in one shot?
[*bar].each { |item| puts item }
This will work whether bar is a single item or an array or hash or whatever. This probably isn't the best for working with hashes, but with arrays it works pretty well.
Another way to ensure that something is an Array is with the Array "function (technically still a method):
def foo(bar)
Array(bar).map { |o| … }
end
Array will leave an array an array, and convert single elements to an array:
Array(["foo"]) # => ["foo"]
Array("foo") # => ["foo"]
Array(nil) # => []

undefined method `assoc' for #<Hash:0x10f591518> (NoMethodError)

I'm trying to return a list of values based on user defined arguments, from hashes defined in the local environment.
def my_method *args
#initialize accumulator
accumulator = Hash.new(0)
#define hashes in local environment
foo=Hash["key1"=>["var1","var2"],"key2"=>["var3","var4","var5"]]
bar=Hash["key3"=>["var6"],"key4"=>["var7","var8","var9"],"key5"=>["var10","var11","var12"]]
baz=Hash["key6"=>["var13","var14","var15","var16"]]
#iterate over args and build accumulator
args.each do |x|
if foo.has_key?(x)
accumulator=foo.assoc(x)
elsif bar.has_key?(x)
accumulator=bar.assoc(x)
elsif baz.has_key?(x)
accumulator=baz.assoc(x)
else
puts "invalid input"
end
end
#convert accumulator to list, and return value
return accumulator = accumulator.to_a {|k,v| [k].product(v).flatten}
end
The user is to call the method with arguments that are keywords, and the function to return a list of values associated with each keyword received.
For instance
> my_method(key5,key6,key1)
=> ["var10","var11","var12","var13","var14","var15","var16","var1","var2"]
The output can be in any order. I received the following error when I tried to run the code:
undefined method `assoc' for #<Hash:0x10f591518> (NoMethodError)
Please would you point me how to troubleshoot this? In Terminal assoc performs exactly how I expect it to:
> foo.assoc("key1")
=> ["var1","var2"]
I'm guessing you're coming to Ruby from some other language, as there is a lot of unnecessary cruft in this method. Furthermore, it won't return what you expect for a variety of reasons.
`accumulator = Hash.new(0)`
This is unnecessary, as (1), you're expecting an array to be returned, and (2), you don't need to pre-initialize variables in ruby.
The Hash[...] syntax is unconventional in this context, and is typically used to convert some other enumerable (usually an array) into a hash, as in Hash[1,2,3,4] #=> { 1 => 2, 3 => 4}. When you're defining a hash, you can just use the curly brackets { ... }.
For every iteration of args, you're assigning accumulator to the result of the hash lookup instead of accumulating values (which, based on your example output, is what you need to do). Instead, you should be looking at various array concatenation methods like push, +=, <<, etc.
As it looks like you don't need the keys in the result, assoc is probably overkill. You would be better served with fetch or simple bracket lookup (hash[key]).
Finally, while you can call any method in Ruby with a block, as you've done with to_a, unless the method specifically yields a value to the block, Ruby will ignore it, so [k].product(v).flatten isn't actually doing anything.
I don't mean to be too critical - Ruby's syntax is extremely flexible but also relatively compact compared to other languages, which means it's easy to take it too far and end up with hard to understand and hard to maintain methods.
There is another side effect of how your method is constructed wherein the accumulator will only collect the values from the first hash that has a particular key, even if more than one hash has that key. Since I don't know if that's intentional or not, I'll preserve this functionality.
Here is a version of your method that returns what you expect:
def my_method(*args)
foo = { "key1"=>["var1","var2"],"key2"=>["var3","var4","var5"] }
bar = { "key3"=>["var6"],"key4"=>["var7","var8","var9"],"key5"=>["var10","var11","var12"] }
baz = { "key6"=>["var13","var14","var15","var16"] }
merged = [foo, bar, baz].reverse.inject({}, :merge)
args.inject([]) do |array, key|
array += Array(merged[key])
end
end
In general, I wouldn't define a method with built-in data, but I'm going to leave it in to be closer to your original method. Hash#merge combines two hashes and overwrites any duplicate keys in the original hash with those in the argument hash. The Array() call coerces an array even when the key is not present, so you don't need to explicitly handle that error.
I would encourage you to look up the inject method - it's quite versatile and is useful in many situations. inject uses its own accumulator variable (optionally defined as an argument) which is yielded to the block as the first block parameter.

Generating configuration hash with reduce in the Jekyll source code?

I've been looking through the Jekyll source code, and stumbled upon this method:
# Public: Generate a Jekyll configuration Hash by merging the default
# options with anything in _config.yml, and adding the given options on top.
#
# override - A Hash of config directives that override any options in both
# the defaults and the config file. See Jekyll::DEFAULTS for a
# list of option names and their defaults.
#
# Returns the final configuration Hash.
def self.configuration(override)
# Convert any symbol keys to strings and remove the old key/values
override = override.reduce({}) { |hsh,(k,v)| hsh.merge(k.to_s => v) }
# _config.yml may override default source location, but until
# then, we need to know where to look for _config.yml
source = override['source'] || Jekyll::DEFAULTS['source']
# Get configuration from <source>/_config.yml or <source>/<config_file>
config_file = override.delete('config')
config_file = File.join(source, "_config.yml") if config_file.to_s.empty?
begin
config = YAML.safe_load_file(config_file)
raise "Configuration file: (INVALID) #{config_file}" if !config.is_a?(Hash)
$stdout.puts "Configuration file: #{config_file}"
rescue SystemCallError
# Errno:ENOENT = file not found
$stderr.puts "Configuration file: none"
config = {}
rescue => err
$stderr.puts " " +
"WARNING: Error reading configuration. " +
"Using defaults (and options)."
$stderr.puts "#{err}"
config = {}
end
# Merge DEFAULTS < _config.yml < override
Jekyll::DEFAULTS.deep_merge(config).deep_merge(override)
end
end
I can't figure out what it does despite the comments. reduce({}) especially bothers me - what does it do?
Also, the method that is called just before configuration is:
options = normalize_options(options.__hash__)
What does __hash__ do?
Let's look at the code in question:
override.reduce({}) { |hsh,(k,v)| hsh.merge(k.to_s => v) }
Now let's look at the docs for Enumerable#reduce:
Combines all elements of enum by applying a binary operation, specified by a block or a symbol that names a method or operator.
If you specify a block, then for each element in enum the block is passed an accumulator value (memo) and the element. If you specify a symbol instead, then each element in the collection will be passed to the named method of memo. In either case, the result becomes the new value for memo. At the end of the iteration, the final value of memo is the return value for the method.
So, override is going to be your typical Ruby options hash, like:
{
debug: 'true',
awesomeness: 'maximum'
}
So what happens when you use that reduce on override?
It will combine all the elements of the enum (key => value pairs of the override hash) using the binary function merge. Merge takes a hash and merges it into the receiver. So what's happening here?
hsh starts out as {} and the first key/value pair is merged: {}.merge(:debug.to_s => "true").
hsh is now {"debug" => "true"}.
The next key/value pair is merged into that: {"debug" => "true"}.merge(:awesomeness.to_s => "maximum").
hsh is now {"debug" => "true", "awesomeness" => "maximum"}
There are no more elements, so this value of hsh is returned.
This matches up with the code comment, which says "Convert any symbol keys to strings and remove the old key/values", although technically the old values are not removed. Rather, a new hash is constructed and the old hash with the old values is discarded by replacing the variable with the new value, to eventually be collected – along with the intermediate objects created by the merges in the reduce – by the garbage collector. As an aside, this means that merge! would be slightly more efficient than merge in this case as it would not create those intermediate objects.
__foo__ is a ruby idiom for a quasi-private and/or 'core' method that you want to make sure isn't redefined, e.g., __send__ because things like Socket want to use send. In Ruby, hash is the hash value of an object (computed using a hash function, used when the object is used as a hash key), so __hash__ probably points to an instance variable of the options object that stores its data as a hash. Here's a class from a gem that does just that. You'd have to look at the docs for whatever type of object options is to be sure though. (You'd have to look at the code to be really sure. ;)
reduce is often used to build an array or hash, in a way that is similar to using map or collect, by iteratively adding each element to that container, usually after some manipulation to the element.
I use each_with_object instead as it's more intuitive for that sort of operation:
[:foo, :bar].each_with_object({}) do |e, h|
h[e.to_s] = e
end
Notice that each_with_object doesn't need to have the "remembered" value returned from the block like reduce or inject wants. reduce and inject are great for other types of summing magic that each_with_object doesn't do though, so leave those in your toolbox too.

Timeseries transformations in Ruby, Yahoo! Pipes-style

I'm trying to build a system for programmatically filtering timeseries data and wonder if this problem has been solved, or at least hacked at, before. It seems that it's a perfect opportunity to do some Ruby block magic, given the scope and passing abilities; however, I'm still a bit short of fully grokking how to take advantage of blocks.
To wit:
Pulling data from my database, I can create either a hash or an array, let's use array:
data = [[timestamp0, value0],[timestamp1,value1], … [timestampN, valueN]]
Then I can add a method to array, maybe something like:
class Array
def filter &block
…
self.each_with_index do |v, i|
…
# Always call with timestep, value, index
block.call(v[0], v[1], i)
…
end
end
end
I understand that one powers of Ruby blocks is that the passed block of code happens within the scope of the closure. So somehow calling data.filter should allow me to work with that scope. I can only figure out how to do that without taking advantage of the scope. To wit:
# average if we have a single null value, assumes data is correctly ordered
data.filter do |t, v, i|
# Of course, we do some error checking…
(data[i-1] + data[i+1]) / 2 if v.nil?
end
What I want to do is actually is (allow the user to) build up mathematical filters programmatically, but taking it one step at a time, we'll build some functions:
def average_single_values(args)
#average over single null values
#return filterable array
end
def filter_by_std(args)
#limit results to those within N standard deviations
#return filterable array
end
def pull_bad_values(args)
#delete or replace values seen as "bad"
#return filterable array
end
my_filters == [average_single_values, filter_by_std, pull_bad_values]
Then, having a list of filters, I figure (somehow) I should be able to do:
data.filter do |t, v, i|
my_filters.each do |f|
f.call t, v, i
end
end
or, assuming a different filter implementation:
filtered_data = data.filter my_filters
which would probably be a better way to design it, as it returns a new array and is non-destructive
The result being an array that has been run through all of the filters. The eventual goal, is to be able to have static data arrays that can be run through arbitrary filters, and filters that can be passed (and shared) as objects the way that Yahoo! Pipes does so with feeds. I'm not looking for too generalized a solution right now, I can make the format of the array/returns strict.
Has anyone seen something similar in Ruby? Or have some basic pointers?
The first half of your question about working in the scope of the array seems unnecessary and irrelevant to your problem. As for creating operations to manipulate data with blocks, you can use Proc instances ("procs"), which essentially are blocks stored in an object. For example, if you want to store them with names, you can create a hash of filters:
my_filters = {}
my_filters[:filter_name] = lambda do |*args|
# filter body here...
end
You do not need to name them, of course, and can use arrays. Then, to run some data through an ordered series of filters, use the helpful Enumerable#inject method:
my_filters.inject(data) do |result, filter|
filter.call result
end
It uses no monkeypatching too!

Resources