check if hash keys exist in ruby [duplicate] - ruby

Ruby 2.3 introduces a new method on Array and Hash called dig. The examples I've seen in blog posts about the new release are contrived and convoluted:
# Hash#dig
user = {
user: {
address: {
street1: '123 Main street'
}
}
}
user.dig(:user, :address, :street1) # => '123 Main street'
# Array#dig
results = [[[1, 2, 3]]]
results.dig(0, 0, 0) # => 1
I'm not using triple-nested flat arrays. What's a realistic example of how this would be useful?
UPDATE
It turns out these methods solve one of the most commonly-asked Ruby questions. The questions below have something like 20 duplicates, all of which are solved by using dig:
How to avoid NoMethodError for missing elements in nested hashes, without repeated nil checks?
Ruby Style: How to check whether a nested hash element exists

In our case, NoMethodErrors due to nil references are by far the most common errors we see in our production environments.
The new Hash#dig allows you to omit nil checks when accessing nested elements. Since hashes are best used for when the structure of the data is unknown, or volatile, having official support for this makes a lot of sense.
Let's take your example. The following:
user.dig(:user, :address, :street1)
Is not equivalent to:
user[:user][:address][:street1]
In the case where user[:user] or user[:user][:address] is nil, this will result in a runtime error.
Rather, it is equivalent to the following, which is the current idiom:
user[:user] && user[:user][:address] && user[:user][:address][:street1]
Note how it is trivial to pass a list of symbols that was created elsewhere into Hash#dig, whereas it is not very straightforward to recreate the latter construct from such a list. Hash#dig allows you to easily do dynamic access without having to worry about nil references.
Clearly Hash#dig is also a lot shorter.
One important point to take note of is that Hash#dig itself returns nil if any of the keys turn out to be, which can lead to the same class of errors one step down the line, so it can be a good idea to provide a sensible default. (This way of providing an object which always responds to the methods expected is called the Null Object Pattern.)
Again, in your example, an empty string or something like "N/A", depending on what makes sense:
user.dig(:user, :address, :street1) || ""

One way would be in conjunction with the splat operator reading from some unknown document model.
some_json = JSON.parse( '{"people": {"me": 6, ... } ...}' )
# => "{"people" => {"me" => 6, ... }, ... }
a_bunch_of_args = response.data[:query]
# => ["people", "me"]
some_json.dig(*a_bunch_of_args)
# => 6

It's useful for working your way through deeply nested Hashes/Arrays, which might be what you'd get back from an API call, for instance.
In theory it saves a ton of code that would otherwise check at each level whether another level exists, without which you risk constant errors. In practise you still may need a lot of this code as dig will still create errors in some cases (e.g. if anything in the chain is a non-keyed object.)
It is for this reason that your question is actually really valid - dig hasn't seen the usage we might expect. This is commented on here for instance: Why nobody speaks about dig.
To make dig avoid these errors, try the KeyDial gem, which I wrote to wrap around dig and force it to return nil/default if any error crops up.

Related

Fetch from hash with either Singular or Plural

I get the following input hash in my ruby code
my_hash = { include: 'a,b,c' }
(or)
my_hash = { includes: 'a,b,c' }
Now I want the fastest way to get 'a,b,c'
I currently use
def my_includes
my_hash[:include] || my_hash[:includes]
end
But this is very slow because it always checks for :include keyword first then if it fails it'll look for :includes. I call this function several times and the value inside this hash can keep changing. Is there any way I can optimise and speed up this? I won't get any other keywords. I just need support for :include and :includes.
Caveats and Considerations
First, some caveats:
You tagged this Rails 3, so you're probably on a very old Ruby that doesn't support a number of optimizations, newer Hash-related method calls like #fetch_values or #transform_keys!, or pattern matching for structured data.
You can do all sorts of things with your Hash lookups, but none of them are likely to be faster than a Boolean short-circuit when assuming you can be sure of having only one key or the other at all times.
You haven't shown any of the calling code, so without benchmarks it's tough to see how this operation can be considered "slow" in any general sense.
If you're using Rails and not looking for a pure Ruby solution, you might want to consider ActiveModel::Dirty to only take action when an attribute has changed.
Use Memoization
Regardless of the foregoing, what you're probably missing here is some form of memoization so you don't need to constantly re-evaluate the keys and extract the values each time through whatever loop feels slow to you. For example, you could store the results of your Hash evaluation until it needs to be refreshed:
attr_accessor :includes
def extract_includes(hash)
#includes = hash[:include] || hash[:includes]
end
You can then call #includes or #includes= (or use the #includes instance variable directly if you like) from anywhere in scope as often as you like without having to re-evaluate the hashes or keys. For example:
def count_includes
#includes.split(?,).count
end
500.times { count_includes }
The tricky part is basically knowing if and when to update your memoized value. Basically, you should only call #extract_includes when you fetch a new Hash from somewhere like ActiveRecord or a remote API. Until that happens, you can reuse the stored value for as long as it remains valid.
You could work with a modified hash that has both keys :include and :includes with the same values:
my_hash = { include: 'a,b,c' }
my_hash.update(my_hash.key?(:include) ? { includes: my_hash[:include] } :
{ include: my_hash[:includes] })
#=> {:include=>"a,b,c", :includes=>"a,b,c"}
This may be fastest if you were using the same hash my_hash for multiple operations. If, however, a new hash is generated after just a few interrogations, you might see if both the keys :include and :includes can be included when the hash is constructed.

Specify Ruby method namespace for readability

This is a bit of a weird question, but I'm not quite sure how to look it up. In our project, we already have an existing concept of a "shift". There's a section of code that reads:
foo.shift
In this scenario, it's easy to read this as trying to access the shift variable of object foo. But it could also be Array#shift. Is there a way to specify which class we expect the method to belong to? I've tried variations such as:
foo.send(Array.shift)
Array.shift(foo)
to make it more obvious which method was being called, but I can't get it to work. Is there a way to be more explicit about which class the method you're trying to call belongs to to help in code readability?
On a fundamental level you shouldn't be concerned about this sort of thing and you absolutely can't tell the Array shift method to operate on anything but an Array object. Many of the core Ruby classes are implemented in C and have optimizations that often depend on specific internals being present. There's safety measures in place to prevent you from trying to do something too crazy, like rebinding and applying methods of that sort arbitrarily.
Here's an example of two "shifty" objects to help illustrate a real-world situation and how that applies:
class CharacterArray < Array
def initialize(*args)
super(args.flat_map(&:chars))
end
def inspect
join('').inspect
end
end
class CharacterList < String
def shift
slice!(0, 1)
end
end
You can smash Array#shift on to the first and it will work by pure chance because you're dealing with an Array. It won't work with the second one because that's not an Array, it's missing significant methods that the shift method likely depends on.
In practice it doesn't matter what you're using, they're both the same:
list_a = CharacterArray.new("test")
list_a.shift
# => "t"
list_a.shift
# => "e"
list_a << "y"
# => "sty"
list_b = CharacterList.new("test")
list_b.shift
# => "t"
list_b.shift
# => "e"
list_b << "y"
# => "sty"
These both implement the same interfaces, they both produce the same results, and as far as you're concerned, as the caller, that's good enough. This is the foundation of Duck Typing which is the philosophy Ruby has deeply embraced.
If you try the rebind trick on the CharacterList you're going to end up in trouble, it won't work, yet that class delivers on all your expectations as far as interface goes.
Edit: As Sergio points out, you can't use the rebind technique, Ruby abruptly explodes:
Array.instance_method(:shift).bind(list_b).call
# => Error: bind argument must be an instance of Array (TypeError)
If readability is the goal then that has 35 more characters than list_b.shift which is usually going dramatically in the wrong direction.
After some discussion in the comments, one solution is:
Array.instance_method(:shift).bind(foo).call
Super ugly, but gets across the idea that I wanted which was to completely specify which instance method was actually being called. Alternatives would be to rename the variable to something like foo_array or to call it as foo.to_a.shift.
The reason this is difficult is that Ruby is not strongly-typed, and this question is all about trying to bring stronger typing to it. That's why the solution is gross! Thanks to everybody for their input!

What is a cleaner way of accessing a nested array+hash in Ruby?

I have the following code, and I feel like there is probably a cleaner way to access the objects that I want:
id = job.args.size > 0 && job.args[0]['arguments'].size > 0 ? job.args[0]['arguments'][0] : nil
This is what dig is for:
id = job.args.dig(0, 'arguments', 0)
dig is defined for Array, Hash, and Struct so it can deal with most kinds of nested structures.
It's hard to give advice on how to better organise the code if we can only see one line of it! Dealing with a messy object like this indicates you may have a wider design issue that could improve the code quality. However...
Based on the above, a "happy scenario" is if:
job.args == [{"arguments"=>["foo"]}]
i.e. An array whose first element is a hash with key 'arguments', which maps to a non-empty array. This looks very messy!
However, you can simplify this to:
job.args.dig(0, 'arguments', 0)
This is applying Array#dig (note: there's also Hash#dig) to chain the method calls and gracefully respond with nil if any fail.
This answer assumes you are using ruby version >= 2.3.0, since this is when dig was added to the language. If you are running an older version, you could also use this gem to back-port the feature.

Why can't I overwrite self in the Integer class?

I want to be able to write number.incr, like so:
num = 1; num.incr; num
#=> 2
The error I'm seeing states:
Can't change the value of self
If that's true, how do bang! methods work?
You cannot change the value of self
An object is a class pointer and a set of instance methods (note that this link is an old version of Ruby, because its dramatically simpler, and thus better for explanatory purposes).
"Pointing" at an object means you have a variable which stores the object's location in memory. Then to do anything with the object, you first go to the location in memory (we might say "follow the pointer") to get the object, and then do the thing (e.g. invoke a method, set an ivar).
All Ruby code everywhere is executing in the context of some object. This is where your instance variables get saved, it's where Ruby looks for methods that don't have a receiver (e.g. $stdout is the receiver in $stdout.puts "hi", and the current object is the receiver in puts "hi"). Sometimes you need to do something with the current object. The way to work with objects is through variables, but what variable points at the current object? There isn't one. To fill this need, the keyword self is provided.
self acts like a variable in that it points at the location of the current object. But it is not like a variable, because you can't assign it new value. If you could, the code after that point would suddenly be operating on a different object, which is confusing and has no benefits over just using a variable.
Also remember that the object is tracked by variables which store memory addresses. What is self = 2 supposed to mean? Does it only mean that the current code operates as if it were invoked 2? Or does it mean that all variables pointing at the old object now have their values updated to point at the new one? It isn't really clear, but the former unnecessarily introduces an identity crisis, and the latter is prohibitively expensive and introduce situations where it's unclear what is correct (I'll go into that a bit more below).
You cannot mutate Fixnums
Some objects are special at the C level in Ruby (false, true, nil, fixnums, and symbols).
Variables pointing at them don't actually store a memory location. Instead, the address itself stores the type and identity of the object. Wherever it matters, Ruby checks to see if it's a special object (e.g. when looking up an instance variable), and then extracts the value from it.
So there isn't a spot in memory where the object 123 is stored. Which means self contains the idea of Fixnum 123 rather than a memory address like usual. As with variables, it will get checked for and handled specially when necessary.
Because of this, you cannot mutate the object itself (though it appears they keep a special global variable to allow you to set instance variables on things like Symbols).
Why are they doing all of this? To improve performance, I assume. A number stored in a register is just a series of bits (typically 32 or 64), which means there are hardware instructions for things like addition and multiplication. That is to say the ALU, is wired to perform these operations in a single clock cycle, rather than writing the algorithms with software, which would take many orders of magnitude longer. By storing them like this, they avoid the cost of storing and looking the object in memory, and they gain the advantage that they can directly add the two pointers using hardware. Note, however, that there are still some additional costs in Ruby, that you don't have in C (e.g. checking for overflow and converting result to Bignum).
Bang methods
You can put a bang at the end of any method. It doesn't require the object to change, it's just that people usually try to warn you when you're doing something that could have unexpected side-effects.
class C
def initialize(val)
#val = val # => 12
end # => :initialize
def bang_method!
"My val is: #{#val}" # => "My val is: 12"
end # => :bang_method!
end # => :bang_method!
c = C.new 12 # => #<C:0x007fdac48a7428 #val=12>
c.bang_method! # => "My val is: 12"
c # => #<C:0x007fdac48a7428 #val=12>
Also, there are no bang methods on integers, It wouldn't fit with the paradigm
Fixnum.instance_methods.grep(/!$/) # => [:!]
# Okay, there's one, but it's actually a boolean negation
1.! # => false
# And it's not a Fixnum method, it's an inherited boolean operator
1.method(:!).owner # => BasicObject
# In really, you call it this way, the interpreter translates it
!1 # => false
Alternatives
Make a wrapper object: I'm not going to advocate this one, but it's the closest to what you're trying to do. Basically create your own class, which is mutable, and then make it look like an integer. There's a great blog post walking through this at http://blog.rubybestpractices.com/posts/rklemme/019-Complete_Numeric_Class.html it will get you 95% of the way there
Don't depend directly on the value of a Fixnum: I can't give better advice than this without knowing what you're trying to do / why you feel this is a need.
Also, you should show your code when you ask questions like this. I misunderstood how you were approaching it for a long time.
It's simply impossible to change self to another object. self is the receiver of the message send. There can be only one.
If that's true, how do bang! methods work?
The bang (!) is simply part of the method name. It has absolutely no special meaning whatsoever. It is a convention among Ruby programmers to name surprising variants of less surprising methods with a bang, but that's just that: a convention.

ruby setting variable versus using variable

I'm somewhat new to ruby so there may be an easy solution to this.
But basically I want to reuse an object #result, so that when I execute a method on it (filter) I continue to be using the original object. However, as I run the method, the object itself seems to be changing.
The object (#result) is RDF::Query::Solutions class
http://rdf.rubyforge.org/RDF/Query/Solutions.html#filter-instance_method
#result = rdf_query(query) # solutions object
At this point the #result contains all the solutions, approximately 30 results
#pubinfo = #result.filter(:ptype => RDF::URI("http://scta.info/pubInfo"))
At this point #result becomes equivalent to what I want only #pubinfo to be. There are only 5 or so results
#contentinfo = #result.filter(:ptype => RDF::URI("http://scta.info/contentInfo"))
at this point #contentinfo comes up nil because the filter is actually on the solutions left from the previous filter. But i wanted to run this filter on the original contents of #result
#linkinginfo = #result.filter(:ptype => RDF::URI("http://scta.info/linkingInfo"))
Again predictable the #linking is 'nil' because #result was set to nil in the previous filter. But I don't want #result changing.
Please help.
update
Look what happens if i try the following
#pubinfo = #result
#pubinfo2 = #pubinfo.filter(:ptype => RDF::URI("http://scta.info/pubInfo"))
binding.pry
At this point #result = has been filtered. Why should should #result be affected at all by what I do to #pubinfo. In other words, how do i make #pubinfo a mere copy or duplicate of #result so that one is not affected by the other??
If you read the documentation:
This method returns an undefined value.
Filters this solution sequence by the given criteria.
This is quite vague, I agree, but one thing stands out - it returns an undefined value, from this I conclude that this is a destructive method, which changes the current object rather than returns a new object with the result of the filter. Another hint to this is that it is Also known as: filter!, since methods ending in ! are by convention destructive in ruby.
Looking at the source code verified this conclusion, as it uses reject! in the code.
As to solutions on how to do it properly - I'm not familiar with this library, and it has proven quite hard to try and figure it out from the documentation, I suggest you find a way to do one of the following (ordered from most recommended, down to last fallback):
Find a non-destructive API
Find a dup or clone API
Re-query before each filter...
And maybe try to contact the author to provide his own recommendation...

Resources