Is there a similar solution as Array#wrap for hashes? - ruby

I use Array.wrap(x) all the time in order to ensure that Array methods actually exist on an object before calling them.
What is the best way to similarly ensure a Hash?
Example:
def ensure_hash(x)
# TODO: this is what I'm looking for
end
values = [nil,1,[],{},'',:a,1.0]
values.all?{|x| ensure_hash(x).respond_to?(:keys) } # true

The best I've been able to come up with so far is:
Hash::try_convert(x) || {}
However, I would prefer something more elegant.

tl; dr: In an app with proper error handling, there is no "easy, care-free" way to handle something that may or may not be hashy.
From a conceptual standpoint, the answer is no. There is no similar solution as Array.wrap(x) for hashes.
An array is a collection of values. Single values can be stored outside of arrays (e.g. x = 42) , so it's a straight-forward task to wrap a value in an array (a = [42]).
A hash is a collection of key-value pairs. In ruby, single key-value pairs can't exist outside of a hash. The only way to express a key-value pair is with a hash: h = { v: 42 }
Of course, there are a thousand ways to express a key-value pair as a single value. You could use an array [k, v] or a delimited string `"k:v" or some more obscure method.
But at that point, you're no longer wrapping, you're parsing. Parsing relies on properly formatted data and has multiple points of failure. No matter how you look at it, if you find yourself in a situation where you may or may not have a hash, that means you need to write a proper chunk of code for data validation and parsing (or refactor your upstream code so that you can always expect a hash).

Related

Fetch from hash with either Singular or Plural

I get the following input hash in my ruby code
my_hash = { include: 'a,b,c' }
(or)
my_hash = { includes: 'a,b,c' }
Now I want the fastest way to get 'a,b,c'
I currently use
def my_includes
my_hash[:include] || my_hash[:includes]
end
But this is very slow because it always checks for :include keyword first then if it fails it'll look for :includes. I call this function several times and the value inside this hash can keep changing. Is there any way I can optimise and speed up this? I won't get any other keywords. I just need support for :include and :includes.
Caveats and Considerations
First, some caveats:
You tagged this Rails 3, so you're probably on a very old Ruby that doesn't support a number of optimizations, newer Hash-related method calls like #fetch_values or #transform_keys!, or pattern matching for structured data.
You can do all sorts of things with your Hash lookups, but none of them are likely to be faster than a Boolean short-circuit when assuming you can be sure of having only one key or the other at all times.
You haven't shown any of the calling code, so without benchmarks it's tough to see how this operation can be considered "slow" in any general sense.
If you're using Rails and not looking for a pure Ruby solution, you might want to consider ActiveModel::Dirty to only take action when an attribute has changed.
Use Memoization
Regardless of the foregoing, what you're probably missing here is some form of memoization so you don't need to constantly re-evaluate the keys and extract the values each time through whatever loop feels slow to you. For example, you could store the results of your Hash evaluation until it needs to be refreshed:
attr_accessor :includes
def extract_includes(hash)
#includes = hash[:include] || hash[:includes]
end
You can then call #includes or #includes= (or use the #includes instance variable directly if you like) from anywhere in scope as often as you like without having to re-evaluate the hashes or keys. For example:
def count_includes
#includes.split(?,).count
end
500.times { count_includes }
The tricky part is basically knowing if and when to update your memoized value. Basically, you should only call #extract_includes when you fetch a new Hash from somewhere like ActiveRecord or a remote API. Until that happens, you can reuse the stored value for as long as it remains valid.
You could work with a modified hash that has both keys :include and :includes with the same values:
my_hash = { include: 'a,b,c' }
my_hash.update(my_hash.key?(:include) ? { includes: my_hash[:include] } :
{ include: my_hash[:includes] })
#=> {:include=>"a,b,c", :includes=>"a,b,c"}
This may be fastest if you were using the same hash my_hash for multiple operations. If, however, a new hash is generated after just a few interrogations, you might see if both the keys :include and :includes can be included when the hash is constructed.

JSON generate unique hash value (SHA-512)

I'm searching for a way to generate a SHA-512 hash from a json string in Ruby, independent from the positions of the elements in it, and independent from nestings, arrays, nested arrays and so on. I just want to hash the raw data along with its keys.
I tried some approaches with converting the JSON into a ruby hash, deep sort them by their keys, append everything into one, long string and hash it. But I bet that my solution isn't the most efficient one, and that there must be a better way to do this.
EDIT
So far, I convert JSON into a Ruby hash. Then I try to use this function to get a canonical representation:
def self.canonical_string_from_hash value, key=nil
str = ""
if value.is_a? Hash
value.keys.sort.each do |k|
str += canonical_string_from_hash(value[k], k)
end
elsif value.is_a? Array
str += key.to_s
value.each do |v|
str += canonical_string_from_hash(v)
end
else
str += key ? "#{key}#{value}" : value.to_s
end
return str
end
But I'm not sure, if this is a good and efficient way to do this.
For example, this hash
hash = {
id: 3,
zoo: "test",
global: [
{ukulele: "ringding", blub: 3},
{blub: nil, ukulele: "rangdang", guitar: "stringstring"}
],
foo: {
ids: [3,4,5],
bar: "asdf"
}
}
gets converted to this string:
barasdfids345globalblub3ukuleleringdingblubguitarstringstringukulelerangdangid3zootest
But I'm not sure, if this is a good and efficient way to do this.
Depends on what you are trying to do. Your canonical/equivalent structures need to represent what is important to you for the comparison. Removing details such as object structure makes sense if you consider two items with different structure but same string values equivalent.
According to your comments, you are attempting to sign a request that is being transferred from one system to a second one. In other words you want security, not a measure of similarity or a digital fingerprint for some other purpose. Therefore equivalent requests are ones that are identical in all the ways that affect the processing that you want to protect. It is simpler, and very likely more secure, to lock down the raw bytes of data that transfer between your two systems.
In which case your whole approach needs a re-think. The reasons for that are probably best discussed on security.stackoverflow.com
However, in brief:
Use an HMAC routine (HMAC-SHA512), it is designed for your purpose. Instead of a salt, this uses a secret, which is essentially the same thing (in fact you need to keep your salt a secret in your implementation too, which is unusual for something called a salt), but has been combined with the SHA in a way which makes it resilient to a couple of attack forms possible against simple concatenation followed by SHA. The worst of these is that it is possible to extend the data and have it generate the same SHA when processed, without needing to know the salt. In other words, an attacker could take a known valid request and use it to forge other requests which will get past your security check. Your proposed solution looks vulnerable to this form of attack to me.
Unpacking the request and analysing the details to get a "canonical" view of the request is not necessary, and also reduces the security of your solution. The only reason for doing this is that you are for some reason not able to handle the request once it has been serialised to JSON, and are forced to work only with the de-serialised request at one end or another of the two systems. If that is purely a knowledge or convenience thing, then fix that problem rather than trying to roll your own security protocol using SHA-512.
You should sign the request, and check the signature, against the fully serialised JSON string. If you need to de-serialise data from a "man-in-the-middle" attack, then you are potentially already exposed to some attacks via the parser. You should work to reject suspect requests before any data processing has been done to them.
TL;DR - ALthough not a direct answer to your question, the correct solution for you is to not write this code at all. Instead you need to place your secure signature code closer to the ins and outs of your two services that need to trust each other.

Maximum arity of ruby function?

I am looking to make an efficient function to clear out a redis-based cache.
I have a method call that returns a number of keys from redis:
$redis.keys("foo:*")
That returns all the keys that start with "foo:". Next, I'd like to delete all the values for these keys.
One (memory-intensive) way to do this is:
$redis.keys("foo:*").each do |key|
$redis.del(key)
end
I'd like to avoid loading all the keys into memory, and then making numerous requests to the redis server.
Another way that I like is to use the splat operator:
keys = $redis.keys("foo:*")
$redis.del(*keys)
The problem is that I don't know what the maximum arity of the $redis.del method, nor of any ruby method, I can't seem to find it online.
What is the maximum arity?
#muistooshort in the comments had a good suggestion that turned out to be right, the redis driver knows what to do with an array argument:
# there are 1,000,000 keys of the form "foo:#{number}"
keys = $redis.keys("foo:*")
$redis.del(keys) # => 1000000
Simply pass an array of keys to $redis.del

Can't all or most cases of `each` be replaced with `map`?

The difference between Enumerable#each and Enumerable#map is whether it returns the receiver or the mapped result. Getting back to the receiver is trivial and you usually do not need to continue a method chain after each like each{...}.another_method (I probably have not seen such case. Even if you want to get back to the receiver, you can do that with tap). So I think all or most cases where Enumerable#each is used can be replaced by Enumerable#map. Am I wrong? If I am right, what is the purpose of each? Is map slower than each?
Edit:
I know that there is a common practice to use each when you are not interested in the return value. I am not interested in whether such practice exists, but am interested in whether such practice makes sense other than from the point of view of convention.
The difference between map and each is more important than whether one returns a new array and the other doesn't. The important difference is in how they communicate your intent.
When you use each, your code says "I'm doing something for each element." When you use map, your code says "I'm creating a new array by transforming each element."
So while you could use map in place of each, performance notwithstanding, the code would now be lying about its intent to anyone reading it.
The choice between map or each should be decided by the desired end result: a new array or no new array. The result of map can be huge and/or silly:
p ("aaaa".."zzzz").map{|word| puts word} #huge and useless array of nil's
I agree with what you said. Enumerable#each simply returns the original object it was called on while Enumerable#map sets the current element being iterated over to the return value of the block, and then returns a new object with those changes.
Since Enumerable#each simply returns the original object itself, it can be very well preferred over the map when it comes to cases where you need to simply iterate or traverse over elements.
In fact, Enumerable#each is a simple and universal way of doing a traditional iterating for loop, and each is much preferred over for loops in Ruby.
You can see the significant difference between map and each when you're composing these enumaratiors.
For example you need to get new array with indixes in it:
array.each.with_index.map { |index, element| [index, element] }
Or for example you just need to apply some method to all elements in array and print result without changing the original array:
m = 2.method(:+)
[1,2,3].each { |a| puts m.call(a) } #=> prints 3, 4, 5
And there's a plenty another examples where the difference between each and map is important key in the writing code in functional style.

ruby hash autovivification (facets)

Here is a clever trick to enable hash autovivification in ruby (taken from facets):
# File lib/core/facets/hash/autonew.rb, line 19
def self.autonew(*args)
leet = lambda { |hsh, key| hsh[key] = new( &leet ) }
new(*args,&leet)
end
Although it works (of course), I find it really frustrating that I can't figure out how this two liner does what it does.
leet is put as a default value. So that then just accessing h['new_key'] somehow brings it up and creates 'new_key' => {}
Now, I'd expect h['new_key'] returning default value object as opposed to evaluating it. That is, 'new_key' => {} is not automatically created. So how does leet actually get called? Especially with two parameters?
The standard new method for Hash accepts a block. This block is called in the event of trying to access a key in the Hash which does not exist. The block is passed the Hash itself and the key that was requested (the two parameters) and should return the value that should be returned for the requested key.
You will notice that the leet lambda does 2 things. It returns a new Hash with leet itself as the block for handling defaults. This is the behaviour which allows autonew to work for Hashes of arbitrary depth. It also assigns this new Hash to hsh[key] so that next time you request the same key you will get the existing Hash rather than a new one being created.
It's also worth noting that this code can be made into a one-liner as follows:
def self.autonew(*args)
new(*args){|hsh, key| hsh[key] = Hash.new(&hsh.default_proc) }
end
The call to Hash#default_proc returns the proc that was used to create the parent, so we have a nice recursive setup here.
I talk about a similar case to this on my blog.
Alternatively, you might consider my xkeys gem. It's a module that you can use to extend arrays or hashes to facilitate nested access.
If you look for something that doesn't exist yet, you get a nil value (or another value or an exception if you prefer) without creating anything by looking. It can also append to the end of arrays.
You can opt to autovivify either hashes or arrays for integer keys (but just once for the entire structure).

Resources