Maximum arity of ruby function? - ruby

I am looking to make an efficient function to clear out a redis-based cache.
I have a method call that returns a number of keys from redis:
$redis.keys("foo:*")
That returns all the keys that start with "foo:". Next, I'd like to delete all the values for these keys.
One (memory-intensive) way to do this is:
$redis.keys("foo:*").each do |key|
$redis.del(key)
end
I'd like to avoid loading all the keys into memory, and then making numerous requests to the redis server.
Another way that I like is to use the splat operator:
keys = $redis.keys("foo:*")
$redis.del(*keys)
The problem is that I don't know what the maximum arity of the $redis.del method, nor of any ruby method, I can't seem to find it online.
What is the maximum arity?

#muistooshort in the comments had a good suggestion that turned out to be right, the redis driver knows what to do with an array argument:
# there are 1,000,000 keys of the form "foo:#{number}"
keys = $redis.keys("foo:*")
$redis.del(keys) # => 1000000
Simply pass an array of keys to $redis.del

Related

Fetch from hash with either Singular or Plural

I get the following input hash in my ruby code
my_hash = { include: 'a,b,c' }
(or)
my_hash = { includes: 'a,b,c' }
Now I want the fastest way to get 'a,b,c'
I currently use
def my_includes
my_hash[:include] || my_hash[:includes]
end
But this is very slow because it always checks for :include keyword first then if it fails it'll look for :includes. I call this function several times and the value inside this hash can keep changing. Is there any way I can optimise and speed up this? I won't get any other keywords. I just need support for :include and :includes.
Caveats and Considerations
First, some caveats:
You tagged this Rails 3, so you're probably on a very old Ruby that doesn't support a number of optimizations, newer Hash-related method calls like #fetch_values or #transform_keys!, or pattern matching for structured data.
You can do all sorts of things with your Hash lookups, but none of them are likely to be faster than a Boolean short-circuit when assuming you can be sure of having only one key or the other at all times.
You haven't shown any of the calling code, so without benchmarks it's tough to see how this operation can be considered "slow" in any general sense.
If you're using Rails and not looking for a pure Ruby solution, you might want to consider ActiveModel::Dirty to only take action when an attribute has changed.
Use Memoization
Regardless of the foregoing, what you're probably missing here is some form of memoization so you don't need to constantly re-evaluate the keys and extract the values each time through whatever loop feels slow to you. For example, you could store the results of your Hash evaluation until it needs to be refreshed:
attr_accessor :includes
def extract_includes(hash)
#includes = hash[:include] || hash[:includes]
end
You can then call #includes or #includes= (or use the #includes instance variable directly if you like) from anywhere in scope as often as you like without having to re-evaluate the hashes or keys. For example:
def count_includes
#includes.split(?,).count
end
500.times { count_includes }
The tricky part is basically knowing if and when to update your memoized value. Basically, you should only call #extract_includes when you fetch a new Hash from somewhere like ActiveRecord or a remote API. Until that happens, you can reuse the stored value for as long as it remains valid.
You could work with a modified hash that has both keys :include and :includes with the same values:
my_hash = { include: 'a,b,c' }
my_hash.update(my_hash.key?(:include) ? { includes: my_hash[:include] } :
{ include: my_hash[:includes] })
#=> {:include=>"a,b,c", :includes=>"a,b,c"}
This may be fastest if you were using the same hash my_hash for multiple operations. If, however, a new hash is generated after just a few interrogations, you might see if both the keys :include and :includes can be included when the hash is constructed.

Is there a similar solution as Array#wrap for hashes?

I use Array.wrap(x) all the time in order to ensure that Array methods actually exist on an object before calling them.
What is the best way to similarly ensure a Hash?
Example:
def ensure_hash(x)
# TODO: this is what I'm looking for
end
values = [nil,1,[],{},'',:a,1.0]
values.all?{|x| ensure_hash(x).respond_to?(:keys) } # true
The best I've been able to come up with so far is:
Hash::try_convert(x) || {}
However, I would prefer something more elegant.
tl; dr: In an app with proper error handling, there is no "easy, care-free" way to handle something that may or may not be hashy.
From a conceptual standpoint, the answer is no. There is no similar solution as Array.wrap(x) for hashes.
An array is a collection of values. Single values can be stored outside of arrays (e.g. x = 42) , so it's a straight-forward task to wrap a value in an array (a = [42]).
A hash is a collection of key-value pairs. In ruby, single key-value pairs can't exist outside of a hash. The only way to express a key-value pair is with a hash: h = { v: 42 }
Of course, there are a thousand ways to express a key-value pair as a single value. You could use an array [k, v] or a delimited string `"k:v" or some more obscure method.
But at that point, you're no longer wrapping, you're parsing. Parsing relies on properly formatted data and has multiple points of failure. No matter how you look at it, if you find yourself in a situation where you may or may not have a hash, that means you need to write a proper chunk of code for data validation and parsing (or refactor your upstream code so that you can always expect a hash).

Updating Hash Values in Ruby Clarified

I was going to comment on the original question but I don't have the reputation to do so yet....
I too was wondering how to easily update all the values in a hash, or if there was some kind of equivalent .map! method for hashes. Someone put up this elegant solution:
hash.update(hash){|key,v1| expresion}
on this question:
Ruby: What is the easiest method to update Hash values?
My questions is how does the block know to iterate over each element in the hash? For example, I'd have to call .each on a hash to access each element normally so why isn't it something like:
hash.update(hash.each) do |key ,value|
value+=1
end
In the block with {|key, value| expression} I am accessing each individual hash element yet I don't have to explicitly tell the system this? Why not? Thank you very much.
Hash#update is an alias for Hash#merge! which is more descriptive.
When calling the method with a block, the following happens (excerpt from the docs):
If [a] block is specified, [...] the value of each duplicate key is
determined by calling the block with the key [...]
So, the above code works like this:
The hash is merged with itself, and for each duplicate key the block is called. As we merge the hash with itself, every newly added key is a duplicate and therefore the block is invoked. The result is that every value in the hash gets replaced by expresion.
Hash#update takes a hash as the first parameter, and an optional block as the second parameter. If the second parameter is left out, the method will internally loop on each key-value pair in the supplied hash and use them to merge into the original hash.
If the block (second parameter) is supplied, the method does exactly the same thing. It loops over each key-value in the supplied hash and merges it in. The only difference is where a collision is found (the original hash already has an entry for a specific key). In this case the block is called to help resolve the conflict.
Based on this understanding, simply passing the hash into itself will cause it to loop over every key-value because that's how update always works. Calling .each would be redundant.
To see this more clearly, take a look at the source code for the #update method, and note the internal call to rb_hash_foreach in either logic branch.

JSON generate unique hash value (SHA-512)

I'm searching for a way to generate a SHA-512 hash from a json string in Ruby, independent from the positions of the elements in it, and independent from nestings, arrays, nested arrays and so on. I just want to hash the raw data along with its keys.
I tried some approaches with converting the JSON into a ruby hash, deep sort them by their keys, append everything into one, long string and hash it. But I bet that my solution isn't the most efficient one, and that there must be a better way to do this.
EDIT
So far, I convert JSON into a Ruby hash. Then I try to use this function to get a canonical representation:
def self.canonical_string_from_hash value, key=nil
str = ""
if value.is_a? Hash
value.keys.sort.each do |k|
str += canonical_string_from_hash(value[k], k)
end
elsif value.is_a? Array
str += key.to_s
value.each do |v|
str += canonical_string_from_hash(v)
end
else
str += key ? "#{key}#{value}" : value.to_s
end
return str
end
But I'm not sure, if this is a good and efficient way to do this.
For example, this hash
hash = {
id: 3,
zoo: "test",
global: [
{ukulele: "ringding", blub: 3},
{blub: nil, ukulele: "rangdang", guitar: "stringstring"}
],
foo: {
ids: [3,4,5],
bar: "asdf"
}
}
gets converted to this string:
barasdfids345globalblub3ukuleleringdingblubguitarstringstringukulelerangdangid3zootest
But I'm not sure, if this is a good and efficient way to do this.
Depends on what you are trying to do. Your canonical/equivalent structures need to represent what is important to you for the comparison. Removing details such as object structure makes sense if you consider two items with different structure but same string values equivalent.
According to your comments, you are attempting to sign a request that is being transferred from one system to a second one. In other words you want security, not a measure of similarity or a digital fingerprint for some other purpose. Therefore equivalent requests are ones that are identical in all the ways that affect the processing that you want to protect. It is simpler, and very likely more secure, to lock down the raw bytes of data that transfer between your two systems.
In which case your whole approach needs a re-think. The reasons for that are probably best discussed on security.stackoverflow.com
However, in brief:
Use an HMAC routine (HMAC-SHA512), it is designed for your purpose. Instead of a salt, this uses a secret, which is essentially the same thing (in fact you need to keep your salt a secret in your implementation too, which is unusual for something called a salt), but has been combined with the SHA in a way which makes it resilient to a couple of attack forms possible against simple concatenation followed by SHA. The worst of these is that it is possible to extend the data and have it generate the same SHA when processed, without needing to know the salt. In other words, an attacker could take a known valid request and use it to forge other requests which will get past your security check. Your proposed solution looks vulnerable to this form of attack to me.
Unpacking the request and analysing the details to get a "canonical" view of the request is not necessary, and also reduces the security of your solution. The only reason for doing this is that you are for some reason not able to handle the request once it has been serialised to JSON, and are forced to work only with the de-serialised request at one end or another of the two systems. If that is purely a knowledge or convenience thing, then fix that problem rather than trying to roll your own security protocol using SHA-512.
You should sign the request, and check the signature, against the fully serialised JSON string. If you need to de-serialise data from a "man-in-the-middle" attack, then you are potentially already exposed to some attacks via the parser. You should work to reject suspect requests before any data processing has been done to them.
TL;DR - ALthough not a direct answer to your question, the correct solution for you is to not write this code at all. Instead you need to place your secure signature code closer to the ins and outs of your two services that need to trust each other.

ruby hash autovivification (facets)

Here is a clever trick to enable hash autovivification in ruby (taken from facets):
# File lib/core/facets/hash/autonew.rb, line 19
def self.autonew(*args)
leet = lambda { |hsh, key| hsh[key] = new( &leet ) }
new(*args,&leet)
end
Although it works (of course), I find it really frustrating that I can't figure out how this two liner does what it does.
leet is put as a default value. So that then just accessing h['new_key'] somehow brings it up and creates 'new_key' => {}
Now, I'd expect h['new_key'] returning default value object as opposed to evaluating it. That is, 'new_key' => {} is not automatically created. So how does leet actually get called? Especially with two parameters?
The standard new method for Hash accepts a block. This block is called in the event of trying to access a key in the Hash which does not exist. The block is passed the Hash itself and the key that was requested (the two parameters) and should return the value that should be returned for the requested key.
You will notice that the leet lambda does 2 things. It returns a new Hash with leet itself as the block for handling defaults. This is the behaviour which allows autonew to work for Hashes of arbitrary depth. It also assigns this new Hash to hsh[key] so that next time you request the same key you will get the existing Hash rather than a new one being created.
It's also worth noting that this code can be made into a one-liner as follows:
def self.autonew(*args)
new(*args){|hsh, key| hsh[key] = Hash.new(&hsh.default_proc) }
end
The call to Hash#default_proc returns the proc that was used to create the parent, so we have a nice recursive setup here.
I talk about a similar case to this on my blog.
Alternatively, you might consider my xkeys gem. It's a module that you can use to extend arrays or hashes to facilitate nested access.
If you look for something that doesn't exist yet, you get a nil value (or another value or an exception if you prefer) without creating anything by looking. It can also append to the end of arrays.
You can opt to autovivify either hashes or arrays for integer keys (but just once for the entire structure).

Resources