Search array of symbols - Ruby - ruby

Given an array of symbols, how would you iterate over the array, and store the array indexes of matching pairs?
foo = [:date, :recorded_at, :scheduled_for, :amount, :activity, :pending ]
For example, if you searched for 'recorded_at' and 'activity', it should return [1,4]
I thought something like this would work:
bar = ['recorded_at','activity']
buzz = bar.each{|i| foo.index(i.to_sym)}
However this just returns the strings, ["foo", "bar"], not the actual array indexes.

Use Array#map instead of Array#each:
buzz = bar.map{|i| foo.index(i.to_sym)}
#=> [1, 4]

An efficient way, particularly if you have to do this multiple times, for different values of bar, or if the arrays are large, would be to first convert foo to a hash:
foo = [:date, :recorded_at, :scheduled_for, :amount, :activity, :pending ]
foo_hash = Hash[foo.map(&:to_s).zip([*0...foo.size]).to_h]
#=> {"date"=>0, "recorded_at"=>1, "scheduled_for"=>2,
# "amount"=>3, "activity"=>4, "pending"=>5}
Then it is a simple hash lookup for any value of bar:
bar = ['recorded_at','activity']
foo_hash.values_at(*bar)
#=> [1, 4]
For Ruby 1.9+ Hash[arr] can be replaced by arr.to_h.
If n=foo.size and m=bar.size, only m hash lookups would be required after constructing the hash. By constrast, for each element of bar, there would be, on average, n/2 comparisons to find the index of an element of foo that matches the element of bar, for a total of m*n/2 comparisons.

Related

Simplest way in Ruby to iterate over element plus next element, where "the next element" is nil on the last iteration?

I want to loop over an array like [:foo, :bar, :baz].
On each iteration, I want the item and the next item. On the last iteration, the "next item" should be nil.
So I want to yield in turn :foo, :bar, then :bar, :baz, and finally :baz, nil.
I can think of a few ways of achieving this:
my_list.zip(my_list.from(1)) { |item, next_item| … }
my_list.each.with_index(1) { |item, i| next_item = my_list[i] }
[*my_list, nil].each_cons(2) { |item, next_item| … }
But I feel like I might be missing some simpler way. Am I?
If you only every have three non-nil elements in your Array and always want two elements at a time from #each_cons, then just append a nil and call it a day.
%i[foo bar baz].push(nil).each_cons(2).map { [_1, _2] }
#=> [[:foo, :bar], [:bar, :baz], [:baz, nil]]
Alternatively, you can just reference each indexed element of the named Array by the current or successor index number, because any reference to elements outside the Array's bounds will be nil. For example:
array = %i[foo bar baz]
array.each_index.map { [array[_1], array[_1.succ]] }
#=> [[:foo, :bar], [:bar, :baz], [:baz, nil]]
In either case, you'll need #map or some other mechanism if you want to return an Array of Array objects. If you want to do something else, like yield, you can do that too.
Note that if you're going to have a variable number of Array elements, or make changes to the number of #each_cons items, then you'll need a different approach. If that is the case, feel free to open a new or related question.

Flatten deep nested hash to array for sha1 hashing

I want to compute an unique sha1 hash from a ruby hash. I thought about
(Deep) Converting the Hash into an array
Sorting the array
Join array by empty string
calculate sha1
Consider the following hash:
hash = {
foo: "test",
bar: [1,2,3]
hello: {
world: "world",
arrays: [
{foo: "bar"}
]
}
}
How can I get this kind of nested hash into an array like
[:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
I would then sort the array, join it with array.join("") and compute the sha1 hash like this:
require 'digest/sha1'
Digest::SHA1.hexdigest hash_string
How could I flatten the hash like I described above?
Is there already a gem for this?
Is there a quicker / easier way to solve this? I have a large amount of objects to convert (~700k), so performance does matter.
EDIT
Another problem that I figured out by the answers below are this two hashes:
a = {a: "a", b: "b"}
b = {a: "b", b: "a"}
When flattening the hash and sorting it, this two hashes produce the same output, even when a == b => false.
EDIT 2
The use case for this whole thing is product data comparison. The product data is stored inside a hash, then serialized and sent to a service that creates / updates the product data.
I want to check if anything has changed inside the product data, so I generate a hash from the product content and store it in a database. The next time the same product is loaded, I calculate the hash again, compare it to the one in the DB and decide wether the product needs an update or not.
EDIT : As you detailed, two hashes with keys in different order should give the same string. I would reopen the Hash class to add my new custom flatten method :
class Hash
def custom_flatten()
self.sort.map{|pair| ["key: #{pair[0]}", pair[1]]}.flatten.map{ |elem| elem.is_a?(Hash) ? elem.custom_flatten : elem }.flatten
end
end
Explanation :
sort converts the hash to a sorted array of pairs (for the comparison of hashes with different keys order)
.map{|pair| ["key: #{pair[0]}", pair[1]]} is a trick to differentiate keys from values in the final flatten array, to avoid the problem of {a: {b: {c: :d}}}.custom_flatten == {a: :b, c: :d}.custom_flatten
flatten converts an array of arrays into a single array of values
map{ |elem| elem.is_a?(Hash) ? elem.custom_flatten : elem } calls back fully_flatten on any sub-hash left.
Then you just need to use :
require 'digest/sha1'
Digest::SHA1.hexdigest hash.custom_flatten.to_s
I am not aware of a gem that does something like what you are looking for. There is a Hash#flatten method in ruby, but it does not flatten nested hashes recursively. Here is a straight forward recursive function that will flatten in the way that you requested in your question:
def completely_flatten(hsh)
hsh.flatten(-1).map{|el| el.is_a?(Hash) ? completely_flatten(el) : el}.flatten
end
This will yield
hash = {
foo: "test",
bar: [1,2,3]
hello: {
world: "earth",
arrays: [
{my: "example"}
]
}
}
completely_flatten(hash)
#=> [:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
To get the string representation you are looking for (before making the sha1 hash) convert everything in the array to a string before sorting so that all of the elements can be meaningfully compared or else you will get an error:
hash_string = completely_flatten(hash).map(&:to_s).sort.join
#=> "123arraysbarearthexamplefoohellomytestworld"
The question is how to "flatten" a hash. There is a second, implicit, question concerning sha1, but, by SO rules, that needs to be addressed in a separate question. You can "flatten" any hash or array as follows.
Code
def crush(obj)
recurse(obj).flatten
end
def recurse(obj)
case obj
when Array then obj.map { |e| recurse e }
when Hash then obj.map { |k,v| [k, recurse(v)] }
else obj
end
end
Example
crush({
foo: "test",
bar: [1,2,3],
hello: {
world: "earth",
arrays: [{my: "example"}]
}
})
#=> [:foo, "test", :bar, 1, 2, 3, :hello, :world, "earth", :arrays, :my, "example"]
crush([[{ a:1, b:2 }, "cat", [3,4]], "dog", { c: [5,6] }])
#=> [:a, 1, :b, 2, "cat", 3, 4, "dog", :c, 5, 6]
Use Marshal for Fast Serialization
You haven't articulated a useful reason to change your data structure before hashing. Therefore, you should consider marshaling for speed unless your data structures contain unsupported objects like bindings or procs. For example, using your hash variable with the syntax corrected:
require 'digest/sha1'
hash = {
foo: "test",
bar: [1,2,3],
hello: {
world: "world",
arrays: [
{foo: "bar"}
]
}
}
Digest::SHA1.hexdigest Marshal.dump(hash)
#=> "f50bc3ceb514ae074a5ab9672ae5081251ae00ca"
Marshal is generally faster than other serialization options. If all you need is speed, that will be your best bet. However, you may find that JSON, YAML, or a simple #to_s or #inspect meet your needs better for other reasons. As long as you are comparing similar representations of your object, the internal format of the hashed object is largely irrelevant to ensuring you have a unique or unmodified object.
Any solution based on flattening the hash will fail for nested hashes. A robust solution is to explicitly sort the keys of each hash recursively (from ruby 1.9.x onwards, hash keys order is preserved), and then serialize it as a string and digest it.
def canonize_hash(h)
r = h.map { |k, v| [k, v.is_a?(Hash) ? canonize_hash(v) : v] }
Hash[r.sort]
end
def digest_hash(hash)
Digest::SHA1.hexdigest canonize_hash(hash).to_s
end
digest_hash({ foo: "foo", bar: "bar" })
# => "ea1154f35b34c518fda993e8bb0fe4dbb54ae74a"
digest_hash({ bar: "bar", foo: "foo" })
# => "ea1154f35b34c518fda993e8bb0fe4dbb54ae74a"

Unexpected result with splat operator

I have a hash, whose values are an array of size 1:
hash = {:start => [1]}
I want to unpack the arrays as in:
hash.each_pair{ |key, value| hash[key] = value[0] } # => {:start=>1}
and I thought the *-operator as in the following would work, but it does not give the expected result:
hash.each_pair{ |key, value| hash[key] = *value } # => {:start=>[1]}
Why does *value return [1] and not 1?
Because the []= method applied to hash takes only one argument in addition to the key (which is put inside the [] part), and a splatted/expanded array, which is in general a sequence of values (which coincidentally happens to be a single element in this particular case) cannot be directly accepted as the argument as is splatted. So it is accepted by the argument of []= as an array after all.
In other words, an argument (of the []= method) must be an object, but splatted elements (such as :foo, :bar, :baz) are not an object. The only way to interpret them as an object is to put them back into an array (such as [:foo, :bar, :baz]).
Using the splat operator, you can do it like this:
hash.each_pair{|key, value| hash.[]= key, *value}
sawa and Ninigi already pointed out why the assignment doesn't work as expected. Here's my attempt.
Ruby's assignment features work regardless of whether you're assigning to a variable, a constant or by implicitly invoking an assignment method like Hash#[]= with the assignment operator. For the sake of simplicity, I'm using a variable in the following examples.
Using the splat operator in an assignment does unpack the array, i.e.
a = *[1, 2, 3]
is evaluated as:
a = 1, 2, 3
But Ruby also allows you to implicitly create arrays during assignment by listing multiple values. Therefore, the above is in turn equivalent to:
a = [1, 2, 3]
That's why *[1] results in [1] - it's unpacked, just to be converted back to an array.
Elements can be assigned separately using multiple assignment:
a, b = [1, 2, 3]
a #=> 1
b #=> 2
or just:
a, = [1, 2, 3]
a #=> 1
You could use this in your code (note the comma after hash[key]):
hash = {:start => [1]}
hash.each_pair { |key, values| hash[key], = values }
#=> {:start=>1}
But there's another and more elegant way: you can unpack the array by putting parentheses around the array argument:
hash = {:start => [1]}
hash.each_pair { |key, (value)| hash[key] = value }
#=> {:start=>1}
The parentheses will decompose the array, assigning the first array element to value.
Because Ruby is acting unexpectedly smart here.
True, the splash operator will "fold" and "unfold" an array, but the catch in your code is what you do with that fanned value.
Take this code into account:
array = ['a', 'b']
some_var = *array
array # => ['a', 'b']
As you can see the splat operator seemingly does nothing to your array, while this:
some_var, some_other_var = *array
some_var # => "a"
somet_other_var # => "b"
Will do what you'd expect it does.
It seems ruby just "figures" if you splat an array into a single variable, that you want the array, not the values.
EDIT: As sawa pointed out in the comments, hash[key] = is not identical to variable =. []= is an instance Method of Hash, with it's own C-Code under the hood, which COULD (in theory) lead to different behaviour in some instances. I don't know of any example, but that does not mean there is none.
But for the sake of simplicity, we can asume that the regular variable assignment behaves exactly identical to hash[key] =.

Ruby how to return an element of a dictionary?

# dictionary = {"cat"=>"Sam"}
This a return a key
#dictionary.key(x)
This returns a value
#dictionary[x]
How do I return the entire element
"cat"=>"Sam"
#dictionary
should do the trick for you
whatever is the last evaluated expression in ruby is the return value of a method.
If you want to return the hash as a whole. the last line of the method should look like the line I have written above
Your example is a bit (?) misleading in a sense it only has one pair (while not necessarily), and you want to get one pair. What you call a "dictionary" is actually a hashmap (called a hash among Rubyists).
A hashrocket (=>) is a part of hash definition syntax. It can't be used outside it. That is, you can't get just one pair without constructing a new hash. So, a new such pair would look as: { key => value }.
So in order to do that, you'll need a key and a value in context of your code somewhere. And you've specified ways to get both if you have one. If you only have a value, then:
{ #dictionary.key(x) => x }
...and if just a key, then:
{ x => #dictionary[x] }
...but there is no practical need for this. If you want to process each pair in a hash, use an iterator to feed each pair into some code as an argument list:
#dictionary.each do |key, value|
# do stuff with key and value
end
This way a block of code will get each pair in a hash once.
If you want to get not a hash, but pairs of elements it's constructed of, you can convert your hash to an array:
#dictionary.to_a
# => [["cat", "Sam"]]
# Note the double braces! And see below.
# Let's say we have this:
#dictionary2 = { 1 => 2, 3 => 4}
#dictionary2[1]
# => 2
#dictionary2.to_a
# => [[1, 2], [3, 4]]
# Now double braces make sense, huh?
It returns an array of pairs (which are arrays as well) of all elements (keys and values) that your hashmap contains.
If you wish to return one element of a hash h, you will need to specify the key to identify the element. As the value for key k is h[k], the key-value pair, expressed as an array, is [k, h[k]]. If you wish to make that a hash with a single element, use Hash[[[k, h[k]]]].
For example, if
h = { "cat"=>"Sam", "dog"=>"Diva" }
and you only wanted to the element with key "cat", that would be
["cat", h["cat"]] #=> ["cat", "Sam"]
or
Hash[[["cat", h["cat"]]]] #=> {"cat"=>"Sam"}
With Ruby 2.1 you could alternatively get the hash like this:
[["cat", h["cat"]]].to_h #=> {"cat"=>"Sam"}
Let's look at a little more interesting case. Suppose you have an array arr containing some or all of the keys of a hash h. Then you can get all the key-value pairs for those keys by using the methods Enumerable#zip and Hash#values_at:
arr.zip(arr.values_at(*arr))
Suppose, for example,
h = { "cat"=>"Sam", "dog"=>"Diva", "pig"=>"Petunia", "owl"=>"Einstein" }
and
arr = ["dog", "owl"]
Then:
arr.zip(h.values_at(*arr))
#=> [["dog", "Diva"], ["owl", "Einstein"]]
In steps:
a = h.values_at(*arr)
#=> h.values_at(*["dog", "owl"])
#=> h.values_at("dog", "owl")
#=> ["Diva", "Einstein"]
arr.zip(a)
#=> [["dog", "Diva"], ["owl", "Einstein"]]
To instead express as a hash:
Hash[arr.zip(h.values_at(*arr))]
#=> {"dog"=>"Diva", "owl"=>"Einstein"}
You can get the key and value in one go - resulting in an array:
#h = {"cat"=>"Sam", "dog"=>"Phil"}
key, value = p h.assoc("cat") # => ["cat", "Sam"]
Use rassoc to search by value ( .rassoc("Sam") )

Looping on array

I'm having some trouble figuring out the right way to do this:
I have an array and a separate array of arrays that I want to compare to the first array. The first array is a special Enumerable object that happens to contain an array.
Logic tells me that I should be able to do this:
[1,2,3].delete_if do |n|
[[2,4,5], [3,6,7]].each do |m|
! m.include?(n)
end
end
Which I would expect to return
=> [2,3]
But it returns [] instead.
This idea works if I do this:
[1,2,3].delete_if do |n|
! [2,4,5].include?(n)
end
It will return
=> [2]
I can't assign the values to another object, as the [1,2,3] array must stay its special Enumerable object. I'm sure there is a much simpler explanation to this than what I'm trying. Anybody have any ideas?
You can also flatten the multi-dimensional array and use the Array#& intersection operator to get the same result:
# cast your enumerable to array with to_a
e = [1,2,3].each
e.to_a & [[2,4,5], [3,6,7]].flatten
# => [2, 3]
Can't you just add the two inner array together, and and check the inclusion on the concatenated array?
[1,2,3].delete_if do |n|
!([2,4,5] + [3,6,7]).include?(n)
end
The problem is that the return value of each is the array being iterated over, not the boolean (which is lost). Since the array is truthy, the value returned back to delete_if is always true, so all elements are deleted. You should instead use any?:
[1,2,3].delete_if do |n|
![[2,4,5], [3,6,7]].any? do |m|
m.include?(n)
end
end
#=> [2, 3]

Resources