How to convert only strings hash values that are numbers to integers - ruby

I have rows of hashes imported from several different XML database dumps that look like this (but with varying keys):
{"Id"=>"1", "Name"=>"Cat", "Description"=>"Feline", "Count"=>"123"}
I tried using #to_i but it converts a non-number string to 0:
"Feline".to_i
# => 0
But what I'd like is a way for "Feline" to remain a string, while Id and Count in the above example become integers 1 and 123.
Is there an easy way to convert only the strings values that are numbers into integers?

One line answer:
Using regex approach
h.merge(h) { |k, v| v.match(/\A[+-]?\d+?(\.\d+)?\Z/) ? v.to_i : v }
Using Integer approach
h.merge(h) { |k, v| Integer(v) rescue v }

use Kernel#Integer:
my_hash = {"Id"=>"1", "Name"=>"Cat", "Description"=>"Feline", "Count"=>"123"}
Hash[ my_hash.map{ |a, b| [ a,
begin
Integer b
rescue ArgumentError
b
end ] } ]
ADDED LATER: With my y_support gem, you can make hash operations even more concise.
require 'y_support/core_ext/hash'
my_hash.with_values { |v| begin
Integer b
rescue ArgumentError
b
end }
YSupport can be installed by gem install y_support and also offers Hash#with_keys, Hash#with_values!, Hash#with_keys! that do what you expect they do, and Hash#modify that expects a binary block returning a pair of values, modifying the hash in place. There have been proposals to add such methods directly to the Ruby core in the future.

I think you know what fields should be integers (your consuming code probably depends on it), so I would recommend you convert the specific fields.
c = Hash[h.map { |k,v| [k, %w(Id Count).include?(k) ? Integer(v) : v ] }]

I had a similar problem to solve, where results for pesticide analysis came into the system as a heterogeneous (bad design!) format... negative integers as special codes (not detected, not tested, not quantified etc...), nil as synonym as not detected, floats for quantified compounds and strings for pass/fail boolean... Hold your horses, this is a 10 years old-running in production, never greenfield-ed highly patched app ;)
Two things that I learned from top rubists:
0) DON'T ITERATE-MODIFY AN ENUMERABLE (return a copy)
1) YOUR REGEX WON'T COVER ALL CASES
While I am not a big fan of rescue, I think it fits the purpose of keeping the code clean. So, I've been using this to mitigate my input:
ha = {
"p_permethrin" => nil,
"p_acequinocyl"=>"0.124",
"p_captan"=>"2.12",
"p_cypermethrin"=>"-6",
"p_cyfluthrin"=>"-6",
"p_fenhexamid"=>"-1",
"p_spinetoram"=>"-6",
"p_pentachloronitrobenzene"=>"-6",
"p_zpass"=>"true"
}
Hash[ha.map{|k,v| [k, (Float(v) rescue v)]}] # allows nil
Hash[ha.map{|k,v| [k, (Float(v) rescue v.to_s)]}] # nit to empty string
I would even
class Hash
# return a copy of the hash, where values are evaluated as Integer and Float
def evaluate_values
Hash[self.map{|k,v| [k, (Float(v) rescue v)]}]
end
end

Using a regex and the ternary operator, you could incorporate this into the logic somewhere:
string =~ /^\d+$/ ? string.to_i : string

This will handle not only integers but all numbers.
my_hash = {"Id"=>"1", "Name"=>"Cat", "Description"=>"Feline", "Count"=>"123"}
result = my_hash.inject({}) { |result,(key,value)|
if value.match(/^\s*[+-]?((\d+_?)*\d+(\.(\d+_?)*\d+)?|\.(\d+_?)*\d+)(\s*|([eE][+-]?(\d+_?)*\d+)\s*)$/)
result[key.to_sym] = value.to_i
else
result[key.to_sym] = value
end
result
}
Thanks to Determine if a string is a valid float value for regexp

Define a new method for String: String#to_number
class String
def to_number
Integer(self) rescue Float(self) rescue self
end
end
Test it:
"1".to_number => 1
"Cat".to_number => "Cat"

Related

Ruby Set with custom class to equal basic strings

I want to be able to find a custom class in my set given just a string. Like so:
require 'set'
Rank = Struct.new(:name, keyword_init: true) {
def hash
name.hash
end
def eql?(other)
hash == other.hash
end
def ==(other)
hash == other.hash
end
}
one = Rank.new(name: "one")
two = Rank.new(name: "two")
set = Set[one, two]
but while one == "one" and one.eql?("one") are both true, set.include?("one") is still false. what am i missing?
thanks!
Set is built upon Hash, and Hash considers two objects the same if:
[...] their hash value is identical and the two objects are eql? to each other.
What you are missing is that eql? isn't necessarily commutative. Making Rank#eql? recognize strings doesn't change the way String#eql? works:
one.eql?('one') #=> true
'one'.eql?(one) #=> false
Therefore it depends on which object is the hash key and which is the argument to include?:
Set['one'].include?(one) #=> true
Set[one].include?('one') #=> false
In order to make two objects a and b interchangeable hash keys, 3 conditions have to be met:
a.hash == b.hash
a.eql?(b) == true
b.eql?(a) == true
But don't try to modify String#eql? – fiddling with Ruby's core classes isn't recommended and monkey-patching probably won't work anyway because Ruby usually calls the C methods directly for performance reasons.
In fact, making both hash and eql? mimic name doesn't seem like a good idea in the first place. It makes the object's identity ambiguous which can lead to very strange behavior and hard to find bugs:
h = { one => 1, 'one' => 1 }
#=> {#<struct Rank name="one">=>1, "one"=>1}
# vs
h = { 'one' => 1, one => 1 }
#=> {"one"=>1}
what am i missing?
What you are missing is that "one" isn't in your set. one is in your set, but "one" isn't.
Therefore, the answer Ruby is giving you is perfectly correct.
All that you have done with your implementation of Rank is that any two ranks with the same name are considered to be the same by a Hash, Set, or Array#uniq. But, a Rank is not the same as a String.
If you want to be able to have a set-like data structure where you can look up things by one of their attributes, you will have to write it yourself.
Something like (untested):
class RankSet < Set
def [](*args)
super(*args.map(&:name))
end
def each
return enum_for(__callee__) unless block_given?
super {|e| yield e.name }
end
end
might get you started.
Or, instead of writing your own set, you can just use the fact that any arbitrary rank with the right name can be used for lookup:
set.include?(Rank.new(name: "one"))
#=> true
# even though it is a *different* `Rank` object

Ruby Hash destructive vs. non-destructive method

Could not find a previous post that answers my question...I'm learning how to use destructive vs. non-destructive methods in Ruby. I found an answer to the exercise I'm working on (destructively adding a number to hash values), but I want to be clear on why some earlier solutions of mine did not work. Here's the answer that works:
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| the_hash[k] = v + number_to_add_to_each_value}
end
These two solutions come back as non-destructive (since they all use "each" I cannot figure out why. To make something destructive is it the equals sign above that does the trick?):
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each_value { |v| v + number_to_add_to_each_value}
end
def modify_a_hash(the_hash, number_to_add_to_each_value)
the_hash.each { |k, v| v + number_to_add_to_each_value}
end
The terms "destructive" and "non-destructive" are a bit misleading here. Better is to use the conventional "in-place modification" vs. "returns a copy" terminology.
Generally methods that modify in-place have ! at the end of their name to serve as a warning, like gsub! for String. Some methods that pre-date this convention do not have them, like push for Array.
The = performs an assignment within the loop. Your other examples don't actually do anything useful since each returns the original object being iterated over regardless of any results produced.
If you wanted to return a copy you'd do this:
def modify_a_hash(the_hash, number_to_add)
Hash[
the_hash.collect do |k, v|
[ k, v + number_to_add ]
end
]
end
That would return a copy. The inner operation collect transforms key-value pairs into new key-value pairs with the adjustment applied. No = is required since there's no assignment.
The outer method Hash[] transforms those key-value pairs into a proper Hash object. This is then returned and is independent of the original.
Generally a non-destructive or "return a copy" method needs to create a new, independent version of the thing it's manipulating for the purpose of storing the results. This applies to String, Array, Hash, or any other class or container you might be working with.
Maybe this slightly different example will be helpful.
We have a hash:
2.0.0-p481 :014 > hash
=> {1=>"ann", 2=>"mary", 3=>"silvia"}
Then we iterate over it and change all the letters to the uppercase:
2.0.0-p481 :015 > hash.each { |key, value| value.upcase! }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}
The original hash has changed because we used upcase! method.
Compare to method without ! sign, that doesn't modify hash values:
2.0.0-p481 :017 > hash.each { |key, value| value.downcase }
=> {1=>"ANN", 2=>"MARY", 3=>"SILVIA"}

How can I know how many parameters a method passes to a block?

If I have two variables like a and h.
a = ["cat", "dog", "mat"]
h = {cat: 'gatto', dog: 'cane', mat: 'stuoia'} # (Italian translations)
And I call the method .each on them, if I don't know the kind of object they are pointing to, how can I know that the block passed to a.each can take one parameter and the block passed to b.each can take two?
In other words, when I pass a block to a method, how can I know how many block parameters the method will set?
Is there some_method which returns the number of parameters a block should take? So that obj.general_method_that_takes_a_block.some_method would return the number of parameters that general_method_that_takes_a_block passes to its block?
A straightforward way is:
a.each{|e| p [*e].length}
# => 1 1 1
h.each{|e| p [*e].length}
# => 2 2 2
The each blocks always gets a single parameter, it never gets two. In the Hash case, when you do this:
h.each { |k, v| ... }
Ruby is, more or less, doing this behind your back:
h.each { |a| k, v = a; ... }
So you could check if the block's argument is an Array:
e.each do |x|
if x.kind_of? Array
# e might be a Hash
else
# e might be an Array
end
end
The problem is that e might be something like [ [1,2], [3,4] ] which would incorrectly put you into the might be a Hash branch; this sort of e will also fool a [*e].length check.
I don't think there is any clean and simple way to know what you're iterating over from inside the block.

Ruby regex selecting multiple words at the same time

I have a hash that I am using regex on to select what key/value pairs I want. Here is the method I have written:
def extract_gender_race_totals(gender, race)
totals = #data.select {|k,v| k.to_s.match(/(#{gender})(#{race})/)}
temp = 0
totals.each {|key, value| temp += value}
temp
end
the hash looks like this:
#data = {
:number_of_african_male_senior_managers=>2,
:number_of_coloured_male_senior_managers=>0,
:number_of_indian_male_senior_managers=>0,
:number_of_white_male_senior_managers=>0,
:number_of_african_female_senior_managers=>0,
:number_of_coloured_female_senior_managers=>0,
:number_of_indian_female_senior_managers=>0,
:number_of_white_female_senior_managers=>0,
:number_of_african_male_middle_managers=>2,
:number_of_coloured_male_middle_managers=>0,
:number_of_indian_male_middle_managers=>0,
:number_of_white_male_middle_managers=>0,
:number_of_african_female_middle_managers=>0,
:number_of_coloured_female_middle_managers=>0,
:number_of_indian_female_middle_managers=>0,
:number_of_white_female_middle_managers=>0,
:number_of_african_male_junior_managers=>0,
:number_of_coloured_male_junior_managers=>0,
:number_of_indian_male_junior_managers=>0,
:number_of_white_male_junior_managers=>0,
:number_of_african_female_junior_managers=>0,
:number_of_coloured_female_junior_managers=>0,
:number_of_indian_female_junior_managers=>0,
:number_of_white_female_junior_managers=>0
}
but it's re-populated with data after a SQL Query.
I would like to make it so that the key must contain both the race and the gender in order for it to return something. Otherwise it must return 0. Is this right or is the regex syntax off?
It's returning 0 for all, which it shouldn't.
So the example would be
%td.total_cell= #ee_demographics_presenter.extract_gender_race_totals("male","african")
This would return 4, there are 4 African, male managers.
Try something like this.
def extract_gender_race_totals(gender, race)
#data.select{|k, v| k.to_s.match(/#{race}_#{gender}/)}.values.reduce(:+)
end
extract_gender_race_totals("male", "african")
# => 4
gmalete's answer gives an elegant solution, but here is just an explanation of why your regexp isn't quite right. If you corrected the regexp I think your approach would work, it just isn't as idiomatic Ruby.
/(#{gender})(#{race})/ won't match number_of_african_male_senior_managers for 2 reasons:
1) the race comes before the gender in the hash key and 2) there is an underscore in the hash key that needs to be in the regexp. e.g.
/(#{race})_(#{gender})/
would work, but the parentheses aren't needed so this can be simplified to
/#{race}_#{gender}/
Rather than having specific methods to query pieces of your keys (i.e. "gender_race"), you could make a general method to query any attribute in any order:
def extract_totals(*keywords)
keywords.inject(#data) { |memo, keyword| memo.select { |k, v| k.to_s =~ /_#{keyword}(?:_|\b)/ } }.values.reduce(:+)
end
Usage:
extract_totals("senior")
extract_totals("male", "african")
extract_totals("managers") # maybe you'll have _employees later...
# etc.
Not exactly what you asked for, but maybe it will help.

Search ruby hash for empty value

I have a ruby hash like this
h = {"a" => "1", "b" => "", "c" => "2"}
Now I have a ruby function which evaluates this hash and returns true if it finds a key with an empty value. I have the following function which always returns true even if all keys in the hash are not empty
def hash_has_blank(hsh)
hsh.each do |k,v|
if v.empty?
return true
end
end
return false
end
What am I doing wrong here?
Try this:
def hash_has_blank hsh
hsh.values.any? &:empty?
end
Or:
def hash_has_blank hsh
hsh.values.any?{|i|i.empty?}
end
If you are using an old 1.8.x Ruby
I hope you're ready to learn some ruby magic here. I wouldn't define such a function globally like you did. If it's an operation on a hash, than it should be an instance method on the Hash class you can do it like this:
class Hash
def has_blank?
self.reject{|k,v| !v.nil? || v.length > 0}.size > 0
end
end
reject will return a new hash with all the empty strings, and than it will be checked how big this new hash is.
a possibly more efficient way (it shouldn't traverse the whole array):
class Hash
def has_blank?
self.values.any?{|v| v.nil? || v.length == 0}
end
end
But this will still traverse the whole hash, if there is no empty value
I've changed the empty? to !nil? || length >0 because I don't know how your empty method works.
If you just want to check if any of the values is an empty string you could do
h.has_value?('')
but your function seems to work fine.
I'd consider refactoring your model domain. Obviously the hash represents something tangible. Why not make it an object? If the item can be completely represented by a hash, you may wish to subclass Hash. If it's more complicated, the hash can be an attribute.
Secondly, the reason for which you are checking blanks can be named to better reflect your domain. You haven't told us the "why", but let's assume that your Item is only valid if it doesn't have any blank values.
class MyItem < Hash
def valid?
!invalid?
end
def invalid?
values.any?{|i| i.empty?}
end
end
The point is, if you can establish a vocabulary that makes sense in your domain, your code will be cleaner and more understandable. Using a Hash is just a means to an end and you'd be better off using more descriptive, domain-specific terms.
Using the example above, you'd be able to do:
my_item = MyItem["a" => "1", "b" => "", "c" => "2"]
my_item.valid? #=> false

Resources