Replace keys from the hash by values from the hash - ruby

I have a hash (with hundreds of pairs) and I have a string.
I want to replace in this string all occurrences of keys from the hash to according values from the hash.
I understand that I can do something like this
some_hash.each { |key, value| str = str.gsub(key, value) }
However, I am wondering whether there is some better (performance wise) method to do this.

You only need to run gsub once. Since regex (oniguruma) is implemented in C, it should be faster than looping within Ruby.
some_hash = {
"a" => "A",
"b" => "B",
"c" => "C",
}
"abcdefgabcdefg".gsub(Regexp.union(some_hash.keys), some_hash)
# => "ABCdefgABCdefg"

Some benchmarks:
require 'benchmark'
SOME_HASH = Hash[('a'..'z').zip('A'..'Z')]
SOME_REGEX = Regexp.union(SOME_HASH.keys)
SHORT_STRING = ('a'..'z').to_a.join
LONG_STRING = SHORT_STRING * 100
N = 10_000
def sub1(str)
SOME_HASH.each { |key, value|
str = str.gsub(key, value)
}
str
end
def sub2(str)
SOME_HASH.each { |key, value|
str.gsub!(key, value)
}
str
end
def sub_regex(str)
str.gsub(SOME_REGEX, SOME_HASH)
end
puts RUBY_VERSION
puts "#{ N } loops"
puts
puts "sub1: #{ sub1(SHORT_STRING) }"
puts "sub2: #{ sub2(SHORT_STRING) }"
puts "sub_regex: #{ sub_regex(SHORT_STRING) }"
puts
Benchmark.bm(10) do |b|
b.report('gsub') { N.times { sub1(LONG_STRING) } }
b.report('gsub!') { N.times { sub2(LONG_STRING) } }
b.report('regex') { N.times { sub_regex(LONG_STRING) } }
end
Which outputs:
1.9.3
10000 loops
sub1: ABCDEFGHIJKLMNOPQRSTUVWXYZ
sub2: ABCDEFGHIJKLMNOPQRSTUVWXYZ
sub_regex: ABCDEFGHIJKLMNOPQRSTUVWXYZ
user system total real
gsub 14.360000 0.030000 14.390000 ( 14.412178)
gsub! 1.940000 0.010000 1.950000 ( 1.957591)
regex 0.080000 0.000000 0.080000 ( 0.075038)

Related

Convert Hash to OpenStruct recursively

Given I have this hash:
h = { a: 'a', b: 'b', c: { d: 'd', e: 'e'} }
And I convert to OpenStruct:
o = OpenStruct.new(h)
=> #<OpenStruct a="a", b="b", c={:d=>"d", :e=>"e"}>
o.a
=> "a"
o.b
=> "b"
o.c
=> {:d=>"d", :e=>"e"}
2.1.2 :006 > o.c.d
NoMethodError: undefined method `d' for {:d=>"d", :e=>"e"}:Hash
I want all the nested keys to be methods as well. So I can access d as such:
o.c.d
=> "d"
How can I achieve this?
You can monkey-patch the Hash class
class Hash
def to_o
JSON.parse to_json, object_class: OpenStruct
end
end
then you can say
h = { a: 'a', b: 'b', c: { d: 'd', e: 'e'} }
o = h.to_o
o.c.d # => 'd'
See Convert a complex nested hash to an object.
I came up with this solution:
h = { a: 'a', b: 'b', c: { d: 'd', e: 'e'} }
json = h.to_json
=> "{\"a\":\"a\",\"b\":\"b\",\"c\":{\"d\":\"d\",\"e\":\"e\"}}"
object = JSON.parse(json, object_class:OpenStruct)
object.c.d
=> "d"
So for this to work, I had to do an extra step: convert it to json.
personally I use the recursive-open-struct gem - it's then as simple as RecursiveOpenStruct.new(<nested_hash>)
But for the sake of recursion practice, I'll show you a fresh solution:
require 'ostruct'
def to_recursive_ostruct(hash)
result = hash.each_with_object({}) do |(key, val), memo|
memo[key] = val.is_a?(Hash) ? to_recursive_ostruct(val) : val
end
OpenStruct.new(result)
end
puts to_recursive_ostruct(a: { b: 1}).a.b
# => 1
edit
Weihang Jian showed a slight improvement to this here https://stackoverflow.com/a/69311716/2981429
def to_recursive_ostruct(hash)
hash.each_with_object(OpenStruct.new) do |(key, val), memo|
memo[key] = val.is_a?(Hash) ? to_recursive_ostruct(val) : val
end
end
Also see https://stackoverflow.com/a/63264908/2981429 which shows how to handle arrays
note
the reason this is better than the JSON-based solutions is because you can lose some data when you convert to JSON. For example if you convert a Time object to JSON and then parse it, it will be a string. There are many other examples of this:
class Foo; end
JSON.parse({obj: Foo.new}.to_json)["obj"]
# => "#<Foo:0x00007fc8720198b0>"
yeah ... not super useful. You've completely lost your reference to the actual instance.
Here's a recursive solution that avoids converting the hash to json:
def to_o(obj)
if obj.is_a?(Hash)
return OpenStruct.new(obj.map{ |key, val| [ key, to_o(val) ] }.to_h)
elsif obj.is_a?(Array)
return obj.map{ |o| to_o(o) }
else # Assumed to be a primitive value
return obj
end
end
My solution is cleaner and faster than #max-pleaner's.
I don't actually know why but I don't instance extra Hash objects:
def dot_access(hash)
hash.each_with_object(OpenStruct.new) do |(key, value), struct|
struct[key] = value.is_a?(Hash) ? dot_access(value) : value
end
end
Here is the benchmark for you reference:
require 'ostruct'
def dot_access(hash)
hash.each_with_object(OpenStruct.new) do |(key, value), struct|
struct[key] = value.is_a?(Hash) ? dot_access(value) : value
end
end
def to_recursive_ostruct(hash)
result = hash.each_with_object({}) do |(key, val), memo|
memo[key] = val.is_a?(Hash) ? to_recursive_ostruct(val) : val
end
OpenStruct.new(result)
end
require 'benchmark/ips'
Benchmark.ips do |x|
hash = { a: 1, b: 2, c: { d: 3 } }
x.report('dot_access') { dot_access(hash) }
x.report('to_recursive_ostruct') { to_recursive_ostruct(hash) }
end
Warming up --------------------------------------
dot_access 4.843k i/100ms
to_recursive_ostruct 5.218k i/100ms
Calculating -------------------------------------
dot_access 51.976k (± 5.0%) i/s - 261.522k in 5.044482s
to_recursive_ostruct 50.122k (± 4.6%) i/s - 250.464k in 5.008116s
My solution, based on max pleaner's answer and similar to Xavi's answer:
require 'ostruct'
def initialize_open_struct_deeply(value)
case value
when Hash
OpenStruct.new(value.transform_values { |hash_value| send __method__, hash_value })
when Array
value.map { |element| send __method__, element }
else
value
end
end
Here is one way to override the initializer so you can do OpenStruct.new({ a: "b", c: { d: "e", f: ["g", "h", "i"] }}).
Further, this class is included when you require 'json', so be sure to do this patch after the require.
class OpenStruct
def initialize(hash = nil)
#table = {}
if hash
hash.each_pair do |k, v|
self[k] = v.is_a?(Hash) ? OpenStruct.new(v) : v
end
end
end
def keys
#table.keys.map{|k| k.to_s}
end
end
Basing a conversion on OpenStruct works fine until it doesn't. For instance, none of the other answers here properly handle these simple hashes:
people = { person1: { display: { first: 'John' } } }
creds = { oauth: { trust: true }, basic: { trust: false } }
The method below works with those hashes, modifying the input hash rather than returning a new object.
def add_indifferent_access!(hash)
hash.each_pair do |k, v|
hash.instance_variable_set("##{k}", v.tap { |v| send(__method__, v) if v.is_a?(Hash) } )
hash.define_singleton_method(k, proc { hash.instance_variable_get("##{k}") } )
end
end
then
add_indifferent_access!(people)
people.person1.display.first # => 'John'
Or if your context calls for a more inline call structure:
creds.yield_self(&method(:add_indifferent_access!)).oauth.trust # => true
Alternatively, you could mix it in:
module HashExtension
def very_indifferent_access!
each_pair do |k, v|
instance_variable_set("##{k}", v.tap { |v| v.extend(HashExtension) && v.send(__method__) if v.is_a?(Hash) } )
define_singleton_method(k, proc { self.instance_variable_get("##{k}") } )
end
end
end
and apply to individual hashes:
favs = { song1: { title: 'John and Marsha', author: 'Stan Freberg' } }
favs.extend(HashExtension).very_indifferent_access!
favs.song1.title
Here is a variation for monkey-patching Hash, should you opt to do so:
class Hash
def with_very_indifferent_access!
each_pair do |k, v|
instance_variable_set("##{k}", v.tap { |v| v.send(__method__) if v.is_a?(Hash) } )
define_singleton_method(k, proc { instance_variable_get("##{k}") } )
end
end
end
# Note the omission of "v.extend(HashExtension)" vs. the mix-in variation.
Comments to other answers expressed a desire to retain class types. This solution accommodates that.
people = { person1: { created_at: Time.now } }
people.with_very_indifferent_access!
people.person1.created_at.class # => Time
Whatever solution you choose, I recommend testing with this hash:
people = { person1: { display: { first: 'John' } }, person2: { display: { last: 'Jingleheimer' } } }
If you are ok with monkey-patching the Hash class, you can do:
require 'ostruct'
module Structurizable
def each_pair(&block)
each do |k, v|
v = OpenStruct.new(v) if v.is_a? Hash
yield k, v
end
end
end
Hash.prepend Structurizable
people = { person1: { display: { first: 'John' } }, person2: { display: { last: 'Jingleheimer' } } }
puts OpenStruct.new(people).person1.display.first
Ideally, instead of pretending this, we should be able to use a Refinement, but for some reason I can't understand it didn't worked for the each_pair method (also, unfortunately Refinements are still pretty limited)

Retrieving from an array an object that satisfies some characteristics

I have some objects in an array objects. Given a certain property-value pair, I need a function that returns the first object that matches this. For example, given objects.byName "John", it should return the first object with name: "John".
Currently I'm doing this:
def self.byName name
ID_obj_by_name = {}
##objects.each_with_index do |o, index|
ID_obj_by_name[o.name] = index
end
##objects[ID_obj_by_name[name]]
end
But it seems very slow, and is using a lot of memory. How can I improve this?
If you need performance, you should consider this approach:
require 'benchmark'
class Foo
def initialize(name)
#name = name
end
def name
#name
end
end
# Using array ######################################################################
test = []
500000.times do |i|
test << Foo.new("ABC" + i.to_s + "#!###!DS")
end
puts "using array"
time = Benchmark.measure {
result = test.find { |o| o.name == "ABC250000#!###!DS" }
}
puts time
####################################################################################
# Using a hash #####################################################################
test = {}
i_am_your_object = Object.new
500000.times do |i|
test["ABC" + i.to_s + "#!###!DS"] = i_am_your_object
end
puts "using hash"
time = Benchmark.measure {
result = test["ABC250000#!###!DS"]
}
puts time
####################################################################################
Results:
using array
0.060000 0.000000 0.060000 ( 0.060884)
using hash
0.000000 0.000000 0.000000 ( 0.000005)
Try something like
def self.by_name name
##objects.find { |o| o.name == name }
end

Get a list of all the prefixes of a string

is there any inbuilt function in the Ruby String class that can give me all the prefixes of a string in Ruby. Something like:
"ruby".all_prefixes => ["ruby", "rub", "ru", "r"]
Currently I have made a custom function for this:
def all_prefixes search_string
dup_string = search_string.dup
return_list = []
while(dup_string.length != 0)
return_list << dup_string.dup
dup_string.chop!
end
return_list
end
But I am looking for something more rubylike, less code and something magical.
Note: of course it goes without saying original_string should remain as it is.
No, there is no built-in method for this. You could do it like this:
def all_prefixes(string)
string.size.times.collect { |i| string[0..i] }
end
all_prefixes('ruby')
# => ["r", "ru", "rub", "ruby"]
A quick benchmark:
require 'fruity'
string = 'ruby'
compare do
toro2k do
string.size.times.collect { |i| string[0..i] }
end
marek_lipka do
(0...(string.length)).map{ |i| string[0..i] }
end
jorg_w_mittag do
string.chars.inject([[], '']) { |(res, memo), c|
[res << memo += c, memo]
}.first
end
jorg_w_mittag_2 do
acc = ''
string.chars.map {|c| acc += c }
end
stefan do
Array.new(string.size) { |i| string[0..i] }
end
end
And the winner is:
Running each test 512 times. Test will take about 1 second.
jorg_w_mittag_2 is faster than stefan by 19.999999999999996% ± 10.0%
stefan is faster than marek_lipka by 10.000000000000009% ± 10.0%
marek_lipka is faster than jorg_w_mittag by 10.000000000000009% ± 1.0%
jorg_w_mittag is similar to toro2k
def all_prefixes(str)
acc = ''
str.chars.map {|c| acc += c }
end
What about
str = "ruby"
prefixes = Array.new(str.size) { |i| str[0..i] } #=> ["r", "ru", "rub", "ruby"]
This is maybe a long shot, but if you want to find distinct abbreviations for a set of strings, you can use the Abbrev module:
require 'abbrev'
Abbrev.abbrev(['ruby']).keys
=> ["rub", "ru", "r", "ruby"]
A little bit shorter form:
def all_prefixes(search_string)
(0...(search_string.length)).map{ |i| search_string[0..i] }
end
all_prefixes 'ruby'
# => ["r", "ru", "rub", "ruby"]
def all_prefixes(str)
str.chars.inject([[], '']) {|(res, memo), c| [res << memo += c, memo] }.first
end
str = "ruby"
prefixes = str.size.times.map { |i| str[0..i] } #=> ["r", "ru", "rub", "ruby"]
Two not mentioned before and faster than those in the #toro2k's accepted comparison answer.
(1..s.size).map { |i| s[0, i] }
=> ["r", "ru", "rub", "ruby"]
Array.new(s.size) { |i| s[0, i+1] }
=> ["r", "ru", "rub", "ruby"]
Strangely, nobody used String#[start, length] before, only the slower String#[range].
And I think at least my first solution is quite straightforward.
Benchmark results (using Ruby 2.4.2):
user system total real
toro2k 14.594000 0.000000 14.594000 ( 14.724630)
marek_lipka 12.485000 0.000000 12.485000 ( 12.635404)
jorg_w_mittag 16.968000 0.000000 16.968000 ( 17.080315)
jorg_w_mittag_2 11.828000 0.000000 11.828000 ( 11.935078)
stefan 10.766000 0.000000 10.766000 ( 10.831517)
stefanpochmann 9.734000 0.000000 9.734000 ( 9.765227)
stefanpochmann 2 8.219000 0.000000 8.219000 ( 8.240854)
My benchmark code:
require 'benchmark'
string = 'ruby'
#n = 10**7
Benchmark.bm(20) do |x|
#x = x
def report(name, &block)
#x.report(name) {
#n.times(&block)
}
end
report('toro2k') {
string.size.times.collect { |i| string[0..i] }
}
report('marek_lipka') {
(0...(string.length)).map{ |i| string[0..i] }
}
report('jorg_w_mittag') {
string.chars.inject([[], '']) { |(res, memo), c|
[res << memo += c, memo]
}.first
}
report('jorg_w_mittag_2') {
acc = ''
string.chars.map {|c| acc += c }
}
report('stefan') {
Array.new(string.size) { |i| string[0..i] }
}
report('stefanpochmann') {
(1..string.size).map { |i| string[0, i] }
}
report('stefanpochmann 2') {
Array.new(string.size) { |i| string[0, i+1] }
}
end

ruby string to hash conversion

I have a string like this,
str = "uu#p, xx#m, yy#n, zz#m"
I want to know how to convert the given string into a hash. (i.e my actual requirement is, how many values (before the # symbol) have the m, n and p. I don't want the counting, I need an exact value). The output would be better like this,
{"m" => ["xx", "zz"], "n" => ["yy"], "p" => ["uu"]}
Can help me anyone, please?
Direct copy/past of an IRB session:
>> str.split(/, /).inject(Hash.new{|h,k|h[k]=[]}) do |h, s|
.. v,k = s.split(/#/)
.. h[k] << v
.. h
.. end
=> {"p"=>["uu"], "m"=>["xx", "zz"], "n"=>["yy"]}
Simpler code for a newbie :)
str = "uu#p, xx#m, yy#n, zz#m"
h = {}
str.split(",").each do |x|
v,k = x.split('#')
h[k] ||= []
h[k].push(v)
end
p h
FP style:
grouped = str
.split(", ")
.group_by { |s| s.split("#")[1] }
.transform_values { |ss| ss.map { |x| s.split("#")[0] } }
#=> {"m"=>["xx", "zz"], "n"=>["yy"], "p"=>["uu"]}
This is a pretty common pattern. Using Facets.map_by:
require 'facets'
str.split(", ").map_by { |s| s.split("#", 2).reverse }
#=> {"m"=>["xx", "zz"], "n"=>["yy"], "p"=>["uu"]}

ruby fast reading from std

What is the fastest way to read from STDIN a number of 1000000 characters (integers), and split it into an array of one character integers (not strings) ?
123456 > [1,2,3,4,5,6]
The quickest method I have found so far is as follows :-
gets.unpack("c*").map { |c| c-48}
Here are some results from benchmarking most of the provided solutions. These tests were run with a 100,000 digit file but with 10 reps for each test.
user system total real
each_char_full_array: 1.780000 0.010000 1.790000 ( 1.788893)
each_char_empty_array: 1.560000 0.010000 1.570000 ( 1.572162)
map_byte: 0.760000 0.010000 0.770000 ( 0.773848)
gets_scan 2.220000 0.030000 2.250000 ( 2.250076)
unpack: 0.510000 0.020000 0.530000 ( 0.529376)
And here is the code that produced them
#!/usr/bin/env ruby
require "benchmark"
MAX_ITERATIONS = 100000
FILE_NAME = "1_million_digits"
def build_test_file
File.open(FILE_NAME, "w") do |f|
MAX_ITERATIONS.times {|x| f.syswrite rand(10)}
end
end
def each_char_empty_array
STDIN.reopen(FILE_NAME)
a = []
STDIN.each_char do |c|
a << c.to_i
end
a
end
def each_char_full_array
STDIN.reopen(FILE_NAME)
a = Array.new(MAX_ITERATIONS)
idx = 0
STDIN.each_char do |c|
a[idx] = c.to_i
idx += 1
end
a
end
def map_byte()
STDIN.reopen(FILE_NAME)
a = STDIN.bytes.map { |c| c-48 }
a[-1] == -38 && a.pop
a
end
def gets_scan
STDIN.reopen(FILE_NAME)
gets.scan(/\d/).map(&:to_i)
end
def unpack
STDIN.reopen(FILE_NAME)
gets.unpack("c*").map { |c| c-48}
end
reps = 10
build_test_file
Benchmark.bm(10) do |x|
x.report("each_char_full_array: ") { reps.times {|y| each_char_full_array}}
x.report("each_char_empty_array:") { reps.times {|y| each_char_empty_array}}
x.report("map_byte: ") { reps.times {|y| map_byte}}
x.report("gets_scan ") { reps.times {|y| gets_scan}}
x.report("unpack: ") { reps.times {|y| unpack}}
end
This should be reasonably fast:
a = []
STDIN.each_char do |c|
a << c.to_i
end
although some rough benchmarking shows this hackish version is considerably faster:
a = STDIN.bytes.map { |c| c-48 }
scan(/\d/).map(&:to_i)
This will split any string into an array of integers, ignoring any non-numeric characters. If you want to grab user input from STDIN add gets:
gets.scan(/\d/).map(&:to_i)

Resources