gsub method and regex (case sensitive and case insensitive) - ruby

In ruby, I want to substitute some letters in a string, is there a better way of doing this?
string = "my random string"
string.gsub(/a/, "#").gsub(/i/, "1").gsub(/o/, "0")`
And if I want to substitute both "a" and "A" with a "#", I know I can do .gsub(/a/i, "#"), but what if I want to substitute every "a" with an "e" and every "A" with an "E"? Is there a way of abstracting it instead of specifying both like .gsub(/a/, "e").gsub(/A/, "E")?

You can use a Hash. eg:
h = {'a' => '#', 'b' => 'B', 'A' => 'E'}
"aAbBcC".gsub(/[abA]/, h)
# => "#EBBcC"

Not really an answer to your question, but more an other way to proceed: use the translation:
'aaAA'.tr('aA', 'eE')
# => 'eeEE'
For the same transformation, you can also use the ascii table:
'aaAA'.gsub(/a/i) {|c| (c.ord + 4).chr}
# => 'eeEE'
other example (the last character is used by default):
'aAaabbXXX'.tr('baA', 'B#')
# => '####BBXXX'

Here are two variants of #Santosh's answer:
str ="aAbBcC"
h = {'a' => '#', 'b' => 'B', 'A' => 'E'}
#1
str.gsub(/[#{h.keys.join}]/, h) #=> "#EBBcC"
#2
h.default_proc = ->(_,k) { k }
str.gsub(/./, h) #=> "#EBBcC"
These offer better maintainability should h could change in future

You can also pass gsub a block
"my random string".gsub(/[aoi]/) do |match|
case match; when "a"; "#"; when "o"; "0"; when "i"; "I" end
end
# => "my r#nd0m strIng"
The use of a hash is of course much more elegant in this case, but if you have complex rules of substitution it can come in handy to dedicate a class to it.
"my random string".gsub(/[aoi]/) {|match| Substitute.new(match).result}
# => "my raws0m strAINng"

Related

Convert array string objects into hashes?

I have an array:
a = ["us.production => 1", "us.stats => 1", "us.stats.total_active => 1", "us.stats.inactive => 0"]
How can I modify it into a hash object? e.g.:
h = {"us.production" => 1, "us.stats" => 1, "us.stats.total_active" => 1, "us.stats.inactive" => 0}
Thank you,
If pattern you are having is proper and constant, you can try following,
h = a.map { |x| x.split(' => ') }.to_h
# => {"us.production"=>"1", "us.stats"=>"1", "us.stats.total_active"=>"1", "us.stats.inactive"=>"0"}
Instead it is better to use split(/\s*=>\s*/) instead of split(' => ')
You can split every string with String#split and then convert an array of pairs into a hash with Array#to_h:
a = ["us.production => 1", "us.stats => 1", "us.stats.total_active => 1", "us.stats.inactive => 0"]
pairs = a.map{|s| s.split(/\s*=>\s*/)}
# => [["us.production", "1"], ["us.stats", "1"], ["us.stats.total_active", "1"], ["us.stats.inactive", "0"]]
pairs.to_h
# => {"us.production"=>"1", "us.stats"=>"1", "us.stats.total_active"=>"1", "us.stats.inactive"=>"0"}
/\s*=>\s*/ is a regular expression that matches first any number of whitespaces with \s*, then => then again any nuber of whitespaces. As it's a String#split delimiter, this part of string won't be present in a string pair.
The other answers to date are incorrect because they leave the values as strings, whereas the spec is that they be integers. That can be easily corrected. One way is to change s.split(/\s*=>\s*/) in #mrzasa's answer to k,v = s.split(/\s*=>\s*/); [k,v.to_i]. Another way is to tack .transform_values(&:to_i) to the ends of the expressions given in those answers. I expect the authors of those answers either didn't notice that integers were required or intended to leave it as an exercise for the OP to do the (rather uninteresting) conversion.
To make a single pass through the array and avoid the creation of a temporary array and local variables (other than block variables), I suggest using Enumerable#each_with_object (rather than map and to_h), and use regular expressions to extract both keys and values (rather than using String#split):
a = ["us.production => 1", "us.stats=>1", "us.stats.total_active => 1"]
a.each_with_object({}) { |s,h| h[s[/.*[^ ](?= *=>)/]] = s[/\d+\z/].to_i }
#=> {"us.production"=>1, "us.stats"=>1, "us.stats.total_active"=>1}
The first regular expression reads, "match zero or more characters (.*) followed by a character that is not a space ([^ ]), provided that is followed by zero or more spaces (*) followed by the string "=>". (?= *=>) is a positive lookahead.
The second regular expression reads, "match one or more digits (\d+) at the end of the string (the anchor \z). If that string could represent a negative integer, change that regex to /-?\d+\z/ (? makes the minus sign optional).

Replace words in a string with the words defined as values in a hash

The aim is to replace specific words in a string by the values defined in the dictionary.
dictionary =
{"Hello" => "hi",
"to, two, too" => "2",
"for, four" => "4",
"be" => "b",
"you" => "u",
"at" => "#",
"and" => "&"
}
def word_substituter(tweet)
tweet_array = tweet.split(',') ##converting the string to array
tweet_array.each do |word|
if word === dictionary.keys ##if the words of array are equal to the keys of the dictionary
word == dictionary.values ##then now the words are now the the values of the dictionary
puts word
end
end
word.join(", ")
end
word_substituter("Hey guys, can anyone teach me how to be cool? I really want to be the best at everything, you know what I mean? Tweeting is super fun you guys!!!!")
I would appreciate the help. Could you explain it?
Naive words enumeration
DICTIONARY = {
"Hello" => "hi",
"to, two, too" => "2",
"for, four" => "4",
"be" => "b",
"you" => "u",
"at" => "#",
"and" => "&"
}.freeze
def word_substituter(tweet)
dict = {}
DICTIONARY.keys.map { |k| k.split(', ') }.flatten.each do |w|
DICTIONARY.each { |k, v| dict.merge!(w => v) if k.include?(w) }
end
tweet.split(' ').map do |s|
dict.each { |k, v| s.sub!(/#{k}/, v) if s =~ /\A#{k}[[:punct:]]*\z/ }
s
end.join(' ')
end
word_substituter("Hey guys, I'm Robe too. Can anyone teach me how to be cool? I really want to be the best at everything, you know what I mean? Tweeting is super fun you guys!!!!")
# => "Hey guys, I'm Robe 2. Can anyone teach me how 2 b cool? I really want 2 b the best # everything, u know what I mean? Tweeting is super fun u guys!!!!"
I feel like this provides a fairly simple solution to this:
DICTIONARY = {
"Hello" => "hi",
"to, two, too" => "2",
"for, four" => "4",
"be" => "b",
"you" => "u",
"at" => "#",
"and" => "&"
}.freeze
def word_substituter(tweet)
tweet.dup.tap do |t|
DICTIONARY.each { |key, replacement| t.gsub!(/\b(#{key.split(', ').join('|')})\b/, replacement) }
end
end
word_substituter("Hey guys, can anyone teach me how to be cool? I really want to be the best at everything, you know what I mean? Tweeting is super fun you guys!!!!")
# => "Hey guys, can anyone teach me how 2 b cool? I really want 2 b the best # everything, u know what I mean? Tweeting is super fun u guys!!!!"
Breaking it down into steps:
the method takes a tweet and creates a copy
it passes this to tap so it's returned from the method call
it iterates through the DICTIONARY
transforms the key into a regex matcher (e.g. /\b(to|two|too)\b/)
passes this to gsub! and replaces any matches with the replacement
If you want this to replace occurrence within words (e.g. what => wh#), you can remove the checks for word boundaries (\b) in the regex.
One gotcha is that if any of your dictionary's keys contain regex matchers, this would need a little rework: if you had "goodbye." => 'later xxx', the dot would match any character and include it in what's replaced. Regexp.escape is your friend here.
Hope this helps - let me know how you get on.

Scanning through a hash and return a value if true

Based on my hash, I want to match it if it's in the string:
def conv
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60 }
str.match( Regexp.union( hash.keys.to_s ) )
end
puts conv # => <blank>
The above does not work but this only matches "one":
str.match( Regexp.union( hash[0].to_s ) )
Edited:
Any idea how to match "one", "two" and sixty in the string exactly?
If my string has "sixt" it return "6" and that should not happen based on #Cary's answer.
You need to convert each element of hash.keys to a string, rather than converting the array hash.keys to a string, and you should use String#scan rather than String#match. You may also need to play around with the regex until it returns everyhing you want and nothing you don't want.
Let's first look at your example:
str = "I only have one, two or maybe sixty"
hash = {:one => 1, :two => 2, :six => 6, :sixty => 60}
We might consider constructing the regex with word breaks (\b) before and after each word we wish to match:
r0 = Regexp.union(hash.keys.map { |k| /\b#{k.to_s}\b/ })
#=> /(?-mix:\bone\b)|(?-mix:\btwo\b)|(?-mix:\bsix\b)|(?-mix:\bsixty\b)/
str.scan(r0)
#=> ["one", "two", "sixty"]
Without the word breaks, scan would return ["one", "two", "six"], as "sixty" in str would match "six". (Word breaks are zero-width. One before a string requires that the string be preceded by a non-word character or be at the beginning of the string. One after a string requires that the string be followed by a non-word character or be at the end of the string.)
Depending on your requirements, word breaks may not be sufficient or suitable. Suppose, for example (with hash above):
str = "I only have one, two, twenty-one or maybe sixty"
and we do not wish to match "twenty-one". However,
str.scan(r0)
#=> ["one", "two", "one", "sixty"]
One option would be to use a regex that demands that matches be preceded by whitespace or be at the beginning of the string, and be followed by whitespace or be at the end of the string:
r1 = Regexp.union(hash.keys.map { |k| /(?<=^|\s)#{k.to_s}(?=\s|$)/ })
str.scan(r1)
#=> ["sixty"]
(?<=^|\s) is a positive lookbehind; (?=\s|$) is a positive lookahead.
Well, that avoided the match of "twenty-one" (good), but we no longer matched "one" or "two" (bad) because of the comma following each of those words in the string.
Perhaps the solution here is to first remove punctuation, which allows us to then apply either of the above regexes:
str.tr('.,?!:;-','')
#=> "I only have one two twentyone or maybe sixty"
str.tr('.,?!:;-','').scan(r0)
#=> ["one", "two", "sixty"]
str.tr('.,?!:;-','').scan(r1)
#=> ["one", "two", "sixty"]
You may also want to change / at the end of the regex to /i to make the match insensitive to case.1
1 Historical note for readers who want to know why 'a' is called lower case and 'A' is called upper case.

How to do named capture in ruby

I want to name the capture of string that I get from scan. How to do it?
"555-333-7777".scan(/(\d{3})-(\d{3})-(\d{4})/).flatten #=> ["555", "333", "7777"]
Is it possible to turn it into like this
{:area => "555", :city => "333", :local => "7777" }
or
[["555","area"], [...]]
I tried
"555-333-7777".scan(/((?<area>)\d{3})-(\d{3})-(\d{4})/).flatten
but it returns
[]
You should use match with named captures, not scan
m = "555-333-7777".match(/(?<area>\d{3})-(?<city>\d{3})-(?<number>\d{4})/)
m # => #<MatchData "555-333-7777" area:"555" city:"333" number:"7777">
m[:area] # => "555"
m[:city] # => "333"
If you want an actual hash, you can use something like this:
m.names.zip(m.captures).to_h # => {"area"=>"555", "city"=>"333", "number"=>"7777"}
Or this (ruby 2.4 or later)
m.named_captures # => {"area"=>"555", "city"=>"333", "number"=>"7777"}
Something like this?
"555-333-7777" =~ /^(?<area>\d+)\-(?<city>\d+)\-(?<local>\d+)$/
Hash[$~.names.collect{|x| [x.to_sym, $~[x]]}]
=> {:area=>"555", :city=>"333", :local=>"7777"}
Bonus version:
Hash[[:area, :city, :local].zip("555-333-7777".split("-"))]
=> {:area=>"555", :city=>"333", :local=>"7777"}
In case you don't really need the hash, but just local variables:
if /(?<area>\d{3})-(?<city>\d{3})-(?<number>\d{4})/ =~ "555-333-7777"
puts area
puts city
puts number
end
How does it work?
You need to use =~ regex operator.
The regex (sadly) needs to be on the left. It doesn't work if you use string =~ regex.
Otherwise it is the same syntax ?<var> as with named_captures.
It is supported in Ruby 1.9.3!
Official documentation:
When named capture groups are used with a literal regexp on the
left-hand side of an expression and the =~ operator, the captured text
is also assigned to local variables with corresponding names.
A way to turn capture group names and their values into a hash is to use a regex with named captures using (?<capture_name> and then access the %~ global "last match" variable.
regex_with_named_capture_groups = %r'(?<area>\d{3})-(?<city>\d{3})-(?<local>\d{4})'
"555-333-7777"[regex_with_named_capture_groups]
match_hash = $~.names.inject({}){|mem, capture| mem[capture] = $~[capture]; mem}
# => {"area"=>"555", "city"=>"333", "local"=>"7777"}
# If ActiveSupport is available
match_hash.symbolize_keys!
# => {area: "555", city: "333", local: "7777"}
This alternative also works:
regex = /^(?<area>\d+)\-(?<city>\d+)\-(?<local>\d+)$/
m = "555-333-7777".match regex
m.named_captures
=> {"area"=>"555", "city"=>"333", "local"=>"7777"}
There are a LOT of ways to create named captures, many of which have been mentioned already. For the record though, we could have even used the originally posted code along with Multiple Assignment like so:
a, b, c = "555-333-7777".scan(/(\d{3})-(\d{3})-(\d{4})/).flatten
hash = {area: a, city: b, local: c}
#=> {:area=>"555", :city=>"333", :local=>"7777"}
OR
hash = {}
hash[:area], hash[:city], hash[:local] = "555-333-7777".scan(/(\d{3})-(\d{3})-(\d{4})/).flatten
hash
#=> {:area=>"555", :city=>"333", :local=>"7777"}
OR along with zip and optionally to_h:
[:area, :city, :local].zip "555-333-7777".scan(/(\d{3})-(\d{3})-(\d{4})/).flatten
#=> [[:area, "555"], [:city, "333"], [:local, "7777"]]
([:area, :city, :local].zip "555-333-7777".scan(/(\d{3})-(\d{3})-(\d{4})/).flatten).to_h
#=> {:area=>"555", :city=>"333", :local=>"7777"}

Ruby multiple string replacement

str = "Hello☺ World☹"
Expected output is:
"Hello:) World:("
I can do this: str.gsub("☺", ":)").gsub("☹", ":(")
Is there any other way so that I can do this in a single function call?. Something like:
str.gsub(['s1', 's2'], ['r1', 'r2'])
Since Ruby 1.9.2, String#gsub accepts hash as a second parameter for replacement with matched keys. You can use a regular expression to match the substring that needs to be replaced and pass hash for values to be replaced.
Like this:
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
'(0) 123-123.123'.gsub(/[()-,. ]/, '') #=> "0123123123"
In Ruby 1.8.7, you would achieve the same with a block:
dict = { 'e' => 3, 'o' => '*' }
'hello'.gsub /[eo]/ do |match|
dict[match.to_s]
end #=> "h3ll*"
Set up a mapping table:
map = {'☺' => ':)', '☹' => ':(' }
Then build a regex:
re = Regexp.new(map.keys.map { |x| Regexp.escape(x) }.join('|'))
And finally, gsub:
s = str.gsub(re, map)
If you're stuck in 1.8 land, then:
s = str.gsub(re) { |m| map[m] }
You need the Regexp.escape in there in case anything you want to replace has a special meaning within a regex. Or, thanks to steenslag, you could use:
re = Regexp.union(map.keys)
and the quoting will be take care of for you.
You could do something like this:
replacements = [ ["☺", ":)"], ["☹", ":("] ]
replacements.each {|replacement| str.gsub!(replacement[0], replacement[1])}
There may be a more efficient solution, but this at least makes the code a bit cleaner
Late to the party but if you wanted to replace certain chars with one, you could use a regex
string_to_replace.gsub(/_|,| /, '-')
In this example, gsub is replacing underscores(_), commas (,) or ( ) with a dash (-)
Another simple way, and yet easy to read is the following:
str = '12 ene 2013'
map = {'ene' => 'jan', 'abr'=>'apr', 'dic'=>'dec'}
map.each {|k,v| str.sub!(k,v)}
puts str # '12 jan 2013'
You can also use tr to replace multiple characters in a string at once,
Eg., replace "h" to "m" and "l" to "t"
"hello".tr("hl", "mt")
=> "metto"
looks simple, neat and faster (not much difference though) than gsub
puts Benchmark.measure {"hello".tr("hl", "mt") }
0.000000 0.000000 0.000000 ( 0.000007)
puts Benchmark.measure{"hello".gsub(/[hl]/, 'h' => 'm', 'l' => 't') }
0.000000 0.000000 0.000000 ( 0.000021)
Riffing on naren's answer above, I'd go with
tr = {'a' => '1', 'b' => '2', 'z' => '26'}
mystring.gsub(/[#{tr.keys}]/, tr)
So
'zebraazzeebra'.gsub(/[#{tr.keys}]/, tr) returns
"26e2r112626ee2r1"

Resources