Ruby string char chunking - ruby

I have a string "wwwggfffw" and want to break it up into an array as follows:
["www", "gg", "fff", "w"]
Is there a way to do this with regex?

"wwwggfffw".scan(/((.)\2*)/).map(&:first)
scan is a little funny, as it will return either the match or the subgroups depending on whether there are subgroups; we need to use subgroups to ensure repetition of the same character ((.)\1), but we'd prefer it if it returned the whole match and not just the repeated letter. So we need to make the whole match into a subgroup so it will be captured, and in the end we need to extract just the match (without the other subgroup), which we do with .map(&:first).
EDIT to explain the regexp ((.)\2*) itself:
( start group #1, consisting of
( start group #2, consisting of
. any one character
) and nothing else
\2 followed by the content of the group #2
* repeated any number of times (including zero)
) and nothing else.
So in wwwggfffw, (.) captures w into group #2; then \2* captures any additional number of w. This makes group #1 capture www.

You can use back references, something like
'wwwggfffw'.scan(/((.)\2*)/).map{ |s| s[0] }
will work

Here's one that's not using regex but works well:
def chunk(str)
chars = str.chars
chars.inject([chars.shift]) do |arr, char|
if arr[-1].include?(char)
arr[-1] << char
else
arr << char
end
arr
end
end
In my benchmarks it's faster than the regex answers here (with the example string you gave, at least).

Another non-regex solution, this one using Enumerable#slice_when, which made its debut in Ruby v.2.2:
str.each_char.slice_when { |a,b| a!=b }.map(&:join)
#=> ["www", "gg", "fff", "w"]
Another option is:
str.scan(Regexp.new(str.squeeze.each_char.map { |c| "(#{c}+)" }.join)).first
#=> ["www", "gg", "fff", "w"]
Here the steps are as follows
s = str.squeeze
#=> "wgfw"
a = s.each_char
#=> #<Enumerator: "wgfw":each_char>
This enumerator generates the following elements:
a.to_a
#=> ["w", "g", "f", "w"]
Continuing
b = a.map { |c| "(#{c}+)" }
#=> ["(w+)", "(g+)", "(f+)", "(w+)"]
c = b.join
#=> "(w+)(g+)(f+)(w+)"
r = Regexp.new(c)
#=> /(w+)(g+)(f+)(w+)/
d = str.scan(r)
#=> [["www", "gg", "fff", "w"]]
d.first
#=> ["www", "gg", "fff", "w"]

Here's one more way of doing it without a regex:
'wwwggfffw'.chars.chunk(&:itself).map{ |s| s[1].join }
# => ["www", "gg", "fff", "w"]

Related

Is there a way to reverse individual scrambled words in a string without changing the words' order in Ruby?

I'm trying to reverse a string without using the built-in reverse method to get something like this:
input: "hello, world"
output: "world hello,"
I've been able to reverse the string to "dlrow ,olleh" so the words are the in the order they should be, but I'm stuck on how to reverse the individual words.
Suppose
str = "Three blind mice"
Here are three ways you could obtain the desired result, "mice blind Three", without using the method Array#reverse.
Split string, add index, use Enumerable#sort_by to sort array by index, join words
str.split.each_with_index.sort_by { |_,i| -i }.map(&:first).join(' ')
The steps are as follows.
a = str.split
#=> ["Three", "blind", "mice"]
enum = a.each_with_index
#=> #<Enumerator: ["Three", "blind", "mice"]:each_with_index>
b = enum.sort_by { |_,i| -i }
#=> [["mice", 2], ["blind", 1], ["Three", 0]]
c = b.map(&:first)
#=> ["mice", "blind", "Three"]
c.join(' ')
#=> "mice blind Three"
We can see the elements that will be generated by enum and passed to sort_by by converting enum to an array:
enum.to_a
#=> [["Three", 0], ["blind", 1], ["mice", 2]]
A disadvantage of this method is that it sorts an array, which is a relatively expensive operation. The next two approaches do not share that weakness.
Split string, use Array#values_at to extract words by index, highest to lowest join words
arr = str.split
arr.values_at(*(arr.size-1).downto(0).to_a).join(' ')
The steps are as follows.
arr = str.split
#=> ["Three", "blind", "mice"]
a = arr.size-1
#=> 2
b = a.downto(0).to_a
#=> [2, 1, 0]
c = arr.values_at(*b)
#=> ["mice", "blind", "Three"]
c.join(' ')
#=> "mice blind Three"
Use String#gsub to create an enumerator, chain to Enumerator#with_object, build string
str.gsub(/\w+/).with_object('') { |word,s|
s.prepend(s.empty? ? word : word + ' ') }
The steps are as follows.
enum1 = str.gsub(/\w+/)
#=> #<Enumerator: "Three blind mice":gsub(/\w+/)>
enum2 = enum1.with_object('')
#=> #<Enumerator: #<Enumerator: "Three blind mice":
# gsub(/\w+/)>:with_object("")>
enum2.each { |word,s| s.prepend(s.empty? ? word : word + ' ') }
#=> "mice blind Three"
When String#gsub is called on str without a block it returns an enumerator (see doc). The enumerator generates, and passes to with_object, matches of its argument, /\w+/; that is, words. At this point gsub no longer performs character replacement. When called without a block it is convenient to think of gsub as being named each_match. We can see the values that enum1 generates by converting it to an array (or execute Enumerable#entries on enum1):
enum1.to_a
#=> ["Three", "blind", "mice"]
Though Ruby has no such concept, it may be helpful to think of enum2 as a compound enumerator (study the return value for enum2 = enum1.with_object('') above). It will generate the following values, which is will pass to Enumerator#each:
enum2.to_a
#=> [["Three", ""], ["blind", ""], ["mice", ""]]
The second value of each of these elements is the initial value of the string that will be built and returned by each.
Let's now look at the first element element being generated by enum2 and passed to the block:
word, s = enum2.next
#=> ["Three", ""]
This first step is called Array decomposition.
word
#=> "Three"
s #=> ""
The block calculation is then as follows.
s.empty?
#=> true
t = word
#=> "Three"
s.prepend(t)
#=> "Three"
s #=> "Three"
Now the second element is generated by enum2 and passed to the block
word, s = enum2.next
#=> ["blind", "Three"]
word
#=> "blind"
s #=> "Three"
s.empty?
#=> false
t = word + ' '
#=> "blind "
s.prepend(t)
#=> "blind Three"
Notice that the value of second element of the array returned by enum2.next, the current value of s, has been updated to "Three".
The processing of the third and final element generated by enum2 (["mice", "blind Three"]) is similar, resulting in the block returning the value of s, "mice blind Three".

How do I go backwards a letter?

Using next, I created a method that encrypts a password by advancing every letter of a string one letter forward:
def encryptor
puts "Give me your password!"
password = gets.chomp
index = 0
while index < password.length
password[index] = password[index].next!
index +=1
end
puts password
end
encryptor
I have to create a decrypt method that undoes that. In the end, this should be cleared:
encrypt("abc") should return "bcd"
encrypt("zed") should return "afe"
decrypt("bcd") should return "abc"
decrypt("afe") should return "zed"
I see that Ruby does not have a method to go backwards. I'm stuck with reversing letters. I tried to add an alphabet to index within the method, but I can't get it to do it.
Any help in the right direction would be greatly appreciated.
I know that you can use .next to advance in a string.
Well, kind of, but there are special cases you have to be aware of:
'z'.next #=> 'aa'
I did this successfully
Not quite, your encryptor maps "xyz" to "yzab".
I see that Ruby does not have this option to just go backwards.
Take this example:
'9'.next #=> '10'
'09'.next #=> '10'
As you can see, the mapping is not injective. Both, '9' and '09' are mapped to '10'. Because of this, there is no String#pred – what should '10'.pred return?
Now I'm completely stuck with reversing it a letter.
You could use tr: (both, for encryption and decryption)
'abc'.tr('abcdefghijklmnopqrstuvwxyz', 'zabcdefghijklmnopqrstuvwxy')
#=> 'zab'
tr also has a c1-c2 notation for character ranges, so it can be shortened to:
'abc'.tr('a-z', 'za-y')
#=> 'zab'
Or via Range#to_a, join and rotate:
from = ('a'..'z').to_a.join #=> "abcdefghijklmnopqrstuvwxyz"
to = ('a'..'z').to_a.rotate(-1).join #=> "zabcdefghijklmnopqrstuvwxy"
'abc'.tr(from, to)
#=> "zab"
Another option is to define two alphabets:
from = ('a'..'z').to_a
#=> ["a", "b", "c", ..., "x", "y", "z"]
to = from.rotate(-1)
#=> ["z", "a", "b", ..., "w", "x", "y"]
And create a hash via zip:
hash = from.zip(to).to_h
#=> {"a"=>"z", "b"=>"a", "c"=>"b", ..., "x"=>"w", "y"=>"x", "z"=>"y"}
Which can be passed to gsub:
'abc'.gsub(/[a-z]/, hash)
#=> "zab"
You can also build the regular expression programmatically via Regexp::union:
Regexp.union(hash.keys)
#=> /a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z/
You can use the .next to do this as long as you test for z:
> 'abc'.split("").map { |ch| ch=='z' ? 'a' : ch.next }.join
=> "bcd"
> 'zed'.split("").map { |ch| ch=='z' ? 'a' : ch.next }.join
=> "afe"
Then for decrypt you can do:
> "bcd".split("").map { |ch| ch=='a' ? 'z' : (ch.ord-1).chr }.join
=> "abc"
> "afe".split("").map { |ch| ch=='a' ? 'z' : (ch.ord-1).chr }.join
=> "zed"
Best

Ignoring capture group in Regex that is used for repeating the patten

/((\w)\2)/ finds repeating letters. I was hoping to avoid the two dimensional array that is produced by ignoring the letter matching second capture group like this: /((?:\w)\2)/. It seems that's not possible. Any ideas why?
Rubular example
You don't need any capture groups:
str = [*'a+'..'z+', *'A+'..'Z+', *'0+'..'9+', '_+'].join('|')
#=> "a+|b+| ... |z+|A+|B+| ... |Z+|0+|1+| ... |9+|_+"
"aaabbcddd".scan(/#{str}/)
#=> ["aaa", "bb", "c", "ddd"]
but if you insist on having one:
"aaabbcddd".scan(/(#{str})/).flatten(1)
#=> ["aaa", "bb", "c", "ddd"]
Is this cheating? You did ask if it was possible.
If you mean you're using String#scan, you can post-process the result to return only the first items Enumerable#map:
'helloo'.scan(/((\w)\2)/)
# => [["ll", "l"], ["oo", "o"]]
'helloo'.scan(/((\w)\2)/).map { |m| m[0] }
# => ["ll", "oo"]

Ruby search for word in string

Given input = "helloworld"
The output should be output = ["hello", "world"]
Given I have a method called is_in_dict? which returns true if there's a word given
So far i tried:
ar = []
input.split("").each do |f|
ar << f if is_in_dict? f
// here need to check given char
end
How to achieve it in Ruby?
Instead of splitting the input into characters, you have to inspect all combinations, i.e. "h", "he", "hel", ... "helloworld", "e", "el" , "ell", ... "elloworld" and so on.
Something like this should work:
(0..input.size).to_a.combination(2).each do |a, b|
word = input[a...b]
ar << word if is_in_dict?(word)
end
#=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
ar
#=> ["hello", "world"]
Or, using each_with_object, which returns the array:
(0..input.size).to_a.combination(2).each_with_object([]) do |(a, b), array|
word = input[a...b]
array << word if is_in_dict?(word)
end
#=> ["hello", "world"]
Another approach is to build a custom Enumerator:
class String
def each_combination
return to_enum(:each_combination) unless block_given?
(0..size).to_a.combination(2).each do |a, b|
yield self[a...b]
end
end
end
String#each_combination yields all combinations (instead of just the indices):
input.each_combination.to_a
#=> ["h", "he", "hel", "hell", "hello", "hellow", "hellowo", "hellowor", "helloworl", "helloworld", "e", "el", "ell", "ello", "ellow", "ellowo", "ellowor", "elloworl", "elloworld", "l", "ll", "llo", "llow", "llowo", "llowor", "lloworl", "lloworld", "l", "lo", "low", "lowo", "lowor", "loworl", "loworld", "o", "ow", "owo", "owor", "oworl", "oworld", "w", "wo", "wor", "worl", "world", "o", "or", "orl", "orld", "r", "rl", "rld", "l", "ld", "d"]
It can be used with select to easily filter specific words:
input.each_combination.select { |word| is_in_dict?(word) }
#=> ["hello", "world"]
This seems to be a task for recursion. In short you want to take letters one by one until you get a word which is in dictionary. This however will not guarantee that the result is correct, as the remaining letters may not form a words ('hell' + 'oworld'?). This is what I would do:
def split_words(string)
return [[]] if string == ''
chars = string.chars
word = ''
(1..string.length).map do
word += chars.shift
next unless is_in_dict?(word)
other_splits = split_words(chars.join)
next if other_splits.empty?
other_splits.map {|split| [word] + split }
end.compact.inject([], :+)
end
split_words('helloworld') #=> [['hello', 'world']] No hell!
It will also give you all possible splits, so pages with urls like penisland can be avoided
split_words('penisland') #=> [['pen', 'island'], [<the_other_solution>]]

Ruby: Insert Multiple Values Into String

Suppose we have the string "aaabbbccc" and want to use the String#insert to convert the string to "aaa<strong>bbb</strong>ccc". Is this the best way to insert multiple values into a Ruby string using String#insert or can multiple values simultaneously be added:
string = "aaabbbccc"
opening_tag = '<strong>'
opening_index = 3
closing_tag = '</strong>'
closing_index = 6
string.insert(opening_index, opening_tag)
closing_index = 6 + opening_tag.length # I don't really like this
string.insert(closing_index, closing_tag)
Is there a way to simultaneously insert multiple substrings into a Ruby string so the closing tag does not need to be offset by the length of the first substring that is added? I would like something like this one liner:
string.insert(3 => '<strong>', 6 => '</strong>') # => "aaa<strong>bbb</strong>ccc"
Let's have some fun. How about
class String
def splice h
self.each_char.with_index.inject('') do |accum,(c,i)|
accum + h.fetch(i,'') + c
end
end
end
"aaabbbccc".splice(3=>"<strong>", 6=>"</strong>")
=> "aaa<strong>bbb</strong>ccc"
(you can encapsulate this however you want, I just like messing with built-ins because Ruby lets me)
How about inserting from right to left?
string = "aaabbbccc"
string.insert(6, '</strong>')
string.insert(3, '<strong>')
string # => "aaa<strong>bbb</strong>ccc"
opening_tag = '<strong>'
opening_index = 3
closing_tag = '</strong>'
closing_index = 6
string = "aaabbbccc"
string[opening_index...closing_index] =
opening_tag + string[opening_index...closing_index] + closing_tag
#=> "<strong>bbb</strong>"
string
#=> "aaa<strong>bbb</strong>ccc"
If your string is comprised of three groups of consecutive characters, and you'd like to insert the opening tag between the first two groups and the closing tag between the last two groups, regardless of the size of each group, you could do that like this:
def stuff_tags(str, tag)
str.scan(/((.)\2*)/)
.map(&:first)
.insert( 1, "<#{tag}>")
.insert(-2, "<\/#{tag}>")
.join
end
stuff_tags('aaabbbccc', 'strong') #=> "aaa<strong>bbb</strong>ccc"
stuff_tags('aabbbbcccccc', 'weak') #=> "aa<weak>bbbb</weak>cccccc"
I will explain the regex used by scan, but first would like to show how the calculations proceed for the string 'aaabbbccc':
a = 'aaabbbccc'.scan(/((.)\2*)/)
#=> [["aaa", "a"], ["bbb", "b"], ["ccc", "c"]]
b = a.map(&:first)
#=> ["aaa", "bbb", "ccc"]
c = b.insert( 1, "<strong>")
#=> ["aaa", "<strong>", "bbb", "ccc"]
d = c.insert(-2, "<\/strong>")
#=> ["aaa", "<strong>", "bbb", "</strong>", "ccc"]
d.join
#=> "aaa<strong>bbb</strong>ccc"
We need two capture groups in the regex. The first (having the first left parenthesis) captures the string we want. The second captures the first character, (.). This is needed so that we can require that it be followed by zero or more copies of that character, \2*.
Here's another way this can be done:
def stuff_tags(str, tag)
str.chars.chunk {|c| c}
.map {|_,a| a.join}
.insert( 1, "<#{tag}>")
.insert(-2, "<\/#{tag}>")
.join
end
The calculations of a and b above change to the following:
a = 'aaabbbccc'.chars.chunk {|c| c}
#=> #<Enumerator: #<Enumerator::Generator:0x000001021622d8>:each>
# a.to_a => [["a",["a","a","a"]],["b",["b","b","b"]],["c",["c","c","c"]]]
b = a.map {|_,a| a.join }
#=> ["aaa", "bbb", "ccc"]

Resources