Ruby regex capture groups in gsub - ruby

Say I want to switch each letter in a message with it's place in the reverse alphabet. Why can't I seem to use the captured group and do it in one gsub?
Perhaps someone could explain in general about using captured groups in gsub, can the back references be bare(no ' ')? Can I use #{\1}?
def decode(message)
a = ('a'..'z').to_a
z = a.reverse
message.gsub!(/([[:alpha:]])/, z[a.index('\1')])
end
decode("the quick brown fox")

Remember that arguments to methods are evaluated immediately and the result of that is passed in to the method. If you want to make the substitution adapt to the match:
message.gsub!(/([[:alpha:]])/) { |m| z[a.index($1)] }
That employs a block that gets evaluated for each match.

Using gsub:
Your code was not working because '\1' is not yet being evaluated as its regex match, at the point you desire. This can be solved by using a block, so the match variable is defined:
message.gsub(/[[:alpha:]]/) { |char| z[a.index(char)] }
Using tr:
A more efficient way to solve problems like this, where you are simply "replacing one set of characters with another set", is to instead use String#tr. This could be done as follows:
message.tr(a.join(''), z.join(''))

One way is to manipulate ASCII values.
def code(message)
message.gsub(/[[:alpha:]]/) { |s| (((s < 'a') ? 155 : 219 ) - s.ord).chr }
end
coded = code("the quick brown fox")
#=> "gsv jfrxp yildm ulc"
code(coded)
#=> "the quick brown fox"
Note:
'A'.ord + 'Z'.ord
#=> 155
'a'.ord + 'z'.ord
#=> 219
Another is to use a hash:
a = ('a'..'z').to_a
Ch = a.zip(a.reverse).to_h
#=> {"a"=>"z", "b"=>"y",..., "y"=>"b", "z"=>"a"}
def code(message)
message.gsub(/[[:alpha:]]/, Ch)
end
coded = code("the quick brown fox")
#=> "gsv jfrxp yildm ulc"
code(coded)
#=> "the quick brown fox"

Related

Reversing characters in a string regardless of the number of spaces - Ruby

This question is from codewars
Complete the function that accepts a string parameter, and reverses each word in the string. All spaces in the string should be retained.
Here is my code that only works for a string with single spaces, but I can't seem to figure out how to add/subtract anything to it to make it work for a string with more than one space in-between each word.
def reverse_words(str)
str.split(" ").map(&:reverse!).join(" ")
end
Examples given:
('The quick brown fox jumps over the lazy dog.'), 'ehT kciuq nworb xof spmuj revo eht yzal .god')
('apple'), 'elppa')
('a b c d'), 'a b c d')
('double spaced words'), 'elbuod decaps sdrow')
I think the easiest option to tackle this is by using a regex.
def reverse_words(str)
str
.scan(/(\s*)(\S+)(\s*)/)
.map { |spacer1, word, spacer2| spacer1 + word.reverse + spacer2 }
.join
end
This searches the string for zero or more whitespaces captured by the first group. Followed by one or more non-whitespaces, captured by the second group. Followed by zero or more whitespaces captured in the third group. Mapping over the resulting array we can combine the spacers back with the reversed word and join the whole thing together.
The above results in the following output:
reverse_words('The quick brown fox jumps over the lazy dog.')
#=> "ehT kciuq nworb xof spmuj revo eht yzal .god"
reverse_words('apple')
#=> "elppa"
reverse_words('a b c d')
#=> "a b c d"
reverse_words('double spaced words')
#=> "elbuod decaps sdrow"
reverse_words(' foo bar ')
#=> " oof rab "
References:
String#scan
Array#map
Array#join
Regular expressions in Ruby
Here you go:
irb(main):023:0> 'double spaced words'.split(//).reverse.join
=> "sdrow decaps elbuod"
Pass regexp so String#split does not omit spaces. There are similar examples in docs
Just to play with recursion, even if Johan Wentholt answer is the best so far:
def part(string)
if string.count(" ") > 0
ary = string.partition(/\s{1,}/)
last = ary.pop
ary << part(last)
ary.flatten
else string
end
end
part(string).map(&:reverse).join
Well,
f = " Hello im the world"
ff = f.split #=> ["Hello", "im", "the", "world"]
ff.each do |a|
a.reverse! #=> ["olleH", "mi", "eht", "dlrow"]
end
ff.join! #=> "olleH mi eht dlrow"

Stuck in Abbreviation implementation to ruby string

I want to convert all the words(alphabetic) in the string to their abbreviations like i18n does. In other words I want to change "extraordinary" into "e11y" because there are 11 characters between the first and the last letter in "extraordinary". It works with a single word in the string. But how can I do the same for a multi-word string? And of course if a word is <= 4 there is no point to make an abbreviation from it.
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/, "#{x[0]}#{(x.length-2)}#{x[-1]}")
end
end
Test.assert_equals( Abbreviator.abbreviate("banana"), "b4a", Abbreviator.abbreviate("banana") )
Test.assert_equals( Abbreviator.abbreviate("double-barrel"), "d4e-b4l", Abbreviator.abbreviate("double-barrel") )
Test.assert_equals( Abbreviator.abbreviate("You, and I, should speak."), "You, and I, s4d s3k.", Abbreviator.abbreviate("You, and I, should speak.") )
Your mistake is that your second parameter is a substitution string operating on x (the original entire string) as a whole.
Instead of using the form of gsub where the second parameter is a substitution string, use the form of gsub where the second parameter is a block (listed, for example, third on this page). Now you are receiving each substring into your block and can operate on that substring individually.
def short_form(str)
str.gsub(/[[:alpha:]]{4,}/) { |s| "%s%d%s" % [s[0], s.size-2, s[-1]] }
end
The regex reads, "match four or more alphabetic characters".
short_form "abc" # => "abc"
short_form "a-b-c" #=> "a-b-c"
short_form "cats" #=> "c2s"
short_form "two-ponies-c" #=> "two-p4s-c"
short_form "Humpty-Dumpty, who sat on a wall, fell over"
#=> "H4y-D4y, who sat on a w2l, f2l o2r"
I would recommend something along the lines of this:
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/) do |word|
# Skip the word unless it's long enough
next word unless word.length > 4
# Do the same I18n conversion you do before
"#{word[0]}#{(word.length-2)}#{word[-1]}"
end
end
end
The accepted answer isn't bad, but it can be made a lot simpler by not matching words that are too short in the first place:
def abbreviate(str)
str.gsub(/([[:alpha:]])([[:alpha:]]{3,})([[:alpha:]])/i) { "#{$1}#{$2.size}#{$3}" }
end
abbreviate("You, and I, should speak.")
# => "You, and I, s4d s3k."
Alternatively, we can use lookbehind and lookahead, which makes the Regexp more complex but the substitution simpler:
def abbreviate(str)
str.gsub(/(?<=[[:alpha:]])[[:alpha:]]{3,}(?=[[:alpha:]])/i, &:size)
end

Find a keyword in text and then print words leading up to a separate keyword

I am trying to write a program that looks at text provided by a user, searches for a keyword, and if it finds that keyword, will print it as well as any words that follow it until it encounters a separate keyword. For example:
Search for "I", print until "and"
User text: "I like fishing and running"
Returns: "I like fishing"
I have tried to loop through an array of the user's text using each_with_index, but could not access the index of words ahead of the word my loop is currently on. Any attempt to access other indices returns nil.
def search()
#text.each_with_index do |word, index|
if word == "I"
puts word + word[1]
end
end
end
Once I can print words of future indices, my next problem would be to print all words leading up to a key word that tells it to stop, which I imagine I could probably do with an if statement and break, but I'd be grateful for any ideas for this as well.
If you have any suggestions on how to make the above work, or any other solutions I would appreciate it.
str = "The quickest of the quick brown dogs jumped over the lazy fox"
word_include = "quick"
word_exclude = "lazy"
r = /
\b#{word_include}\b # match value of word_include between word boundaries
.*? # match any number of any character, lazily
(?=\b#{word_exclude}\b) # match value of word_exclude with word breaks
# in positive lookahead
/x # extended mode for regex def
#=> /\bquick\b.*?(?=\blazy\b)/
str[r]
#=>"quick brown dogs jumped over the "
Note that if:
str = "The quick brown lazy dog jumped over the lazy fox"
we would obtain:
str[r]
#=> "quick brown "
which is what we want. If however, we changed .*? in the regex to .*, making it non-lazy, we would obtain:
str[r]
#=> "quick brown lazy dog jumped over the "
Using an index here does not seem to be the right way.
_, key, post = "I like fishing and running".partition(/\bI\b/)
pre, key, _ = (key + post).partition(/\band\b/)
pre # => "I like fishing"
def enclosed_words(sentence, start_word, end_word)
words = sentence.split
return [] unless words.include?(start_word) and words.include?(end_word)
words[words.index(start_word)...words.index(end_word)].join(' ')
end
enclosed_words('I like fishing and running', 'I', 'and') # => "I like fishing"

Counting words in Ruby with some exceptions

Say that we want to count the number of words in a document. I know we can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
Say, that I just want to add some exceptions, such that, I don't want to count the following as words:
(1) numbers
(2) standalone letters
(3) email addresses
How can we do that?
Thanks.
You can wrap this up pretty neatly:
text.each_line do |line|
total_words += line.split.reject do |word|
word.match(/\A(\d+|\w|\S*\#\S+\.\S+)\z/)
end.length
end
Roughly speaking that defines an approximate email address.
Remember Ruby strongly encourages the use of variables with names like total_words and not totalWords.
assuming you can represent all the exceptions in a single regular expression regex_variable, you could do:
text.each_line(){ |line| totalWords = totalWords + line.split.count {|wrd| wrd !~ regex_variable }
your regular expression could look something like:
regex_variable = /\d.|^[a-z]{1}$|\A([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
I don't claim to be a regex expert, so you may want to double check that, particularly the email validation part
In addition to the other answers, a little gem hunting came up with this:
WordsCounted Gem
Get the following data from any string or readable file:
Word count
Unique word count
Word density
Character count
Average characters per word
A hash map of words and the number of times they occur
A hash map of words and their lengths
The longest word(s) and its length
The most occurring word(s) and its number of occurrences.
Count invividual strings for occurrences.
A flexible way to exclude words (or anything) from the count. You can pass a string, a regexp, an array, or a lambda.
Customisable criteria. Pass your own regexp rules to split strings if you prefer. The default regexp has two features:
Filters special characters but respects hyphens and apostrophes.
Plays nicely with diacritics (UTF and unicode characters): "São Paulo" is treated as ["São", "Paulo"] and not ["S", "", "o", "Paulo"].
Opens and reads files. Pass in a file path or a url instead of a string.
Have you ever started answering a question and found yourself wandering, exploring interesting, but tangential issues, or concepts you didn't fully understand? That's what happened to me here. Perhaps some of the ideas might prove useful in other settings, if not for the problem at hand.
For readability, we might define some helpers in the class String, but to avoid contamination, I'll use Refinements.
Code
module StringHelpers
refine String do
def count_words
remove_punctuation.split.count { |w|
!(w.is_number? || w.size == 1 || w.is_email_address?) }
end
def remove_punctuation
gsub(/[.!?,;:)](?:\s|$)|(?:^|\s)\(|\-|\n/,' ')
end
def is_number?
self =~ /\A-?\d+(?:\.\d+)?\z/
end
def is_email_address?
include?('#') # for testing only
end
end
end
module CountWords
using StringHelpers
def self.count_words_in_file(fname)
IO.foreach(fname).reduce(0) { |t,l| t+l.count_words }
end
end
Note that using must be in a module (possibly a class). It does not work in main, presumably because that would make the methods available in the class self.class #=> Object, which would defeat the purpose of Refinements. (Readers: please correct me if I'm wrong about the reason using must be in a module.)
Example
Let's first informally check that the helpers are working correctly:
module CheckHelpers
using StringHelpers
s = "You can reach my dog, a 10-year-old golden, at fido#dogs.org."
p s = s.remove_punctuation
#=> "You can reach my dog a 10 year old golden at fido#dogs.org."
p words = s.split
#=> ["You", "can", "reach", "my", "dog", "a", "10",
# "year", "old", "golden", "at", "fido#dogs.org."]
p '123'.is_number? #=> 0
p '-123'.is_number? #=> 0
p '1.23'.is_number? #=> 0
p '123.'.is_number? #=> nil
p "fido#dogs.org".is_email_address? #=> true
p "fido(at)dogs.org".is_email_address? #=> false
p s.count_words #=> 9 (`'a'`, `'10'` and "fido#dogs.org" excluded)
s = "My cat, who has 4 lives remaining, is at abbie(at)felines.org."
p s = s.remove_punctuation
p s.count_words
end
All looks OK. Next, put I'll put some text in a file:
FName = "pets"
text =<<_
My cat, who has 4 lives remaining, is at abbie(at)felines.org.
You can reach my dog, a 10-year-old golden, at fido#dogs.org.
_
File.write(FName, text)
#=> 125
and confirm the file contents:
File.read(FName)
#=> "My cat, who has 4 lives remaining, is at abbie(at)felines.org.\n
# You can reach my dog, a 10-year-old golden, at fido#dogs.org.\n"
Now, count the words:
CountWords.count_words_in_file(FName)
#=> 18 (9 in ech line)
Note that there is at least one problem with the removal of punctuation. It has to do with the hyphen. Any idea what that might be?
Something like...?
def is_countable(word)
return false if word.size < 2
return false if word ~= /^[0-9]+$/
return false if is_an_email_address(word) # you need a gem for this...
return true
end
wordCount = text.split().inject(0) {|count,word| count += 1 if is_countable(word) }
Or, since I am jumping to the conclusion that you can just split your entire text into an array with split(), you might need:
wordCount = 0
text.each_line do |line|
line.split.each{|word| wordCount += 1 if is_countable(word) }
end

Ruby regex checks string for variations of pattern of same length

I was wondering how you construct the regular expression to check if the string has a variation of a pattern with the same length. Say the string is "door boor robo omanyte" how do I return the words that have the variation of [door]?
You can easily get all the possible words using Array#permutation. Then you can scan for them in provided string. Here:
possible_words = %w[d o o r].permutation.map &:join
# => ["door", "doro", "door", "doro", "droo", "droo", "odor", "odro", "oodr", "oord", "ordo", "orod", "odor", "odro", "oodr", "oord", "ordo", "orod", "rdoo", "rdoo", "rodo", "rood", "rodo", "rood"]
string = "door boor robo omanyte"
string.scan(possible_words.join("|"))
# => ["door"]
string = "door close rood example ordo"
string.scan(possible_words.join("|"))
# => ["door", "rood", "ordo"]
UPDATE
You can improve scan further by looking for word boundary. Here:
string = "doorrood example ordo"
string.scan(/"\b#{possible_words.join('\b|\b')}\b"/)
# => ["ordo"]
NOTE
As Cary correctly pointed out in comments below, this process is quite inefficient if you intend to find permutation for a fairly large string. However it should work fine for OP's example.
If the comment I left on your question correctly interprets the question, you could do this:
str = "door sit its odor to"
str.split
.group_by { |w| w.chars.sort.join }
.values
.select { |a| a.size > 1 }
#=> [["door", "odor"], ["sit", "its"]]
This assumes all the letters are the same case.
If case is not important, just make a small change:
str = "dooR sIt itS Odor to"
str.split
.group_by { |w| w.downcase.chars.sort.join }
.values
.select { |a| a.size > 1 }
#=> [["dooR", "Odor"], ["sIt", "itS"]]
In my opinion the fastest way to find this will be
word_a.chars.sort == word_b.chars.sort
since we are using the same characters inside the words
IMO, some kind of iteration is definitely necessary to build a regular expression to match this one. Not using a regular expression is better too.
def variations_of_substr(str, sub)
# Creates regexp to match words with same length, and
# with same characters of str.
patt = "\\b" + ( [ "[#{sub}]{1}" ] * sub.size ).join + "\\b"
# Above alone won't be enough, characters in both words should
# match exactly.
str.scan( Regexp.new(patt) ).select do |m|
m.chars.sort == sub.chars.sort
end
end
variations_of_substr("door boor robo omanyte", "door")
# => ["door"]

Resources