How to replace a character by regex in ruby - ruby

How to replace a letter 'b' to 'c' after a duplicate letter 'a' base on 2 times?
for example :
ab => ab
aab => aac
aaab => aaab
aaaab => aaaac
aaaabaaabaab => aaacaabaac

You should check groups of aa followed by b and then replace captured groups accordingly.
Regex: (?<!a)((?:a{2})+)b
Explanation:
(?<!a) checks for presence of an odd numbered a. If present whole match fails.
((?:a{2})+)b captures an even number of a followed by b. Outer group is captured and numbered as \1.
Replacement: \1c i.e first captured group followed by c.
Test String:
ab
aab
aaab
aaaab
aaaabaaabaab
After replacement:
ab
aac
aaab
aaaac
aaaacaaabaac
Regex101 Demo

Related

why is \d+ not matching all digits?

I have the following regular expression:
REGEX = /^.+(\d+.+(?=AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY)[A-Z]{2}[, ]+\d{5}(?:-\d{4})?).+/
I have the following string:
str = "fdsfd 8126 E Bowen AVE Bensalem, PA 19020-1642 dfdf"
Notice my capturing group begins with one or more digits that match the pattern. Yet this is what I get:
str =~ REGEX
$1
=> "6 E Bowen AVE Bensalem, PA 19020-1642"
Or
match = str.match(REGEX)
match[1]
=> "6 E Bowen AVE Bensalem, PA 19020-1642"
Why is it missing the first 3 digits of 812?
The below regex works properly, as you can see at Regex101
REGEX = /^.+?(\d+.+(?=AL|AK|AS|AZ|AR|CA|CO|CT|DE|DC|FM|FL|GA|GU|HI|ID|IL|IN|IA|KS|KY|LA|ME|MH|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|MP|OH|OK|OR|PW|PA|PR|RI|SC|SD|TN|TX|UT|VT|VI|VA|WA|WV|WI|WY)[A-Z]{2}[, ]+\d{5}(?:-\d{4})?).+/
Note the addition of the question mark near the beginning of the regex
/^.+?(\d+...
^
By default, your first .+ is being greedy, consuming all digits it can, and still allowing the regex pass. By adding ? after the plus, you can make it lazy instead of greedy.
An alternative would be to not capture digits, like this:
/^[^\d]+(\d+...
[^\d]+ will capture everything except for digits.

How do I insert a space before a number in a string if it matches a regular expression?

I want to insert a space before a number in a string if my string has a capital letter, one or more lower case letters, and then a number right after it. That is, if I have
Bdefg23
I want to insert a space between the "g" and the "23" making the above string
Bdefg 23
So this string would not get changed
BabcdD55
because there is a capital letter before "55". I tried this below
str.split(/([A-Z][a-z]+)/).delete_if(&:empty?).join(' ')
but it works too well. That is, if my string is
Ptannex..
it will turn it into
Ptannex ..
How can I adjust what I have to make it work for only the condition I outlined? Btw, I'm using Ruby 2.4.
You can always do it roughly this way:
%w[
Bdefg23
Ptannex95
BigX901
littleX902
CS101
xx900
].each do |example|
puts '%s -> %s' % [
example,
example.sub(/\A([A-Z][a-z]+)(\d+)\z/, '\1 \2')
]
end
Which gives you output like:
Bdefg23 -> Bdefg 23
Ptannex95 -> Ptannex 95
BigX901 -> BigX901
littleX902 -> littleX902
CS101 -> CS101
xx900 -> xx900
You may use
s.sub(/\A(\p{Lu}\p{L}*\p{Ll})(\d+)\z/, '\1 \2')
See the Ruby demo.
Details
\A - start of string
(\p{Lu}\p{L}*\p{Ll}) - Group 1:
\p{Lu} - any Unicode uppercase letter
\p{L}* - any 0+ Unicode letters
\p{Ll} - any lowercase Unicode letter
(\d+) - Group 2: one or more digits
\z - end of string.
The \1 \2 replacement pattern replaces the whole match with the contents of Group 1, then a space, and then the contents of Group 2.

How to select words that are made up of the same letter using regex?

I have a dictionary text file that contains some words that I don't want.
Example:
aa
aaa
aaaa
bb
b
bbb
etc
I want to use a regular expression to select these words and remove them. However,
what I have seems to be getting too long and there must be a more efficient approach.
Here is my code so far:
/^a{1,6}$|^b{1,6}$|^c{1,6}$|^d{1,6}$|^e{1,6}$|^f{1,6}$|^g{1,6}$|^[i]{2,3}$/
It seems that I have to do this for every letter. How could I do this more succinctly?
It's a lot easier to collapse the word down to unique letters and remove all of those with just one letter in them:
words = "aa aaa aaaa bb b bbb etc aab abcabc"
words.split(/\s+/).select do |word|
word.chars.uniq.length > 1
end
# => ["etc", "aab", "abcabc"]
This splits your string into words, then selects only those words that have more than one type of character in them (.chars.uniq)
^([a-z])\1?\1?\1?\1?\1?$
Match any single letter, followed by 5 optional backreferences to the initial letter.
This might work too:
^([a-z])\1{,5}$
Try this
\b([a-zA-Z])\1*\b
if you want (in addition to letters) to include also repeated digits or underscores, use this code:
\b([\w])\1*\b
Update:
To exclude I from being removed:
(?i)ii+|\b((?i)[a-hj-z])\1*\b
(?i) is added above to make letters not case sensitive.
Demo:
https://regex101.com/r/gFUWE8/7
You can try with this regex:
\b([a-z])\1{0,}\b
and replace by empty
Ruby code sample:
re = /\b([a-z])\1{0,}\b/m
str = 'aa aaa aaaa bb b bbb abc aa a pqaaa '
result = str.gsub(re,'')
puts result
Run the code here

Finding all groups of contiguous words in string

I need to find all groups of two contiguous words in a string, but only of words that have 2-3 chars of length. So far I've come with this:
'toolong fee fi fo fum toolong verylong aa bb'.scan(/\b[a-z]{2,3}\s+\b[a-z]{2,3}/)
=> ["fee fi", "fo fum", "aa bb"]
But I want something like this:
=> ["fee fi", "fi fo", "fo fum", "aa bb"]
Any help greatly appreciated.
You need to use lookahead along with capturing group in-order to do overlapping matches.
> 'toolong fee fi fo fum toolong verylong aa bb'.scan(/(?=\b([a-z]{2,3}\s+[a-z]{2,3})\b)/)
=> [["fee fi"], ["fi fo"], ["fo fum"], ["aa bb"]]
> 'toolong fee fi fo fum toolong verylong aa bb'.scan(/\b(?=([a-z]{2,3}\s+[a-z]{2,3})\b)/).flatten
=> ["fee fi", "fi fo", "fo fum", "aa bb"]
The logical way is to consume the first 3 ltr word, then lookahead for the
next one.
Since you want both words together, you'd capture each one then join
them together after each match. \b([a-z]{2,3})(?=(\s+[a-z]{2,3})\b)
\b
( [a-z]{2,3} ) # (1)
(?=
( # (2 start)
\s+
[a-z]{2,3}
) # (2 end)
\b
)
The next logical way (though, not intuitive) is to lookahead for the
combined 2 words, then consume the first one to advance the match
position. (?=\b(([a-z]{2,3})\s+[a-z]{2,3})\b)\2
This way lets you just grab group 1 without the need to join.
(?=
\b
( # (1 start)
( [a-z]{2,3} ) # (2)
\s+
[a-z]{2,3}
) # (1 end)
\b
)
\2

Elimination of extra characters in a word

I'm doing a project which requires removal of extra letters in a word.
If a letter occurs three or more times consecutively, we condense it to one letter
-Happyyyyyy -> Happy
-awwwsum -> awsum
-cooool -> col
I'm using Ruby 1.8.7 to do this. How do I go about this?
Here's how you do it:
result = subject.gsub(
/(.) # Match a single character, capture it in group 1
\1{2,} # Match the same character 2 or more times, as many as possible/x,
'\1') # Replace with the one captured character
Result:
> subject = "happyyyy daaaaays!!!"
=> "happyyyy daaaaays!!!"
> result = subject.gsub(/(.)\1{2,}/, '\1')
=> "happy days!"

Resources