ruby regex - how replace nth instance of a match in a string - ruby

In my app I need to be able to find all number substrings, then scan each one, find the first one that matches a range (such as between 5 and 15) and replace THAT instance with another string "X".
My test string s = "1 foo 100 bar 10 gee 1"
My initial pattern is any string of 1 or more digits, eg, re = Regexp.new(/\d+/)
matches = s.scan(re) gives ["1", "100", "10", "1"]
If I want to replace the Nth match, and only the Nth match, with "X" how do I?
For example if I want to replace the third match "10" (matches[2]) I can't just say
s[matches[2]] = "X" because that does two replacements
"1 foo X0 bar X gee 1"
Any help would be appreciated!

String#gsub has a form that takes a block. It yields to the block for each match, and replaces the match with the result of the block. So:
first = true
"1 foo 100 bar 10 gee 1 12".gsub(/\d+/) do |digits|
number = digits.to_i
if number >= 5 && number <= 15 && first
# do the replacement
first = false
'X'
else
# don't replace; i.e. replace with itself
digits
end
end
# => "1 foo 100 bar X gee 1 12"

An alternate way is to construct number range using character class (if it is not too complicated)
>> s = "1 foo 100 bar 10 gee 1"
=> "1 foo 100 bar 10 gee 1"
>> s.sub(/(?<!\d)([5-9]|1[0-5])(?!\d)/, 'X')
=> "1 foo 100 bar X gee 1"
the negative lookarounds ensure that part of digit sequence are not matched
you can use \b instead of lookarounds if the numbers cannot be part of words like abc12ef or 8foo
([5-9]|1[0-5]) will match numbers from 5 to 15
Initially, the title lead me to think that you want to replace Nth occurrence - for ex: N=2 means replace second occurrence of any digit sequence. For that you can use this:
# here the number inside {} will be N-1
>> s.sub(/(\d+.*?){1}\K\d+/, 'X')
=> "1 foo X bar 10 gee 1"

Related

Ruby four-digit number, how to check it whether they are two the same numbers

How to check that there are at least two the same values on four-digit number or there aren't at least two the same values
for example 2432 - there are two 2, how to check it and then spit out an information that there are at least two or more the same numbers in this four-digit number?
puts "enter four-digit number: "
four_digit_num = gets.chomp.to_i
You can do that as follows.
r = /(\d)\d*\1/
gets.match?(r) # when gets #=> "1231\n"
#=> true
gets.match?(r) # when gets #=> "1232\n"
#=> true
gets.match?(r) # when gets #=> "1234\n"
#=> false
We can write the regular expression in free-spacing mode to make it self-documenting.
r = /
(\d) # match a digit and save to capture group 1
\d* # match zero or more digits
\1 # match the contents of capture group 1
/x # specify free-spacing regex definition mode
See String#match?.
If you must begin with the integer
four_digit_num = gets.to_i
you could write
arr = four_digit_num.digits
arr.uniq.size < arr.size
or convert it to a string and apply the first method above:
four_digit_num.to_s.match?(r)
Test for unique digits:
four_digit_num = gets.to_i
digits = four_digit_num.digits # leading zeros will disappear
puts "2 or more same digits." if digits != digits.uniq
You can have up to two pairs of duplicates, so that's probably a use case that needs to be handled. In any case, with Ruby 2.7.2 you might use a Hash to count occurrences of each digit in your String using Hash#tally (via Enumerable) like so:
def report_duplicates_in(digits)
dupes = digits.to_s.chars.tally.select { _2 > 1 }
puts "dupliicate digits=>counts for #{digits}: #{dupes}" if dupes.any?
[digits, dupes]
end
# test with some values
%w[1234 2432 2442].map do |four_digit_num|
report_duplicates_in(four_digit_num)
end.to_h
This will print the following to standard output and return a Hash:
dupliicate digits=>counts for 2432: {"2"=>2}
dupliicate digits=>counts for 2442: {"2"=>2, "4"=>2}
#=> {"1234"=>{}, "2432"=>{"2"=>2}, "2442"=>{"2"=>2, "4"=>2}}

How to check if the first and last character of a word are the same in Ruby?

If I have a string that's a sentence, I want to check if the first and last letter of each word are the same and find which of the words have their first and last letter the same. For example:
sentence_one = "Label the bib numbers in red."
You could use a regex:
sentence_one = "Label the bib numbers in red"
sentence_one.scan(/(\b(\w)\w*(\2)\b)/i)
#=> [["Label", "L", "l"], ["bib", "b", "b"]]
\b is a word boundary, \w matches a letter (you may have to adjust this). There are 3 captures: (1) the whole word, (2) the first letter and (3) the last letter. Using \2 requires the last letter to match the first.
This will print out all words that start with and end with the same letter (not case-sensitive)
sentence_one = "Label the bib numbers in red"
words = sentence_one.split(' ')
words.each do |word|
if word[0].downcase == word[-1].downcase
puts word
end
end
sentence_one.scan(/\S+/).select{|s| s[0].downcase == s[-1].downcase}
# => ["Label", "bib"]
In a comment the OP asked how one could obtain a count of words having the desired property. Here's one way to do that. I assume that the desired property is that a word's first and last characters are the same, though possibly of different case. Here is a way to do that that does not produce an intermediate array whose elements would be counted.
r = /
\b # match a word break
(?: # begin a non-capture group
\p{Alpha} # match a letter
| # or
(\p{Alpha}) # match a letter in capture group 1
\p{Alpha}* # match zero or more letters
\1 # match the contents of capture group 1
) # end the non-capture group
\b # match a word break
/ix # case-indifferent and free-spacing regex definition modes
str = "How, now is that a brown cow?"
str.gsub(r).count
#=> 2
See String#gsub, in particular the case where there is only one argument and no block is provided.
Note
str.gsub(r).to_a
#=> ["that", "a"]
str.scan(r)
#=> [["t"], [nil]]
Sometimes it is awkward to use scan when the regular expression contains capture groups (see String#scan). Those problems often can be avoided by instead using gsub followed by to_a (or Enumerable#entries).
Just to add one option more splitting to array (skipping one letter words):
sentence_one = "Label the bib numbers in a red color"
sentence_one.split(' ').keep_if{ |w| w.end_with?(w[0].downcase) & (w.size > 1) }
#=> ["Label", "bib"]
sentence_one = "Label the bib numbers in red"
puts sentence_one.split(' ').count{|word| word[0] == word[-1]} # => 1

Regex pattern to see if a string contains a number in a range

I'm trying to match a number between 400 and 499 anywhere in a string. For example, both:
string = "[401] One too three"
string2 = "Yes 450 sir okay"
should match. Both:
string3 = "[123] Test"
string4 = "This is another string"
should fail.
What's the best way to write the regex? I wrote:
string =~ /\d{3}/
to see if the string contains a three digit integer. How can I see if that's within the range?
If you don't actually need the number afterwords, and just need to determine yes or no the string contains a number in the range of 400-499, you can:
Check that you are at the beginning of a line, or have a non-digit character followed by
the digit '4' followed by
Any 2 digits followed by
the end of a line or a non-digit character
so you end up with a regex looking something like
regex = /(?:^|\D)4\d{2}(?:\D|$)/
or, by using negative look ahead/look behinds:
regex = /(?<!\d)4\d{2}(?!\d)/
and you need step 1 and 4 above to make rule out numbers such as 1400-1499 and 4000-4999 (and other such numbers with more than 3 digits that have 400-499 somewhere buried in them). You can then make use of String#match? in newer ruby versions to get just a simple boolean:
string.match?(regex) # => true
string2.match?(regex) # => true
string3.match?(regex) # => false
string4.match?(regex) # => false
"1400".match?(regex) # => false
"400".match?(regex) # => true
"4000".match?(regex) # => false
"[1400]".match?(regex) # => false
"[400]".match?(regex) # => true
"[4000]".match?(regex) # => false
Fairly simple regex, no need to pull out the match and convert it to an integer if you just need a simple yes or no
def doit(str, rng)
str.gsub(/-?\d+/).find { |s| rng.cover?(s.to_i) }
end
doit "[401] One too three", 400..499 #=> "401"
doit "Yes 450 sir okay", 400..499 #=> "450"
doit "Yes -450 sir okay", -499..400 #=> "-450"
doit "[123] Test", 400..499 #=> nil
doit "This is another string", 400..499 #=> nil
Recall that String#gsub returns an enumerator, when used with a single argument and no block. The enumerator merely generates matches and performs no substitutions. I've found a number of situations, as here, where this form of the method can be used to advantage.
if str may contain the representations of multiple integers within the specified range, and all such are desired, simply replace Enumerable#find with Enumerable#select:
"401, 532 and -126".gsub(/-?\d+/).select { |s| (-127..451).cover?(s.to_i) }
#=> ["401", "-126"]
I recommend using a general regex to first extract the number from each line. Then, use a regular script to check the range:
s = "[404] Yes sir okay"
data = s.match(/\[(\d+)\]/)
data.captures
num = data[1].to_i
if (num >= 400 && num < 500)
print "Match"
else
print "No Match"
end
Demo
The pattern I wrote should actually work to match any number in brackets, anywhere in the string.
Extract the digits with a regex, convert the capture group to integer and ask Ruby if they're between your bounds:
s = "[499] One too three"
lo = 400
hi = 499
puts s =~ /(\d{3})/ && $1.to_i.between?(lo, hi)

Checking string with minimum 8 digits using regex

I have regex as follows:
/^(\d|-|\(|\)|\+|\s){12,}$/
This will allow digits, (, ), space. But I want to ensure string contains atleast 8 digits.
Some allowed strings are as follows:
(1323 ++24)233
24243434 43
++++43435++4554345 434
It should not allow strings like:
((((((1213)))
++++232+++
Use Look ahead within your regex at the start..
/^(?=(.*\d){8,})[\d\(\)\s+-]{8,}$/
---------------
|
|->this would check for 8 or more digits
(?=(.*\d){8,}) is zero width look ahead that checks for 0 to many character (i.e .*) followed by a digit (i.e \d) 8 to many times (i.e.{8,0})
(?=) is called zero width because it doesnt consume the characters..it just checks
To restict it to 14 digits you can do
/^(?=([^\d]*\d){8,14}[^\d]*$)[\d\(\)\s+-]{8,}$/
try it here
Here's a non regular expression solution
numbers = ["(1323 ++24)233", "24243434 43" , "++++43435++4554345 434", "123 456_7"]
numbers.each do |number|
count = 0
number.each_char do |char|
count += 1 if char.to_i.to_s == char
break if count > 7
end
puts "#{count > 7}"
end
No need to mention ^, $, or the "or more" part of {8,}, or {12,}, which is unclear where it comes from.
The following makes the intention transparent.
r = /
(?=(?:.*\d){8}) # First condition: Eight digits
(?!.*[^-\d()+\s]) # Second condition: Characters other than `[-\d()+\s]` should not be included.
/x
resulting in:
"(1323 ++24)233" =~ r #=> 0
"24243434 43" =~ r #=> 0
"++++43435++4554345 434" =~ r #=> 0
"((((((1213)))" =~ r #=> nil
"++++232+++" =~ r #=> nil

Ruby: Conditional replace in String using gsub

Given an input string:
<m>1</m>
<m>2</m>
<m>10</m>
<m>11</m>
I would like to replace all values that are not equal to 1 with 5.
So the output String should look like:
<m>1</m>
<m>5</m>
<m>5</m>
<m>5</m>
I tried using:
gsub(/(<m>)([^1])(<\/m>)/, '\15\3')
But this will not replace 10 and 11.
#gsub can optionally take a block and will replace with the result of that block:
subject.gsub(/\d+/) { |m| m == '1' ? m : '5' }
Without regexp just because it's possible
"1 2 10 11".split.map{|n| n=='1' ? n : '5'}.join(' ')
result = subject.gsub(/\b(?!1\b)\d+/, '5')
Explanation:
\b # match at a word boundary (in this case, at the start of a number)
(?! # assert that it's not possible to match
1 # 1
\b # if followed by a word boundary (= end of the number)
) # end of lookahead assertion
\d+ # match any (integer) number
Edit:
If you just wish to replace numbers that are surrounded by <m> and </m> then you can use
result = subject.gsub(/<m>(?!1\b)\d+<\/m>/, '<m>5</m>')

Resources