Given an input string:
<m>1</m>
<m>2</m>
<m>10</m>
<m>11</m>
I would like to replace all values that are not equal to 1 with 5.
So the output String should look like:
<m>1</m>
<m>5</m>
<m>5</m>
<m>5</m>
I tried using:
gsub(/(<m>)([^1])(<\/m>)/, '\15\3')
But this will not replace 10 and 11.
#gsub can optionally take a block and will replace with the result of that block:
subject.gsub(/\d+/) { |m| m == '1' ? m : '5' }
Without regexp just because it's possible
"1 2 10 11".split.map{|n| n=='1' ? n : '5'}.join(' ')
result = subject.gsub(/\b(?!1\b)\d+/, '5')
Explanation:
\b # match at a word boundary (in this case, at the start of a number)
(?! # assert that it's not possible to match
1 # 1
\b # if followed by a word boundary (= end of the number)
) # end of lookahead assertion
\d+ # match any (integer) number
Edit:
If you just wish to replace numbers that are surrounded by <m> and </m> then you can use
result = subject.gsub(/<m>(?!1\b)\d+<\/m>/, '<m>5</m>')
Related
I am busy working through some problems I have found on the net and I feel like this should be simple but I am really struggling.
Say you have the string 'AbcDeFg' and the next string of 'HijKgLMnn', I want to be able to find the same characters in the string so in this case it would be 'g'.
Perhaps I wasn't giving enough information - I am doing Advent of Code and I am on day 3. I just need help with the first bit which is where you are given a string of characters - you have to split the characters in half and then compare the 2 strings. You basically have to get the common character between the two. This is what I currently have:
file_data = File.read('Day_3_task1.txt')
arr = file_data.split("\n")
finals = []
arr.each do |x|
len = x.length
divided_by_two = len / 2
second = x.slice!(divided_by_two..len).split('')
first = x.split('')
count = 0
(0..len).each do |z|
first.each do |y|
if y == second[count]
finals.push(y)
end
end
count += 1
end
end
finals = finals.uniq
Hope that helps in terms of clarity :)
Did you try to convert both strings to arrays with the String#char method and find the intersection of those arrays?
Like this:
string_one = 'AbcDeFg'.chars
string_two = 'HijKgLMnn'.chars
string_one & string_two # => ["g"]
One way to do that is to use the method String#scan with the regular expression
rgx = /(.)(?!.*\1.*_)(?=.*_.*\1)/
I'm not advocating this approach. I merely thought some readers might find it interesting.
Suppose
str1 = 'AbcDgeFg'
str2 = 'HijKgLMnbn'
Now form the string
str = "#{str1}_#{str2}"
#=> "AbcDeFg_HijKgLMnbn"
I've assumed the strings contain letters only, in which case they are separated in str with any character other than a letter. I've used an underscore. Naturally, if the strings could contain underscores a different separator would have to be used.
We then compute
str.scan(rgx).flatten
#=> ["b", "g"]
Array#flatten is needed because
str.scan(rgx)
#=>[["b"], ["g"]]
The regular expression can be written in free-spacing mode to make it self-documenting:
rgx =
/
(.) # match any character, same to capture group 1
(?! # begin a negative lookahead
.* # match zero or more characters
\1 # match the contents of capture group 1
.* # match zero or more characters
_ # match an underscore
) # end the negative lookahead
(?= # begin a positive lookahead
.* # match zero or more characters
_ # match an underscore
.* # match zero or more characters
\1 # match the contents of capture group 1
) # end the positive lookahead
/x # invoke free-spacing regex definition mode
Note that if a character appears more than once in str1 and at least once in str2 the negative lookahead ensures that only the last one in str1 is matched, to avoid returning duplicates.
Alternatively, one could write
str.gsub(rgx).to_a
The uses the (fourth) form of String#gsub which takes a single argument and no block and returns an enumerator.
The one exception is that the returned string cannot begin or end with a hyphen, and each odd digit is permitted only a single hyphen before and after each odd digit. For example:
def hyphenate(number)
# code
end
hyphenate(132237847) # should return "1-3-22-3-7-84-7"
"-1-3-22-3-7-84-7-" # incorrect because there is a hyphen before and after
# each beginning and ending odd digit respectively.
"1--3-22-3--7-84-7" # Also incorrect because there is more than one
# single hyphen before and after each odd digit
I suggest to match a non-word boundary \B (that will match a position between two digits) followed or preceded with an odd digit:
number.to_s.gsub(/\B(?=[13579])|\B(?<=[13579])/, '-')
Since the same position can't be matched twice, you avoid the problem of consecutive hyphens.
rubular demo
with the replacement
A simple way is to convert the number to a string, String#split the string on odd digits (using a group so that the odd digit delimiters get into the output), clean up the stray '' strings that String#split will produce, and put it back together with Array#join:
number.to_s.split(/([13579])/).reject(&:empty?).join('-')
def hyphenate(number)
test_string = ''
# Convert the number to a string then iterate over each character
number.to_s.each_char do |n|
# If the number is divisible by 2 then just add it to the string
# else it is an odd number then add it with the hyphens
n.to_i % 2 == 0 ? test_string += n : test_string += "-#{n}-"
end
# Remove the first character of the string if it is a hyphen
test_string = test_string[1..-1] if test_string.start_with?('-')
# Remove the last character of the string if it is a hyphen
test_string = test_string[0..-2] if test_string.end_with?('-')
# Return the string and replace all double hyphens with a single hyphen
test_string.gsub('--', '-')
end
puts hyphenate(132237847)
Returns "1-3-22-3-7-84-7"
Here's another approach for taking a number and returning it in string form with its odd digits surrounded by hyphens:
def hyphenate(number)
result = ""
number.digits.reverse.each do |digit|
result << (digit.odd? ? "-#{digit}-" : digit.to_s)
end
result.gsub("--", "-").gsub(/(^-|-$)/, "")
end
hyphenate(132237847)
# => "1-3-22-3-7-84-7"
Hope it helps!
I want to match character pairs in a string. Let's say the string is:
"zttabcgqztwdegqf". Both "zt" and "gq" are matching pairs of characters in the string.
The following code finds the "zt" matching pair, but not the "gq" pair:
#!/usr/bin/env ruby
string = "zttabcgqztwdegqf"
puts string.scan(/.{1,2}/).detect{ |c| string.count(c) > 1 }
The code provides matching pairs where the indices of the pairs are 0&1,2&3,4&5... but not 1&2,3&4,5&6, etc:
zt
ta
bc
gq
zt
wd
eg
qf
I'm not sure regex in Ruby is the best way to go. But I want to use Ruby for the solution.
You can do your search with a single regex:
puts string.scan(/(?=(.{2}).*\1)/)
regex101 demo
Output
zt
gq
Regex Breakout
(?= # Start a lookahead
(.{2}) # Search any couple of char and group it in \1
.*\1 # Search ahead in the string for another \1 to validate
) # Close lookahead
Note
Putting all the checks inside lookahead assure the regex engine does not consume the couple when validates it.
So it also works with overlapping couples like in the string abcabc: the output will correctly be ab,bc.
Oddity
If the regex engine does not consume the chars how it can reach the end of the string?
Internally after the check Onigmo (the ruby regex engine) makes one step further automatically. Most regex flavours behaves in this way but e.g. the javascript engine needs the programmer to increment the last match index manually.
str = "ztcabcgqzttwtcdegqf"
r = /
(.) # match any character in capture group 1
(?= # begin a positive lookahead
(.) # match any character in capture group 2
.+ # match >= 1 characters
\1 # match capture group 1
\2 # match capture group 2
) # close positive lookahead
/x # extended/free-spacing regex definition mode
str.scan(r).map(&:join)
#=> ["zt", "tc", "gq"]
Here is one way to do this without using regex:
string = "zttabcgqztwdegqf"
p string.split('').each_cons(2).map(&:join).select {|i| string.scan(i).size > 1 }.uniq
#=> ["zt", "gq"]
I want to fetch the value 10 from video[10]. How to do this using regular expression?
I have tried : /video\[(.*?)\]/.
This did not work.
It should be /video\[(.*)\]/.
/video\[(.*)\]/.match "video[10]"
#=> #<MatchData "video[10]" 1:"10">
r = /
\b # match a word break
video\[ # match string
(\d+) # match >= 1 digits in capture group 1
\] # match char
/x # extended/free-spacing mode
"Martha, where did you put video[10]?"[r,1]
#=> "10"
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 7 years ago.
I found and tested a regex to validate a time string such as 11:30 AM:
^(1[0-2]|0?[1-9]):([0-5][0-9])(\s[A|P]M)\)?$
I understand most of it except the beginning:
(1[0-2]|0?[1-9])
Can someone explain what is going on? 1[0-2] is there is a fist digit that can be between 0 and 2? And then I don't understand |0?[1-9].
(1[0-2]|0?[1-9])
| separates the regex into two parts, where
1[0-2]
matches 10, 11 or 12, and
0?[1-9]
matches 1 to 9, with an optional leading 0.
I will explain by writing the regex in extended mode, which permits comments:
r = /
^ # match the beginning of the string
( # begin capture group 1
1 # match 1
[0-2] # match one of the characters 0,1,2
| # or
0? # optionally match a zero
[1-9] # match one of the characters between 1 and 9
) # end capture group 1
: # match a colon
( # begin capture group 2
[0-5] # match one of the characters between 0 and 5
[0-9] # match one of the characters between 0 and 9
) # end capture group 2
( # begin capture group 3
\s # match one whitespace character
[A|P] # match one of the characters A, | or P
M # match M
) # end capture group 3
\)? # optionally match a right parenthesis
$ # match the end of the string
/x # extended mode
As noticed by #Mischa, [A|P] is incorrect. It should be [AP]. That's because "|" is just an ordinary character when it's within a character class.
Also, I think the regex would be improved by moving \s out of capture group 3. We therefore might write:
r = /^(1[0-2]|0?[1-9]):([0-5][0-9])\s([AP]M)\)?$/
It could be used thusly:
result = "11:39 PM" =~ r
if result
puts "It's #{$2} minutes past #{$1}, #{ $3=='AM' ? 'anti' : 'post' } meridiem."
else
# raise exception?
end
#=> It's 39 minutes past 11, post meridiem.
In words, the revised regex reads as follows:
match the beginning of the string.
match "10", "11", "12", or one of the digits "1" to "9", optionally preceded by a zero, and save the match to capture group 1.
match a colon.
match a digit between "0" and "5", then a digit between "0" and "9", and save the two digits to capture group 2.
match a whitespace character.
match "A", or "P", followed by "M", and save the two characters to capture group 3.
optionally match a right parenthesis.
match the end of the string.