Ruby Regex: How to match pattern that follows another pattern? - ruby

I have ID numbers that should come after the text ID: so my file consists of
ID: A1234
ID: A1235
ID: A1236
etc. I want to match /[A-Z]*[0-9]+/ but only if it comes after the characters ID:. How would I add that to the regular expression but not make it return ID: as part of the result? I just want it to match the regex that follows ID:, because at the end of the file I have numbers and it's returning them, but those aren't ID numbers.

/ID:\s*([A-Z]*[0-9]+)/
the parentheses capture what's inside the parentheses, and then you can refer to it using backreferences. If you post some code of how you're using the regex, I can try to add some more detail to show you how.

Related

Extracting a substring from a string using `Regexp.new`

I have a string like this:
var = "Renewal Quote RQ00041233 (Payment Pending) Policy R38A014294-1"
I have to extract "Payment Pending" from that string using only the information included in another single string.
The following:
var[/\((.*)\)/, 1]
will extract what I want. I can include the string representation of the regex in the string to be given, and construct the regular expression from it using Regexp.new, but I have no way to achieve the information 1 used as the second argument of [].
Without the second argument 1,
regex_string = '\((.*)\)'
var[Regexp.new(regex_string)]
fetches the string "(Payment Pending)"instead of the expected "Payment Pending".
Can someone help me?
Not sure what you are trying to do, but you can get rid of capturing groups using a different regex:
var[/(?<=\().*(?=\))/]
# => "Payment Pending"
or
var[Regexp.new('(?<=\().*(?=\))')]
# => "Payment Pending"
/\((.*)\)/ is just shorthand for Regexp.new('\((.*)\)').
String#[] takes a regex and a capture group as two separate arguments. var[/\((.*)\)/, 1] is var[Regex, 1].
The important thing to realize is 1 is passed to var[], not the regex.
re = Regexp.new('\((.*)\)')
match = var[re, 1]
Note: you might want to require a named capture group rather than a numbered one. It's very easy to accidentally include an extra capture group in a regex.
Assuming there are no nested parenthesis in the string, one way to do that without using a regular expression is as follows.
instance_eval "var[(i=var.index('(')+1)..var.index(')',i)-1]"
#=> "Payment Pending"
See String#index, particularly the reference to the optional second argument, "offset".

Ruby regex | Match enclosing brackets

I'm trying to create a regex pattern to match particular sets of text in my string.
Let's assume this is the string ^foo{bar}#Something_Else
I would like to match ^foo{} skipping entirely the content of the brackets.
Until now i figured out how to get all everything with this regex here \^(\w)\{([^\}]+)} but i really don't know how to ignore the text inside the curly brackets.
Anyone has an idea? Thanks.
Update
This is the final solution:
puts script.gsub(/(\^\w+)\{([^}]+)(})/, '[BEFORE]\2[AFTER]')
Though I'd prefer this with fewer groups:
puts script.gsub(/\^\w+\{([^}]+)}/, '[BEFORE]\1[AFTER]')
Original answer
I need to replace the ^foo{} part with something else
Here is a way to do it with gsub:
s = "^foo{bar}#Something_Else"
puts s.gsub(/(.*)\^\w+\{([^}]+)}(.*)/, '\1SOMETHING ELSE\2\3')
See demo
The technique is the same: you capture the text you want to keep and just match text you want to delete, and use backreferences to restore the text you captured.
The regex matches:
(.*) - matches and captures into Group 2 as much text as possible from the start
\^\w+\{ - matches ^, 1 or more word characters, {
([^}]+) - matches and captures into Group 2 1 or more symbols other than }
} - matches the }
(.*) - and finally match and capture into Group 3 the rest of the string.
If you mean to match ^foo{} by a single match against a regex, it is impossible. A regex match only matches a substring of the original string. Since ^foo{} is not a substring of ^foo{bar}#Something_Else, you cannot match that with a single match.

Capturing groups don't work as expected with Ruby scan method

I need to get an array of floats (both positive and negative) from the multiline string. E.g.: -45.124, 1124.325 etc
Here's what I do:
text.scan(/(\+|\-)?\d+(\.\d+)?/)
Although it works fine on regex101 (capturing group 0 matches everything I need), it doesn't work in Ruby code.
Any ideas why it's happening and how I can improve that?
See scan documentation:
If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.
You should remove capturing groups (if they are redundant), or make them non-capturing (if you just need to group a sequence of patterns to be able to quantify them), or use extra code/group in case a capturing group cannot be avoided.
In this scenario, the capturing group is used to quantifiy a pattern sequence, thus all you need to do is convert the capturing group into a non-capturing one by replacing all unescaped ( with (?: (there is only one occurrence here):
text = " -45.124, 1124.325"
puts text.scan(/[+-]?\d+(?:\.\d+)?/)
See demo, output:
-45.124
1124.325
Well, if you need to also match floats like .04 you can use [+-]?\d*\.?\d+. See another demo
There are cases when you cannot get rid of a capturing group, e.g. when the regex contains a backreference to a capturing group. In that case, you may either a) declare a variable to store all matches and collect them all inside a scan block, or b) enclose the whole pattern with another capturing group and map the results to get the first item from each match, c) you may use a gsub with just a regex as a single argument to return an Enumerator, with .to_a to get the array of matches:
text = "11234566666678"
# Variant a:
results = []
text.scan(/(\d)\1+/) { results << Regexp.last_match(0) }
p results # => ["11", "666666"]
# Variant b:
p text.scan(/((\d)\2+)/).map(&:first) # => ["11", "666666"]
# Variant c:
p text.gsub(/(\d)\1+/).to_a # => ["11", "666666"]
See this Ruby demo.
([+-]?\d+\.\d+)
assumes there is a leading digit before the decimal point
see demo at Rubular
If you need capture groups for a complex pattern match, but want the entire expression returned by .scan, this can work for you.
Suppose you want to get the image urls in this string perhaps from a markdown text with html image tags:
str = %(
Before
<img src="https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-1842z4b73d71">
After
<img src="https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-a235b84bf150.png">).strip
You may have a regular expression defined to match just the urls, and maybe used a Rubular example like this to build/test your Regexp
image_regex =
/https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b/
Now you don't need each sub-capture group, but just the the entire expression in your your .scan, you can just wrap the whole pattern inside a capture group and use it like this:
image_regex =
/(https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b)/
str.scan(image_regex).map(&:first)
=> ["https://user-images.githubusercontent.com/1949900/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png",
"https://user-images.githubusercontent.com/1949900/75255473-02bca700-57b0-11ea-852a-58424698cfb0.png"]
How does this actually work?
Since you have 3 capture groups, .scan alone will return an Array of arrays with, one for each capture:
str.scan(image_regex)
=> [["https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png", "user-", "githubusercontent"],
["https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-0714c8f76f68", nil, "zenhubusercontent"]]
Since we only want the 1st (outter) capture group, we can just call .map(&:first)

Ruby Regex Group Replacement

I am trying to perform regular expression matching and replacement on the same line in Ruby. I have some libraries that manipulate strings in Ruby and add special formatting characters to it. The formatting can be applied in any order. However, if I would like to change the string formatting, I want to keep some of the original formatting. I'm using regex for that. I have the regular expression matching correctly what I need:
mystring.gsub(/[(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))|(\e\[[3,9][0-8]m)]*Text/, 'New Text')
However, what I really want is the matching from the first grouping found in:
(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))
to be appended to New Text and replaced as opposed to just New Text. I'm trying to reference the match in the form of
mystring.gsub(/[(\e\[([1-9]|[1,2,4,5,6,7,8]{2}m))|(\e\[[3,9][0-8]m)]*Text/, '\1' + 'New Text')
but my understanding is that \1 only works when using \d or \k. Is there any way to reference that specific capturing group in my replacement string? Additionally, since I am using an asterik for the [], I know that this grouping could occur more than once. Therefore, I would like to have the last matching occurrence yielded.
My expected input/output with a sample is:
Input: "\e[1mHello there\e[34m\e[40mText\e[0m\e[0m\e[22m"
Output: "\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m"
Input: "\e[1mHello there\e[44m\e[34m\e[40mText\e[0m\e[0m\e[22m"
Output: "\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m"
So the last grouping is found and appended.
You can use the following regex with back-reference \\1 in the replacement:
reg = /(\\e\[(?:[0-9]{1,2}|[3,9][0-8])m)+Text/
mystring = "\\e[1mHello there\\e[34m\\e[40mText\\e[0m\\e[0m\\e[22m"
puts mystring.gsub(reg, '\\1New Text')
mystring = "\\e[1mHello there\\e[44m\\e[34m\\e[40mText\\e[0m\\e[0m\\e[22m"
puts mystring.gsub(reg, '\\1New Text')
Output of the IDEONE demo:
\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m
\e[1mHello there\e[40mNew Text\e[0m\e[0m\e[22m
Mind that your input has backslash \ that needs escaping in a regular string literal. To match it inside the regex, we use double slash, as we are looking for a literal backslash.

Rails 3 + regex - Replace part of a string, 1 occurrence

I'm new to Rails, and furthermore to regex. Been looking around, but I'm blocked...
I have a string like this :
Current: http://zs.domain.com/user_images/123456789/imageName_size.ext
Wanted: http://zs.domain.com/user_images/123456789/imageName.ext
I've managed to get to this :
http://a0.twimg.com/profile/1240267050/logo1.png
=> losing all occurrences with
picture.gsub!(/_([a-z0-9-]+)/, '')
or this :
http://a0.twimg.com/profile_images/1240267050/logo1
=> changing only the last occurrence, but losing the extension with
picture.gsub!(/_([a-z0-9-]+)**.(png|gif|jpg|jpeg)**/, '')
You're almost there. The second parameter is the string with which the match will be replaced, and you can re-use matched groups from the match. This will do the trick:
picture.gsub!(/_([a-z0-9-]+).(png|gif|jpg|jpeg)/, '.\2')
To accomodate for the additional conditions, as posed in the comment:
picture.gsub!(/_([^\/]+).(png|gif|jpg|jpeg)/, '.\2')
markijbema's answer will change the string
.../xxx_yyygifzzz/...,
into
.../xxxgifzzz/....
In order to avoid that, you can do this:
picture.gsub!(/_[^\/]+(?=\.[^\.]+\z)/, '')
(?=...) is understood as a context that follows the string, and will not be included in the match.
\z describes the end of the string, so this regexp is safe to use when some intermediate directory includes a string like above.

Resources