Regex.union doesn't match multiline string - ruby

I'd like to match this content:
[default]
aws_secret_access_key = 69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I
aws_access_key_id = K3YD33nX3u3jeTHWaSnpUw3S66SHpD5cSF
against this union:
aws_configuration_file_regex = Regexp.union [
/aws_access_key_id\s*=\s*(?<aws_access_key_id>.+)/,
/aws_secret_access_key\s*=\s*(?<aws_secret_access_key>.+)/
]
but it doesn't work as expected as only the first match present in result:
=> #<MatchData
"aws_secret_access_key = 69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I"
aws_secret_access_key:"69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I"
aws_access_key_id:nil>
How to fix that? I'd like to keep the code as short as possible, i.e. no function defines should be present.

The problem is that Regexp.union effectively means match one or the other. It might be best to first match one, then the other.
If you still want to match both in one go (as I see you have differently named groups, you have to concatenate them instead and add a multiline flag:
# note the .*? for concatenation and the //m flag
r = /aws_secret_access_key\s*=\s*(?<aws_secret_access_key>.+).*?aws_access_key_id\s*=\s*(?<aws_access_key_id>.+)/m
foo.match(r) # =>
# #<MatchData
# "aws_secret_access_key = 69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I\n aws_access_key_id = K3YD33nX3u3jeTHWaSnpUw3S66SHpD5cSF"
# aws_secret_access_key:"69bbTs4LcLIRC5zEQxNxEF6FQJI92pdPJe8HHhoEzDnmtS6I\n "
# aws_access_key_id:"K3YD33nX3u3jeTHWaSnpUw3S66SHpD5cSF">
However, note that this is not the most comprehensive code in the word.

You want to do something like this:
((aws_access_key_id\s*=\s*(?<aws_access_key_id>.+))|(aws_secret_access_key\s*=\s*(?<aws_secret_access_key>.+)))*
Seems to work on http://rubular.com/
This will collect all the keys, and make a bunch of matches, so you have to sort through that. Highly recommend you try on rubular.com, and get what is happening, and even tune up the pattern.

Related

Capturing groups don't work as expected with Ruby scan method

I need to get an array of floats (both positive and negative) from the multiline string. E.g.: -45.124, 1124.325 etc
Here's what I do:
text.scan(/(\+|\-)?\d+(\.\d+)?/)
Although it works fine on regex101 (capturing group 0 matches everything I need), it doesn't work in Ruby code.
Any ideas why it's happening and how I can improve that?
See scan documentation:
If the pattern contains no groups, each individual result consists of the matched string, $&. If the pattern contains groups, each individual result is itself an array containing one entry per group.
You should remove capturing groups (if they are redundant), or make them non-capturing (if you just need to group a sequence of patterns to be able to quantify them), or use extra code/group in case a capturing group cannot be avoided.
In this scenario, the capturing group is used to quantifiy a pattern sequence, thus all you need to do is convert the capturing group into a non-capturing one by replacing all unescaped ( with (?: (there is only one occurrence here):
text = " -45.124, 1124.325"
puts text.scan(/[+-]?\d+(?:\.\d+)?/)
See demo, output:
-45.124
1124.325
Well, if you need to also match floats like .04 you can use [+-]?\d*\.?\d+. See another demo
There are cases when you cannot get rid of a capturing group, e.g. when the regex contains a backreference to a capturing group. In that case, you may either a) declare a variable to store all matches and collect them all inside a scan block, or b) enclose the whole pattern with another capturing group and map the results to get the first item from each match, c) you may use a gsub with just a regex as a single argument to return an Enumerator, with .to_a to get the array of matches:
text = "11234566666678"
# Variant a:
results = []
text.scan(/(\d)\1+/) { results << Regexp.last_match(0) }
p results # => ["11", "666666"]
# Variant b:
p text.scan(/((\d)\2+)/).map(&:first) # => ["11", "666666"]
# Variant c:
p text.gsub(/(\d)\1+/).to_a # => ["11", "666666"]
See this Ruby demo.
([+-]?\d+\.\d+)
assumes there is a leading digit before the decimal point
see demo at Rubular
If you need capture groups for a complex pattern match, but want the entire expression returned by .scan, this can work for you.
Suppose you want to get the image urls in this string perhaps from a markdown text with html image tags:
str = %(
Before
<img src="https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-1842z4b73d71">
After
<img src="https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-a235b84bf150.png">).strip
You may have a regular expression defined to match just the urls, and maybe used a Rubular example like this to build/test your Regexp
image_regex =
/https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b/
Now you don't need each sub-capture group, but just the the entire expression in your your .scan, you can just wrap the whole pattern inside a capture group and use it like this:
image_regex =
/(https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b)/
str.scan(image_regex).map(&:first)
=> ["https://user-images.githubusercontent.com/1949900/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png",
"https://user-images.githubusercontent.com/1949900/75255473-02bca700-57b0-11ea-852a-58424698cfb0.png"]
How does this actually work?
Since you have 3 capture groups, .scan alone will return an Array of arrays with, one for each capture:
str.scan(image_regex)
=> [["https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png", "user-", "githubusercontent"],
["https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-0714c8f76f68", nil, "zenhubusercontent"]]
Since we only want the 1st (outter) capture group, we can just call .map(&:first)

Method gsub does not work as expected

I want to change "#" to "\40" in a string. But am not able to do so.
a = "srikanth#in.com"
a.gsub("#", "\40")
# => "srikanth in.com"
It's changing \40 with space. Any idea how to implement this?
An other solution
puts a.gsub("#") {"\\40"}
# => srikanth\40in.com
\\40 doesn't work because it refers to a capture group. From the docs:
If replacement is a String it will be substituted for the matched
text. It may contain back-references to the pattern’s capture groups
of the form \\d, where d is a group number ...
You can use gsub's hash syntax instead:
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
Example:
a.gsub('#', '#' => '\\40')
#=> "srikanth\\40in.com"
backslashes have a special meaning in the second parameter of gsub. They refer to a possibly matched regex groups. I tried escaping, but couldn't get it to work. It works this way, though:
s = "srikanth#in.com"
s['#'] = '\\40'
s # => "srikanth\\40in.com"

Combine Regexp and set of values (hash/array) to compare if a string matches in ruby

I have the following pattern to check:
"MODEL_NAME"-"ID"."FORMAT_TYPE"
where, for example:
MODEL_NAME = [:product, :brand]
FORMAT_TYPE = [:jpg, :png]
First I wanted to check if the regexp is something like:
/^\w+-\d+.\w+$/
and I have also to check if the part of my string is part of my arrays. I want something more flexible than:
/^(product|brand)-\d+.(jpg|png)$/
which I could manage through my arrays. What is a good solution to do it?
/^(#{MODEL_NAME.join '|'})-\d+\.(#{FORMAT_TYPE.join '|'})$/
# => /^(product|brand)-\d+\.(jpg|png)$/

Ruby Regexp group matching, assign variables on 1 line

I'm currently trying to rexp a string into multiple variables. Example string:
ryan_string = "RyanOnRails: This is a test"
I've matched it with this regexp, with 3 groups:
ryan_group = ryan_string.scan(/(^.*)(:)(.*)/i)
Now to access each group I have to do something like this:
ryan_group[0][0] (first group) RyanOnRails
ryan_group[0][1] (second group) :
ryan_group[0][2] (third group) This is a test
This seems pretty ridiculous and it feels like I'm doing something wrong. I would be expect to be able to do something like this:
g1, g2, g3 = ryan_string.scan(/(^.*)(:)(.*)/i)
Is this possible? Or is there a better way than how I'm doing it?
You don't want scan for this, as it makes little sense. You can use String#match which will return a MatchData object, you can then call #captures to return an Array of captures. Something like this:
#!/usr/bin/env ruby
string = "RyanOnRails: This is a test"
one, two, three = string.match(/(^.*)(:)(.*)/i).captures
p one #=> "RyanOnRails"
p two #=> ":"
p three #=> " This is a test"
Be aware that if no match is found, String#match will return nil, so something like this might work better:
if match = string.match(/(^.*)(:)(.*)/i)
one, two, three = match.captures
end
Although scan does make little sense for this. It does still do the job, you just need to flatten the returned Array first. one, two, three = string.scan(/(^.*)(:)(.*)/i).flatten
You could use Match or =~ instead which would give you a single match and you could either access the match data the same way or just use the special match variables $1, $2, $3
Something like:
if ryan_string =~ /(^.*)(:)(.*)/i
first = $1
third = $3
end
You can name your captured matches
string = "RyanOnRails: This is a test"
/(?<one>^.*)(?<two>:)(?<three>.*)/i =~ string
puts one, two, three
It doesn't work if you reverse the order of string and the regex.
You have to decide whether it is a good idea, but ruby regexp can (automagically) define local variables for you!
I am not yet sure whether this feature is awesome or just totally crazy, but your regex can define local variables.
ryan_string = "RyanOnRails: This is a test"
/^(?<webframework>.*)(?<colon>:)(?<rest>)/ =~ ryan_string
# This defined three variables for you. Crazy, but true.
webframework # => "RyanOnRails"
puts "W: #{webframework} , C: #{colon}, R: #{rest}"
(Take a look at http://ruby-doc.org/core-2.1.1/Regexp.html , search for "local variable").
Note:
As pointed out in a comment, I see that there is a similar and earlier answer to this question by #toonsend (https://stackoverflow.com/a/21412455). I do not think I was "stealing", but if you want to be fair with praises and honor the first answer, feel free :) I hope no animals were harmed.
scan() will find all non-overlapping matches of the regex in your string, so instead of returning an array of your groups like you seem to be expecting, it is returning an array of arrays.
You are probably better off using match(), and then getting the array of captures using MatchData#captures:
g1, g2, g3 = ryan_string.match(/(^.*)(:)(.*)/i).captures
However you could also do this with scan() if you wanted to:
g1, g2, g3 = ryan_string.scan(/(^.*)(:)(.*)/i)[0]

Regex to leave desired string remaining and others removed

In Ruby, what regex will strip out all but a desired string if present in the containing string? I know about /[^abc]/ for characters, but what about strings?
Say I have the string "group=4&type_ids[]=2&type_ids[]=7&saved=1" and want to retain the pattern group=\d, if it is present in the string using only a regex?
Currently, I am splitting on & and then doing a select with matching condition =~ /group=\d/ on the resulting enumerable collection. It works fine, but I'd like to know the regex to do this more directly.
Simply:
part = str[/group=\d+/]
If you want only the numbers, then:
group_str = str[/group=(\d+)/,1]
If you want only the numbers as an integer, then:
group_num = str[/group=(\d+)/,1].to_i
Warning: String#[] will return nil if no match occurs, and blindly calling nil.to_i always returns 0.
You can try:
$str =~ s/.*(group=\d+).*/\1/;
Typically I wouldn't really worry too much about a complex regex. Simply break the string down into smaller parts and it becomes easier:
asdf = "group=4&type_ids[]=2&type_ids[]=7&saved=1"
asdf.split('&').select{ |q| q['group'] } # => ["group=4"]
Otherwise, you can use regex a bunch of different ways. Here's two ways I tend to use:
asdf.scan(/group=\d+/) # => ["group=4"]
asdf[/(group=\d+)/, 1] # => "group=4"
Try:
str.match(/group=\d+/)[0]

Resources