Not grab group match in Ruby Regex - ruby

I am trying to break down the following string:
"#command Text1 #command2 Text2"
in Ruby. I want to take out "Text1" and "Text2" in an array. To do this I am using the scan method and using this:
text.scan(/#* (.*?)(#|$)/)
However, when run, the script is pulling the # symbol in the middle as a separate match (presumably because the parenthesis are used in Ruby to indicate what string you want to pull out of the input):
Text1
#
Text2
My question is, how can I pull out Text1 and Text2 bearing in mind the expression needs to stop matching at both "#" and the end of a string?

If you want a non-capturing group use ?:
text.scan(/#* (.*?)(?:#|$)/)
As a sidenote, your regular expression looks like it might contain an error. Perhaps you meant this instead?
text.scan(/#\w+ (\w+)(?= #|$)/)
The difference is that your expression matches on " foo", which I guess is not intentional.

text.scan(/#* (.*?)(?:#|$)/)'

In your regex, you don't need the parentheses around '#|$'. The following will accomplish the same thing without the '#' being returned in a separate match group:
text.scan(/#* (.*?)[#\$]/)
Since you're looking only for a single character in that group, the square brackets will match any one character within them.

Here's how I'd do it:
text.scan(/#[^\s]* ([^#]*)/)

How does this regex look?
http://rubular.com/regexes/13264

Related

Regular expression to clean string

I'm struggling to figure out even where to start with this. I believe there is a regular expression to make this a fairly straight forward task. I want to trim off the extra asterisks in a string.
Example string:
test="AM*BE*3***LAST****~"
I would like it to trim asterisks off only the end that don't have repeating symbols. So the resulting value in the variable would be:
test="AM*BE*3***LAST~"
In Perl I was able to use this:
s/\*+~+/~/;
Is there something similar I can do in Ruby? I'm sure there is, just struggling to find it for some reason. Any help would be greatly appreciated.
You could use this regex:
/\*+~$/
Then use the gsub method to replace all matches with a tilde ~:
test = "AM*BE*3***LAST****~"
test.gsub!(/\*+~$/, '~')
# => "AM*BE*3***LAST~"
Or you could use this more flexible regex, which matches any amount of characters after * until end of line:
/\*+([^*])+$/
Then use the first capture group ($1) as the replacement:
test.gsub(/\*+([^*])+$/) { $1 }
Ruby's String class has the [] method, which lets us use regexp as a parameter. We can also assign to that, allowing us to do things like:
foo = "AM*BE*3***LAST****~"
foo[/\*+~+$/] = '~'
foo # => "AM*BE*3***LAST~"
That reuses the match pattern from your Perl search/replace. (I'm assuming you only want to match at the end of the line because of your examples. If it needs to be anywhere in the string remove the trailing $ from the pattern.)
You can use Rubular and try to test the regex and achieve what you need based on the references down the page.
http://rubular.com/

gsub same pattern from a string

I have big problems with figuring out how regex works.
I want this text:
This is an example\e[213] text\e[123] for demonstration
to become this:
This is an example text for demonstration.
So this means that I want to remove all strings that begin with \e[ and end with ]
I just cant find a proper regex for this.
My current regex looks like this:
/.*?(\\e\[.*\])?.*/ig
But it dont work. I appreciate every help.
You only need to do this:
txt.gsub(/\\e\[[^\]]*\]/i, "")
There is no need to match what is before or after with .*
The second problem is that you use .* to describe the content between brackets. Since the * quantifier is by default greedy, it will match all until the last closing bracket in the same line.
To prevent this behaviour a way is to use a negated character class in place of the dot that excludes the closing square brackets [^\]]. In this way you keep the advantage of using a greedy quantifier.
gsub can do the global matching for you.
re = /\\e\[.+?\]/i
'This is an example\e[213] text\e[123] for demonstration'.gsub re, ''
=> "This is an example text for demonstration"
You can make the search less greedy by using .+? in the regex
puts 'This is an example\e[213] text\e[123] for demonstration'.gsub(/\\e\[.+?\]/, '')
This is an example text for demonstration
=> nil

Ruby regular expression for this string?

I'm trying to get the first word in this string: Basic (11/17/2011 - 12/17/2011)
So ultimately wanting to get Basic out of that.
Other example string: Premium (11/22/2011 - 12/22/2011)
The format is always "Single-word followed by parenthesized date range" and I just want the single word.
Use this:
str = "Premium (11/22/2011 - 12/22/2011)"
str.split.first # => "Premium"
The split uses ' ' as default parameter if you don't specify any.
After that, get the first element with first
You don't need regexp for that, you can just use
str.split(' ')[0]
I know you found the answer you are needing but in case anyone stumbles on this in the future, in order to pull the needed value out of a large String of unknown length:
word_you_need = s.slice(/(\b[a-zA-Z]*\b \(\d+\/\d+\/\d+ - \d+\/\d+\/\d+\))/).split[0]
This regular expression will match the first word with out the trailing space
"^\w+ ??"
If you really want a regex you can get the first group after using this regex:
(\w*) .*
"Single-word followed by parenthesized date range"
'word' and 'parenthesized date range' should be better defined
as, by your requirement statement, they should be anchors and/or delimeters.
These raw regex's are just a general guess.
\w+(?=\s*\([^)]*\))
or
\w+(?=\s*\(\s*\d+(?:/\d+)*\s*-\s*\d+(?:/\d+)*\s*\))
Actually, all you need is:
s.split[0]
...or...
s.split.first

Ruby regex: match alternative expressions with quotes

In Ruby, I want to have a regex match either of two expressions with a single group in the result. I want the following results:
regex = /you tell me/
regex.match(%|My name is "Peter"|)[1]
=> "Peter"
regex.match(%|My name is 'Peter'|)[1]
=> "Peter"
Note that I want the 1st group to refer to just Peter with no quotes, and I want there to be exactly one group matched in either case. Just as an example, this would match the first case (only):
/^My name is "([^"]*)"$/
I'd like something similar to that. I happen to be using this for cucumber testing.
This regex might work for you
['"](\w+)['"]
It matches exactly one group. But it also allows unbalanced quotes, like 'Peter"
If you want to match only balanced quotes, then you can't do it with a single group (I'm afraid).
Anyhow, here's my take:
('|")(\w+)\1
It matches two groups and "Peter" is in the second one.
http://rubular.com/r/C78X0wwGej
(?=['"](\w+)['"])(?:"\1"|'\1')

Replacing partial regex matches in place with Ruby

I want to transform the following text
This is a ![foto](foto.jpeg), here is another ![foto](foto.png)
into
This is a ![foto](/folder1/foto.jpeg), here is another ![foto](/folder2/foto.png)
In other words I want to find all the image paths that are enclosed between brackets (the text is in Markdown syntax) and replace them with other paths. The string containing the new path is returned by a separate real_path function.
I would like to do this using String#gsub in its block version. Currently my code looks like this:
re = /!\[.*?\]\((.*?)\)/
rel_content = content.gsub(re) do |path|
real_path(path)
end
The problem with this regex is that it will match ![foto](foto.jpeg) instead of just foto.jpeg. I also tried other regexen like (?>\!\[.*?\]\()(.*?)(?>\)) but to no avail.
My current workaround is to split the path and reassemble it later.
Is there a Ruby regex that matches only the path inside the brackets and not all the contextual required characters?
Post-answers update: The main problem here is that Ruby's regexen have no way to specify zero-width lookbehinds. The most generic solution is to group what the part of regexp before and the one after the real matching part, i.e. /(pre)(matching-part)(post)/, and reconstruct the full string afterwards.
In this case the solution would be
re = /(!\[.*?\]\()(.*?)(\))/
rel_content = content.gsub(re) do
$1 + real_path($2) + $3
end
A quick solution (adjust as necessary):
s = 'This is a ![foto](foto.jpeg)'
s.sub!(/!(\[.*?\])\((.*?)\)/, '\1(/folder1/\2)' )
p s # This is a [foto](/folder1/foto.jpeg)
You can always do it in two steps - first extract the whole image expression out and then second replace the link:
str = "This is a ![foto](foto.jpeg), here is another ![foto](foto.png)"
str.gsub(/\!\[[^\]]*\]\(([^)]*)\)/) do |image|
image.gsub(/(?<=\()(.*)(?=\))/) do |link|
"/a/new/path/" + link
end
end
#=> "This is a ![foto](/a/new/path/foto.jpeg), here is another ![foto](/a/new/path/foto.png)"
I changed the first regex a bit, but you can use the same one you had before in its place. image is the image expression like ![foto](foto.jpeg), and link is just the path like foto.jpeg.
[EDIT] Clarification: Ruby does have lookbehinds (and they are used in my answer):
You can create lookbehinds with (?<=regex) for positive and (?<!regex) for negative, where regex is an arbitrary regex expression subject to the following condition. Regexp expressions in lookbehinds they have to be fixed width due to limitations on the regex implementation, which means that they can't include expressions with an unknown number of repetitions or alternations with different-width choices. If you try to do that, you'll get an error. (The restriction doesn't apply to lookaheads though).
In your case, the [foto] part has a variable width (foto can be any string) so it can't go into a lookbehind due to the above. However, lookbehind is exactly what we need since it's a zero-width match, and we take advantage of that in the second regex which only needs to worry about (fixed-length) compulsory open parentheses.
Obviously you can put real_path in from here, but I just wanted a test-able example.
I think that this approach is more flexible and more readable than reconstructing the string through the match group variables
In your block, use $1 to access the first capture group ($2 for the second and so on).
From the documentation:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
As a side note, some people think '\1' inappropriate for situations where an unconfirmed number of characters are matched. For example, if you want to match and modify the middle content, how can you protect the characters on both sides?
It's easy. Put a bracket around something else.
For example, I hope replace a-ruby-porgramming-book-531070.png to a-ruby-porgramming-book.png. Remove context between last "-" and last ".".
I can use /.*(-.*?)\./ match -531070. Now how should I replace it? Notice
everything else does not have a definite format.
The answer is to put brackets around something else, then protect them:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1.')
# => "a-ruby-porgramming-book.png"
If you want add something before matched content, you can use:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1-2019\2.')
# => "a-ruby-porgramming-book-2019-531070.png"

Resources