Ruby: how to perform lazy regex matching? [duplicate]

Ruby: how to perform lazy regex matching? [duplicate] - ruby

This question already has answers here:
Capturing groups don't work as expected with Ruby scan method
(3 answers)
Closed 5 years ago.
This is a following up question regarding Lazy (ungreedy) matching multiple groups using regex. I try to use the method but not very successful.
I grab a string from gitlab API and try to extract all the repos. The name of repo follows the format of "https://gitlab.example.com/foo/xxx.git".
So far, if I try this, it works OK.
gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\//)
But to add name wildcard is tricky, I use the method from the previous question:
gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\/(.*?)\.git\"/)
It says to use (.*?) for lazy matching, but it doesn't seem to work.
Thanks a lot for the help.

If we have the following string:
gitlab_str = "\"https://gitlab.example.com/foo/xxx.git\""
The following RegEx will return [["xxx"]], which is expected:
gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\/(.*?)\.git\"/)
Because you had the (.*?). Note the parenthesis, so only what's inside the parenthesis will be returned.
If you want to return the whole string matched, you can just remove the parenthesis:
gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\/.*?\.git\"/)
This will return:
["\"https://gitlab.example.com/foo/xxx.git\""]
It also works for multiple occurrences:
> gitlab_str = "\"https://gitlab.example.com/foo/xxx.git\" and \"https://gitlab.example.com/foo/yyy.git\""
> gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\/.*?\.git\"/)
=> ["\"https://gitlab.example.com/foo/xxx.git\"", "\"https://gitlab.example.com/foo/yyy.git\""]
Finally, if you want to remove the https:// part from the resulting matches, then just wrap everything but that part with () in the RegEx:
gitlab_str.scan(/\"https\:\/\/(gitlab\.example\.com\/foo\/.*?\.git)\"/)

Related

Regex with Ruby gsub [duplicate]

This question already has answers here:
Reference - What does this regex mean?
(1 answer)
How to back reference "inner" selections ( () ) in a regular expression?
(3 answers)
Closed 4 years ago.
My goal is to replace spaces and "/" with '-' from the input:
name = "chard / pinot noir"
to get:
"chard-pinot-noir"
My first attempt is:
name.gsub(/ \/\ /, '-') #=> "chart-pinot noir"
My second attempt is:
name.gsub(/\/\s+/, '-') #=> "chard -pinot noir"
My third attempt is:
name.gsub(/\s+/, '-') #=> "chard-/-pinot-noir"
This article helped. The first group checks for a forward slash /, and contains a break. The second portion replaces a forward slash with '-'. However, the space remains. I believe /s matches spaces, but I can't get it to work while simultaneously checking for forward slash.
My question is how can I get the desired result, shown above, with varying strings using either regex or a ruby helpers. Is there a preferred way? Pro / Con ?

If you don't know much about regex, you can do this way.
name = "chard / pinot noir"
(name.split() - ["/"]).join("-")
=> "chard-pinot-noir"
I think the best way is use with regex as #Sagar Pandya described above.
name.gsub(/[\/\s]+/,'-')
=> "chard-pinot-noir"

extracting link from text [duplicate]

This question already has answers here:
How to extract URLs from text
(6 answers)
Closed 8 years ago.
I am tring to extract a link from a phrase and it could be any where last, first or middle so I am usig this regex
link=text.scan(/(^| )(http.*)($| )/)
but the problem is when the link is in the middle it gets the whole phrase until the end.
What should I do ?

It's because .* next to http is greedy. I suggest you to use lookarounds.
link=text.scan(/(?<!\S)(http\S+)(?!\S)/)
OR
link=text.scan(/(?<!\S)(http\S+)/)
Example:
> "http://bar.com foo http://bar.com bar http://bar.com".scan(/(?<!\S)http\S+(?!\S)/)
=> ["http://bar.com", "http://bar.com", "http://bar.com"]
DEMO
(?<!\S) Negative lookbehind which asserts that the match won't be preceeded by a non-space character.
http\S+ Matches the substring http plus the following one or more non-space characters.

Do all the links you are trying to match follow some simple pattern? We'd need to see more context to confidently provide a good solution to your problem.
For example, the regex:
link=text.scan(/http.*\.com/)
...might be good enough for the job (this assumes all links end in ".com"), but I can't say for sure without more information.
Or again, for example, perhaps you could use something like:
link=text.scan(/http[a-z./:]*) - this assumes all links contain only lower case letters, ".", "/" and ":".

Ruby Regex not matching what it should be [duplicate]

This question already has answers here:
How to match all occurrences of a regular expression in Ruby
(6 answers)
Closed 8 years ago.
I've got the following regex:
regex = /\$([a-zA-Z.]+)/
and the following query
query = "Show me the PE Ratio for $AAPL, $TSLA"
Now regex.match(query) should capture AAPL and TSLA, but instead I get the following:
#<MatchData "$AAPL" 1:"AAPL">
which is completely wrong. Anyone know why?
Note that this regex works fine on Rubular: http://rubular.com/r/j0maQHnVFF

In Ruby the .match method will only return the first capture. You need it to return all captured matches, like the /g flag in PCRE
You can use the scan method. The scan method will either give you an array of all the matches or, if you pass it a block, pass each match to the block.
Code
query.scan(/\$([a-zA-Z.]+)/)

Fixed it, needed to use .scan instead of .match

How can one write this gsub regex match? [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Perfect way to write a gsub for a regex match?
I am trying to write a gsub for a regex match, but I imagine there's a more perfect way to do this .
My equation :
ref.gsub(ref.match(/settings(.*)/)[1], '')
So that I can take this settings/animals, and return just settings.
But what if settings is null? Than my [1] fails as expected.
So how can one write the above statement assuming that sometimes settings won't match ?

Use /(settings|)(.*)/, then first group will return you "settings" or empty string, if it is not present.
puts 'settings/123'.match(/(settings|)(.*)/)[1];
puts 'Xettings/123'.match(/(settings|)(.*)/)[1];

Ruby Regular expression to match a url [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Regex to match URL
regex to remove the webpage part of a url in ruby
I am in search of a regular expression for parsing all the urls in a file.
i tried many of the regular expression i got after googling but it fails in one or the other case . my idea is to write one which checks the presense of http or https at the begening and it will match everything untill it sees a blank space .
any ideas ?
NOTE : i dont need to parse the url but erase all the urls from a file or atleast make it unreadable .

The standard URI library provides URI.regexp which is the regular expression for url string.
require 'uri'
string.scan(URI.regexp)
http://ruby-doc.org/stdlib/libdoc/uri/rdoc/index.html

You can try this:
/https?:\/\/[\S]+/
The \S means any non-whitespace character.
(Rubular)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ruby: how to perform lazy regex matching? [duplicate] - ruby

Related

Regex with Ruby gsub [duplicate]

extracting link from text [duplicate]

Ruby Regex not matching what it should be [duplicate]

How can one write this gsub regex match? [duplicate]

Ruby Regular expression to match a url [duplicate]

Categories

Resources