Regex with Ruby gsub [duplicate] - ruby

This question already has answers here:
Reference - What does this regex mean?
(1 answer)
How to back reference "inner" selections ( () ) in a regular expression?
(3 answers)
Closed 4 years ago.
My goal is to replace spaces and "/" with '-' from the input:
name = "chard / pinot noir"
to get:
"chard-pinot-noir"
My first attempt is:
name.gsub(/ \/\ /, '-') #=> "chart-pinot noir"
My second attempt is:
name.gsub(/\/\s+/, '-') #=> "chard -pinot noir"
My third attempt is:
name.gsub(/\s+/, '-') #=> "chard-/-pinot-noir"
This article helped. The first group checks for a forward slash /, and contains a break. The second portion replaces a forward slash with '-'. However, the space remains. I believe /s matches spaces, but I can't get it to work while simultaneously checking for forward slash.
My question is how can I get the desired result, shown above, with varying strings using either regex or a ruby helpers. Is there a preferred way? Pro / Con ?

If you don't know much about regex, you can do this way.
name = "chard / pinot noir"
(name.split() - ["/"]).join("-")
=> "chard-pinot-noir"
I think the best way is use with regex as #Sagar Pandya described above.
name.gsub(/[\/\s]+/,'-')
=> "chard-pinot-noir"

Related

Ruby: how to perform lazy regex matching? [duplicate]

This question already has answers here:
Capturing groups don't work as expected with Ruby scan method
(3 answers)
Closed 5 years ago.
This is a following up question regarding Lazy (ungreedy) matching multiple groups using regex. I try to use the method but not very successful.
I grab a string from gitlab API and try to extract all the repos. The name of repo follows the format of "https://gitlab.example.com/foo/xxx.git".
So far, if I try this, it works OK.
gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\//)
But to add name wildcard is tricky, I use the method from the previous question:
gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\/(.*?)\.git\"/)
It says to use (.*?) for lazy matching, but it doesn't seem to work.
Thanks a lot for the help.
If we have the following string:
gitlab_str = "\"https://gitlab.example.com/foo/xxx.git\""
The following RegEx will return [["xxx"]], which is expected:
gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\/(.*?)\.git\"/)
Because you had the (.*?). Note the parenthesis, so only what's inside the parenthesis will be returned.
If you want to return the whole string matched, you can just remove the parenthesis:
gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\/.*?\.git\"/)
This will return:
["\"https://gitlab.example.com/foo/xxx.git\""]
It also works for multiple occurrences:
> gitlab_str = "\"https://gitlab.example.com/foo/xxx.git\" and \"https://gitlab.example.com/foo/yyy.git\""
> gitlab_str.scan(/\"https\:\/\/gitlab\.example\.com\/foo\/.*?\.git\"/)
=> ["\"https://gitlab.example.com/foo/xxx.git\"", "\"https://gitlab.example.com/foo/yyy.git\""]
Finally, if you want to remove the https:// part from the resulting matches, then just wrap everything but that part with () in the RegEx:
gitlab_str.scan(/\"https\:\/\/(gitlab\.example\.com\/foo\/.*?\.git)\"/)

Ruby regex include all alphabets, numbers and / [duplicate]

This question already has answers here:
How to escape all characters with special meaning in Regex
(2 answers)
Closed 7 years ago.
I know this might be asked time and again. But I'm really stuck with this. I've got it to work for including numbers and alphabets but I have no idea on how to include "/" also.
This is what I have,
name.gsub!(/[^0-9A-Za-z]/, '')
So if name is "Cool Stuff *(#/" it returns "CoolStuff". I'd just like it to return "CoolStuff/".
The / is a special character that must be 'escaped' (meaning to take the / literally, and not for a switch or special meaning). So you have:
name.gsub!(/[^0-9A-Za-z]/, '')
But also realize you could shorten your RegEx statement by making it case insensitive by adding an 'i' after the ending slash and therefore allowing you to drop either the [A-Z] or the [a-z] part. So you could have instead:
name.gsub!(/[^0-9A-Z\/]/i, '')
Hope this helps!

extracting link from text [duplicate]

This question already has answers here:
How to extract URLs from text
(6 answers)
Closed 8 years ago.
I am tring to extract a link from a phrase and it could be any where last, first or middle so I am usig this regex
link=text.scan(/(^| )(http.*)($| )/)
but the problem is when the link is in the middle it gets the whole phrase until the end.
What should I do ?
It's because .* next to http is greedy. I suggest you to use lookarounds.
link=text.scan(/(?<!\S)(http\S+)(?!\S)/)
OR
link=text.scan(/(?<!\S)(http\S+)/)
Example:
> "http://bar.com foo http://bar.com bar http://bar.com".scan(/(?<!\S)http\S+(?!\S)/)
=> ["http://bar.com", "http://bar.com", "http://bar.com"]
DEMO
(?<!\S) Negative lookbehind which asserts that the match won't be preceeded by a non-space character.
http\S+ Matches the substring http plus the following one or more non-space characters.
Do all the links you are trying to match follow some simple pattern? We'd need to see more context to confidently provide a good solution to your problem.
For example, the regex:
link=text.scan(/http.*\.com/)
...might be good enough for the job (this assumes all links end in ".com"), but I can't say for sure without more information.
Or again, for example, perhaps you could use something like:
link=text.scan(/http[a-z./:]*) - this assumes all links contain only lower case letters, ".", "/" and ":".

Ruby Regex not matching what it should be [duplicate]

This question already has answers here:
How to match all occurrences of a regular expression in Ruby
(6 answers)
Closed 8 years ago.
I've got the following regex:
regex = /\$([a-zA-Z.]+)/
and the following query
query = "Show me the PE Ratio for $AAPL, $TSLA"
Now regex.match(query) should capture AAPL and TSLA, but instead I get the following:
#<MatchData "$AAPL" 1:"AAPL">
which is completely wrong. Anyone know why?
Note that this regex works fine on Rubular: http://rubular.com/r/j0maQHnVFF
In Ruby the .match method will only return the first capture. You need it to return all captured matches, like the /g flag in PCRE
You can use the scan method. The scan method will either give you an array of all the matches or, if you pass it a block, pass each match to the block.
Code
query.scan(/\$([a-zA-Z.]+)/)
Fixed it, needed to use .scan instead of .match

What does the regex (\d{3})(?=\d) mean? [duplicate]

This question already has answers here:
Explanation of Lookaheads in This Regular Expression
(5 answers)
Closed 9 years ago.
I am new to regex, and I am trying to break down the regex so I can understand it better:
/(\d{3})(?=\d)/
I understand that (\d{3}) is capturing 3 digits, but unsure what the second portion is trying to capture.
What does ?= mean?
(?=\d) is a positive lookahead it means match & capture 3 digits that are followed by a digit.
So something like this will happen:
1234 => capture 123
123a => no match
(?=pat) - Positive lookahead assertion: ensures that the following characters match pat, but doesn't include those characters in the matched text
/(\d{3})(?=\d)/ - Here (\d{3}) is capturing 3 digits, followed by a digit,but last digit not to be captured in that group.
Look here, here and here
Hope this will help!

Resources