Regex group match if present - ruby

My input string is:
/234243/source_path/a/b/c.test
or something like:
/234243/source_path/a/b/c.test/check_w123
I want a regex to match substrings starting with source and check with the result like:
source: source_path/a/b/c.test/
check: check_w123
using a regex like /(?<source>source.*)(?<check>check.*)/ without ? in the last group.
My regex is:
/(?<source>source.*)(?<check>check.*)?/
My Result is:
source: source_path/a/b/c.test/check_w123
check: nil

Just turn .* inside the first source group into it's non-greedy form. And don't forget to add end of the line anchor.
(?<source>source.*?)(?<check>check.*)?$
DEMO

(?<source>source.*?(?!.*\/))(?<check>check.*)?
Try this.See demo.
https://regex101.com/r/rU8yP6/11

Related

Regular Expression find usage of word after "/" in URL

I am trying to parse through URLs using Ruby and return the URLs that match a word after the "/" in .com , .org , etc.
If I am trying to capture "questions" in a URL such as
https://stackoverflow.com/questions I also want to be able to capture https://stackoverflow.com/blah/questions. But I do not want to capture https://stackoverflow.com/queStioNs.
Currently my expression can match https://stackoverflow.com/questions but cannot match with "questions" after another "/", or 2 "/"s, etc.
The end of my regular expression is using \bquestions\.
I tried doing ([a-zA-Z]+\W{1}+\bjob\b|\bjob\b) but this only gets me URLs with /questions and /blah/questions but not /blah/bleh/questions.
What am I doing wrong and how do I match what I need?
You don't actually need a regex for this, you can instead use the URI module:
require 'uri'
urls = ['https://stackoverflow.com/blah/questions', 'https://stackoverflow.com/queStioNs']
urls.each do |url|
the_path = URI(url).path
puts the_path if the_path.include?'questions'
end
I don't know whether there is any simple way around, here is my solution:
regexp = '^(https|http)?:\/\/[\w]+\.(com|org|edu)(\/{1}[a-z]+)*$'
group_length = "https://stackoverflow.com/blah/questions".match(regexp).length
"https://stackoverflow.com/blah/questions".match(regexp)[group_length - 1].gsub("/","")
It will return 'questions'.
Update as per you comments below:
use [\S]*(\/questions){1}$
Hope it helps :)

How do I use gsub to search and replace using a regex?

I want to filter tags out of a description string, and want to make them into anchor tags. I am not able to return the value of the tag.
My input is:
a = "this is a sample #tag and the string is having a #second tag too"
My output should be:
a = "this is a sample #tag and the string is having a #second tag too"
So far I am able to do some minor stuff but I am not able to achive the final output. This pattern:
a.gsub(/#\S+/i, "<a href='/tags/\0'>\0</a>")
returns:
"this is a sample <a href='/tags/\u0000'>\u0000</a> and the string is having a <a href='/tags/\u0000'>\u0000</a> tag too"
What do I need to do differently?
You can do it like this:
a.gsub(/#(\S+)/, '\0')
The reason why your replacement doesn't work is that you must use double escape when you are between double quotes:
a.gsub(/#(\S+)/, "<a href='/tags/\\1'>\\0</a>")
Note that the /i modifier is not needed here.
You need to give gsub a block if you want to do something with the match from the regex:
a.gsub(/#(\S+)/i) { "<a href='/tags/#{$1}'>##{$1}</a>" }
$1 is a global variable that Ruby automatically fills with the first capture block in the matched string.
Try this:
a.gsub(/(?<a>#\w+)/, '\k<a>')

Ruby: replace a given URL in an HTML string

In Ruby, I want to replace a given URL in an HTML string.
Here is my unsuccessful attempt:
escaped_url = url.gsub(/\//,"\/").gsub(/\./,"\.").gsub(/\?/,"\?")
path_regexp = Regexp.new(escaped_url)
html.gsub!(path_regexp, new_url)
Note: url is actually a Google Chart request URL I wrote, which will not have more special characters than /?|.=%:
The gsub method can take a string or a Regexp as its first argument, same goes for gsub!. For example:
>> 'here is some ..text.. xxtextxx'.gsub('..text..', 'pancakes')
=> "here is some pancakes xxtextxx"
So you don't need to bother with a regex or escaping at all, just do a straight string replacement:
html.gsub!(url, new_url)
Or better, use an HTML parser to find the particular node you're looking for and do a simple attribute assignment.
I think you're looking for something like:
path_regexp = Regexp.new(Regexp.escape(url))

Use Xpath to find the appropriate element based on the element value

I have the following xml snippet
<ZMARA01 SEGMENT="1">
<CHARACTERISTICS_01>X,001,COLOR_ATTRIBUTE_FR,BRUN ÉCORCE,TMBR,French C</CHARACTERISTICS_01>
<CHARACTERISTICS_02>X,001,COLOR_ATTRIBUTE,Timber Brown,TMBR,Color Attr</CHARACTERISTICS_02>
</ZMARA01>
I am looking for an xpath expression that will match based on COLOR_ATTRIBUTE. It will not always be in CHARACTERISTIC_02. It could be CHARACTERISTIC_XX. Also I don't want to match COLOR_ATTRIBUTE_FR. I have been using this:
Transaction.Input_XML{/ZMAT/IDOC/E1MARAM/ZMARA01/*[starts-with(local-name(.), 'CHARACTERISTIC_')][contains(.,'COLOR_ATTRIBUTE')]}
This gets me mostly there but it matches both COLOR_ATTRIBUTE and COLOR_ATTRIBUTE_FR
Use:
contains(concat(',', ., ','), ',COLOR_ATTRIBUTE,')
This first surrounds the string value of the context node with commas, then simply tests if the so cunstructed string contains ',COLOR_ATTRIBUTE,'.
Thus we treat all cases (pattern at the start of the string, pattern at the end of the string and pattern neither at the start or at the end) in the same single way.
If COLOR_ATTRIBUTE is guaranteed not to be in the first or last position, you could use [contains(.,',COLOR_ATTRIBUTE,')], otherwise you could use something like [contains(.,'COLOR_ATTRIBUTE') and not contains(.,'COLOR_ATTRIBUTE_FR')].

Ruby regex match specific string with special conditions

I'm currently trying to parse a document into tokens with the help of regex.
Currently I'm trying to match the keywords in the document. For example I have the following document:
Func test()
Return blablaFuncblabla
EndFunc
The keywords that needs to be matched is Func, Return and EndFunc.
I've comed up with the following regex: (\s|^)(Func)(\s|$) to match the Func keyword, but it doesn't work exactly like I want, the whitespaces are matched as well!
How can I match it without capturing the whitespaces?
(?:\s|^)(Func)(?:\s|$)
?: makes a group non-capturing.

Resources