Exclude a Substring using Regex in Java - negation

I want to take a string like src=' blah src='' blah and ignore the first src='
Expected results should be: blah src='' blah
I've tried: blah(?!:(src\\s*?=\\s*?))
I've seen other posts on here mentioning ^(...).*$ but I really don't understand how to apply that or really how to work with negation. The java tutorial mentions [^abc], but can that be used for a regular expression too not just characters? e.g. [^src\\s=]

Simple "src'(.*)" should do the work unless you have more complicated cases:
Pattern pattern = Pattern.compile( "src='(.*)");
Matcher matcher = pattern.match( "src=' blah src='' blah");
if ( matcher.find( )) {
String result = matcher.group(1); // Here is the extracted string just like you wanted.
}

Related

urlrewriting tuckey using Tuckey

My project (we have Spring 3) needs to rewrite URLs from the form
localhost:8888/testing/test.htm?param1=val1&paramN=valN
to
localhost:8888/nottestinganymore/test.htm?param1=val1&paramN=valN
My current rule looks like:
<from>^/testing/(.*/)?([a-z0-9]*.htm.*)$</from>
<to type="passthrough">/nottestinganymore/$2</to>
But my query parameters are being doubled, so I am getting param1=val1,val1 and paramN=valN,valN...please help! This stuff is a huge pain.
To edit/add, we have use-query-string=true on the project and I doubt I can change that.
The regular expression needs some tweaking. Tuckey uses the java regular expression engine unless specified otherwise. Hence the best way to deal with this is to write a small test case that will confirm if your regular expression is correct. For e.g. a slightly tweaked example of your regular expression with a test case is below.
#Test public void testRegularExpression()
{
String regexp = "/testing/(.*)([a-z0-9]*.htm.*)$";
String url = "localhost:8888/testing/test.htm?param1=val1&paramN=valN";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(url);
if (matcher.find())
{
System.out.println("$1 : " + matcher.group(1) );
System.out.println("$2 : " + matcher.group(2) );
}
}
The above will print the output as follows :
$1 : test
$2 : .htm?param1=val1&paramN=valN
You can modify the expression now to see what "groups" you want to extract from URL and then form the target URL.

How do I use gsub to search and replace using a regex?

I want to filter tags out of a description string, and want to make them into anchor tags. I am not able to return the value of the tag.
My input is:
a = "this is a sample #tag and the string is having a #second tag too"
My output should be:
a = "this is a sample #tag and the string is having a #second tag too"
So far I am able to do some minor stuff but I am not able to achive the final output. This pattern:
a.gsub(/#\S+/i, "<a href='/tags/\0'>\0</a>")
returns:
"this is a sample <a href='/tags/\u0000'>\u0000</a> and the string is having a <a href='/tags/\u0000'>\u0000</a> tag too"
What do I need to do differently?
You can do it like this:
a.gsub(/#(\S+)/, '\0')
The reason why your replacement doesn't work is that you must use double escape when you are between double quotes:
a.gsub(/#(\S+)/, "<a href='/tags/\\1'>\\0</a>")
Note that the /i modifier is not needed here.
You need to give gsub a block if you want to do something with the match from the regex:
a.gsub(/#(\S+)/i) { "<a href='/tags/#{$1}'>##{$1}</a>" }
$1 is a global variable that Ruby automatically fills with the first capture block in the matched string.
Try this:
a.gsub(/(?<a>#\w+)/, '\k<a>')

Ruby: replace a given URL in an HTML string

In Ruby, I want to replace a given URL in an HTML string.
Here is my unsuccessful attempt:
escaped_url = url.gsub(/\//,"\/").gsub(/\./,"\.").gsub(/\?/,"\?")
path_regexp = Regexp.new(escaped_url)
html.gsub!(path_regexp, new_url)
Note: url is actually a Google Chart request URL I wrote, which will not have more special characters than /?|.=%:
The gsub method can take a string or a Regexp as its first argument, same goes for gsub!. For example:
>> 'here is some ..text.. xxtextxx'.gsub('..text..', 'pancakes')
=> "here is some pancakes xxtextxx"
So you don't need to bother with a regex or escaping at all, just do a straight string replacement:
html.gsub!(url, new_url)
Or better, use an HTML parser to find the particular node you're looking for and do a simple attribute assignment.
I think you're looking for something like:
path_regexp = Regexp.new(Regexp.escape(url))

Is it possible to exclude some of the string used to match from Ruby regexp data?

I have a bunch of strings that look, for example, like this:
<option value="Spain">Spain</option>
And I want to extract the name of the country from inside.
The easiest way I could think of to do this in Ruby was to use a regular expression of this form:
country = line.match(/>(.+)</)
However, this returns >Spain<. So I did this:
line.match(/>(.+)</).to_s.gsub!(/<|>/,"")
Works well enough, but I'd be surprised if there's not a more elegant way to do this? It seems like using a regular expression to declare how to find the thing you want, without actually wanting the enclosing strings that were used to match it to be part of the data that gets returned.
Is there a conventional approach to this problem?
The right way to deal with that string is to use an HTML parser, for example:
country = Nokogiri::HTML('<option value="Spain">Spain</option>').at('option').text
And if you have several such strings, paste them together and use search:
html = '<option value="Spain">Spain</option><option value="Canada">Canada</option>'
countries = Nokogiri::HTML(html).search('option').map(&:text)
# ["Spain", "Canada"]
But if you must use a regex, then:
country = '<option value="Spain">Spain</option>'.match('>([^<]+)<')[1]
Keep in mind that match actually returns a MatchData object and MatchData#to_s:
Returns the entire matched string.
But you can access the captured groups using MatchData#[]. And if you don't like counting, you could use a named capture group as well:
country = '<option value="Spain">Spain</option>'.match('>(?<name>[^<]+)<')['name']

Ruby regex match specific string with special conditions

I'm currently trying to parse a document into tokens with the help of regex.
Currently I'm trying to match the keywords in the document. For example I have the following document:
Func test()
Return blablaFuncblabla
EndFunc
The keywords that needs to be matched is Func, Return and EndFunc.
I've comed up with the following regex: (\s|^)(Func)(\s|$) to match the Func keyword, but it doesn't work exactly like I want, the whitespaces are matched as well!
How can I match it without capturing the whitespaces?
(?:\s|^)(Func)(?:\s|$)
?: makes a group non-capturing.

Resources