regex selection - ruby

I have a string like this.
<p class='link'>try</p>bla bla</p>
I want to get only <p class='link'>try</p>
I have tried this.
/<p class='link'>[^<\/p>]+<\/p>/
But it doesn't work.
How can I can do this?
Thanks,

If that is your string, and you want the text between those p tags, then this should work...
/<p\sclass='link'>(.*?)<\/p>/
The reason yours is not working is because you are adding <\/p> to your not character range. It is not matching it literally, but checking for not each character individually.
Of course, it is mandatory I mention that there are better tools for parsing HTML fragments (such as a HTML parser.)

'/<p[^>]+>([^<]+)<\/p>/'
will get you "try"

It looks like you used this block: [^<\/p>]+ intending to match anything except for </p>. Unfortunately, that's not what it does. A [] block matches any of the characters inside. In your case, the /<p class='link'>[^<\/p>]+ part matched <p class='link'>try</, but it was not immediately followed by the expected </p>, so there was no match.
Alex's solution, to use a non-greedy qualifier is how I tend to approach this sort of problem.

I tried to make one less specific to any particular tag.
(<[^/]+?\s+[^>]*>[^>]*>)
this returns:
<p class='link'>try</p>

Related

Ruby regex section to match multiline

So this is my code
convert = contents.gsub(/\\s1(.*?)(\n\\r.*?)?\n((?s)\\ms3(.*?)\\p)/, 'replacement code')
in the first bit: \\s1(.*?)(\n\\r.*?)?\ni only want it to match a newline when i tell it there's one there. But when searching for \\ms3(.*?)\\p i want it to pick up any newlines that are there. Unfortunately it looks like Ruby doesn't support this (?s)prefix. Is there any way of doing this?
thanks
(.*?)==>([\s\S]*?)
You can use this instead of DOTALL modifier.
convert = contents.gsub(/\\s1(.*?)(\n\\r.*?)?\n((\n*)\\ms3(.*?)\\p)/, 'replacement code')
This will capture any(0+) newlines before "\ms3". If it's not what you meant, please, clarify what functionality do you expect from (?s)?

How to write Xpath expressions to distinguish between results?

I am new to xpath expression. Need help on a issue
Consider the following Document :
<tbody><tr>
<td>By <strong>Bec</strong></td>
<td><strong>Great Support</strong></td>
</tr></tbody>
In this I have to find the text inside tags separately.
Following is my xpath expression:
//tbody//td//strong/text();
It evaluates output as expected:
Bec
Great Support
How can I write xpath expressions to distinguish between the results i.e Becand Great Support
It's rather unclear what you're trying to do, but the following should succeed in selecting them separately:
//tbody/tr/td[1]/strong
and
//tbody/tr/td[2]/strong
Note that the text() you had at the end is most likely not needed in this case.
Not sure I understand 100%, but if you're trying to get the text of the first and the second strong tags, you can use position (1 based index)
//tbody/td[position()=1]/strong/text() //first text
//tbody/td[position()=2]/strong/text() //second text
This solution only applies to the current sample though, where your strong tags are inside either the first or second td tag.
Not sure this is what you're looking for... anyway, assuming you're asking to retrieve a node based on its text you can look up for text content by doing something like:
//tbody//td//strong/text()[.="Bec"]
PS
in [.=""] the dot is an alias for text() self::node() (thanks JLRishe for pointing out the mistake).

XPATH Search and replace full words

Strange I can't find this information online, does anyone know how to search and replace a full word for something else?
For example:
<div class="cell_user" alt="hotel Review - Young couple">
I want to remove: "hotel Review - " so I am left with:
"Young couple"
It seems to be something to do with fn:replace but I cannot find a single example anywhere of someone using it!
Thanks for any advice
If you have a constant prefix or infix string as in your example you can always use a string function such as substring-after():
"substring-after(//div[#class = 'cell_user']/#alt, 'hotel Review - ')"
You may want to tweak the beginning of the XPath expression a little bit to be more selective depending on your context. Also see xsl substring-after usage.
If you know that the relevant part is always after a hyphen and there is always only a single hyphen in your string you may even write
"substring-after(//div[#class = 'cell_user']/#alt, '- ')"

Regexp to convert tags (similar to BBCode) to HTML

I have set of strings with nested [quote] tags in following format:
[quote name="John"]Some text. [quote name="Piter"]Inner quote.[/quote][/quote]
As you see it is not like ordinary BBCode. So I can't find a suitable regexp for gsub in Ruby to convert them to strings like this:
<blockquote>
<p>Some text.
<blockquote>
<p>Inner quote.</p>
<small>Piter</small>
</blockquote>
</p>
<small>John</small>
</blockquote>
Can anybody please help me with such regexp?
I'm pretty sure that regexes fundamentally can't cope with nesting. What you could do is make it do a minimal match (e.g. only the inner quote levels), replace them, and then repeat as long as you have more matches. Once you've replaced a level it will just be HTML so will not match the regex any more.

Ruby Regular Expressions: Matching if substring doesn't exist

I'm having an issue trying to capture a group on a string:
"type=gist\nYou need to gist this though\nbecause its awesome\nright now\n</code></p>\n\n<script src=\"https://gist.github.com/3931634.js\"> </script>\n\n\n<p><code>Not code</code></p>\n"
My regex currently looks like this:
/<code>([\s\S]*)<\/code>/
My goal is to get everything in between the code brackets. Unfortunately, it's matching up to the 2nd closing code bracket Is there a way to match everything inside the code brackets up until the first occurrence of ending code bracket?
All repetition quantifiers in regular expressions are greedy by default (matching as many characters as possible). Make the * ungreedy, like this:
/<code>([\s\S]*?)<\/code>/
But please consider using a DOM parser instead. Regex is just not the right tool to parse HTML.
And I just learned that for going through multiple parts, the
String.scan( /<code>(.*?)<\/code>/ ){
puts $1
}
is a very nice way of going through all occurences of code - but yes, getting a proper parser is better...

Resources