XPATH Search and replace full words - xpath

Strange I can't find this information online, does anyone know how to search and replace a full word for something else?
For example:
<div class="cell_user" alt="hotel Review - Young couple">
I want to remove: "hotel Review - " so I am left with:
"Young couple"
It seems to be something to do with fn:replace but I cannot find a single example anywhere of someone using it!
Thanks for any advice

If you have a constant prefix or infix string as in your example you can always use a string function such as substring-after():
"substring-after(//div[#class = 'cell_user']/#alt, 'hotel Review - ')"
You may want to tweak the beginning of the XPath expression a little bit to be more selective depending on your context. Also see xsl substring-after usage.
If you know that the relevant part is always after a hyphen and there is always only a single hyphen in your string you may even write
"substring-after(//div[#class = 'cell_user']/#alt, '- ')"

Related

Weird thing in regex

When I was practice in rubular.com, I've be trying to match with a regular expression that checks if a word starts with a non-consonant. My approach it's check cases how that begins with a non-letter, or starts with a number or underscore, or checks the empty string
I've founded a strange behaviour:
My regex /^[aeiou_0-9\W]|^$/i match the k and s consonants!. I don't understand why.
Any ideas?
A link to example -> http://rubular.com/r/0zt0VPmcwr
This is very funny because you have stumbled across a bug specifically for just the letters k and s when using \W with /i (it's like a perfect storm).
Here is the link that explains the bug: https://bugs.ruby-lang.org/issues/4044
Perhaps this was patched in a later version of ruby, but if you don't feel like going through the hassle of going to a new version of ruby, then you can just explicitly make an inverted character class of all the consonants:
/^[^bcdfghjklmnpqrstvwxyz]|^$/i
Here is the rubular link: http://rubular.com/r/URgsWP3suQ
Edit:
So, something else I noticed about your regex is that your regex (and the regex I provided above) matches only the first letter of the words where as the regex that I provided matches the whole word. I don't know if this makes a difference for you, but I felt it was worth pointing out. Please see the difference in the highlighting in the rubular link above and the one below (See how the link above only highlights the first letter of the words where as the link below highlights the whole words):
^[^bcdfghjklmnpqrstvwxyz].*|^$
http://rubular.com/r/IVJ03uOK4h
It is a bug in Ruby regex in some versions. Select version 1.8.7 in the dropdown and you will see your regex works properly.
Edit. Check the docs at http://ruby-doc.org/core-2.1.5/Regexp.html. More specifically, in the metacharacters section:
/\W/ - A non-word character ([^a-zA-Z0-9_]). Please take a look at Bug #4044 if using /\W/ with the /i modifier.

Verify string in MVC validator using regularexpressions

I am trying to grasp the concept of Regular Expressions but seem to be missing something.
I want to ensure that someone enters a string that ends with .wav in a field. Should be a pretty simple Regular Expression.
I've tried this...
[RegularExpression(#"$.wav")]
but seem to be incorrect. Any help is appreciated. Thanks!
$ is the anchor for the end of the string, so $.wav doesn't make any sense. You can't have any characters after the end of the string. Also, . has a special meaning for regex (it just means 'any character') so you need to escape it.
Try writing
\.wav$
If that doesn't work, try
.*\.wav$
(It depends on if the RegularExpression attribute wants to match the whole string, or just a part of it. .* means 'any character, 0 or more times')
Another thing you should consider is what to do with extra whitespace in the field. Users have a terrible habit of adding extra white space in inputs - its why various .Trim() functions are so important. Here, RegularExpressionAttribute might be evaluated before you can trim the input, so you might want to write this:
.*\.wav[\s]*$
The [\s]* section means 'any whitespace character (tabs, space, linebreak, etc) 0 or more times'.
You should read a tutorial on regex. It's not so hard to understand for simple problems like this. When I was learning I found this site pretty handy: http://www.regular-expressions.info/

Ruby Regular Expressions: Matching if substring doesn't exist

I'm having an issue trying to capture a group on a string:
"type=gist\nYou need to gist this though\nbecause its awesome\nright now\n</code></p>\n\n<script src=\"https://gist.github.com/3931634.js\"> </script>\n\n\n<p><code>Not code</code></p>\n"
My regex currently looks like this:
/<code>([\s\S]*)<\/code>/
My goal is to get everything in between the code brackets. Unfortunately, it's matching up to the 2nd closing code bracket Is there a way to match everything inside the code brackets up until the first occurrence of ending code bracket?
All repetition quantifiers in regular expressions are greedy by default (matching as many characters as possible). Make the * ungreedy, like this:
/<code>([\s\S]*?)<\/code>/
But please consider using a DOM parser instead. Regex is just not the right tool to parse HTML.
And I just learned that for going through multiple parts, the
String.scan( /<code>(.*?)<\/code>/ ){
puts $1
}
is a very nice way of going through all occurences of code - but yes, getting a proper parser is better...

Trouble using Xpath "starts with" to parse xhtml

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format
<div id="post_message_somenumber">
and I only want to get the first one
I tried xpath='//div[starts-with(#id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions
I think I have a solution that does not require dealing with namespaces.
Here is one that selects all matching div's:
//div[#id[starts-with(.,"post_message")]]
But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:
(//div[#id[starts-with(.,"post_message")]])[1]
These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.
It works great for me in PowerShell:
# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'
# Run the xpath selection of all matching div's
$xml.selectnodes('//div[#id[starts-with(.,"post_message")]]')
Result:
id
--
post_message_somenumber
post_message_somenumber2
Or, for just the first match:
# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[#id[starts-with(.,"post_message")]])[1]')
Result:
id
--
post_message_somenumber
I tried xpath='//div[starts-with(#id,
'"post_message_')]' in yql without
success I'm still learning this,
anyone have suggestions
If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.
Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.
Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.
Here is a good answer how to do this in C#.
Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.
#FindBy(xpath = "//div[starts-with(#id,'expiredUserDetails') and contains(text(), 'Details')]")
private WebElementFacade ListOfExpiredUsersDetails;
This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details

regex selection

I have a string like this.
<p class='link'>try</p>bla bla</p>
I want to get only <p class='link'>try</p>
I have tried this.
/<p class='link'>[^<\/p>]+<\/p>/
But it doesn't work.
How can I can do this?
Thanks,
If that is your string, and you want the text between those p tags, then this should work...
/<p\sclass='link'>(.*?)<\/p>/
The reason yours is not working is because you are adding <\/p> to your not character range. It is not matching it literally, but checking for not each character individually.
Of course, it is mandatory I mention that there are better tools for parsing HTML fragments (such as a HTML parser.)
'/<p[^>]+>([^<]+)<\/p>/'
will get you "try"
It looks like you used this block: [^<\/p>]+ intending to match anything except for </p>. Unfortunately, that's not what it does. A [] block matches any of the characters inside. In your case, the /<p class='link'>[^<\/p>]+ part matched <p class='link'>try</, but it was not immediately followed by the expected </p>, so there was no match.
Alex's solution, to use a non-greedy qualifier is how I tend to approach this sort of problem.
I tried to make one less specific to any particular tag.
(<[^/]+?\s+[^>]*>[^>]*>)
this returns:
<p class='link'>try</p>

Resources