XPath selector for matching multiple classes [duplicate] - xpath

This question already has answers here:
How can I match on an attribute that contains a certain string?
(10 answers)
Closed 9 years ago.
I've been searching for the past 30 minutes or so, but I can't seem to an answer to how to create an xpath selector that will match multiple classes.
After reading this: How can I match on an attribute that contains a certain string?
The closest solution I can find is:
//div[contains(#class,'atag') and contains(#class ,'btag')]
However, one of the comment suggests that it would also match:
<div class="Patagonia Halbtagsarbeit">
What XPath selector should I use to select a div with multiple classes?
Example:
<div class="fl badge bolded shadow">

I would suggest backing the xpath up to locate the div more specifically so that other divs with the same classes could not be selected instead. You can use FireBug's FirePath to get the absolute xpath.

Related

How to make XPath text() query case insensitive? [duplicate]

This question already has answers here:
Case insensitive XPath contains() possible?
(6 answers)
Closed 2 months ago.
I have a query and I'd like it to find any match on a page - regardless if any of the letters on the page are upper or lower case.
My query:
//*[contains(text(),'Deez')]
I've tried the solutions I've seen given to other similar questions but none have worked. My query uses text() which I've not seen in the other questions. Is that a problem?
With XPath 2.0 or greater, you can use upper-case():
//*[contains(upper-case(text()),'DEEZ')]
or lower-case():
//*[contains(lower-case(text()),'deez')]
or matches() with the case insensitive flag i (won't be the most efficient):
//*[matches(text(),'Deez', 'i')]
For XPath 1.0 and greater, you can use translate() to ensure that all the letters are upper or lower-case:
//*[contains(translate(text(), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'),'DEEZ')]

xpath normalize-space with contains [duplicate]

This question already has answers here:
XPath contains(text(),'some string') doesn't work when used with node with more than one Text subnode
(7 answers)
Closed 4 years ago.
I have an xpath string //*[normalize-space() = "some sub text"]/text()/.. which works fine if the text I am finding is in a node which does not have multiple text sub nodes, but if it does then it won't work, so I am trying to combine it with contains() as follows: //*[contains(normalize-space(), "some sub text")]/text()/.. which does work, but it always returns the body and html tags as well as the p tag which contains the text. How can I change it so it only returns the p tag?
It depends exactly what you want to match.
The most likely scenario is that you want to match some text if it appears anywhere in the normalized string value of the element, possibly split across multiple text nodes at different levels: for example any of the following:
<p>some text</p>
<p>There was some text</p>
<p>There was <b>some text</b></p>
<p>There <b>was</b> some text</p>
<p>There was <b>some</b> <!--italic--> <i>text</i></p>
<p>There was <b>some</b> text</p>
If that's the case, then use //p[contains(normalize-space(.), "some text")].
As you point out, using //* with this predicate will also match ancestor elements of the relevant element. The simplest way to fix this is by using //p to say what element you are looking for. If you don't know what element you are looking for, then in XPath 3.0 you could use
innermost(//*[contains(normalize-space(.), "some text")])
but if you have the misfortune not to be using XPath 3.0, then you could do (//*[contains(normalize-space(.), "some text")])[last()], though this doesn't do quite the same thing if there are multiple paragraphs with the required content.
If you don't want to match all of the above, but want to be more selective, then you need to explain your requirements more clearly.
Either way, use of text() in a path expression is generally a code smell, except in the rare cases where you want to select text in an element only if it is not wrapped in other tags.

In JSP, how to escape a string value for JavaScript first, then for HTML? [duplicate]

This question already has an answer here:
How to escape JavaScript in JSP?
(1 answer)
Closed 6 years ago.
In a project using JSP and Spring, what would be the simplest way to escape an EL expression value for JavaScript first, then for HTML?
Imagine this use:
<a onclick='alert("${note}");'>Foo</a>
This is susceptible to XSS or plain syntax errors, as the variable value can contain quote, less-than and other characters.
What I came up is:
<a onclick='alert("<c:out value="${null}"
><s:escapeBody htmlEscape="false" javaScriptEscape="true"
>${variable}</s:escapeBody></c:out>");'>Foo</a>
<!-- this escapes first the inner tags body for JS,
then c:out uses that (because its value attribute
is null, it uses what is in its own body) and
escapes it for HTML/XML -->
It is rather clumsy, so I'm looking for a more elegant way.
(note that using just <s:escapeBody htmlEscape="true" javaScriptEscape="true"> is incorrect as the tag escapes first for HTML and then for JS, so for example it would fail on the value of a"b)
To avoid getting clumsy code, I would suggest you create a small custom taglib which provides a function exactly for that. You can implement that very easily by using StringEscapeUtils in commons-lang.
Then you can write:
<a onclick='alert("${foo:escapeJsHtml(note)}");'>Foo</a>

XPath on Wikipedia Summary

I'm currently trying to extract the blurb, or summary from any given Wikipedia page, using XPath. Now, there are many places online where this has already been done: http://jimblackler.net/blog/?p=13, How to use XPath or xgrep to find information in Wikipedia?.
But, when I try to use similar XPath expressions, on a variety of pages, the returned results are strange. For the sake of this question, let's assume I'm trying to retrieve the very first paragraph in the printable Wikipedia page on Boston: http://en.wikipedia.org/w/index.php?title=Boston&printable=yes.
When I try to use this expression /html/body/div[#id='content']/div[#id='bodyContent']//p, only the last four words of the paragraph, "in the United States.", are returned.
Actually, the expression used above could be simplified to //div/p, but the results are the same.
Strangely, the links I linked to previously seem to use similar methods and return great results; originally, I imagined this was due to Wikipedia changing the formatting of their pages in recent years, but honestly, I can't seem to find what's wrong with both the expressions.
Does anyone have any idea about this?
When I try to use this expression
/html/body/div[#id='content']/div[#id='bodyContent']//p, only the
last four words of the paragraph, "in the United States.", are
returned.
There are a few problems here:
The XML document is in a default namespace. Writing XPath expressions to select nodes in a document that is in a default namespace is the most FAQ about XPath -- search for "XPath and default namespace". In short, any unprefixed name will most probably cause nothing to be selected. One must register the default namespace and associate a specific prefix with this namespace. Then any element name in the XPath expression must be written with this prefix. So, the expression above will become:
:
/x:html/x:body/x:div[#id='content']/x:div[#id='bodyContent']//x:p
where the "x:" prefix is associated to the "http://www.w3.org/1999/xhtml" namespace.
.2. Even the above expression doesn't select (only) the wanted node. In order to select only the first x:p from the above, the XPath expression must be specified as (note the brackets):
(/x:html/x:body/x:div[#id='content']/x:div[#id='bodyContent']//x:p)[1]
.3. As you want the text of the paragraph, an easy way to do this is to use the standard XPath function string():
string((/x:html/x:body/x:div[#id='content']/x:div[#id='bodyContent']//x:p)[1])
When this XPath expression is evaluated, I get the text of the paragraph -- for example in the XPath Visualizer I wrote some years ago:

Trouble using Xpath "starts with" to parse xhtml

I'm trying to parse a webpage to get posts from a forum.
The start of each message starts with the following format
<div id="post_message_somenumber">
and I only want to get the first one
I tried xpath='//div[starts-with(#id, '"post_message_')]' in yql without success
I'm still learning this, anyone have suggestions
I think I have a solution that does not require dealing with namespaces.
Here is one that selects all matching div's:
//div[#id[starts-with(.,"post_message")]]
But you said you wanted just the "first one" (I assume you mean the first "hit" in the whole page?). Here is a slight modification that selects just the first matching result:
(//div[#id[starts-with(.,"post_message")]])[1]
These use the dot to represent the id's value within the starts-with() function. You may have to escape special characters in your language.
It works great for me in PowerShell:
# Load a sample xml document
$xml = [xml]'<root><div id="post_message_somenumber"/><div id="not_post_message"/><div id="post_message_somenumber2"/></root>'
# Run the xpath selection of all matching div's
$xml.selectnodes('//div[#id[starts-with(.,"post_message")]]')
Result:
id
--
post_message_somenumber
post_message_somenumber2
Or, for just the first match:
# Run the xpath selection of the first matching div
$xml.selectnodes('(//div[#id[starts-with(.,"post_message")]])[1]')
Result:
id
--
post_message_somenumber
I tried xpath='//div[starts-with(#id,
'"post_message_')]' in yql without
success I'm still learning this,
anyone have suggestions
If the problem isn't due to the many nested apostrophes and the unclosed double-quote, then the most likely cause (we can only guess without being shown the XML document) is that a default namespace is used.
Specifying names of elements that are in a default namespace is the most FAQ in XPath. If you search for "XPath default namespace" in SO or on the internet, you'll find many sources with the correct solution.
Generally, a special method must be called that binds a prefix (say "x:") to the default namespace. Then, in the XPath expression every element name "someName" must be replaced by "x:someName.
Here is a good answer how to do this in C#.
Read the documentation of your language/xpath-engine how something similar should be done in your specific environment.
#FindBy(xpath = "//div[starts-with(#id,'expiredUserDetails') and contains(text(), 'Details')]")
private WebElementFacade ListOfExpiredUsersDetails;
This one gives a list of all elements on the page that share an ID of expiredUserDetails and also contains the text or the element Details

Resources