How to get html elements with multiple css classes - html-agility-pack

I know how to get a list of DIVs of the same css class e.g
<div class="class1">1</div>
<div class="class1">2</div>
using xpath //div[#class='class1']
But how if a div have multiple classes, e.g
<div class="class1 class2">1</div>
What will the xpath like then?

The expression you're looking for is:
//div[contains(#class, 'class1') and contains(#class, 'class2')]
I highly suggest XPath visualizer, which can help you debug xpath expressions easily. It can be found here:
http://xpathvisualizer.codeplex.com/

According to this answer, which explains why it is important to make sure substrings of the class name that one is looking for are not included, the correct answer should be:
//div[contains(concat(' ', normalize-space(#class), ' '), ' class1 ')
and contains(concat(' ', normalize-space(#class), ' '), ' class2 ')]

There's a useful python package called cssselect.
from cssselect import CSSSelector
CSSSelector('div.gallery').path
Generates a usable XPath:
descendant-or-self::div[#class and contains(concat(' ', normalize-space(#class), ' '), ' gallery ')]
It's very similar to Flynn1179's answer.

i think this the expression you're looking for is
//div[starts-with(#class, "class1")]/text()

You could also do:
//div[contains-token(#class, 'class_one') and contains-token(#class, 'class_two')]

Related

XPath V1.0 contains() not specific enough

I have an application that requires me to find a XPath selector for an element and then see if that XPath can be simplified.
So if I have
<a class="abc def gh">
I may determine that the XPath
a[contains(#class, "abc")
is specific enough. The problem is, it also selects items with class "abcxyz",
Is there a way to select items with ONLY class "abc"?
i.e. I think it's clear but I want to find items that have a class of "abc" or "abc def" but not "abcxyz".
Here's a more specific example because I believe neither of the answers so far works:
<div>
<span id="x" class="btnSalePriceLabel">Sale:</span>
<span id="y" class="btnSalePrice highlight">$20.40</span>
</div>
I want whatever XPath selector will select the 2nd span and not the first.
If I try
//span[#class and contains(concat(' ', normalize-space(#class), ' '), ' btnSalesPrice ')]
I get nothing selected. Likewise with
//span[contains(concat(' ', normalize-space(#class), ' '), ' btnSalesPrice ')]
Since class attribute is a multi-valued attribute, you have to account for these spaces between the values with concat():
//a[contains(concat(' ', normalize-space(#class), ' '), ' abc ')]
Note that CSS selectors have this ability to match specific class values built-in:
a.abc
I think you can see what is more concise and readable.
it is better if you use css for this exact matches, specially with class attributes, in which case it would be:
a.abc
You can use different css-to-xpath converters on several languages (check this one for example on javascript) and its transformation would be:
descendant-or-self::a[#class and contains(concat(' ', normalize-space(#class), ' '), ' abc ')]

xpath search for class and text - combine two xpath selector

I know there are tousands of simple question about xpath but i dont get it how to combine two not too simple expressions...
My xml structure:
<div class="some-container">
<div class="btn btn-blue">
<div class="btn-text"><!-- Select by class -->
<span> <!-- Select by text-->
Download
</span>
</div>
</div>
</div>
Select by class
I know achieved to select the div by searching after the class:
//*/div[contains(concat(' ', #class, ' '), ' btn-text')]
Select by text
To select also the span i know i can simply add /span but then i want to select by text.
For that usecase i got the xpath (form here):
//*/text()[normalize-space(.)='Download']/parent::*
Those selector both are working properly but i want to combine them
Search for class "btn"
Search for text inside span that exactly matches
I tried to concat like that but that dont work test-example:
//div[contains(concat(' ', #class, ' '), ' btn-text')]/text()[normalize-space(.)='Download']/parent::*
even if it'd work there is no selecting by span tag
Anyone who could help?
Your XPath is looking for a text node directly inside the div, but the text node you're looking for is inside a span. That's why it's not succeeding.
To get it to work, just change the XPath to look for the span and not the text node:
//div[contains(concat(' ', #class, ' '), ' btn-text ')]/span[normalize-space(.)='Download']

Use xpath or xquery to show text in title attribute

I'd like to use xquery (I believe) to output the text from the title attribute of an html element.
Example:
<div class="rating" title="1.0 stars">...</div>
I can use xpath to select the element, but it tries to output the info between the div tags. I think I need to use xquery to output the "1.0 stars" text from the title attribute.
There's gotta be a way to do this. My Google skills are proving ineffective in coming up with an answer.
Thanks.
XPath: //div[#class='rating']/#title
This will give you the title text for every div with a class of "rating".
Addendum (following from comments below):
If the class has other, additional text in it, in addition to "rating", then you can use something like this:
//div[contains(concat(' ', normalize-space(#class), ' '), ' rating ')]
(Hat tip to How can I match on an attribute that contains a certain string?).
You should use:
let $XML := <p><div class="rating" title="2.0 stars">sdfd</div><div class="rating" title="1.0 stars">sdfd</div></p>
for $title in $XML//#title
return
<p>{data($title)}</p>
to get output:
<p>2.0 stars</p>
<p>1.0 stars</p>

XPath Expressions - find elements without specifying the entire path

I am doing some screen scraping with a library that takes XPath expressions and noticed that several pages are similar, but different.
Is there a way to loosely say "get me divs that have class='mytarget' but exist as a child of a div with class = 'nav' and the exact path is unknown between nav and mytarget."
<div class="nav">
<div>
??????
<div class="mytarget"></div>
??????
</div>
</div>
Yes, using the descendant-or-self axis (//):
//div[#class='nav']//div[#class='mytarget']
Or, if there can be more than one class name on those elements, then this is even better:
//div[contains(concat(' ', #class, ' '), ' nav ')]//
div[contains(concat(' ', #class, ' '), ' mytarget ')]
Warning: this can be very inefficient on large documents. You should use absolute paths wherever the structure is known. Only resort to // when the structure is unknown.
Thats what "//" expressions are for. Something like:
//*[#class="nav"]//*[#class="mytarget"]
http://www.w3schools.com/xpath/xpath_syntax.asp

XPath to find elements that does not have an id or class

How can I get all tr elements without id attribute?
<tr id="name">...</tr>
<tr>...</tr>
<tr>...</tr>
Thanks
Pretty straightforward:
//tr[not(#id) and not(#class)]
That will give you all tr elements lacking both id and class attributes. If you want all tr elements lacking one of the two, use or instead of and:
//tr[not(#id) or not(#class)]
When attributes and elements are used in this way, if the attribute or element has a value it is treated as if it's true. If it is missing it is treated as if it's false.
If you're looking for an element that has class a but doesn't have class b, you can do the following.
//*[contains(#class, 'a') and not(contains(#class, 'b'))]
Or if you want to be sure not to match partial.
//*[contains(concat(' ', normalize-space(#class), ' '), ' some-class ') and
not(contains(concat(' ', normalize-space(#class), ' '), ' another-class '))]
Can you try //tr[not(#id)]?

Resources