Given any xpath query, can we say that there exists some CSS selector that will match the same elements?
As pointed out in the comments, the answer is an emphatic no.
The simplest way is proof through contradiction, and that is a simple parent selector - ./../. Since the parent of an arbitrary node in an XML document can be retrieved by XPath, but no parent selector exists in CSS, is trivial to say that not every XPath has an equivilent CSS selector.
QED. Ipso facto. Lorum Ipsum.
Related
With XPath, how would you search for elements that only contain another specific element? For example, what expression would result in getting all <p> tags that contain <strong> elements within them?
<p>This is some text that <strong>contains another HTML element</strong></p>
In XPath you use square brackets to filter. It is called the predicate. See I.e. this tutorial .
To select all p’s with a element strong you use
//p[strong]
If you want to find all p’s with only the element strong and no other elements, you add
//p[strong][count(*)=count(strong)]
The * stands for any element.
If, as in your example , you only interested in p’s with the strong element being the last child node you use
//p[strong[not(following-sibling::node())]]
Predicates are the way to go.
what expression would result in getting all p tags that contain elements within them?
If you only want to select p elements with direct strong children, you can use p[strong], if you're looking for any descendants, use p[descendant::strong]. In both cases the context node has to be at the level of the p elements.
I am trying to find all DIV elements have the attribute widget-name and a descendant span tag that have a title attribute.
This is what I am trying.
//div[#widget-name and descendant::span[#title]]"
This seems to almost work but it is missing one element in the Nodes Collection it returns.
Never mind.
This is what I needed:
//div[#widget-name and descendant::span[#class='title']]
OK - take it back.
This is not the complete answer.
I am now trying to tweak this to where it returns all except where title is not equal to some text:
//div[#widget-name and descendant::span[#class='title' and [text()[contains(., '{someTextToKeep}'
Anyone see why this would be invalid XPath?
Final answer is:
//div[#widget-name and descendant::span[#class='title' and text()[not(contains(., 'someTextToKeep'))]]]"
This XPath should return all div's that:
has a widget-name attribute
has a descendant span element (used abbreviated syntax) that:
has a class attribute with the value 'title'
contains the text 'someTextToKeep' (if you want to exclude spans with certain text, wrap the contains() in not().
XPath:
//div[#widget-name and .//span[#class='title'][contains(.,'someTextToKeep')]]
I'm looking through HTML documents for the text: "Required". What I need to find is the element that holds the text. For example:
<p>... Required<p>
I would get to element name = p
However, it might not be in a <p> tag. It could be in any kind of tag, which is where this question differs from some of the other search text Stack Overflow questions.
Right now I'm using:
page.at(':contains("Required")')
but this only get me the full HTML element
The problem you have is the :contains pseudo class matches any element that has the searched for text anywhere in its descendants. You need to find the innermost element that contains such text. Since html is the ancestor of all elements, if the page contains the text anywhere then html will contain, and so that will be the first matching element.
I’m not sure you can achieve this with CSS, but you can use XPath like this:
page.at_xpath('//*[text()[contains(., "Required")]]')
This finds the first element node that has a text() node as a child that contains Required. When you have that node (if it exists) you can then call name on it to give the name of the element.
For CSS you can do:
page.at('[text()*="Required"]')
It's not real CSS though, or even a jQuery extra.
You should use CSS selectors:
page.css('p').text
There is a type of XPath like this
//div[span[a[#title='foo']]]
where it matches & returns the div (not the hyperlink) that contains a span, which in turn contains a hyperlink with title "foo".
is there a CSS selector format equivalent to this?
I gave it a short trying to convert to CSS, and if there is an equivalent, I don't know how to map it correctly.
No, there isn't. Selectors don't have the kind of predicate that XPath does, and there isn't a way to ascend an element's hierarchy from the deepest element (in this case your a[#title='foo']), i.e. there is no parent selector.
What are the advantages / disadvantages of the two different selectors?
Should I use one over the other?
I think it's primarily a matter of user preference.
To select the first child of all <p> elements, you'd do:
$("//p/*[1]") in Xpath
$$("p > *:first-child") in CSS
I prefer using Xpath, but YMMV.
Note that, internally, all CSS selectors are converted to Xpath. For example, the selector $$("#one") will be converted into $(".//*[id='one']").
Just a few notes:
indexing starts from 1 in XPath, so it's //p/*[1]
the CSS selectors in Tritium allow you to prefix a selector with >, as in $$("> p > :first-child"); this will be converted into a scoped search (i.e., ./p/*[1])
because CSS selectors are (currently) dynamically converted into XPath, there's a slight performance hit compared to using straight XPath