xpath string to select a specific node with specific attribute - xpath

I have the follwoing HTML:
<div class=""postrow first"">
<h2 class=""title icon"">
This is the title
</h2>
<div class=""content"">
<div id=""post_message_1668079"">
<blockquote class=""postcontent restore "">
<div>Category</div>
line 1<br /> line2
</blockquote>
</div>
</div>
</div>
<div class=""postrow"">
<h2 class=""title icon"">
second title
</h2>
<div class=""content"">
<div id=""post_message_1668079"">
<blockquote class=""postcontent restore "">
<div>Category</div>
line 1<br /> line2
</blockquote>
</div>
</div>
</div>
What is the xpath string to select all DIVs with attribute is "postrow" or "postrow "

This answer assumes that for each "", you actually have " in your document.
There are a number of alternative XPaths available to you. Here are just two:
Using a conditional |:
//div[#class = "postrow"] | //div[#class = "postrow "]
Using starts-with:
//div[starts-with(#class, "postrow")]

Related

Scrapy xpath select parent element based on text value in subelement and lacking of element

I want to select all elements article that don't contain a span element with class status and where the nested a element contains a href attribute which contains the text "rent.html".
I've managed to get the a element like so:
response.xpath('//article[#class="car"]//a[contains(#href,"rent.html")]')
But reading here and trying to select the first parent element article like so returns "data=0"
response.xpath('//article[#class="car"]//a[contains(#href,"rent.html")]//parent::article and not //article[#class="car"]//span[#class="status"]')
I also tried this.
response.xpath('//article[#class="car"][//a[contains(#href,"rent.html")]/article and not //article[#class="car"]//span[#class="status"]')')
I don't know what the expression is for my use case.
<article class="car">
<div>
<div class="container">
<a href="/34625030/rent.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/34625230/rent.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/12325230/buy.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/34632230/rent.html">
</a>
</div>
</div>
<span class="status">Rented</span>
</article>
This XPath expression will do the work:
"//article[not(.//span[#class='status'])][.//a[contains(#href,'rent.html')]]"
The entire command is:
response.xpath("//article[not(.//span[#class='status'])][.//a[contains(#href,'rent.html')]]")
Explanations:
Translating your requirements into XPath syntax.
"select all elements article" - //article
"that don't contain a span element with class status" - [not(.//span[#class='status'])]
" and where the nested a element contains a href attribute which contains the text "rent.html"" - [.//a[contains(#href,'rent.html')]]
I tested the XPath above on the shared sample XML and it worked properly.

xpath: How to combine multiple conditions on different axes

I try to extract all links based on these three conditions:
Must be part of <div data-test="cond1">
Must have a <a href="..." class="cond2">
Must not have a <img src="..." class="cond3">
The result should be "/product/1234".
<div data-test="test1">
<div>
<div data-test="cond1">
Link 1
<div class="test4">
<div class="test5">
<div class="test6">
<div class="test7">
<div class="test8">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div data-test="test2">
<div>
<div data-test="cond1">
Link 2
<div class="test4">
<div class="test5">
<div class="test6">
<div class="test7">
<div class="test8">
<img src="bild.jpg" class="cond3">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
I'm able to extract the links with the following xpath query.
//div[starts-with(#data-test,"cond")]/a[starts-with(#class,"cond")]/#href
(I know the first part is not really neccessary. But better safe than sorry.)
But I'm still struggling with excluding the links containing an descendant img tag and how to add it to the query above.
This should do what you want:
//div[#data-test="cond1" and not(.//img[#class="cond3"])]
/a[#class="cond2"]
/#href
/product/1234

XPath how to find nested element on first level [duplicate]

This question already has answers here:
Difference between "//" and "/" in XPath?
(4 answers)
Closed 4 months ago.
I have following XML:
<article>
<div class="class1">
<span>Article header 1</span>
<div>
<span>Date</span>
</div>
</div>
<div class="class2">
<span>Details</span>
<div class="class3">
<span>Number</span>
</div>
</div>
<div>
<span>Price 1</span>
</div>
<div class="class3">
<span>Footer 1</span>
<div>Footer details</div>
</div>
</article>
<article>
<div class="class1">
<span>Article header 2</span>
<div>
<span>Date</span>
</div>
</div>
<div>
<span>Price 2</span>
</div>
<div class="class2">
<span>Details</span>
<div class="class3">
<span>Number</span>
</div>
</div>
<div class="class3">
<span>Footer 2</span>
<div>Footer details</div>
</div>
</article
And I want to select only DIV without class and only from first nesting level
In this case
<div>
<span>Price 1</span>
</div>
and
<div>
<span>Price 2</span>
</div>
Note that this div in first article is on 3 place but in second article is on 2 place
I tried to use
//div[not(#class)]
but it find all elements in article, not only from first nest
You were close to the right expression.
This will select what you are looking for:
article/div[not(#class)]
This article/div indicates the direct div child element of the top node article.

How to select text node without preceding text in XPath?

<div class="a">
<div class="a random number of div wrapers">
<div>Random1<em>Median</em>
<div class="b">
<div class="c">Edit</div>
</div>
</div>
<div>Random2<em>Median</em></div>
<div>
<em>Median</em>
</div>
<div>Random3<em>Median</em></div>
<div>Random4<em>Median</em>
<div>Random4<em>Median</em></div>
</div>
</div>
<div class="a">
<div class="a random number of div wrapers">
<div>Random1<em>Median</em></div>
<div>Random2<em>Median</em></div>
<div>
<em>Median</em>
</div>
<div>Random3<em>Median</em>
<div class="b">
<div class="c">Edit</div>
</div>
</div>
<div>Random4<em>Median</em>
</div>
</div>
In this case, how to get the two nodes contains 'Median' that doesn't have text before it using XPath?
I prefer not using the index because the node position could be random.
Maybe try:
//*[.='Median'][not(preceding-sibling::text()[normalize-space()])]

WinJS Repeater Only Binding First Property

I have a list of objects with two properties, and I want my repeater to display each of them. The first property should be inside an <h2> tag, and the second should be in an <h3> tag.
HTML:
<div class="dataColumns" data-win-control="WinJS.UI.Repeater">
<h2 data-win-bind="textContent: Title"></h2>
<h3 data-win-bind="textContent: SomeOtherProperty"></h3>
</div>
JS:
var columnData = new WinJS.Binding.List([
{ Title: 'First Title', SomeOtherProperty: 'First SomeOtherProperty' },
{ Title: '2nd Title', SomeOtherProperty: '2nd SomeOtherProperty' }]);
document.querySelector('.dataColumns').winControl.data = columnData;
Here is the actual output, as seen in the DOM Explorer:
<div class="dataColumns win-repeater win-disposable" data-win-control="WinJS.UI.Repeater">
<h2 class="win-disposable">First Title</h2>
<h2 class="win-disposable">2nd Title</h2>
</div>
Why is only the <h2> shown for each item?
Here is what I would have expected:
<div class="dataColumns win-repeater win-disposable" data-win-control="WinJS.UI.Repeater">
<h2 class="win-disposable">First Title</h2>
<h3 class="win-disposable">First SomeOtherProperty</h3>
<h2 class="win-disposable">2nd Title</h2>
<h3 class="win-disposable">2nd SomeOtherProperty</h3>
</div>
A Repeater template may have only one direct descendant element (like in the case below, a single child div:
<div class="dataColumns" data-win-control="WinJS.UI.Repeater" >
<div>
<h2 data-win-bind="textContent: Title"></h2>
<h3 data-win-bind="textContent: SomeOtherProperty"></h3>
</div>
</div>
Results:
You may also consider using a WinJS.Binding.Template for the contents of the Repeater:
<div class="template" data-win-control="WinJS.Binding.Template">
<div>
<h2 data-win-bind="textContent: Title"></h2>
<h3 data-win-bind="textContent: SomeOtherProperty"></h3>
</div>
</div>
<div class="dataColumns" data-win-control="WinJS.UI.Repeater"
data-win-options="{template: select('.template')}">
</div>

Resources