Scrapy xpath select parent element based on text value in subelement and lacking of element

Scrapy xpath select parent element based on text value in subelement and lacking of element - xpath

I want to select all elements article that don't contain a span element with class status and where the nested a element contains a href attribute which contains the text "rent.html".
I've managed to get the a element like so:
response.xpath('//article[#class="car"]//a[contains(#href,"rent.html")]')
But reading here and trying to select the first parent element article like so returns "data=0"
response.xpath('//article[#class="car"]//a[contains(#href,"rent.html")]//parent::article and not //article[#class="car"]//span[#class="status"]')
I also tried this.
response.xpath('//article[#class="car"][//a[contains(#href,"rent.html")]/article and not //article[#class="car"]//span[#class="status"]')')
I don't know what the expression is for my use case.
<article class="car">
<div>
<div class="container">
<a href="/34625030/rent.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/34625230/rent.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/12325230/buy.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/34632230/rent.html">
</a>
</div>
</div>
<span class="status">Rented</span>
</article>

This XPath expression will do the work:
"//article[not(.//span[#class='status'])][.//a[contains(#href,'rent.html')]]"
The entire command is:
response.xpath("//article[not(.//span[#class='status'])][.//a[contains(#href,'rent.html')]]")
Explanations:
Translating your requirements into XPath syntax.
"select all elements article" - //article
"that don't contain a span element with class status" - [not(.//span[#class='status'])]
" and where the nested a element contains a href attribute which contains the text "rent.html"" - [.//a[contains(#href,'rent.html')]]
I tested the XPath above on the shared sample XML and it worked properly.

Related

XPath: Select any div that contains one or more descendant divs with a specific class

Assume that following HTML snippet exists somewhere in the <body> element of a web page:
<div id="root_1000" class="root bacon">
<ul>
<li id="item_1234567" class="active">
<div class="userpost author_4281">
<div>This text should be visible.<div>
</div>
<ul><li>Some item</li></ul>
</li>
</ul>
</div>
<div id="root_2000" class="root bacon">
<ul>
<li id="item_8675309" class="active">
<div class="userpost author_3333">
<div>
This text, and as the DIV.root that contains it, should be hidden.
<div>
</div>
<ul><li>Another item</li></ul>
</li>
</ul>
</div>
<div id="root_3000" class="root bacon">
<ul>
<li id="item_7654321" class="active">
<div class="userpost author_9877">
<div>This text should be visible.<div>
</div>
<ul><li>Yet another item</li></ul>
</li>
</ul>
</div>
So here's my question: what would the XPath syntax be to select the div.root that contains info posted by author #3333 (i.e. div[class~="author_3333"])?
The following XPath statement will properly match the div.userpost element associated with author #3333 that I want to hide, but does not include the <ul><li>Another item</li></ul> node, which I also need to hide:
.//div[contains(#class, 'author_3333')]
What I want to do is select the closest div.root ancestor associated with the node that my XPath statement matches. Any help would be greatly appreciated... thanks in advance!

you need to get the parent node that has the second div as its child, something like:
//div[.//div[contains(#class, "author_3333")]]

You can use this XPath expression:
.//div[contains(#class, 'author_3333')]/ancestor::div[contains(#class,'root')][1]
Output is:
<div id="root_2000" class="root bacon">
<ul>
<li id="item_8675309" class="active">
<div class="userpost author_3333">
<div>
This text, and as the DIV.root that contains it, should be hidden.
</div>
</div>
<ul>
<li>Another item</li>
</ul>
</li>
</ul>
</div>

XPath: how to select elements that are related to other on the same level

The question is simple but I don't have enough practice for this case :)
How to get price text value from every div within "block" if we know that we need only item_promo elements.
<div class="block">
<div class="item_promo">item</div>
<div class="item_price">123</div>
</div>
<div class="block">
<div class="item_promo">item</div>
<div class="item_price">456</div>
</div>
<div class="block">
<div class="item_promo">item</div>
<div class="item_price">789</div>
</div>
<div class="block">
<div class="item">item</div>
<div class="item_price">222</div>
</div>
<div class="block">
<div class="item">item</div>
<div class="item_price">333</div>
</div>

You could use the xpath :
//div[#class='block']/*[#class='item_promo']/following-sibling::div[#class='item_price']/text()
You look for div elements that has attribute class with value item_promo and look at its following sibling which has an attribute item_price and grab the text.

This XPath,
//div[div/#class='item_promo']/div[#class='item_price']
will return those item_price class div elements with sibling item_promo class div elements:
<div class="item_price">123</div>
<div class="item_price">456</div>
<div class="item_price">789</div>
This will work regardless of label/price order.

Using xpath, I can't seem to be able to find a text node

So, I am building a web crawler for one site's comment section, and I have came with a problem, it seems I can't find a text node for the comments content. This is how the web pages element looks:
<div class="comments"> // this is the whole comments section
<div class="comment"> // this is where the p is located
<div class="comment-top">
<div class="comment-nr">208. PROTAS</div>
<div class="comment-info">
<div class="comment-time">2015-06-30 13:00</div>
<div class="comment-ip">IP: 178.250.32.165</div>
<div class="comment-vert1">
<a href="javascript:comr(24470645,'p')">
<img src="http://img.lrytas.lt/css2/img/com-good.jpg" alt="">
</a> <span id="cy_24470645"> </span>
</div>
<div class="comment-vert2">
<a href="javascript:comr(24470645,'m')">
<img src="http://img.lrytas.lt/css2/img/com-bad.jpg" alt="">
</a> <span id="cn_24470645"> </span>
</div>
</div>
</div>
<p class="text-13 no-intend">Test text</p> // I need to get this comments content
</div>
I tried a lot of xpath's like:
*/div[contains(#class, "comment")]/p/text()
/p[contains(#class, "text-13 no-intend")]/text()
etc.
But can't seem able to locate it.
Would appreciate any help.

How about this:
//div[#class = 'comments']/div[#class = 'comment'][1]/p/text()

How to fill my html with json object?

I am new in ajax.
I am trying to get value and want to fill in html code snippet.
I have html code and json object that has value.
Now I want to show the specific value in the different-different part of html code.
Here is my html code:-
<div>
<div class="borb clearfix">
<div class="profileholder fleft">
<img src="images/users/1.png" class="userpic">
<div class="icon state green"></div>
</div>
<div class="remainder">
<div class="padl10">
<div class="username">Anurag Shivpuri</div>
<div class="desig">Cheif Information Officer</div>
<div class="loc">Credit Operation | Pune</div>
</div>
</div>
</div>
<ul class="userdata">
<li>
<span class="lbl">Employee Code :</span>
<span> 2007</span>
</li>
<li>
<span class="lbl">Role_Designation :</span>
<span> Senior HR</span>
</li>
<li>
<span class="lbl">Department :</span>
<span> HR</span>
</li>
<li>
<span class="lbl">Sub_Department :</span>
<span> Talent Acquisition</span>
</li>
<li>
<span class="lbl">Official E-mail Id :</span>
<span> atul.gupta#bajajfinserve.co.in</span>
</li>
<li>
<span class="lbl">Mobile No :</span>
<span> 9844333932</span>
</li>
</ul>
<div class="footbar">
Reward
Incentive
Movement
Leaves
LnD
</div>
On success of ajax I am getting the value now I want to fill it in my html code.
Please help me.

You can use a library (like knockout) to do that, or you can use jQuery to create the elements:
$("<div>").append("some text").appendTo("body"); //creates a div, append some text and...

If you already have a success event in your ajax, you could grab an empty DOM element and set its content.
document.getElementById('placeholder').innerHTML = jsonData.myProperty;

You should use innerHTML in javascript.
Set some ID on the element you need to update:
<li>
<span class="lbl">Employee Code :</span>
<span id='employeeCode'> 2007</span>
</li>
Then update it with your json value:
document.getElementById('employeeCode').innerHTML= yourJsonObject.employeeCodeValue;

Get anchor InnerText in a nested Divs

Html :
<div class="info">
<div class="title">
<div class="{DYNAMIC CLASS NAME}">
Text
</div>
</div>
</div>
<div class="info">
<div class="title">
<div class="{DYNAMIC CLASS NAME}">
Another Text
</div>
</div>
</div>
XPath :
DocumentNode.SelectNodes["//div[#class='info']/div[2]/a"];
How to get a innertext value from nested divs?
Because 3rd div classname is a dynamic.
Thanks.

Using index, like div[2], to get nested <div> is not correct. You should've used div/div instead. Try this way :
DocumentNode.SelectNodes("//div[#class='info']/div[#class='title']/div/a");

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Scrapy xpath select parent element based on text value in subelement and lacking of element - xpath

Related

XPath: Select any div that contains one or more descendant divs with a specific class

XPath: how to select elements that are related to other on the same level

Using xpath, I can't seem to be able to find a text node

How to fill my html with json object?

Get anchor InnerText in a nested Divs

Categories

Resources