Issues with preceding sibiling/parent/ancestor - xpath

<div class='productHolder'>
<a href="https://ap.com" class="tea-time-with-ap">
<div class="aptime-8" dataInfo="name">Hammer</div>
<div class="aptime-9" dataInfo="price">$980</div>
</div>
</div>
</a>
</div>
Note: there are over 20 productHolder classes on the same page.
I am able to get the price data, how can i used parent or preceding sibling to get the href.
I use the following code to get price:
rawPrice = response.xpath("//*[contains(text(),'$')]/text()")[counter].extract()
I've spent 2 hours trying to use preceding sibling, parent and even changing the code to use other values but, I run issues elsewhere.
Any help is appreciated, cheers!

Were you looking for something like:
from io import StringIO
from lxml import etree
html = """
<div class='productHolder'>
<a href="https://ap.com" class="tea-time-with-ap">
<div class="aptime-8" dataInfo="name">Hammer</div>
<div class="aptime-9" dataInfo="price">$980</div>
</div>
</div>
</a>
</div>
"""
root = etree.parse(StringIO(html), etree.HTMLParser())
print(root.xpath('//*[contains(text(),"$")]/../#href')[0])
Result:
https://ap.com
Of course you can easily build from this:
item = root.xpath('//*[contains(text(),"$")]')
print(item[0].text)
print(item[0].xpath('../#href')[0])
Result:
$980
https://ap.com

Related

Parsing through response created with XPath

Using Scrapy, I want to extract some data from a HTML well-formed site. With XPath I am able to extract a list of items, but I am not able to extra data from the elements in the list, using XPath
All XPath's have been tested using XPather. I have tested the issue using a local file that contains the webpage, same issue.
Here goes:
# Get the webpage
fetch("https://www.someurl.com")
# The following gives me the expected items from the HTML
products = response.xpath("//*[#id='product-list-146620']/div/div")
The items are like this:
<div data-pageindex="1" data-guid="13157582" class="col ">
<div class="item item-card item-card--static">
<div class="item-card__inner">
<div class="item__image item__image--overlay">
<a href="/www.something.anywhere?ref_gr=9801" class="ratio_custom" style="padding-bottom:100%">
</a>
</div>
<div class="item__text-container">
<div class="item__name">
<a class="item__name-link" href="/c.aspx?ref_gr=9801">The text I want</a>
</div>
</div>
</div>
</div>
</div>
When using the following Xpath to extract "The text I want", i dont get anything:
XPATH_PRODUCT_NAME = "/div/div/div/div/div[contains(#class,'item__name')]/a/text()"
products[0].xpath(XPATH_PRODUCT_NAME).extract()
The output is empty, why?
Try the following code.
XPATH_PRODUCT_NAME = ".//div[#class='item__name']/a[#class='item__name-link']/text()"
products[0].xpath(XPATH_PRODUCT_NAME).extract()

Thymeleaf switch block returns incorrect value

I have a switch block in my thymeleaf page where I show an image depending on the reputation score of the user:
<h1>
<span th:text="#{user.reputation} + ${reputation}">Reputation</span>
</h1>
<div th:if="${reputation lt 0}">
<img th:src="#{/css/img/troll.png}"/>
</div>
<div th:if="${reputation} == 0">
<img th:src="#{/css/img/smeagol.jpg}"/>
</div>
<div th:if="${reputation gt 0} and ${reputation le 5}">
<img th:src="#{/css/img/samwise.png}"/>
</div>
<div th:if="${reputation gt 5} and ${reputation le 15}">
<img th:src="#{/css/img/frodo.png}"/>
</div>
<div th:if="${reputation gt 15}">
<img th:src="#{/css/img/gandalf.jpg}"/>
</div>
This statement always returns smeagol (so reputation 0), eventhough the reputation of this user is 7: example
EDIT:
I was wrong, the image showing was a rogue line:
<!--<img th:src="#{/css/img/smeagol.jpg}"/>-->
but I commented it out. Now there is no image showing.
EDIT2:
changed my comparators (see original post) and now I get the following error:
The value of attribute "th:case" associated with an element type "div" must not contain the '<' character.
EDIT3:
Works now, updated original post to working code
According to the documentation, Thymeleaf's switch statement works just like Java's - and the example suggests the same.
In other words: you cannot do
<th:block th:switch="${reputation}">
<div th:case="${reputation} < 0">
[...]
but would need to do
<th:block th:switch="${reputation}">
<div th:case="0">
[...]
which is not what you want.
Instead, you will have to use th:if, i.e. something like this:
<div th:if="${reputation} < 0">
<img th:src="#{/css/img/troll.png}"/>
</div>
Change
<div th:case="0">
<img th:src="#{/css/img/smeagol.jpg}"/>
</div>
to
<div th:case="${reputation == 0}">
<img th:src="#{/css/img/smeagol.jpg}"/>
</div>

XPath Getting child elements from html

I am trying to find the xpath for only the child of a navigation bar. The path which I am trying at the moment is //div[#class='navCol subMenus'] from this peace of HTML.
<div class="PrimaryNavigationContainer">
<div class="PrimaryNavigation">
<div class="Menu">
<div>
<span>Brands</span>
<div class="navCol">
<div>
<a class="NoLink unselectable"><span>Shop by Brand</span></a>
<div class="navCol subMenus">
<div>
<span>blah</span>
I have tried a number of Xpath syntax but none seem to work to bring up just the sub categories. Thank you for any help which you can provide.

Watir: How to retrieve all HTML elements that match an attribute? (class, id, title, etc)

I have a page that is dynamically created and displays a list of products with their prices. Since it's dynamic, the same code is reused to create each product's information, so they share the tags and same classes. For instance:
<div class="product">
<div class="name">Product A</div>
<div class="details">
<span class="description">Description A goes here...</span>
<span class="price">$ 180.00</span>
</div>
</div>
<div class="product">
<div class="name">Product B</div>
<div class="details">
<span class="description">Description B goes here...</span>
<span class="price">$ 43.50</span>
</div>
</div>`
<div class="product">
<div class="name">Product C</div>
<div class="details">
<span class="description">Description C goes here...</span>
<span class="price">$ 51.85</span>
</div>
</div>
And so on.
What I need to do with Watir is recover all the texts inside the spans with class="price", in this example: $ 180.00, $43.50 and $51.85.
I've been playing around with something like this:
#browser.span(:class, 'price').each do |row| but is not working.
I'm just starting to use loops in Watir. Your help is appreciated. Thank you!
You can use pluralized methods for retrieving collections - use spans instead of span:
#browser.spans(:class => "price")
This retrieves a span collection object which behaves in similar to the Ruby arrays so you can use Ruby #each like you tried, but i would use #map instead for this situation:
texts = #browser.spans(:class => "price").map do |span|
span.text
end
puts texts
I would use the Symbol#to_proc trick to shorten that code even more:
texts = #browser.spans(:class => "price").map &:text
puts texts

using variables in HtmlXPathSelectors

I am using Scrapy and have run into a few places where it would be nice to use variables, but I can't figure out how. Meaning if I have some long string it would be nice to store it in a variable long_string and then select for it: hxs.select('\\div[#id=long_string]').
I'm sure this is supported by Scrapy and I just can't figure it out as it wouldn't make sense for you to always have to hard-code the string in.
Update:
So for the sample text below I want to extract the div where id="footer":
<div id="footer">
<div id="footer-menu">
<div class="region-footer-menu">
<div id="block-menu-menu-footer-menu" class="block-menu">
<div class="content">
<ul class="menu">
<li class="first leaf">FAQs</li>
<li class="leaf">Media</li>
<li class="leaf">Partners</li>
<li class="last leaf active-trail">Jobs</li>
</ul>
</div>
</div>
<div id="block-block-52" class="block block-block">
<div class="content">
<p>SUPPORT</p>
</div>
</div>
</div>
</div>
</div>
We initialize hxs = HtmlXPathSelector(response) for all the below segments.
The following code selects only the first div:
hxs.select('//div[#id=concat("foot","er")]')
This code selects nothing but gives no error:
hxs.select('//div[#id="foot"+"er"]')
Both of the below code segments select nothing and give no errors:
long_string = "foot"
hxs.select('//div[#id=concat(long_string,"er")]')
hxs.select('//div[#id=long_string]')
I would like to be able to do either of the bottom two methods and return the desired results.
Assuming + works for string concatenation in Scrapy, this should work:
hxs.select('//div[#id="' + long_string + '"]')
I'm not familiar with Scrapy, but I don't think you'll be able to select a div that doesn't exist.
have you tried?
hxs.select('\\div[#id="' + long_string_variable + '"]')

Resources