XPath selecting specific child element - xpath

I've some problem with Xpath syntax with html. I want to select an item which is into a div.
I have a Div define by an id : "popin".
In this div, I have a span with his id is "id_yes".
I can get the div with //DIV[contains(#id ,'popin')] but I failed to get the span element.
Have you a solution ?

If you have the ID, you can use:
//span[#id="id_yes"]
If you want to be more specific, //div[#id="popin"]/span[#id="id_yes"]
That, assuming your IDs are unique.

Related

select element based on class and attribute value

I'm trying to use Xpath in order to select an HTML tag based on its value
Here is my html code:
<span class="yellowbird">Continue</span>
<span class="yellowbird">Stop</span>
I can select the span elements with a specific class value using
//span[contains(#class, 'yellowbird')]
However I'm struggling to select only the element which contains the value "Continue"
This XPath expression will select any span element whose class attribute equals yellowbird and text equals Continue:
//span[#class='yellowbird' and text()='Continue']
Here is the syntax I used to make this work using request.xpath and scrapy
//span[contains(#class, 'yellowbird')][1]//text()='Continue'

Xpath expression (nokogiri) to get tag's child element?

From my xml, I can get this :
<home>
<creditors>
<count>2</count>
</creditors>
</home>
OR even this :
<home>
<creditors>
<moreThan>2</moreThan>
</creditors>
</home>
Which xpath expression can I use to get "<count>2</count>" instead of getting only "2" OR to get "<moreThan>2</moreThan>" instead of getting "2" ?
This XPath,
//creditors/count
will select all count child elements of all creditors elements in the XML document.
Update per OP's request in comments for a single XPath that selects both count and moreThan elements:
This XPath,
//creditors/*[self::count or self::moreThan]
will select all count or moreThan child elements of all creditors elements in the XML document.
Assuming that your xpath expression is OK, you just need to convert the element to string:
doc.xpath("home/creditors/*").to_s
=> "<count>2</count>"
Please check with queries returning more than one element, to make sure that it's desired behaviour.

XPath - Nested path scraping

I'm trying to perform html scrapping of a webpage. I like to fetch the three alternate text (alt - highlighted) from the three "img" elements.
I'm using the following code extract the whole "img" element of slide-1.
from lxml import html
import requests
page = requests.get('sample.html')
tree = html.fromstring(page.content)
text_val = tree.xpath('//a[class="cover-wrapper"][id = "slide-1"]/text()')
print text_val
I'm not getting the alternate text values displayed. But it is an empty list.
HTML Script used:
This is one possible XPath :
//div[#id='slide-1']/a[#class='cover-wrapper']/img/#alt
Explanation :
//div[#id='slide-1'] : This part find the target <div> element by comparing the id attribute value. Notice the use #attribute_name syntax to reference attribute in XPath. Missing the # symbol would change the XPath selector meaning to be referencing a -child- element with the same name, instead of an attribute.
/a[#class='cover-wrapper'] : from each <div> element found by the previous bit of the XPath, find child element <a> that has class attribute value equals 'cover-wrapper'
/img/#alt : then from each of such <a> elements, find child element <img> and return its alt attribute
You might want to change the id filter to be starts-with(#id,'slide-') if you meant to return the all 3 alt attributes in the screenshot.
Try this:
//a[#class="cover-wrapper"]/img/#alt
So, I am first selecting the node having a tag and class as cover-wrapper and then I select the node img and then the attribute alt of img.
To find the whole image element :
//a[#class="cover-wrapper"]
I think you want:
//div[#class="showcase-wrapper"][#id="slide-1"]/a/img/#alt

Need a xpath : where parent having multiple child, but i required only parent value

In below code: parent "div" having three child "span", "script" and "span". but i required the value of Parent "div" which "N/A". "N/A" not comes under any attribute of div. Its just a value of parent "div".
<div class="ah-text-align-right ah-font-xsmall" style="">
<span id="_dcmanageinvestmentsportlet_WAR_ahdcmnginvportlet__FDROR_110hidden" style="display:none">
<script type="text/javascript">
<span class="ah-float-left">
N/A
</div>
For getting parent element you can use double dot .. after child element xpath.
For getting text of an element you can use xpath text() function, but depending on implementation of xpath in whatever environment and code you use, it might be unavailable. Note, that text of an element will return actual text node of this element as well as all text nodes of child elements.
For your case if you search a parent of a span with ah-float-left class, then xpath should be something like following:
//span[#class='ah-float-left']/..
For getting text of a parent, you'll need following:
//span[#class='ah-float-left']/../text()
Note: looking elements up by class name may return you a collection of elements which in turn will return you collection of parent elements and collection of parent nodes texts, which may not be desired. I would recommend lookup child element by id, since xhtml prescribes that elements ids are unique. Thus, an xpath for a parent div should better look like following:
//span[#id='_dcmanageinvestmentsportlet_WAR_ahdcmnginvportlet__FDROR_110hidden']/..

Xpath exclude id from a a div

I've got this xpath expression which selects a div with id called product-details
//DIV[#id='product-details']
Now the problem is that it also selects a div with id called price like this <div id="price"> £705</div>
Which is the express to exclude the above line from the container div?
Thanks

Resources