XPath query to select all href attributes of <a> tag, which 'class' attribute equals specified string - xpath

I don't know why following query doesn't work:
//a/#href[#class='specified_string']

Try it the other way round:
//a[#class='specified_string']/#href
After all, class is an attribute of the <a> element, not an attribute of the href attribute.

An attribute cannot have attributes. Only elements can have attributes.
The original XPath expression:
//a/#href[#class='specified_string']
selects any href attribute of any a element, such that the href attribute has an attribute class whose value is 'specified_string'.
What you want is:
//a[#class='specified_string']/#href
that is: the href attribute of any a element that has class atribute with value 'specified_string'.

You basically say that you are looking for an attribute named href, whose attribute (this is the error) class should be equal to specified_string.
But you need to find the attribute href of an element a, whose attribute class is specified_string.
(ndim's answer overlapped mine)

There is not class attribute present in anchor tag I have href only. It is identified using //*[#href='value'] but //*a[#href='value'] is not working

Related

select element based on class and attribute value

I'm trying to use Xpath in order to select an HTML tag based on its value
Here is my html code:
<span class="yellowbird">Continue</span>
<span class="yellowbird">Stop</span>
I can select the span elements with a specific class value using
//span[contains(#class, 'yellowbird')]
However I'm struggling to select only the element which contains the value "Continue"
This XPath expression will select any span element whose class attribute equals yellowbird and text equals Continue:
//span[#class='yellowbird' and text()='Continue']
Here is the syntax I used to make this work using request.xpath and scrapy
//span[contains(#class, 'yellowbird')][1]//text()='Continue'

XPath selecting specific child element

I've some problem with Xpath syntax with html. I want to select an item which is into a div.
I have a Div define by an id : "popin".
In this div, I have a span with his id is "id_yes".
I can get the div with //DIV[contains(#id ,'popin')] but I failed to get the span element.
Have you a solution ?
If you have the ID, you can use:
//span[#id="id_yes"]
If you want to be more specific, //div[#id="popin"]/span[#id="id_yes"]
That, assuming your IDs are unique.

XPath - Nested path scraping

I'm trying to perform html scrapping of a webpage. I like to fetch the three alternate text (alt - highlighted) from the three "img" elements.
I'm using the following code extract the whole "img" element of slide-1.
from lxml import html
import requests
page = requests.get('sample.html')
tree = html.fromstring(page.content)
text_val = tree.xpath('//a[class="cover-wrapper"][id = "slide-1"]/text()')
print text_val
I'm not getting the alternate text values displayed. But it is an empty list.
HTML Script used:
This is one possible XPath :
//div[#id='slide-1']/a[#class='cover-wrapper']/img/#alt
Explanation :
//div[#id='slide-1'] : This part find the target <div> element by comparing the id attribute value. Notice the use #attribute_name syntax to reference attribute in XPath. Missing the # symbol would change the XPath selector meaning to be referencing a -child- element with the same name, instead of an attribute.
/a[#class='cover-wrapper'] : from each <div> element found by the previous bit of the XPath, find child element <a> that has class attribute value equals 'cover-wrapper'
/img/#alt : then from each of such <a> elements, find child element <img> and return its alt attribute
You might want to change the id filter to be starts-with(#id,'slide-') if you meant to return the all 3 alt attributes in the screenshot.
Try this:
//a[#class="cover-wrapper"]/img/#alt
So, I am first selecting the node having a tag and class as cover-wrapper and then I select the node img and then the attribute alt of img.
To find the whole image element :
//a[#class="cover-wrapper"]
I think you want:
//div[#class="showcase-wrapper"][#id="slide-1"]/a/img/#alt

Retrieving a parent tag with a given attribute that contains a subelement by using XPath

How I can retrieve multiple DIVs (with a given class attribute "a") that contain a span tag with a class attribute "b" by using Xpath?
<div class='a'>
<span class='b'/>
</div>
The structure of my XML is not defined so basically the span could be at any level of the div and the div itself could be at any level of the XML tree.
This should work:
//div[#class='a'][span/#class='b']
// means search anywhere if it starts the expression.
If the span is deeper in the div, use descendant:: which can be shortened to // again:
//div[#class='a'][.//span/#class='b']

Use Nokogiri to get all nodes in an element that contain a specific attribute name

I'd like to use Nokogiri to extract all nodes in an element that contain a specific attribute name.
e.g., I'd like to find the 2 nodes that contain the attribute "blah" in the document below.
#doc = Nokogiri::HTML::DocumentFragment.parse <<-EOHTML
<body>
<h1 blah="afadf">Three's Company</h1>
<div>A love triangle.</div>
<b blah="adfadf">test test test</b>
</body>
EOHTML
I found this suggestion (below) at this website: http://snippets.dzone.com/posts/show/7994, but it doesn't return the 2 nodes in the example above. It returns an empty array.
# get elements with attribute:
elements = #doc.xpath("//*[#*[blah]]")
Thoughts on how to do this?
Thanks!
I found this here
elements = #doc.xpath("//*[#*[blah]]")
This is not a useful XPath expression. It says to give you all elements that have attributes that have child elements named 'blah'. And since attributes can't have child elements, this XPath will never return anything.
The DZone snippet is confusing in that when they say
elements = #doc.xpath("//*[#*[attribute_name]]")
the inner square brackets are not literal... they're there to indicate that you put in the attribute name. Whereas the outer square brackets are literal. :-p
They also have an extra * in there, after the #.
What you want is
elements = #doc.xpath("//*[#blah]")
This will give you all the elements that have an attribute named 'blah'.
You can use CSS selectors:
elements = #doc.css "[blah]"

Resources