I'm trying to use Xpath in order to select an HTML tag based on its value
Here is my html code:
<span class="yellowbird">Continue</span>
<span class="yellowbird">Stop</span>
I can select the span elements with a specific class value using
//span[contains(#class, 'yellowbird')]
However I'm struggling to select only the element which contains the value "Continue"
This XPath expression will select any span element whose class attribute equals yellowbird and text equals Continue:
//span[#class='yellowbird' and text()='Continue']
Here is the syntax I used to make this work using request.xpath and scrapy
//span[contains(#class, 'yellowbird')][1]//text()='Continue'
Related
I have a block of code like so:
<ul class="open-menu">
<span>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text Here</strong>
<small>...</small>
</div>
</li>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text</strong>
<small>...</small>
</div>
</li>
</span>
</ul>
I'm trying to select a menu item based on exact text like so in the dev tools:
$x('.//*[contains(#data-testid, "menu-item") and normalize-space() = "Text"]');
But this doesn't seem to be selecting the element. However, when I do:
$x('.//*[contains(#data-testid, "menu-item")]');
I can see both of the menu items.
UPDATE:
It seems that this works:
$x('.//*[contains(#class, "menu-item") and normalize-space() = "Text"]');
Not sure why using a class in this context works and not a data-testid. How can I get my xpath selector to work with my data-testid?
Why is this exact text selector not working
The fact that both li elements are matched by the XPath expression
if omitting the condition normalize-space() = "Text" is a clue.
normalize-space() returns ... Text Here ... for the first li
in the posted XML and ... Text ... for the second (or some other
content in place of ... from div/svg or div/small) causing
normalize-space() = "Text" to fail.
In an update you say the same condition succeeds. This has nothing to
do with using #class instead of #data-testid; it must be triggered
by some content change.
How can I get my xpath selector to work with my data-testid?
By testing for an exact text match in the li's descendant strong
element,
.//*[#data-testid = "menu-item" and div/strong = "Text"]
which matches the second li. Making the test more robust is usually
in order, e.g.
.//*[contains(#data-testid,"menu-item") and normalize-space(div/strong) = "Text"]
Append /div/small or /descendant::small, for example, to the XPath
expression to extract just the small text.
data-testid="menu-item" is matching both the outer li elements while text content you are looking for is inside the inner strong element.
So, to locate the outer li element based on it's data-testid attribute value and it's inner strong element text value you can use XPath expression like this:
//*[contains(#data-testid, "menu-item") and .//normalize-space() = "Text"]
Or
.//*[contains(#data-testid, "menu-item") and .//*[normalize-space() = "Text"]]
I have tested, both expressions are working correctly
I've some problem with Xpath syntax with html. I want to select an item which is into a div.
I have a Div define by an id : "popin".
In this div, I have a span with his id is "id_yes".
I can get the div with //DIV[contains(#id ,'popin')] but I failed to get the span element.
Have you a solution ?
If you have the ID, you can use:
//span[#id="id_yes"]
If you want to be more specific, //div[#id="popin"]/span[#id="id_yes"]
That, assuming your IDs are unique.
I want to select the div with class "bmBidderButtonText" and with "Low" as inner text, what should I do?
<div class="bmBidderButtonText"><div class="bmBidderButtonArrow"></div>Low</div>
<div class="bmBidderButtonText"><div class="bmBidderButtonArrow"></div>High</div>
Merely //div[#class="bmBidderButtonText"] will select two divs, but how should I include the "Low" as inner text as condition within the xpath?
You can use . to reference current context element, so implementing additional criteria of "...and with 'Low' as inner text" in XPath would be as simple as adding and .='Low' in the predicate of your initial XPath :
//div[#class="bmBidderButtonText" and .="Low"]
demo
Try this below xpath
//div[#class='bmBidderButtonText'][text() ='Low']
Explanation:- Use class attribute of <div> tag along with the text method.
use and:
//div[#class="bmBidderButtonText" and contains(., "Low")]
You can use contains() for this reason:
//div[contains(text(), 'Low')]
Additional resources:
Choosing Effective XPaths
I'm trying to perform html scrapping of a webpage. I like to fetch the three alternate text (alt - highlighted) from the three "img" elements.
I'm using the following code extract the whole "img" element of slide-1.
from lxml import html
import requests
page = requests.get('sample.html')
tree = html.fromstring(page.content)
text_val = tree.xpath('//a[class="cover-wrapper"][id = "slide-1"]/text()')
print text_val
I'm not getting the alternate text values displayed. But it is an empty list.
HTML Script used:
This is one possible XPath :
//div[#id='slide-1']/a[#class='cover-wrapper']/img/#alt
Explanation :
//div[#id='slide-1'] : This part find the target <div> element by comparing the id attribute value. Notice the use #attribute_name syntax to reference attribute in XPath. Missing the # symbol would change the XPath selector meaning to be referencing a -child- element with the same name, instead of an attribute.
/a[#class='cover-wrapper'] : from each <div> element found by the previous bit of the XPath, find child element <a> that has class attribute value equals 'cover-wrapper'
/img/#alt : then from each of such <a> elements, find child element <img> and return its alt attribute
You might want to change the id filter to be starts-with(#id,'slide-') if you meant to return the all 3 alt attributes in the screenshot.
Try this:
//a[#class="cover-wrapper"]/img/#alt
So, I am first selecting the node having a tag and class as cover-wrapper and then I select the node img and then the attribute alt of img.
To find the whole image element :
//a[#class="cover-wrapper"]
I think you want:
//div[#class="showcase-wrapper"][#id="slide-1"]/a/img/#alt
Is there an xpath way to select a given attribute value?
For example I have an html document and want to select only "?ms=669601" :
<input type="button" value="تفاصيل" onclick="xmlreqGET("?ms=669601","jm1x");">
In your simple example, you could simply select that portion of the onclick attribute in the only input:
substring(input/#onclick, 12, 10)
In more complicated documents, try selecting first by #value (or some other (possibly unique) criteria):
substring(//input[#value='تفاصيل']/#onclick, 12, 10)
Or by targeting the input that contains part of the desired substring:
substring(//input[contains(#onclick, 'xmlreqGET(')]/#onclick, 12, 10)
Selecting the input element itself if its onclick attribute contains the target string:
//input[contains(#onclick, '?ms=669601')]
Note: Your input is not valid XML, due to nested double-quotes.