xpath - grab content only if preceding div has certain word - xpath

I want to grab the textual content of this span class but only on the condition the word 'Country' is used in the code before it:
<li itemscope="" itemtype="http://data-vocabulary.org/Breadcrumb"><a href="/testurl.html"
itemprop="url" onclick="ta.setEvtCookie('Breadcrumbs', 'click', 'Country', 2, this.href); ">
<span itemprop="title">China</span></a><img src="http://imagepath.gif" class="fake class"
alt="">
Does anyone know how I can do this?
To be clear, if the xpath query sees the word 'Country' I want it to return the word 'China'.

Instead of checking previous element, try to check parent element. Because, in the sample markup, the span is located within the element that contains word 'Country' :
//span[parent::a[contains(#onclick,'Country')]]
Above XPath search for <span> element that has parent <a> element with attribute onclick value contains 'Country'.

Related

xPath - Why is this exact text selector not working with the data test id?

I have a block of code like so:
<ul class="open-menu">
<span>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text Here</strong>
<small>...</small>
</div>
</li>
<li data-testid="menu-item" class="menu-item option">
<svg>...</svg>
<div>
<strong>Text</strong>
<small>...</small>
</div>
</li>
</span>
</ul>
I'm trying to select a menu item based on exact text like so in the dev tools:
$x('.//*[contains(#data-testid, "menu-item") and normalize-space() = "Text"]');
But this doesn't seem to be selecting the element. However, when I do:
$x('.//*[contains(#data-testid, "menu-item")]');
I can see both of the menu items.
UPDATE:
It seems that this works:
$x('.//*[contains(#class, "menu-item") and normalize-space() = "Text"]');
Not sure why using a class in this context works and not a data-testid. How can I get my xpath selector to work with my data-testid?
Why is this exact text selector not working
The fact that both li elements are matched by the XPath expression
if omitting the condition normalize-space() = "Text" is a clue.
normalize-space() returns ... Text Here ... for the first li
in the posted XML and ... Text ... for the second (or some other
content in place of ... from div/svg or div/small) causing
normalize-space() = "Text" to fail.
In an update you say the same condition succeeds. This has nothing to
do with using #class instead of #data-testid; it must be triggered
by some content change.
How can I get my xpath selector to work with my data-testid?
By testing for an exact text match in the li's descendant strong
element,
.//*[#data-testid = "menu-item" and div/strong = "Text"]
which matches the second li. Making the test more robust is usually
in order, e.g.
.//*[contains(#data-testid,"menu-item") and normalize-space(div/strong) = "Text"]
Append /div/small or /descendant::small, for example, to the XPath
expression to extract just the small text.
data-testid="menu-item" is matching both the outer li elements while text content you are looking for is inside the inner strong element.
So, to locate the outer li element based on it's data-testid attribute value and it's inner strong element text value you can use XPath expression like this:
//*[contains(#data-testid, "menu-item") and .//normalize-space() = "Text"]
Or
.//*[contains(#data-testid, "menu-item") and .//*[normalize-space() = "Text"]]
I have tested, both expressions are working correctly

XPATH - grab content of div after named element

There are a number of labels, I want to specify them in xpath and then grab the text after them, example:
<div class="info-row">
<div class="info-label"><span>Variant:</span></div>
<div class="info-content">
<p>750 ml</p>
</div>
</div>
So in this case, I want to say "after the span named 'Variant' grab the p tag:
Result: 750ml
I tried:
//span[text()='Variant:']/following-sibling::p
and variations of this but to no avail.
'following-sibling' function selects all siblings after the current node,
there no siblings for span with text 'Variant:', and correct to search siblings for span parent.
Here is an example which will work
//span[text()='Variant:']/ancestor::div[#class="info-label"]/following-sibling::div/p

What is Valid Xpath for link extract by div class name?

What is Valid Xpath for link extract by div class name?
Here is html code:
<div class="poster">
<a href="/title/tt2091935/mediaviewer/rm4278707200?ref_=tt_ov_i"> <img alt="Mr. Right Poster" title="Mr. Right Poster" src="http://ia.media-imdb.com/images/M/MV5BOTcxNjUyOTMwOV5BMl5BanBnXkFtZTgwMzUxMDk4NzE#._V1_UX182_CR0,0,182,268_AL_.jpg" itemprop="image">
</a> </div>
I want to know exact Xpath as if i found href link.
I try with //a/#href[#class='poster'] but it's doesn't work
The <div> contains the <a> so you can use that to navigate:
//div[#class='poster']/a/#href
Remember that the "poster" class is defined on the <div> not on the <a> so that's where you need to apply the predicate.
//div returns all <div> elements
[#class='poster'] is a predicate that filters by class
/a returns all <a> elements that are children of those <div>s
/#href gives us the attribute we want
Depending on the system you're using you might need to wrap the whole expression in text() in order to bring back the attribute data rather than the DOM node.

Xpath get text of nested item not working but css does

I'm making a crawler with Scrapy and wondering why my xpath doesn't work when my CSS selector does? I want to get the number of commits from this html:
<li class="commits">
<a data-pjax="" href="/samthomson/flot/commits/master">
<span class="octicon octicon-history"></span>
<span class="num text-emphasized">
521
</span>
commits
</a>
</li
Xpath:
response.xpath('//li[#class="commits"]//a//span[#class="text-emphasized"]//text()').extract()
CSS:
response.css('li.commits a span.text-emphasized').css('::text').extract()
CSS returns the number (unescaped), but XPath returns nothing. Am I using the // for nested elements correctly?
You're not matching all values in the class attribute of the span tag, so use the contains function to check if only text-emphasized is present:
response.xpath('//li[#class="commits"]//a//span[contains(#class, "text-emphasized")]//text()')[0].strip()
Otherwise also include num:
response.xpath('//li[#class="commits"]//a//span[#class="num text-emphasized"]//text()')[0].strip()
Also, I use [0] to retrieve the first element returned by XPath and strip() to remove all whitespace, resulting in just the number.

How to write the single xpath when the text is in two lines

How to write the single xpath for this
<div class="col-lg-4 col-md-4 col-sm-4 profilesky"> <div class="career_icon">
<span> Boost </span> <br/>
Your Profile </div>
I am able to write by two line using "contains" method.
.//*[contains(text(),'Boost')]
.//*[contains(text(),'Your Profile')]
But i want in a single line to write the xpath for this.
You can try this way :
.//*[#class='career_icon' and contains(., 'Boost') and contains(., 'Your Profile')]
Above xpath check if there is an element having class attribute equals career_icon and contains both Boost and Your Profile texts in the element body.
Note that text() only checks direct child text node. To check entire text content of an element simply use dot (.).
You can combine several rules just by writing them one after another since they refer to the same element:
.//[contains(text(),'Boost')][contains(text(),'Your Profile')]

Resources