I want to get a class name like the following:
class="hostHostGrid0_body"
The integer in between hostHostGrid and _body can change, but everything else I want it just like that in the order.
How can I achieve this?
In XPath 1.0 you can use this:
//*[starts-with(#class,'hostHostGrid') and substring-after(#class,'_') = 'body']
to select any element containing one class. It will match tags in any context. It will match all three elements below:
<div class="hostHostGrid0_body">
<span class="hostHostGrid123_body"/>
<b class="hostHostGrid1_body">xxx</b>
</div>
Limitations: it doesn't restrict what is between them to a number. It can be anything, including spaces (ex: it will also match this: class="hostHostGrid xyz abc_body")
This one allows for the class occurring among other classes:
//*[contains(substring-before(#class,'_body'),'hostHostGrid')]
It will match:
<div class="other-class hostHostGrid0_body">
<span class="hostHostGrid123_body other-class"/>
<b class="hostHostGrid1_body">xxx</b>
</div>
(it also has the same limitations - will match anything between 'hostHostGrid' and '_body')
Related
I would like to show an example.
This how the page looks:
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Hello</span>
</div>
</a>
<a class="aclass">
<div class="divclass"></div>
<div id="innerclass">
<span class="spanclass">Pick Delivery Location</span>
</div>
</a>
I want to select anchor tags that have a child (direct or non-direct) span that has the text 'Hello'.
Right now, I do something like this:
//a[#class='aclass'][div/span[text() = 'Hello']]
I want to be able to select without having to select direct children (div in this case), like this:
//a[#class='aclass'][//span[text() = 'Hello']]
However, the second one finds all the anchor tags with the class 'aclass' rather than the one with the span with 'Hello' text.
I hope I worded my question clearly. Please feel free to edit if necessary.
In your attempt, // goes back to the root of the document - effectively you are saying "Give me the as for which there is a span anywhere in the document", which is why you get them all.
What you need is the descendant axis :
//a[#class='aclass' and descendant::span[text() = 'Hello']]
Note I have joined the conditions with and, but two separate conditions would also work.
I saw the existing question with the same title but that was a different question.
Let's say that I want to find elements that has "conGraph" in the class. I have tried
//div[contains(#class,'conGraph')]
It correctly got
<div class='conGraph mr'>
but it also falsely got
<div class='conGraph_wrap'>
which is not the same class at all. For this case only, I could use 'conGraph ' and get away with it, but I would like to know the general solution for future use.
In short, I want to get elements whose class contains "word" like "word", "word word2" or "word3 word", etc, but not like "words" or "fake_word" or "sword". Is that possible?
One option could be to use 4 conditions (exact term + 3 contains function with whitespace support) :
For the first condition, you search the exact term in the attribute content. For the second, the third and the fourth you specify all the whitespace variants.
Data :
<div class='word'></div>
<div class='word word2'></div>
<div class='word word3'></div>
<div class='swords word'></div>
<div class='swords word words'></div>
<div class='words'></div>
<div class='fake_word'></div>
<div class='sword'></div>
XPath :
//div[#class="word" or contains(#class,"word ") or contains(#class," word") or contains(#class," word ")]
Output :
<div class='word'></div>
<div class='word word2'></div>
<div class='word word3'></div>
<div class='swords word'></div>
<div class='swords word words'></div>
<div class="from">
<span class="label">Reported by: Rhjj,
<span class="ocation">US</span>
</span> <span class="dat"> </span> </div>
Here I just want the output as "Reported by :Rhjj". But when i use the XPATH as
//div[contains(#class,"from")]//span[contains(#class,"label")] "US" also gets selected.
Is there any other way to select only Reported by: Rhjj, other than using text() and using substring_before comma. Even this is not consistent
//div[contains(#class,"fromTime")]//span[contains(#class,"label")]/text()
The text you want is the first node under the span element with an attribute named class (note I've taken the names from the XML, not your code.). This works for the snippet of XML you've provided.
/div[#class="from"]/span[#class="label"]/node()[1]
I'm making a crawler with Scrapy and wondering why my xpath doesn't work when my CSS selector does? I want to get the number of commits from this html:
<li class="commits">
<a data-pjax="" href="/samthomson/flot/commits/master">
<span class="octicon octicon-history"></span>
<span class="num text-emphasized">
521
</span>
commits
</a>
</li
Xpath:
response.xpath('//li[#class="commits"]//a//span[#class="text-emphasized"]//text()').extract()
CSS:
response.css('li.commits a span.text-emphasized').css('::text').extract()
CSS returns the number (unescaped), but XPath returns nothing. Am I using the // for nested elements correctly?
You're not matching all values in the class attribute of the span tag, so use the contains function to check if only text-emphasized is present:
response.xpath('//li[#class="commits"]//a//span[contains(#class, "text-emphasized")]//text()')[0].strip()
Otherwise also include num:
response.xpath('//li[#class="commits"]//a//span[#class="num text-emphasized"]//text()')[0].strip()
Also, I use [0] to retrieve the first element returned by XPath and strip() to remove all whitespace, resulting in just the number.
How to write the single xpath for this
<div class="col-lg-4 col-md-4 col-sm-4 profilesky"> <div class="career_icon">
<span> Boost </span> <br/>
Your Profile </div>
I am able to write by two line using "contains" method.
.//*[contains(text(),'Boost')]
.//*[contains(text(),'Your Profile')]
But i want in a single line to write the xpath for this.
You can try this way :
.//*[#class='career_icon' and contains(., 'Boost') and contains(., 'Your Profile')]
Above xpath check if there is an element having class attribute equals career_icon and contains both Boost and Your Profile texts in the element body.
Note that text() only checks direct child text node. To check entire text content of an element simply use dot (.).
You can combine several rules just by writing them one after another since they refer to the same element:
.//[contains(text(),'Boost')][contains(text(),'Your Profile')]