How to find xpath if class contains a space - xpath

I have a site, I need extract some information from class, but class have a space, how I can do it?
<div class ="product-item view-list " data-ht="2" data-pc="13">
Thank you for a help!

If you need to use XPath, this should work:
response.xpath('//div[contains(concat(" ", normalize-space(#class), " "), " product-item view-list ")]')
Or, you can use CSS:
response.css('div.product-item.view-list')

Below elements
<div class ="product-item view-list " data-ht="2" data-pc="13">
<div class ="product-item view-list" data-ht="2" data-pc="13">
are same. There is no space in the class name as such. So you need to just use
response.css(".product-item.view-list")

Related

Xpath insert "," after every li

I have a piece of code and XPath to export it. the code is:
<div class="container-fluid">
<ul class="tags expandable">
<li><a class="search__link" href="domain.com">office</a></li>
<li><a class="search__link" href="domain.com">space</a></li>
</ul>
</div>
and the Xpath is :
//ul[contains(concat (" ", normalize-space(#class), " "), " tags expandable ")]
this Xpath export data is like this: "office space"
but I want to insert "," after each li and I want the export like this: "office, space,"
You should use something like this with XPath 1.0:
translate(normalize-space(//ul)," ",",")
returns "office,space".
To add the last coma :
concat(translate(normalize-space(//ul)," ",","),",")
returns "office,space,".
EDIT : With the website link you posted, use this one-liner to get what you want :
translate(normalize-space(//div[#class="container-fluid"]/ul)," ",",")
Output :
People,Space,Women,Friends,Communication,Group,Support,Community,Beautiful,Unity,Gender,Movement,Gathering,Copy,Rights,Feminist,Empower,Supporting,Copy,space,Empowering
It still needs some corrections (remove the undesired "," between Copy and space). If you need something fully automatic and since you have to use XPath 1.0 (no replace function), you can try :
translate(normalize-space(translate(//div[#class="container-fluid"]/ul," ",""))," ",",")
Output (Copy space are now merged) :
People,Space,Women,Friends,Communication,Group,Support,Community,Beautiful,Unity,Gender,Movement,Gathering,Copy,Rights,Feminist,Empower,Supporting,Copyspace,Empowering
Otherwise, just use :
//ul[#class="tags expandable"]//a/text()
And add the comas with the programming language you want.

Finding the xpath of a class name with \n and spaces

This may be an easy question, I'm new to this.
I'm trying to get the data within this div
<div class="search-results-listings
" vocab="http://schema.org/" typeof="SearchResultsPage">
response.xpath("//div[#class='search-results-listings\n']")
and
response.xpath("//div[#class='search-results-listings\n ']")
are returning empty arrays
You can use XPath's contains:
response.xpath("//div[contains(#class, 'search-results-listings')]")

Use xpath or xquery to show text in title attribute

I'd like to use xquery (I believe) to output the text from the title attribute of an html element.
Example:
<div class="rating" title="1.0 stars">...</div>
I can use xpath to select the element, but it tries to output the info between the div tags. I think I need to use xquery to output the "1.0 stars" text from the title attribute.
There's gotta be a way to do this. My Google skills are proving ineffective in coming up with an answer.
Thanks.
XPath: //div[#class='rating']/#title
This will give you the title text for every div with a class of "rating".
Addendum (following from comments below):
If the class has other, additional text in it, in addition to "rating", then you can use something like this:
//div[contains(concat(' ', normalize-space(#class), ' '), ' rating ')]
(Hat tip to How can I match on an attribute that contains a certain string?).
You should use:
let $XML := <p><div class="rating" title="2.0 stars">sdfd</div><div class="rating" title="1.0 stars">sdfd</div></p>
for $title in $XML//#title
return
<p>{data($title)}</p>
to get output:
<p>2.0 stars</p>
<p>1.0 stars</p>

XPath Expressions - find elements without specifying the entire path

I am doing some screen scraping with a library that takes XPath expressions and noticed that several pages are similar, but different.
Is there a way to loosely say "get me divs that have class='mytarget' but exist as a child of a div with class = 'nav' and the exact path is unknown between nav and mytarget."
<div class="nav">
<div>
??????
<div class="mytarget"></div>
??????
</div>
</div>
Yes, using the descendant-or-self axis (//):
//div[#class='nav']//div[#class='mytarget']
Or, if there can be more than one class name on those elements, then this is even better:
//div[contains(concat(' ', #class, ' '), ' nav ')]//
div[contains(concat(' ', #class, ' '), ' mytarget ')]
Warning: this can be very inefficient on large documents. You should use absolute paths wherever the structure is known. Only resort to // when the structure is unknown.
Thats what "//" expressions are for. Something like:
//*[#class="nav"]//*[#class="mytarget"]
http://www.w3schools.com/xpath/xpath_syntax.asp

How to get html elements with multiple css classes

I know how to get a list of DIVs of the same css class e.g
<div class="class1">1</div>
<div class="class1">2</div>
using xpath //div[#class='class1']
But how if a div have multiple classes, e.g
<div class="class1 class2">1</div>
What will the xpath like then?
The expression you're looking for is:
//div[contains(#class, 'class1') and contains(#class, 'class2')]
I highly suggest XPath visualizer, which can help you debug xpath expressions easily. It can be found here:
http://xpathvisualizer.codeplex.com/
According to this answer, which explains why it is important to make sure substrings of the class name that one is looking for are not included, the correct answer should be:
//div[contains(concat(' ', normalize-space(#class), ' '), ' class1 ')
and contains(concat(' ', normalize-space(#class), ' '), ' class2 ')]
There's a useful python package called cssselect.
from cssselect import CSSSelector
CSSSelector('div.gallery').path
Generates a usable XPath:
descendant-or-self::div[#class and contains(concat(' ', normalize-space(#class), ' '), ' gallery ')]
It's very similar to Flynn1179's answer.
i think this the expression you're looking for is
//div[starts-with(#class, "class1")]/text()
You could also do:
//div[contains-token(#class, 'class_one') and contains-token(#class, 'class_two')]

Resources