I've tried
.//*[#id='post-31']/div/div/div/a[1]
on this input:
<!-- language: lang-html -->
<div class="entry-content">
<div class="myaccount">
<div class="user-profile-links">
Purchase History
|
<a class="current" href="http://store.demoqa.com/products-page/your-account/?tab=edit_profile">Your Details</a>
|
Your Downloads
</div>
</div>
</div>
For the example you posted, there is no root element with an id-attribute, which would explain why it wouldn't work. The xpath expression
/div/div/div/a[1]/#href
finds the href for the first a-element.
Related
I want to select all elements article that don't contain a span element with class status and where the nested a element contains a href attribute which contains the text "rent.html".
I've managed to get the a element like so:
response.xpath('//article[#class="car"]//a[contains(#href,"rent.html")]')
But reading here and trying to select the first parent element article like so returns "data=0"
response.xpath('//article[#class="car"]//a[contains(#href,"rent.html")]//parent::article and not //article[#class="car"]//span[#class="status"]')
I also tried this.
response.xpath('//article[#class="car"][//a[contains(#href,"rent.html")]/article and not //article[#class="car"]//span[#class="status"]')')
I don't know what the expression is for my use case.
<article class="car">
<div>
<div class="container">
<a href="/34625030/rent.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/34625230/rent.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/12325230/buy.html">
</a>
</div>
</div>
</article>
<article class="car">
<div>
<div class="container">
<a href="/34632230/rent.html">
</a>
</div>
</div>
<span class="status">Rented</span>
</article>
This XPath expression will do the work:
"//article[not(.//span[#class='status'])][.//a[contains(#href,'rent.html')]]"
The entire command is:
response.xpath("//article[not(.//span[#class='status'])][.//a[contains(#href,'rent.html')]]")
Explanations:
Translating your requirements into XPath syntax.
"select all elements article" - //article
"that don't contain a span element with class status" - [not(.//span[#class='status'])]
" and where the nested a element contains a href attribute which contains the text "rent.html"" - [.//a[contains(#href,'rent.html')]]
I tested the XPath above on the shared sample XML and it worked properly.
I try to extract all links based on these three conditions:
Must be part of <div data-test="cond1">
Must have a <a href="..." class="cond2">
Must not have a <img src="..." class="cond3">
The result should be "/product/1234".
<div data-test="test1">
<div>
<div data-test="cond1">
Link 1
<div class="test4">
<div class="test5">
<div class="test6">
<div class="test7">
<div class="test8">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div data-test="test2">
<div>
<div data-test="cond1">
Link 2
<div class="test4">
<div class="test5">
<div class="test6">
<div class="test7">
<div class="test8">
<img src="bild.jpg" class="cond3">
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
I'm able to extract the links with the following xpath query.
//div[starts-with(#data-test,"cond")]/a[starts-with(#class,"cond")]/#href
(I know the first part is not really neccessary. But better safe than sorry.)
But I'm still struggling with excluding the links containing an descendant img tag and how to add it to the query above.
This should do what you want:
//div[#data-test="cond1" and not(.//img[#class="cond3"])]
/a[#class="cond2"]
/#href
/product/1234
So, I am building a web crawler for one site's comment section, and I have came with a problem, it seems I can't find a text node for the comments content. This is how the web pages element looks:
<div class="comments"> // this is the whole comments section
<div class="comment"> // this is where the p is located
<div class="comment-top">
<div class="comment-nr">208. PROTAS</div>
<div class="comment-info">
<div class="comment-time">2015-06-30 13:00</div>
<div class="comment-ip">IP: 178.250.32.165</div>
<div class="comment-vert1">
<a href="javascript:comr(24470645,'p')">
<img src="http://img.lrytas.lt/css2/img/com-good.jpg" alt="">
</a> <span id="cy_24470645"> </span>
</div>
<div class="comment-vert2">
<a href="javascript:comr(24470645,'m')">
<img src="http://img.lrytas.lt/css2/img/com-bad.jpg" alt="">
</a> <span id="cn_24470645"> </span>
</div>
</div>
</div>
<p class="text-13 no-intend">Test text</p> // I need to get this comments content
</div>
I tried a lot of xpath's like:
*/div[contains(#class, "comment")]/p/text()
/p[contains(#class, "text-13 no-intend")]/text()
etc.
But can't seem able to locate it.
Would appreciate any help.
How about this:
//div[#class = 'comments']/div[#class = 'comment'][1]/p/text()
I have the follwoing HTML:
<div class=""postrow first"">
<h2 class=""title icon"">
This is the title
</h2>
<div class=""content"">
<div id=""post_message_1668079"">
<blockquote class=""postcontent restore "">
<div>Category</div>
line 1<br /> line2
</blockquote>
</div>
</div>
</div>
<div class=""postrow"">
<h2 class=""title icon"">
second title
</h2>
<div class=""content"">
<div id=""post_message_1668079"">
<blockquote class=""postcontent restore "">
<div>Category</div>
line 1<br /> line2
</blockquote>
</div>
</div>
</div>
What is the xpath string to select all DIVs with attribute is "postrow" or "postrow "
This answer assumes that for each "", you actually have " in your document.
There are a number of alternative XPaths available to you. Here are just two:
Using a conditional |:
//div[#class = "postrow"] | //div[#class = "postrow "]
Using starts-with:
//div[starts-with(#class, "postrow")]
I have a page which compares 4 products at a time in parallel tabular form i.e. It mentions features of each of them one after another. Here is a sample page .
I wish to tag these features so that it becomes easier for search engines to interpret. However, in all the examples given here, you have to mention all the features of a product at a time in a div. This causes a problem for my case, where I mention the features of product together.
A typical example as given goes like this :-
<div itemscope itemtype="http://schema.org/Offer">
<span itemprop="name">Blend-O-Matic</span>
<span itemprop="price">$19.95</span>
</div>
However, I would like it to be in this way :-
<div itemscope itemtype="http://schema.org/Offer">
<span itemprop="name">Blend-O-Matic</span> // Item 1
</div>
<div itemscope itemtype="http://schema.org/Offer">
<span itemprop="name">Blend-O-Matic2</span> // Item 2
</div>
Further followed by :-
<div itemscope itemtype="http://schema.org/Offer">
<span itemprop="price">$19.95</span> // Item 1
</div>
<div itemscope itemtype="http://schema.org/Offer">
<span itemprop="price">$21.95</span> // Item 2
</div>
So, in nutshell, is there a way so that I can tag an item with some code and then use it to refer to other details of that item ?
Please comment if I am unclear in asking my doubt !
Use itemref:
<div itemscope itemtype="http://schema.org/Offer" itemref="item1_price">
<span itemprop="name">Blend-O-Matic</span>
</div>
<div id="item1_price">
<span itemprop="price">$19.95</span>
</div>
See results from Google Structured Data Testing Tool here
You might want to have a look at this for SERP. It shows how to have multiple products in a "ItemList"
http://scottgale.com/schema-org-markup-serp/2013/03/17/
Hth
PS: This works without error or issue on the Google Structured Data testing tool over at http://www.google.com/webmasters/tools/richsnippets
But)))
If to be more realistic - You always have WebPage itemtype yes?
So if you have it we have about this:
<div itemscope="" itemtype="http://schema.org/WebPage">
<div itemscope itemtype="http://schema.org/Offer" itemref="item1_price">
<span itemprop="name">Blend-O-Matic</span>
</div>
<div id="item1_price">
<span itemprop="price">$19.95</span>
</div>
</div>
See the google result
And we have a mistake. If we add the same itemscope="" itemtype="http://schema.org/Offer" we will have one full offer and one duplicate with only price. Code:
<div itemscope="" itemtype="http://schema.org/WebPage">
<div itemscope="" itemtype="http://schema.org/Offer" itemref="item1_price">
<span itemprop="name">Blend-O-Matic</span>
</div>
<div itemscope="" itemtype="http://schema.org/Offer">
<span id="item1_price" itemprop="price">$19.95</span>
</div>
</div>
Google result
So we need a different way as I understand, am I right?