Get what's behind "Location:" with XPath - xpath

I try to get France that's behind Location:
I wrote this XPath: //div[#class="vevent"]/div/div/span[text()="Location: "]. That's how far I came, but how to I get France that comes after.
<div class="vevent">
<div style="float:left; padding-right: 20px;"><img alt="I’M YOUR DJ New Year Edition / 5th Anniversary Celebration" src="https://res.cloudinary.com/latindancecalendar/image/fetch/w_350,h_350/https%3A%2F%2Fi1.wp.com%2Flatindancecalendar.com%2Fdancecal%2Fwp-content%2Fuploads%2F50015933_2182298412088647_1361353647551676416_o.jpg%3Fresize%3D350%252C350%26ssl%3D1" width="200" height="200" scale="0"></div>
<div style="float:left;"><span class="dtstart"><span class="value-title" title="2019-12-27"></span></span><span><b>Friday, 27 December 2019</b></span>
<div><span class="location">Château Lafitte Yvrac ( Bordeaux ) - SCEA Chateau LAFITTE 41 Chemin du Loup, 33370 Yvrac, Aquitaine, France</span></div>
<div>Hosted by <b>I’M YOUR DJ – New Year Edition – Bordeaux</b></div>
<div><span>Location: </span>France</div>
<div><span class="eventpostviews"><i class="fas fa-map-marker-alt" style="padding-right: 5px;"></i>Map | 175 Views | <a style="color:#8F8F8F;" href="https://latindancecalendar.com/report-a-listing/?listing-url=https%3A%2F%2Flatindancecalendar.com%2Ffestivals%2Fim-your-dj-new-year-edition-5th-anniversary-celebration-2019%2F" rel="nofollow"><i class="fas fa-times" style="padding-right: 3px;font-size: 11px;"></i>Report Problem</a></span></div>
<div style="padding-top: 20px;">
<iframe src="//www.facebook.com/plugins/like.php?href=https%3A%2F%2Flatindancecalendar.com%2Ffestivals%2Fim-your-dj-new-year-edition-5th-anniversary-celebration-2019%2F&width&layout=button_count&action=like&size=large&show_faces=false&share=false&height=35&appId=532405580227185" scrolling="no" frameborder="0" style="border:none; overflow:hidden; height:35px;" allowtransparency="true"></iframe>
</div>
</div>
</div>

This should work:
//div[#class="vevent"]/div/div[span = "Location: "]/text()
It selects the div that contains a span with the string value "Location: " and then retrieves the text node directly within that div.

Better give the html in script. Not picture.
And I guess you can use:
'''
following::text()
'''

Related

Error trying to get data using XPath on Google IMPORTXML function

I am trying to find the XPath to get 5 values of the following website: https://plataforma.penserico.com/dashboard/cp.pr?e=TRPL4
I want the values 7,59 2,04 1,81 7,60 7,59
For the first value I tried this command but I get #N/A:
=IMPORTXML("https://plataforma.penserico.com/dashboard/cp.pr?e=TRPL4";"//*[#id='j_idt104:0:j_idt109:1:chartPanel0']/div/span[1]")
The piece of HTML is like below:
<span id="j_idt104:0:j_idt109:1:chartPanel0">
<div class="c--anim-btn" style="color: #5DADE2;">
<span class="c-anim-btn">
7,59
</span>
<span>
<div style="font-size: 12px !important;">
<div style="width: 90%; left: 5%; position:relative;line-height:2em;white-space: nowrap;">
<div style="width:50%;float:left"><label class="idtri">1T:</label>2,04</div>
<div style="width:50%;float:right"><label class="idtri">2T:</label>1,81</div>
</div>
<div style="width: 90%; left: 5%; position:relative;line-height:2em;white-space: nowrap;">
<div style="width:50%;float:left"><label class="idtri">3T:</label>7,60</div>
<div style="width:50%;float:right"><label class="idtri">4T:</label>7,59</div>
</div>
</div>
</span>
</div></span>
What could be the second paramenter to get the values I want?
Thank you
You have to fix your XPath with the following one to get the values :
//tr[.//span[.='P/L']]/td[2]//text()[parent::span[#class='c-anim-btn'] or parent::div][normalize-space()]
Output (formula in C4):
EDIT : Individual XPath :
//tr[.//span[.='P/L']]/td[2]//text()[parent::span[#class='c-anim-btn']]
(//tr[.//span[.='P/L']]/td[2]//text()[parent::div][normalize-space()])[1]
(//tr[.//span[.='P/L']]/td[2]//text()[parent::div][normalize-space()])[2]
(//tr[.//span[.='P/L']]/td[2]//text()[parent::div][normalize-space()])[3]
(//tr[.//span[.='P/L']]/td[2]//text()[parent::div][normalize-space()])[4]

Identifying previous items using xpath in Selenium IDE

I want to check the checkbox based on the value in the text input labeled 'Field' I have tried the following:
<tr>
<td>check</td>
<td>/label[text()="Field"]/../input[#value="6 1012 49817"]/preceding-sibling::label[text()="Private"]/../input</td>
<td></td>
</tr>
Here is the HTML:
<div class="wdg colShwHdeCls" id="divFormFieldPrivate-0" style="width: 82px;">
<input id="FormFieldPrivate-0" name="FormFieldPrivate-0" title="" style="" class="wdg colShwHdeCls" type="checkbox">
<label for="FormFieldPrivate-0">Private</label>
</div>
<div class="csLineBreak"> </div>
<div class="acI fldWd100 wdg colShwHdeOpn" id="divFormFieldId-0"><label for="FormFieldId-0">Field<a class="aut" title="Show selection list"></a>
<a style="display: inline-block; opacity: 0.0118143;" href="field/view?FieldId=" title="View this Field" class="acOptVw acLb acI"></a>
<a style="display: inline-block; opacity: 0.0118143;" href="field/edit?FieldId=" class="acEd acLb acI" title="Edit this Field"></a>
<a style="display: inline-block; opacity: 0.0118143;" href="field/add?FieldId=" class="acAd lightbox acI" title="Add a new Field"></a>
</label>
<br>
<span style="display:none;" id="FormFieldId-0-Old">6 1012 49817</span>
<input id="FormFieldId-0" name="FormFieldId-0" value="11955" type="hidden">
<input autocomplete="off" id="FormFieldId-0-Dsp" title="type three or more characters to see selection list" class="wdg csAutCpl csAutCplFld ui-autocomplete-input" value="6 1012 49817" type="text">
<span class="ui-helper-hidden-accessible" aria-live="polite" role="status"></span>
</div>
Suggestions are welcome thanks ;-)
This is one possible way. First part of the XPath supposed to find the div element containing label with certain text ("Field" in this case) and input with certain value attribute :
//div[label[normalize-space(text())="Field"] and input[#value="6 1012 49817"]]
From the above div, find preceding sibling div containing label with text equals "Private", then get the input child element of that div :
/preceding-sibling::div[label[text()="Private"]]/input
So the entire XPath will look about like this :
//div[label[normalize-space(text())="Field"] and input[#value="6 1012 49817"]]/preceding-sibling::div[label[text()="Private"]]/input
xpathtester.com demo

Scrapy and XPath issue with nested Xpaths

I'm trying to read Amazon products into scrapy.
Starting from a random category using this XPath:
products = Selector(response).xpath('//div[#class="s-item-container"]')
for product in products:
item = AmzItem()
item['title'] = product.xpath('//a[#class="s-access-detail-page"]/#title').extract()[0]
item['url'] = product.xpath('//a[#class="s-access-detail-page"]/#href').extract()[0]
yield item
('//div[#class="s-item-container"]') returns all the divs with the products on one category page - that's correct.
Now, how would I get the link to the product?
// stands for where ever in the code
a with the #class should select the right class
But I get a:
item['title'] = product.xpath('//a[#class="s-access-detail-page"]/#title').extract()[0]
exceptions.IndexError: list index out of range
So my list matching this XPath must be empty - but I don't understand why?
EDIT:
The HTML would look like that:
<div class="s-item-container" style="height: 343px;">
<div class="a-row a-spacing-base">
<div class="a-column a-span12 a-text-left">
<div class="a-section a-spacing-none a-inline-block s-position-relative">
<a class="a-link-normal a-text-normal" href="https://rads.stackoverflow.com/amzn/click/com/B0105S434A" rel="nofollow noreferrer"><img alt="Product Details" src="http://ecx.images-amazon.com/images/I/41%2BzrAY74UL._AA160_.jpg" onload="viewCompleteImageLoaded(this, new Date().getTime(), 24, false);" class="s-access-image cfMarker" height="160" width="160"></a>
<div class="a-section a-spacing-none a-text-center">
<div class="a-row a-spacing-top-mini">
<a class="a-size-mini a-link-normal a-text-normal" href="https://rads.stackoverflow.com/amzn/click/com/B0105S434A" rel="nofollow noreferrer">
<div class="a-box">
<div class="a-box-inner a-padding-mini"><span class="a-color-secondary">See more choices</span></div>
</div>
</a>
</div>
</div>
</div>
</div>
</div>
<div class="a-row a-spacing-mini">
<div class="a-row a-spacing-none">
<a class="a-link-normal s-access-detail-page a-text-normal" title="Harry Potter Gryffindor School Fancy Robe Cloak Costume And Tie (Size S)" href="https://rads.stackoverflow.com/amzn/click/com/B0105S434A" rel="nofollow noreferrer">
<h2 class="a-size-base a-color-null s-inline s-access-title a-text-normal">Harry Potter Gryffindor School Fancy Robe Cloak Costume And Tie (Size S)</h2>
</a>
</div>
<div class="a-row a-spacing-mini"><span class="a-size-small a-color-secondary">by </span><span class="a-size-small a-color-secondary">Legend</span></div>
</div>
<div class="a-row a-spacing-mini">
<div class="a-row a-spacing-none"><a class="a-size-small a-link-normal a-text-normal" href="http://www.amazon.com/gp/offer-listing/B0105S434A/ref=sr_1_21_olp?s=pet-supplies&ie=UTF8&qid=1435391788&sr=1-21&keywords=pet+supplies&condition=new"><span class="a-size-base a-color-price a-text-bold">$28.99</span><span class="a-letter-space"></span>new<span class="a-letter-space"></span><span class="a-color-secondary">(1 offer)</span><span class="a-letter-space"></span><span class="a-color-secondary a-text-strike"></span></a></div>
</div>
<div class="a-row a-spacing-none"><span name="B0105S434A">
<span class="a-declarative" data-action="a-popover" data-a-popover="{"max-width":"700","closeButton":"false","position":"triggerBottom","url":"/review/widgets/average-customer-review/popover/ref=acr_search__popover?ie=UTF8&asin=B0105S434A&contextId=search&ref=acr_search__popover"}"><i class="a-icon a-icon-star a-star-4"><span class="a-icon-alt">3.9 out of 5 stars</span></i><i class="a-icon a-icon-popover"></i></span></span>
<a class="a-size-small a-link-normal a-text-normal" href="https://rads.stackoverflow.com/amzn/click/com/B0105S434A" rel="nofollow noreferrer">48</a>
</div>
</div>
It should be:
# ------------- The dot makes the query relative to product
product.xpath('.//a[#class="s-access-detail-page"]/#title')
//a[#class="s-access-detail-page"] requires to be exactly class="s-access-detail-page", because xpath works with string but not with meaning :) When you have "multi class ", use contains function
//a[contains(concat(' ', #class, ' '), " s-access-detail-page ")]/#title

Looking for same xpath for grid's column text from two different pages

In our application, there is a situation where there is a grid on two pages. I want to get text of columns from the grids. But both grid's column text has little different HTML.
Page 1 grid HTML:
<div class="ngHeaderContainer" ng-style="headerStyle()" style="width: 598px; height: 30px;">
<div class="ngHeaderScroller" ng-style="headerScrollerStyle()" ng-header-row="" style="height: 30px;">
<div class="ngHeaderCell ng-scope col0 colt0" ng-class="col.colIndex()" ng-repeat="col in renderedColumns" ng-style="{ height: col.headerRowHeight }" style="height: 30px;">
<div class="ngVerticalBar ngVerticalBarVisible" ng-class="{ ngVerticalBarVisible: !$last }" ng-style="{height: col.headerRowHeight}" style="height: 30px;"> </div>
<div ng-header-cell="">
<div class="ngHeaderSortColumn " ng-class="{ 'ngSorted': !col.noSortVisible() }" ng-style="{'cursor': col.cursor}" style="cursor: pointer;" draggable="true">
<div class="ngHeaderText ng-binding colt0" ng-class="'colt' + col.index" ng-click="col.sort($event)">Request ID</div>
For this, I've written xpath //div[#class='ngHeaderContainer']//div[#ng-header-cell='']//div[contains(#class,'ngHeaderText')]
Page 2 grid HTML
<div class="ngHeaderContainer" ng-style="headerStyle()" style="width: 598px; height: 30px;">
<div class="ngHeaderScroller" ng-style="headerScrollerStyle()" ng-header-row="" style="height: 30px;">
<div class="ngHeaderCell ng-scope col0 colt0" ng-class="col.colIndex()" ng-repeat="col in renderedColumns" ng-style="{ height: col.headerRowHeight }" style="height: 30px;">
<div class="ngVerticalBar ngVerticalBarVisible" ng-class="{ ngVerticalBarVisible: !$last }" ng-style="{height: col.headerRowHeight}" style="height: 30px;"> </div>
<div ng-header-cell="">
<div class="ng-scope ng-binding" ng-click="onColumnClick( 3, 'select', $event)">
Request ID
<img class="" ng-click="onColumnClick( 3, 'delete', $event)" src="styles/images/common/delete.png" ng-show="true">
<img>
</div>
For this, I've written xpath //div[#class='ngHeaderContainer']//div[#ng-header-cell='']/div
For grid, I've written a class and in that class I've method which returns column names. Since, xpath till reach to column name is different for grid on two different pages, I won't be able to use same method.
Can someone please help me to get xpath which can be used to return column names of the grid of both the pages?
This xpath will do it hopefully. I ran into similiar issue. Took help from here. This should return you both elements
//*[contains(#class, 'ng-binding')]

Selecting a specific div element with Xpath and Nokogiri?

I am relatively new to parsing and would like to get more practice. I want to parse the following URL: http://www.goodreads.com/quotes/tag/hard-work.
I want to grab all quotes tagged "hard-work". This is what the site code breaks down to:
<div class="content">
<div id="siteheader" class="uitext">
<div class="mainContentContainer ">
<div class="mainContent">
<div id="premiumAdTop">
<div class="mainContentFloat">
<div id="flashContainer"> </div>
<div id="connectPrompt" style="">
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote">
<h1>Quotes About Hard Work</h1>
<div class="leftContainer">
<div class="mediumText">
<div class="quote mediumText ">
<div class="quoteDetails ">
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth">
<div class="quoteText">
“It's hard to beat a person who never gives up.”
<br>
―
Babe Ruth
</div>
Right now my code is:
require "rubygems"
require "open-uri"
require "nokogiri"
#page = Nokogiri::HTML(open("http://goodreads.com/quotes"))
#div = #page.xpath("html/body/div[1]")
But the results aren't giving me the output that I want.
I think I ought to call the methods each and collect but I just don't know how to get to the node that I want, which I believe is contained somewhere in here:
<div id="connectPrompt" style="">
<img style="float: left; margin: -3px 5px 0px 0px" src="http://s.gr-assets.com/assets/quote/quote_tiny-566b7de5e1ac5becd0dd8b2856f59228.jpg" alt="quote">
<h1>Quotes About Hard Work</h1>
<div class="leftContainer">
<div class="mediumText">
<div class="quote mediumText ">
<div class="quoteDetails ">
<a class="leftAlignedImage" href="/author/show/3916262.Babe_Ruth">
<div class="quoteText">
“It's hard to beat a person who never gives up.”
<br>
―
Babe Ruth
</div>
Can anyone point me in the right direction please? How far in do I have to go into the div class to get what I want?
You can use the XPath:
//div[#class = 'quoteText' and following-sibling::div[1][#class = 'quoteFooter' and .//a[#href and normalize-space() = 'hard-work']]]
to select all the div elements whose class is quoteText and which are followed by a div with class quoteFooter containing a link with hard-work.

Resources