Filter/Exclude xPath extraction via "pattern" - xpath

This is what I have to work with:
<div class="Pictures zoom">
<a title="Productname 1" class="zoomThumbActive" rel="{gallery: 'gallery1', smallimage: '/images/2.24198/little_one.jpeg', largeimage: '/images/76.24561/big-one-picture.jpeg'}" href="javascript:void(0)" style="border-width:inherit;">
<img title="Productname 1" src="/images/24.245/mini-doge-picture.jpeg" alt="" /></a>
<a title="Productname 1" rel="{gallery: 'gallery1', smallimage: '/images/2.24203/small_one.jpeg', largeimage: '/images/9.5664/very-big-one-picture.jpeg'}" href="javascript:void(0)" style="border-width:inherit;">
<img title="Productname 1" src="/images/22.999/this-picture-is-very-small.jpeg" alt="" /></a>
<div>
Using following Xpath:
/html//div[#class='Pictures zoom']/a/#rel
The output becomes:
{gallery: 'gallery1', smallimage: '/images/2.24198/little_one.jpeg', largeimage: '/images/76.24561/big-one-picture.jpeg'}
{gallery: 'gallery1', smallimage: '/images/2.24203/small_one.jpeg', largeimage: '/images/9.5664/very-big-one-picture.jpeg'}
Is it possible to filter the extraction, so intread of above, I only get these:
/images/76.24561/big-one-picture.jpeg
/images/9.5664/very-big-one-picture.jpeg
I only wish to keep everything between largeimage: ' and '}
Best regards,
Liu Kang

Use substring-before and substring-after to cut of the parts you do not want.
Using XPath 1.0, this can only be done for single results (so you cannot fetch all URLs contained in one document with a single XPath call). This query will return the first URL:
substring-before(substring-after((//#rel)[1], "largeimage: '"), "'")
XPath 2.0 allows you to run functions as axis steps. This query will return all URLs you're looking for as single tokens:
//#rel/substring-before(substring-after(., "largeimage: '"), "'")

Related

Sphinx anchor defined twice (singlehtml output)

I have a sphinx project which uses figures and footnotes. I noticed that as soon as I include a caption in figures, the ids rendered in HTML are defined twice.
For example, consider a minimal project like this:
Project Example
===============
this is index.rst
.. toctree::
:maxdepth: 2
:caption: Contents:
inc
hello [#0]_ world:
We should expect that footnote 0 [#1]_ would have `id1`, and footnote 2 `id2`
.. [#0] Lorem Impsum.
.. [#1] Lorem Impsum.
And inc.rst:
Included
========
.. figure:: _static/cat.jpg
:scale: 20%
:align: center
This is a caption
Running sphinx-build -M singlehtml "." "_build" renders:
<span id="document-inc"></span><section id="included">
<h2>Included<a class="headerlink" href="#included" title="Permalink to this headline">¶</a></h2>
<figure class="align-center" id="id1">
<a ...></a>
<figcaption>
<p><span class="caption-text">This is a caption</span><a class="headerlink" href="#id1" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
</section>
</div>
<p>We should expect that operator <a class="footnote-reference brackets" href="#id3" id="id1">2</a> would:</p>
<p>We should expect that operator <a class="footnote-reference brackets" href="#id4" id="id2">3</a> would:</p>
<dl class="footnote brackets">
<dt class="label" id="id3"><span class="brackets"><a class="fn-backref" href="#id1">2</a></span></dt>
<dd><p>Lorem Impsum.</p>
</dd>
<dt class="label" id="id4"><span class="brackets"><a class="fn-backref" href="#id2">3</a></span></dt>
<dd><p>Lorem Impsum.</p>
</dd>
</dl>
If I remove the caption, the figure opening HTML is rendered without id="id1", like this:
<figure class="align-center">
Is this a bug in sphinx?
Can I tell sphinx use the following id in figure to avoid collisions?

XPATH start scraping after certain word

I am trying to get the location from this html using XPATH. So what I want to say is [in human terms] "when you see Location: grab the next piece of text then stop.
<td width="670">
<h1>Accor Vacation Club - SOLD</h1>
<h2>All Australia, Australia</h2>
<p class="property_number">Property ref: 002</p>
<h3 class="cl2">Description</h3><p class="xh-highlight">Resort: Accor Vacation Club. <br>Location: Australia. <br>Type of Ownership: Points. <br>Season: All. <br>Size of Unit: Studio. <br>Price: SOLD</p><p class="xh-highlight"> </p><p class="xh-highlight"><span style="font-size: 16pt">SOLD</span> </p>
<table width="100%" border="0" cellspacing="0" cellpadding="0" id="photorealestate">
<tbody><tr>
I got this far but can't seem to isolate that word:
//p[./preceding-sibling::h3[contains(., 'Description')]]
//p/text()[./preceding-sibling::h3[contains(., 'Description')]]
If you need to get "Australia" as output you can use below expression
substring-after(//text()[starts-with(., 'Location')], 'Location: ')
This will select text node that starts with word "Location" and return sub-string preceded by "Location: "

How to write xpath

I want to get text "+12345" from this HTML
<p class="Test" ng-repeat="(k, wl) in partnerEditModel.td">
<span id="Test-update-12345" class="ng-binding">
+12345
<span class="err-message ng-binding">Error</span>
</span>
<a id="mibile" class="button" ng-click="remove(k)">
</p>
I have written "//p[#class='Test']/span but it matched "+12345Error" which I have just wanted "+12345" (I have not wanted "Error".)
Could you please tell me about how to write this xpath?
Try below XPath expression to get "+12345" only:
normalize-space(//span[#id="Test-update-12345"]/text()[1])
Try below code in robot framework
${var}= | Get Text | xpath=//span[span[#class='err-message ng-binding']]
Log | ${var}
Try using below:
xpath=//p[#class='Test']/span[1]

WebDriver Capture Text by XPath

I am attempting to capture a line of text for an automated WebDriver test to use it in a comparison later on. However, I cannot find an XPath that will work with WebDriver. I have used the text() function before to capture text that is not in a tag, but in this instance that is not working. Here is the HTML, note that this text will never be the same, so I cannot use contains or similar functions.
<div id="content" class="center ui-content" data-role="content" role="main">
<div data-iscroll="scroller">
<div class="ui-corner-all ui-controlgroup ui-controlgroup-vertical" data-role="controlgroup">
<a class="ui-btn ui-corner-top ui-btn-hover-c" style="text-align: left" data-role="button" onclick="onDocumentClicked(21228772, "document.php?loan=********&folderseq=0&itemnum=21228772&pageCount=3&imageTypeName=1003 Application - Final&firstInitial=&lastName=")" href="#" data-corners="true" data-shadow="true" data-iconshadow="true" data-wrapperels="span" data-theme="c">
<span class="ui-btn-inner ui-corner-top">
<span class="ui-btn-text">
<img class="checkMark checkMark21228772 notViewedCompletely" width="15" height="15" title="You have not yet viewed this document." src="../images/white_dot.gif"/>
1003 Application - Final. (Jan 11 2012 5:04PM)
</span>
</span>
</a>
In this example, the text I am attempting to capture is: 1003 Application - Final. (Jan 11 2012 5:04PM)
I have inspected the element with Firebug and I have tried the following XPaths with no success.
html/body/div[1]/div[2]/div/div/a[1]/span/span
html/body/div[1]/div[2]/div/div/a[1]/span/span/text()
The WebDriver test is being written in C#.
You can either use this
driver.FindElement(By.XPath(".//div[#id='content']/following-sibling::span[#class='ui-btn-text']")
or
var elem = driver.FindElement(By.Id("Content"));
string text = string.Empty;
if(elem!=null) {
var textElem = elem.FindElement(By.Xpath(".//following-sibling::span[#class='ui-btn-text']"));
if(textElem!=null) text = textElem.Text();
}
I was able to solve this issue by removing the span tags from the XPath.
GetText("html/body/div[3]/div[2]/div/div/a[1]", SelectorType.XPath);
python webdriver code looks something like
driver.find_element_by_xpath("//span[#class='ui-btn-text']").text
But locator may be not uniqe, because I can't see all the code
PS Try to never use locators like html/body/div[1]/div[2]/div/div/a[1]/span/span
Approach:
Find the CSS Selector from the Given DOM
Derived CSS:css=#content div.ui-controlgroup > a[onclick*='onDocumentClicked'] > span > span
Use the C# Library Method to get the Text.

Get Text between two tags using nokogiri

My HTML structure is
<div class="line">
<h2>Header</h2>
<h3>Mailing Address</h3>
2349 Glorem ipsun lorem ipsum CA 95833<br>
<br>
Phone: 111-111-2111 Fax: 111-511-1111<br>
<a onfocus="blur()" target="_blank"" href="">some text</a><br>
<a onfocus="blur()" target="_blank" href="">some address</a><br>
<div><p></p></div>
<h3>Contact(s)</h3>
</div>
The HTML page contains several <div class=line></div> elements. For each div i need to extract Phone and Fax in a array with other data. I tried using
doc.css("div#ctl00_cphContent_divBrowseByMember").each do |div|
div.css("div.line").each do |line|
line.xpath('//text()[preceding-sibling::br and following-sibling::a]').text.strip
end
end
It returns nothing and returns time out error.
If I try as
line.xpath('//text()[preceding-sibling::br and following-sibling::a]')[0].text.strip
will return same Phone and fax for all other divs. Please suggest any other solution that will help me.
The easy way:
phone, fax = line.text.scan /\d{3}-\d{3}-\d{4}/

Resources