Sphinx anchor defined twice (singlehtml output) - python-sphinx

I have a sphinx project which uses figures and footnotes. I noticed that as soon as I include a caption in figures, the ids rendered in HTML are defined twice.
For example, consider a minimal project like this:
Project Example
===============
this is index.rst
.. toctree::
:maxdepth: 2
:caption: Contents:
inc
hello [#0]_ world:
We should expect that footnote 0 [#1]_ would have `id1`, and footnote 2 `id2`
.. [#0] Lorem Impsum.
.. [#1] Lorem Impsum.
And inc.rst:
Included
========
.. figure:: _static/cat.jpg
:scale: 20%
:align: center
This is a caption
Running sphinx-build -M singlehtml "." "_build" renders:
<span id="document-inc"></span><section id="included">
<h2>Included<a class="headerlink" href="#included" title="Permalink to this headline">¶</a></h2>
<figure class="align-center" id="id1">
<a ...></a>
<figcaption>
<p><span class="caption-text">This is a caption</span><a class="headerlink" href="#id1" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
</section>
</div>
<p>We should expect that operator <a class="footnote-reference brackets" href="#id3" id="id1">2</a> would:</p>
<p>We should expect that operator <a class="footnote-reference brackets" href="#id4" id="id2">3</a> would:</p>
<dl class="footnote brackets">
<dt class="label" id="id3"><span class="brackets"><a class="fn-backref" href="#id1">2</a></span></dt>
<dd><p>Lorem Impsum.</p>
</dd>
<dt class="label" id="id4"><span class="brackets"><a class="fn-backref" href="#id2">3</a></span></dt>
<dd><p>Lorem Impsum.</p>
</dd>
</dl>
If I remove the caption, the figure opening HTML is rendered without id="id1", like this:
<figure class="align-center">
Is this a bug in sphinx?
Can I tell sphinx use the following id in figure to avoid collisions?

Related

How to get text which has no HTML tag

Following is the HTML:
<div class="ajaxcourseindentfix">
<h3>CPSC 353 - Introduction to Computer Security (3) </h3>
<hr>Security goals, security systems, access controls, networks and security, integrity, cryptography fundamentals, authentication. Attacks: software, network, website; management considerations, security standards in government and industry; security issues in requirements, architecture, design, implementation, testing, operation, maintenance, acquisition, and services.
<br>
<br>Prerequisite: CPSC 253U
<span style="display: none !important"> </span> or CPSC 254
<span style="display: none !important"> </span> and CPSC 351
<span style="display: none !important"> </span>
, declared major/minor in CPSC, CPEN, or CPEI
<br>
</div>
I need to fetch the following text from this HTML:
From Line 6 - or
From Line 7 - and
, declared major/minor in CPSC, CPEN, or CPEI
I am able to get the href [Course number: CPSC 254 etc...] with the following XPath:
# This xpath gives me all the tags followed by h3 and then I iterate through them in my script.
//div[#class='ajaxcourseindentfix']/h3/following-sibling::text()[2]/following-sibling::*
Update
And, then the text with the following XPath:
# This xpath gives me all the text after the h3 tag.
//div[#class='ajaxcourseindentfix']/h3/following-sibling::text()[2]/following-sibling::text()
I need to have these course name/prerequisite in the same way they are at URL 1.
In this approach I am getting all the HREF first, then all text. Is there a better way to achieve this? I don't want to iterate over 2 XPaths to get the HREF first, then Text and after that club them to form the prerequisite string.
1 http://catalog.fullerton.edu/ajax/preview_course.php?catoid=16&coid=99648&show
Try to use below code to get required output:
div = soup.select("div.ajaxcourseindentfix")[0]
" ".join([word for word in div.stripped_strings]).split("Prerequisite: ")[-1]
The output is
'CPSC 253U or CPSC 254 and CPSC 351 , declared major/minor in CPSC, CPEN, or CPEI'

How to write xpath

I want to get text "+12345" from this HTML
<p class="Test" ng-repeat="(k, wl) in partnerEditModel.td">
<span id="Test-update-12345" class="ng-binding">
+12345
<span class="err-message ng-binding">Error</span>
</span>
<a id="mibile" class="button" ng-click="remove(k)">
</p>
I have written "//p[#class='Test']/span but it matched "+12345Error" which I have just wanted "+12345" (I have not wanted "Error".)
Could you please tell me about how to write this xpath?
Try below XPath expression to get "+12345" only:
normalize-space(//span[#id="Test-update-12345"]/text()[1])
Try below code in robot framework
${var}= | Get Text | xpath=//span[span[#class='err-message ng-binding']]
Log | ${var}
Try using below:
xpath=//p[#class='Test']/span[1]

Filter/Exclude xPath extraction via "pattern"

This is what I have to work with:
<div class="Pictures zoom">
<a title="Productname 1" class="zoomThumbActive" rel="{gallery: 'gallery1', smallimage: '/images/2.24198/little_one.jpeg', largeimage: '/images/76.24561/big-one-picture.jpeg'}" href="javascript:void(0)" style="border-width:inherit;">
<img title="Productname 1" src="/images/24.245/mini-doge-picture.jpeg" alt="" /></a>
<a title="Productname 1" rel="{gallery: 'gallery1', smallimage: '/images/2.24203/small_one.jpeg', largeimage: '/images/9.5664/very-big-one-picture.jpeg'}" href="javascript:void(0)" style="border-width:inherit;">
<img title="Productname 1" src="/images/22.999/this-picture-is-very-small.jpeg" alt="" /></a>
<div>
Using following Xpath:
/html//div[#class='Pictures zoom']/a/#rel
The output becomes:
{gallery: 'gallery1', smallimage: '/images/2.24198/little_one.jpeg', largeimage: '/images/76.24561/big-one-picture.jpeg'}
{gallery: 'gallery1', smallimage: '/images/2.24203/small_one.jpeg', largeimage: '/images/9.5664/very-big-one-picture.jpeg'}
Is it possible to filter the extraction, so intread of above, I only get these:
/images/76.24561/big-one-picture.jpeg
/images/9.5664/very-big-one-picture.jpeg
I only wish to keep everything between largeimage: ' and '}
Best regards,
Liu Kang
Use substring-before and substring-after to cut of the parts you do not want.
Using XPath 1.0, this can only be done for single results (so you cannot fetch all URLs contained in one document with a single XPath call). This query will return the first URL:
substring-before(substring-after((//#rel)[1], "largeimage: '"), "'")
XPath 2.0 allows you to run functions as axis steps. This query will return all URLs you're looking for as single tokens:
//#rel/substring-before(substring-after(., "largeimage: '"), "'")

WebDriver Capture Text by XPath

I am attempting to capture a line of text for an automated WebDriver test to use it in a comparison later on. However, I cannot find an XPath that will work with WebDriver. I have used the text() function before to capture text that is not in a tag, but in this instance that is not working. Here is the HTML, note that this text will never be the same, so I cannot use contains or similar functions.
<div id="content" class="center ui-content" data-role="content" role="main">
<div data-iscroll="scroller">
<div class="ui-corner-all ui-controlgroup ui-controlgroup-vertical" data-role="controlgroup">
<a class="ui-btn ui-corner-top ui-btn-hover-c" style="text-align: left" data-role="button" onclick="onDocumentClicked(21228772, "document.php?loan=********&folderseq=0&itemnum=21228772&pageCount=3&imageTypeName=1003 Application - Final&firstInitial=&lastName=")" href="#" data-corners="true" data-shadow="true" data-iconshadow="true" data-wrapperels="span" data-theme="c">
<span class="ui-btn-inner ui-corner-top">
<span class="ui-btn-text">
<img class="checkMark checkMark21228772 notViewedCompletely" width="15" height="15" title="You have not yet viewed this document." src="../images/white_dot.gif"/>
1003 Application - Final. (Jan 11 2012 5:04PM)
</span>
</span>
</a>
In this example, the text I am attempting to capture is: 1003 Application - Final. (Jan 11 2012 5:04PM)
I have inspected the element with Firebug and I have tried the following XPaths with no success.
html/body/div[1]/div[2]/div/div/a[1]/span/span
html/body/div[1]/div[2]/div/div/a[1]/span/span/text()
The WebDriver test is being written in C#.
You can either use this
driver.FindElement(By.XPath(".//div[#id='content']/following-sibling::span[#class='ui-btn-text']")
or
var elem = driver.FindElement(By.Id("Content"));
string text = string.Empty;
if(elem!=null) {
var textElem = elem.FindElement(By.Xpath(".//following-sibling::span[#class='ui-btn-text']"));
if(textElem!=null) text = textElem.Text();
}
I was able to solve this issue by removing the span tags from the XPath.
GetText("html/body/div[3]/div[2]/div/div/a[1]", SelectorType.XPath);
python webdriver code looks something like
driver.find_element_by_xpath("//span[#class='ui-btn-text']").text
But locator may be not uniqe, because I can't see all the code
PS Try to never use locators like html/body/div[1]/div[2]/div/div/a[1]/span/span
Approach:
Find the CSS Selector from the Given DOM
Derived CSS:css=#content div.ui-controlgroup > a[onclick*='onDocumentClicked'] > span > span
Use the C# Library Method to get the Text.

Get Text between two tags using nokogiri

My HTML structure is
<div class="line">
<h2>Header</h2>
<h3>Mailing Address</h3>
2349 Glorem ipsun lorem ipsum CA 95833<br>
<br>
Phone: 111-111-2111 Fax: 111-511-1111<br>
<a onfocus="blur()" target="_blank"" href="">some text</a><br>
<a onfocus="blur()" target="_blank" href="">some address</a><br>
<div><p></p></div>
<h3>Contact(s)</h3>
</div>
The HTML page contains several <div class=line></div> elements. For each div i need to extract Phone and Fax in a array with other data. I tried using
doc.css("div#ctl00_cphContent_divBrowseByMember").each do |div|
div.css("div.line").each do |line|
line.xpath('//text()[preceding-sibling::br and following-sibling::a]').text.strip
end
end
It returns nothing and returns time out error.
If I try as
line.xpath('//text()[preceding-sibling::br and following-sibling::a]')[0].text.strip
will return same Phone and fax for all other divs. Please suggest any other solution that will help me.
The easy way:
phone, fax = line.text.scan /\d{3}-\d{3}-\d{4}/

Resources