How to create a nested list in reStructuredText? - markup

I am trying to create a properly nested list using the following code (following Sphinx and docutils docs):
1. X
a. U
b. V
c. W
2. Y
3. Z
I expect this to result in two OLs but I get the following output instead:
<ol class="arabic simple">
<li>X</li>
</ol>
<blockquote>
<div>
<ol class="loweralpha simple">
<li>U</li>
<li>V</li>
<li>W</li>
</ol>
</div>
</blockquote>
<ol class="arabic simple" start="2">
<li>Y</li>
<li>Z</li>
</ol>
What am I doing wrong? Is it not possible to get the following result?
<ol class="arabic simple">
<li>X
<ol class="loweralpha simple">
<li>U</li>
<li>V</li>
<li>W</li>
</ol>
</li>
<li>Y</li>
<li>Z</li>
</ol>

Make sure the nested list is indented to the same level as the text of the parent list (or three characters, whichever is greater), like this:
1. X
a. U
b. V
c. W
2. Y
3. Z
Then you'll get the output you expected.

If you want Sphinx to take care of the numbering for you, do this.
#. X
#. Y
#. u
#. v
#. Z

Related

How do I get the inner html content in this xpath expression?

I have some HTML code
<li><h3>Number Theory - Even Factors</h3>
<p lang="title">Number N = 2<sup>6</sup> * 5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?</p>
<ol class="xyz">
<li>1183</li>
<li>1200</li>
<li>1050</li>
<li>840</li>
</ol>
<ul class="exp">
<li class="grey fleft">
<span class="qlabs_tooltip_bottom qlabs_tooltip_style_33" style="cursor:pointer;">
<span>
<strong>Correct Answer</strong>
Choice (A).</br>1183
</span>
Correct answer
</span>
</li>
<li class="primary fleft">
Explanatory Answer
</li>
<li class="grey1 fleft">Factors - Even numbers</li>
<li class="orange flrt">Medium</li>
</ul>
</li>
In the HTML snippet above, I am trying to extract the <p lang="title"> Notice how it has <sup></sup> and <sub></sub> tags being used inside.
My Xpath expression .//p[#lang="title"]/text() does not retrieve the sub and sup contents. How do I get this output below
Desired Output
Number N = 2<sup>6</sup>*5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?
XPath
You can simply get innerHTML with node() as below:
//p[#lang="title"]/node()
Note that it returns an array of nodes
Python
You can get required innerHTML with below Python code
from BeautifulSoup import BeautifulSoup
def innerHTML(element):
"Function that receives element and returns its innerHTML"
return element.decode_contents(formatter="html")
html = """<html>
<head>...
<body>...
Your HTML source code
..."""
soup = BeautifulSoup(html)
paragraph = soup.find('p', { "lang" : "title" })
print(innerHTML(paragraph))
Output:
'Number N = 2<sup>6</sup> * 5<sup>5</sup> * 7<sup>6</sup> * 10<sup>7</sup>; how many factors of N are even numbers?'

XPath in RSelenium for indexing list of values

Here is an example of html:
<li class="index i1"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample.com">
<li class="index i2"
<ol id="rem">
<div class="bare">
<h3>
<a class="tlt mhead" href="https://www.myexample2.com">
I would like to take the value of every href in a element. What makes the list is the class in the first li in which class' name change i1, i2.
So I have a counter and change it when I go to take the value.
i <- 1
stablestr <- "index "
myVal <- paste(stablestr , i, sep="")
so even if try just to access the general lib with myVal index using this
profile<-remDr$findElement(using = 'xpath', "//*/input[#li = myVal]")
profile$highlightElement()
or the href using this
profile<-remDr$findElement(using = 'xpath', "/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']")
profile$highlightElement()
Is there anything wrong with xpath?
Your HTML structure is invalid. Your <li> tags are not closed properly, and it seems you are confusing <ol> with <li>. But for the sake of the question, I assume the structure is as you write, with properly closed <li> tags.
Then, constructing myVal is not right. It will yield "index 1" while you want "index i1". Use "index i" for stablestr.
Now for the XPath:
//*/input[#li = myVal]
This is obviously wrong since there is no input in your XML. Also, you didn't prefix the variable with $. And finally, the * seems to be unnecessary. Try this:
//li[#class = $myVal]
In your second XPath, there are also some errors:
/li[#class=myVal]/ol[#id='rem']/div[#id='bare']/h3/a[#class='tlt']
^ ^ ^
missing $ should be #class is actually 'tlt mhead'
The first two issues are easy to fix. The third one is not. You could use contains(#class, 'tlt'), but that would also match if the class is, e.g., tltt, which is probably not what you want. Anyway, it might suffice for your use-case. Fixed XPath:
/li[#class=$myVal]/ol[#id='rem']/div[#class='bare']/h3/a[contains(#class, 'tlt')]

getSiblings() and sort_by error

I am working in Liferay with structure (XML) and template (FTL).
My problem is that I do not get how I can use a sort_by() together with getSiblings().
This code does not work, as an example:
<ul id="emedia-categories">
<#list category?sort_by('linktext').getSiblings() as cat>
<li>
<a href="${cat.path.getData()}" title="${cat.title.getData()}">
<h3>
${cat.linktext.getData()}
</h3>
<img src="${cat.image.getData()}" alt="image-alt">
</a>
</li>
</#list>
</ul>
The error I get is the following:
Expected sequence. category evaluated instead to com.liferay.portal.freemarker.LiferayTemplateModel on line 2, column 16 in 14868#14904#131571.
What I want to achieve is to loop over all data and while doing it, I want it to be sorted on the string which is inside each cat.linktext. So the result comes out like: A, B, C, D, E...
Instead of: D, B, E, A, C...
This is my only working variant, but it does not have any sort on linktext, it just loop data in the order it is entered (probably by id):
<ul id="emedia-categories">
<#list category.getSiblings() as cat>
<li>
<a href="${cat.path.getData()}" title="${cat.title.getData()}">
<h3>
${cat.linktext.getData()}
</h3>
<img src="${cat.image.getData()}" alt="image-alt">
</a>
</li>
</#list>
</ul>
The error message is quite clear: You are trying to sort the category, which is not a sequence (= a list or array).
You want to sort the siblings, which is a sequence (= a list), by the attribute linktext.data:
<#list category.siblings?sort_by(['linktext', 'data']) as cat>
...
<#/list>

Extracting contents from a list split across different divs

Consider the following html
<div id="relevantID">
<div class="column left">
<h1> Section-Header-1 </h1>
<ul>
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
</ul>
</div>
<div class="column">
<ul> <!-- Pay attention here -->
<li>item1e</li>
<li>item1f</li>
</ul>
<h1> Section-Header-2 </h1>
<ul>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
</ul>
</div>
<div class="column right">
<h1> Section-Header-3 </h1>
<ul>
<li>item3a</li>
<li>item3b</li>
<li>item3c</li>
<li>item3d</li>
</ul>
</div>
</div>
My objective is to extract the items for each Section headers. However, inconveniently the designer of the webpage decided to break up the data into three columns, adding an additional div (with classes column right etc).
My current method of extraction was using the xpath
for section headers, I use the xpath (get all h1 elements withing a div with given id)
//div[#id="relevantID"]//h1
above returns a list of h1 elements, looping over each element I apply the additional selector, for each matched h1 element, look up the next ul node and retreive all its li nodes.
following-sibling::ul//li
But thanks to the designer's aesthetics, I am failing in the one particular case I've marked in the HTML file. Where the items are split across two different column divs.
I can probably bypass this problem by stripping out the column divs entirely, but I don't think modifying the html to make a selector match is considered good (I haven't seen it needed anywhere in the examples I've browsed so far).
What would be a good way to extract data that has been formatted like this? Full solutions are not neccessary, hints/tips will do. Thanks!
The columns do frustrate use of following-sibling:: and preceding-sibling::, but you could instead use the following:: and preceding:: axis if the columns at least keep the list items in proper document order. (That is indeed the case in your example.)
The following XPath will select all li items, regardless of column, occurring after the "Section-Header-1" h1 and before the "Section-Header-2" h1 header in document order:
//div[#id='relevantID']//li[normalize-space(preceding::h1) = 'Section-Header-1'
and normalize-space(following::h1) = 'Section-Header-2']
Specifically, it selects the following items from your example HTML:
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
<li>item1e</li>
<li>item1f</li>
You can combine following-sibling and preceding-sibling to get possible li elements in a div before the h2 and use the union operator |. As example for the second h2:
((//div[#id="relevantID"]//h1)[2]/preceding-sibling::ul//li) |
((//div[#id="relevantID"]//h1)[2]/following-sibling::ul//li)
Result:
<li>item1e</li>
<li>item1f</li>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
As you're already selecting all h1 using //div[#id="relevantID"]//h1 and retrieving all li items for each h1 using as a second step following-sibling::ul//li, you could combine this to following-sibling::ul//li | preceding-sibling::ul//li.

XPath and negation searches

I have the following code sample in an xmlns root:
<ol class="stan">
<li>Item one.</li>
<li>
<p>Paragraph one.</p>
<p>Paragraph two.</p>
</li>
<li>
<pre>Preformated one.</pre>
<p>Paragraph one.</p>
</li>
</ol>
I would like to perform a different operation on the first item in <li> depending on the type of tag it resides in, or no tag, i.e. the first <li> in the sample.
EDIT:
My logic in pursuing the task turns out to be incorrect.
How do I query a <li> that has no descendants as in the first list item?
I tried negation:
#doc.xpath("//xmlns:ol[#class='stan']//xmlns:li/xmlns:*[1][not(p|pre)]")
That gives me the exact opposite for what I think I am asking for.
I think I am making the expression more complicated since I can't find the right solution.
UPDATE:
Navin Rawat has answered this one in the comments. The correct code would be:
#doc.xpath("//xmlns:ol[#class='stan']/xmlns:li[not(xmlns:*)]")
CORRECTION:
The correct question involves both an XPath search and a Nokogiri method.
Given the above xhtml code, how do I search for first descendant using xpath? And how do I use xpath in a conditional statement, e.g.:
#doc.xpath("//xmlns:ol[#class='stan']/xmlns:li").each do |e|
if e.xpath("e has no descendants")
perform task
elsif e.xpath("e first descendant is <p>")
perform second task
elsif e.xpath("e first descendant is <pre>")
perform third task
end
end
I am not asking for complete code. Just the part in parenthesis in the above Nokogiri code.
Pure XPath answer...
If you have the following XML :
<ol class="stan">
<li>Item one.</li>
<li>
<p>Paragraph one.</p>
<p>Paragraph two.</p>
</li>
<li>
<pre>Preformated one.</pre>
<p>Paragraph one.</p>
</li>
</ol>
And want to select <li> that has no child element as in the first list item, use :
//ol/li[count(*)=0]
If you have namespaces problem, please give to whole XML (with the root element and namespaces declaration) so that we can help you dealing with it.
EDIT after our discussion, here is your final tested code :):
#doc.xpath("//xmlns:ol[#class='footnotes']/xmlns:li").each do |e|
if e.xpath("count(*)=0")
puts "No children"
elsif e.xpath("count(*[1]/self::xmlns:p)=1")
puts "First child is <p>"
elsif e.xpath("count(*[1]/self::xmlns:pre)=1")
puts "First child is <pre>"
end
end

Resources