Begin ordered list from 0 in Markdown - html-lists

I'm new to Markdown. I was writing something like:
# Table of Contents
0. Item 0
1. Item 1
2. Item 2
But that generates a list that starts with 1, effectively rendering something like:
# Table of Contents
1. Item 0
2. Item 1
3. Item 2
I want to start the list from zero. Is there an easy way to do that?
If not, I could simply rename all of my indices, but this is annoying when there are several items. Beginning a list from zero seems so natural to me, it's like beginning the index of an array from zero.

Simply: NO
Longer: YES, BUT
When you create ordered list in Markdown it is parsed to HTML ordered list, i.e.:
# Table of Contents
0. Item 0
1. Item 1
2. Item 2
Will create:
<h1>Table of Contents</h1>
<ol>
<li>Item 0</li>
<li>Item 1</li>
<li>Item 2</li>
</ol>
So as you can see, there is no data about starting number. If you want to start at certain number, unfortunately, you have to use pure HTML and write:
<ol start="0">
<li>Item 0</li>
<li>Item 1</li>
<li>Item 2</li>
</ol>

You can use HTML start tag:
<ol start="0">
<li> item 1</li>
<li> item 2</li>
<li> item 3</li>
</ol>
It's currently supported in all browsers: Internet Explorer 5.5+, Firefox 1+, Safari 1.3+, Opera 9.2+, Chrome 2+
Optionally you can use type tab for more sophisticated enumerating:
type="1" - decimal (default style)
type="a" - lower-alpha
type="A" - upper-alpha
type="i" - lower-roman
type="I" - upper-roman

Via html: use <ol start="0">
Via CSS:
ol {
counter-reset: num -1; // reset counter to -1 (any var name is possible)
}
ol li {
list-style-type: none; // remove default numbers
}
ol li:before {
counter-increment: num; // increment counter
content: counter(num) ". ";
}
FIDDLE

Update: Depends on the implementation.
The current version of CommonMark requires the start attribute. Some implementations already support this, e.g. pandoc and markdown-it. For more details see babelmark.

Related

Unexpected pandoc behavior converting markdown list to html

Pandoc describes its behavior clearly here in the section "Compact and loose lists"
However the conversion of
# Test
- Item 1
- Item 2
- Subitem 1
- Subitem 2
results in
<h1 id="test">Test</h1>
<ul>
<li><p>Item 1</p></li>
<li><p>Item 2</p>
<ul>
<li>Subitem 1</li>
<li>Subitem 2</li>
</ul></li>
</ul>
My understanding is that the output should be
<h1 id="test">Test</h1>
<ul>
<li><p>Item 1</p></li>
<li>Item 2
<ul>
<li>Subitem 1</li>
<li>Subitem 2</li>
</ul></li>
</ul>
I'm using pandoc 2.10.1. Any thoughts?
This was changed in pandoc 2.7 in order to get pandoc's behavior more in line with that of CommonMark. The changelog contains this entry:
Markdown reader:
Improve tight/loose list handling (#5285). Previously the
algorithm allowed list items with a mix of Para and Plain, which
is never wanted.
The mentioned issue is #5285.
It seems that the documentation was not updated. This should be reported.

xpath nested ul list

I am banging my head against a wall here, its probably something simple that I am missing.
I have a HTML un-ordered list (ul) like the following:
<ul>
<li>Elm 1</li>
<li>Elm 2 - with children
<ul>
<li>Nested Elm</li>
<li>Another Elm</li>
</ul>
</li>
</ul>
Using xpath (version 1 compatible with Scrapy), how would i get the text out of all the li elements including the nested one?
Thanks for any help!
If you need xpath, use response.xpath('//ul//li/text()').extract().
If you can use css, it is shorter: response.css('ul li::text').extract()
Try with a simple xpath selector:
from scrapy.selector import Selector
selector = Selector(text="""
<ul>
<li>Elm 1</li>
<li>Elm 2 - with children
<ul>
<li>Nested Elm</li>
<li>Another Elm</li>
</ul>
</li>
</ul>""")
print(selector.xpath('//li/text()').extract())
This outputs:
['Elm 1', 'Elm 2 - with children\n ', 'Nested Elm', 'Another Elm', '\n ']

Extracting contents from a list split across different divs

Consider the following html
<div id="relevantID">
<div class="column left">
<h1> Section-Header-1 </h1>
<ul>
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
</ul>
</div>
<div class="column">
<ul> <!-- Pay attention here -->
<li>item1e</li>
<li>item1f</li>
</ul>
<h1> Section-Header-2 </h1>
<ul>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
</ul>
</div>
<div class="column right">
<h1> Section-Header-3 </h1>
<ul>
<li>item3a</li>
<li>item3b</li>
<li>item3c</li>
<li>item3d</li>
</ul>
</div>
</div>
My objective is to extract the items for each Section headers. However, inconveniently the designer of the webpage decided to break up the data into three columns, adding an additional div (with classes column right etc).
My current method of extraction was using the xpath
for section headers, I use the xpath (get all h1 elements withing a div with given id)
//div[#id="relevantID"]//h1
above returns a list of h1 elements, looping over each element I apply the additional selector, for each matched h1 element, look up the next ul node and retreive all its li nodes.
following-sibling::ul//li
But thanks to the designer's aesthetics, I am failing in the one particular case I've marked in the HTML file. Where the items are split across two different column divs.
I can probably bypass this problem by stripping out the column divs entirely, but I don't think modifying the html to make a selector match is considered good (I haven't seen it needed anywhere in the examples I've browsed so far).
What would be a good way to extract data that has been formatted like this? Full solutions are not neccessary, hints/tips will do. Thanks!
The columns do frustrate use of following-sibling:: and preceding-sibling::, but you could instead use the following:: and preceding:: axis if the columns at least keep the list items in proper document order. (That is indeed the case in your example.)
The following XPath will select all li items, regardless of column, occurring after the "Section-Header-1" h1 and before the "Section-Header-2" h1 header in document order:
//div[#id='relevantID']//li[normalize-space(preceding::h1) = 'Section-Header-1'
and normalize-space(following::h1) = 'Section-Header-2']
Specifically, it selects the following items from your example HTML:
<li>item1a</li>
<li>item1b</li>
<li>item1c</li>
<li>item1d</li>
<li>item1e</li>
<li>item1f</li>
You can combine following-sibling and preceding-sibling to get possible li elements in a div before the h2 and use the union operator |. As example for the second h2:
((//div[#id="relevantID"]//h1)[2]/preceding-sibling::ul//li) |
((//div[#id="relevantID"]//h1)[2]/following-sibling::ul//li)
Result:
<li>item1e</li>
<li>item1f</li>
<li>item2a</li>
<li>item2b</li>
<li>item2c</li>
<li>item2d</li>
As you're already selecting all h1 using //div[#id="relevantID"]//h1 and retrieving all li items for each h1 using as a second step following-sibling::ul//li, you could combine this to following-sibling::ul//li | preceding-sibling::ul//li.

xpath for locating li with text does not work

Using the xpath //ul//li[contains(text(),"outer")] to find a li in the outer ul does not work
<ul>
<li>
<span> not unique text, </span>
<span> not unique text, </span>
outer ul li 1
<ul >
<li> inner ul li 1 </li>
<li> inner ul li 2 </li>
</ul>
</li>
<li>
<span> not unique text, </span>
<span> not unique text, </span>
outer ul li 2
<ul >
<li> inner ul li 1 </li>
<li> inner ul li 2 </li>
</ul>
</li>
</ul>
Any idea how to find a li with a specific text in the outer ul?
Thank you
This will work for you //ul//li[contains(.,"outer")]
I would expect that you only like to consider the text nodes which are direct child of the li. Therefore you are right with using text() (if you use contains(.,"outer") this will consider text form any children of li).
Therefore try this:
//ul/li[text()[contains(.,'outer')]]
Running this with Saxon, the original XPath expression gives:
XPTY0004: A sequence of more than one item is not allowed as the first argument of
contains() ("", "", ...)
Now, I guess Selenium is probably using XPath 1.0 rather than XPath 2.0, and in 1.0 the contains() function has "first item semantics" - it converts its argument to a string, which if the argument is a node-set containing more than one node, involves considering only the first node. And the first text node is probably whitespace.
If you want to test whether some child text node contains "outer", use
//ul//li[text()[contains(.,"outer")]]
Another reason for switching to XPath 2.0...
For above issue -
This solution will work
//ul//li[contains(.,"outer")]
"." Selects the current node

XPath and negation searches

I have the following code sample in an xmlns root:
<ol class="stan">
<li>Item one.</li>
<li>
<p>Paragraph one.</p>
<p>Paragraph two.</p>
</li>
<li>
<pre>Preformated one.</pre>
<p>Paragraph one.</p>
</li>
</ol>
I would like to perform a different operation on the first item in <li> depending on the type of tag it resides in, or no tag, i.e. the first <li> in the sample.
EDIT:
My logic in pursuing the task turns out to be incorrect.
How do I query a <li> that has no descendants as in the first list item?
I tried negation:
#doc.xpath("//xmlns:ol[#class='stan']//xmlns:li/xmlns:*[1][not(p|pre)]")
That gives me the exact opposite for what I think I am asking for.
I think I am making the expression more complicated since I can't find the right solution.
UPDATE:
Navin Rawat has answered this one in the comments. The correct code would be:
#doc.xpath("//xmlns:ol[#class='stan']/xmlns:li[not(xmlns:*)]")
CORRECTION:
The correct question involves both an XPath search and a Nokogiri method.
Given the above xhtml code, how do I search for first descendant using xpath? And how do I use xpath in a conditional statement, e.g.:
#doc.xpath("//xmlns:ol[#class='stan']/xmlns:li").each do |e|
if e.xpath("e has no descendants")
perform task
elsif e.xpath("e first descendant is <p>")
perform second task
elsif e.xpath("e first descendant is <pre>")
perform third task
end
end
I am not asking for complete code. Just the part in parenthesis in the above Nokogiri code.
Pure XPath answer...
If you have the following XML :
<ol class="stan">
<li>Item one.</li>
<li>
<p>Paragraph one.</p>
<p>Paragraph two.</p>
</li>
<li>
<pre>Preformated one.</pre>
<p>Paragraph one.</p>
</li>
</ol>
And want to select <li> that has no child element as in the first list item, use :
//ol/li[count(*)=0]
If you have namespaces problem, please give to whole XML (with the root element and namespaces declaration) so that we can help you dealing with it.
EDIT after our discussion, here is your final tested code :):
#doc.xpath("//xmlns:ol[#class='footnotes']/xmlns:li").each do |e|
if e.xpath("count(*)=0")
puts "No children"
elsif e.xpath("count(*[1]/self::xmlns:p)=1")
puts "First child is <p>"
elsif e.xpath("count(*[1]/self::xmlns:pre)=1")
puts "First child is <pre>"
end
end

Resources