AsciiDoc list inside paragraph? - asciidoc

With AsciiDoc, if I write this:
p1
* l1
* l2
p2
It translates to HTML as roughly
<p>p1</p>
<ul><li>l1</li><li>l2</li></ul>
<p>p2</p>
Is there a way to write the equivalent of the following?
<p>p1
<ul><li>l1</li><li>l2</li></ul>
p2</p>

No, that can't be done.
The reason is that your requested way is illegal in HTML(5): <ul> tags can not be nested inside <p> tags[*].
The answer to this related SO question gives the details, with references to the HTML specification.
(In a sense, your question is therefore a duplicate of that question.)
[*] Officially, they can't, but you'll probably find that if you do so anyway, browsers will find a way to render it. Probably by making p1 and p2 their own paragraphs.

Related

Supress cross reference hyperlink using exclamation mark

Prefixing the link with an ! suppresses the creation of a reference (e.g. :ref:`!no link` will be simply rendered as no link):
If you prefix the content with !, no reference/hyperlink will be created.
However, I can't think of any practical usage of this. Why should I first create a reference and then don't want to use it - it would be far easier to write plain text from the very beginning.
So - what is a typical use case of such a suppressed reference?
(Sphinx itself for instance doesn't use it in its docs.)
I can't think of any practical usage of this.
The use I can think of is before a build if you wanted to "turn of" hyperlink generation for one given cross-reference (that appears multiple times) how would you do it?
Well, the simplest way might be using some text editor's "find and replace", and arguably the least invasive way would be to add or remove a single character !. That way the length and structure of the cross-reference is kept in the source (and the title is still rendered in place). This could be convenient in several places, like a table where removing the whole cross-reference could misalign the source.
The most economical change possible would be turning this:
:ref:`a very long title <an.extremely.long.link.target>`
into this:
:ref:`!a very long title <an.extremely.long.link.target>`
The same could possibly be achieved programmatically using the Sphinx API, but a lot of Sphinx users are likely to prefer a text editing solution over a programmatic one.

Inconsistent line spacing in RestructuredText document

I'm build RST files for my company's documentation. One irritating thing is that enumerated lists don't seem to have any consistency in terms of line spacing.
Is there a simple way to solve this?
Robert
It's a well known problem of docutils, the library on which Sphinx is built.
From Sphinx issue tracker on GitHub:
tk0miya wrote:
In my short investigation:
The behavior comes from docutils (base library of Sphinx).
In docutils.writers.html4css1.HTMLTranslator, docutils generates <p> tag if list includes any items excepting paragraphs and nested lists.
To fix this, set self.compact_simple in visit_list_item instead of visit_bullet_list and visit_enumerated_list.
But we have to know why docutils check whole of list.
Source: Spinx-Doc/Sphinx #2258 - Nested field lists inside list items cause unwanted space in HTML output
See related issues:
https://github.com/rtfd/sphinx_rtd_theme/issues/119
I'm unsure how to apply Paebbels answer, however I was able to get rid of the <p> tags by changing to the html4 writer by adding this line to my conf.py.
html4_writer = true
This obviously changes it to the html4 writer, so you'll need to determine whether this is acceptable or not.

Microdata for dictionary : can I use yandex

I'm willing to use microdata/microformat/etc. for the part of my website which is an online dictionary. Basically I just want to tag word and definition to help search engines to grab the most important data in every page belonging to the dictionary, and maybe have Google use them as "rich snippets" in results page.
Main problem is it's hard to find dedicated vocabulary for words and definitions (no problem for recipes, movies and hotels though) and I'm not sure if I have to use the "http://schema.org/Article" tree for my lexicographic work. (To my mind, it makes sense to tag something when it's specific enough).
I have found something interesting at Yandex, for words and encyclopedia, I want to ask what to do with. See there :
https://yandex.ru/support/webmaster/microdata/what-is-microdata.xml?lang=en
https://yandex.com/support/webmaster/microdata/term-definition-markup.xml
It looks like it is very close to my request. But I'm sorry I dont know what is Yandex... will it work with Google ?
I'm asking here if that page, from Yandex, is a working model, is still on use, what are the pros and cons ? Will Google be able to use the specific vocabulary from Yandex and understand my Yandex-tagged data ? is it worth using that vocabulary for an online dictionary, or is something else I have missed of better use ?
(http://webmaster.yandex.ru/vocabularies/term-def.xml, which should be the vocabulary url, gives me a 404).
One more question, please : am I allowed to write (duplicate) the most important data in the header, something like (I believe I am, because Google microdata testing tool prooves to be able to extract the data from that code) :
<html itemscope itemtype="http://webmaster.yandex.ru/vocabularies/term-def.xml">
<meta itemprop="term" content="My term" />
<meta itemprop="definition" content="My definition" />
Just to mention I was interested, though not happy with these close discussions :
https://webmasters.stackexchange.com/questions/55073/what-meta-tag-or-structured-data-should-i-use-for-a-dictionary-web-application
schema.org and an online dictionary
Yandex is Russia's version of Google, and typically they both recognize and honor each other's search engine result implementations.
These articles you are referencing are incredibly outdated; I recommend you seeking out fresher sources, preferably where the term being defined uses the proper HTML element.
Here's the Yandex URL that is 404ing, the Wayback Machine is your friend!
Back to fresher documentation/resources, in this case the correct element as of 2016-10-05 is the <dfn> element. I know you want added semantics, but semantics is the proper place to start, and I'd follow that up by marking the entire dictionary up within a Definition List element, and placing the definition wrapped in the definition element into the <dt>, and the definition's of the term in the corresponding <dd>s.
I wouldn't waste time trying to find the perfect ontology here; implement [rel="tag" Microformat on all of the definitions], you can always come back and add a more desired one.
I've written a blog post about this, but a much more valuable resource is HTML5 Doctor's Glossary impementation, More importantly, view source - view-source:http://html5doctor.com/element-index/ (why stackoverflow doesn't recognize 'view-source' schema is beyond me)
More References/Resources:
Microformats Definition Examples has some very interesting ideas/code snippets
Utilizing the Underused by Semantically Awesome Definition List - Written Prior to HTML5's Redefinition of <dl> but Relevant

How do I merge or even disable footnote links in asciidoc fop

I've got a rather large asciidoc document that I translate dynamically to PDF for our developer guide. Since the doc often refers to Java classes that are documented in our developer guide we converted them into links directly in the docs e.g.:
In this block we create a new
https://www.codenameone.com/javadoc/com/codename1/ui/Form.html[Form]
named `hi`.
This works rather well for the most part and looks great in HTML as every reference to a class leads directly to its JavaDoc making the reference/guide process much simpler.
However when we generate a PDF we end up with something like this on some pages:
Normally I wouldn't mind a lot of footnotes or even repeats from a previous page. However, in this case the link to Container appears 3 times.
I could remove some of the links but I'd rather not since they make a lot of sense on the web version. Since I also have no idea where the page break will land I'd rather not do it myself.
This looks to me like a bug somewhere, if the link is the same the footnote for the link should only be generated once.
I'm fine with removing all link footnotes in the document if that is the price to pay although I'd rather be able to do this on a case by case basis so some links would remain printable
Adding these two parameters in fo-pdf.xsl remove footnotes:
<xsl:param name="ulink.footnotes" select="0"></xsl:param>
<xsl:param name="ulink.show" select="0"></xsl:param>
The first parameter disable footnotes, which triggers urls to re-appear inline.
The second parameter removes urls from the text. Links remain active and clickable.
Non-zero values toggle these parameters.
Source:
http://docbook.sourceforge.net/release/xsl/1.78.1/doc/fo/ulink.show.html
We were looking for something similar in a slightly different situation and didn't find a solution. We ended up writing a processor that just stripped away some of the links e.g. every link to the same URL within a section that started with '==='.
Not an ideal situation but as far as I know its the only way.

how can I handle inconsistent markup?

I have a project where I have to scrape many URLs from many pages. I thought the structure of every page would remain the same, but sometimes it changes and breaks my code.
I need to extract, for example, the abstract of an article and its keywords, both of which are in a separate <p> with the same class "marginB3". So I scraped a page and only got two results, one for the abstract and the other one for the keywords:
hxs = HtmlXPathSelector(response)
lista = hxs.select('//p[#class="marginB3"]/text()')
self.abstracto = lista[0].extract()
self.keywords = lista[1].extract()
I then tried with a third page and a new <p> appeared with some additional information about the article and altered the structure. That made it more complicated since there are no ids and only classes. How can I differentiate which one is the <p> for the keywords without id's if they have their own <h2> above them:
<h2>Info</h2>
<p class="marginB3">a_url_I_want</p>
Can I do this differentiation by reading that <h2> and then the <p> below it?
You certainly can.
Try this:
# First <p>
hxs.select('//h2/following-sibling::p[#class="marginB3"][1]/text()').extract()
# Second <p>
hxs.select('//h2/following-sibling::p[#class="marginB3"][2]/text()').extract()
I am not an XPATH expert, but I think you need to look at the following axis to catch the items after the <h2> tag.
In general, XPATH does poorly when the document you are trying to parse isn't well marked. At the risk of adding even more complexity, you could look at something like the BeautifulSoup module that would allow a more procedural way of coping with inconsistent markup. XPATH is a (mostly) declarative language and declarative languages have a hard time coping with non-regular input.

Resources