Substitution in a file name with reStructuredText (Sphinx)? - python-sphinx

I want to create several files from a single template, which differ only by a variable name. For example :
(file1.rst):
.. |variable| replace:: 1
.. include template.rst
(template.rst) :
Variable |variable|
=====================
Image
-------
.. image:: ./images/|variable|-image.png
where of course I have an image called "./images/1-image.png". The substitution of "|variable|" by "1" works well in the title, but not in the image file name, and at compilation I get :
WARNING: image file not readable: ./images/|variable|-image.png
How can I get reST to make the substitution in the variable name too? (if this changes anything, am using Sphinx).

There are two problems here: a substitution problem, and a parsing order problem.
For the first problem, the substitution reference |variable| cannot have adjacent characters (besides whitespace or maybe _ for hyperlinking) or else it won't parse as a substitution reference, so you need to escape it:
./images/\ |variable|\ -image.png
However, the second problem is waiting around the corner. While I'm not certain of the details, it seems reST is unable to parse substitutions inside other directives. I think it first parses the image directive, which puts it in the document tree and thus out of reach of the substitution mechanism. Similarly, I don't think it's possible to use a substitution to insert content intended to be parsed (e.g. .. |img1| replace::`.. image:: images/1-image.png`). This is all speculative based on some tests and my incomplete comprehension of the official documentation, so someone more knowledgeable can correct what I've said here.
I think you're aware of the actual image substitution directive (as opposed to text substitution), but I don't think it attains the generality you're aiming for (you'll still need a separate directive for the image as from the |variable|), but in any case it looks like this:
.. |img1| image:: images/1-image.png
Since you're using Sphinx, you can try creating your own directive extension (see this answer for information), but it won't solve the substitutions-inside-markup problem.

You have to create a custom directive in this case as Sphinx doesn't allow you to substitute image paths. You can change Sphinx figure directive as follows and use it instead of the image directive.
from typing import Any, Dict, List, Tuple
from typing import cast
from docutils import nodes
from docutils.nodes import Node, make_id, system_message
from docutils.parsers.rst import directives
from docutils.parsers.rst.directives import images, html, tables
from sphinx import addnodes
from sphinx.directives import optional_int
from sphinx.domains.math import MathDomain
from sphinx.util.docutils import SphinxDirective
from sphinx.util.nodes import set_source_info
if False:
# For type annotation
from sphinx.application import Sphinx
class CustomFigure(images.Figure):
"""The figure directive which applies `:name:` option to the figure node
instead of the image node.
"""
def run(self) -> List[Node]:
name = self.options.pop('name', None)
path = self.arguments[0] #path = ./images/variable-image.png
#replace 'variable' from th.e given value
self.argument[0] = path.replace("variable", "string substitution")
result = super().run()
if len(result) == 2 or isinstance(result[0], nodes.system_message):
return result
assert len(result) == 1
figure_node = cast(nodes.figure, result[0])
if name:
# set ``name`` to figure_node if given
self.options['name'] = name
self.add_name(figure_node)
# copy lineno from image node
if figure_node.line is None and len(figure_node) == 2:
caption = cast(nodes.caption, figure_node[1])
figure_node.line = caption.line
return [figure_node]
def setup(app: "Sphinx") -> Dict[str, Any]:
directives.register_directive('figure', Figure)
return {
'version': 'builtin',
'parallel_read_safe': True,
'parallel_write_safe': True,
}
You can add this CustomFigure.py directive in the conf.py of the project and use the customfigure directive across Sphinx project instead of the Image directive. Refer http://www.sphinx-doc.org/en/master/usage/extensions/index.html to add a custom directive to your Sphinx project.

Related

python-sphinx - Display only function signature with autodoc?

In Sphinx is possible to include the signature of a function or method manually using the py:function (or py:method) directive:
.. py:function:: my_func(data, named=None, *args, *kwargs)
It is also possible to use autodoc directives to include and format the whole docstring of a function or method:
.. automethod:: my_func
I am wondering if there is a way of configuring autodoc to include and format only the signature, without the rest of the docstring, so that I don't have to do it manually.
autodoc-process-signature can be used here as well.
def process_signature(app, what, name, obj, options, signature, return_annotation):
return modified_signature, modified_return_annotation
# will be rendered to method(modified_signature) -> modified_return_annotation
def setup(app):
app.connect("autodoc-process-signature", process_signature)
http://www.sphinx-doc.org/en/master/_modules/sphinx/ext/autodoc.html
See autodoc's sphinx.ext.autodoc.between.
Return a listener that either keeps, or if exclude is True excludes, lines between lines that match the marker regular expression. If no line matches, the resulting docstring would be empty, so no change will be made unless keepempty is true.
If what is a sequence of strings, only docstrings of a type in what will be processed.

How to add a namespace to existing xml file

I want to open this file and get all elements that start with us-gaap.
ftp://ftp.sec.gov/edgar/data/916789/0001558370-15-001143.txt
To get elements I tried like this:
str = '<html><body><us-gaap:foo>foo</us-gaap:foo></body></html>'
doc = Nokogiri::XML(File.read(str))
doc.xpath('//us-gaap:*')
Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //us-gaap:*
from /Users/ironsand/.rbenv/versions/2.2.2/lib/ruby/gems/2.2.0/gems/nokogiri-1.6.7.2/lib/nokogiri/xml/searchable.rb:165:in `evaluate'
doc.namespaces returns {}, so I think I have to add namespace us-gaap.
There are some questions about "adding namespace with Nokogiri", but it looks like about how to create a new XML document, not how to add a namespace to existing documents.
How can I add a namespace to existing document?
I know I can remove the namespace by Nokogiri::XML::Document#remove_namespaces!, but I don't want to use it because it removes also necesarry information.
You have asked an XY Problem. You think that the problem is that you need to add a missing namespace; the real problem is that the file you're trying to parse is not valid XML.
require 'nokogiri'
doc = Nokogiri.XML( IO.read('0001558370-15-001143.txt') )
doc.errors.length
#=> 5716
For example, the <ACCEPTANCE-DATETIME> 'element' opened on line 3 is never closed, and on line 16 there is a raw ampersand in the text:
STANDARD INDUSTRIAL CLASSIFICATION: ELECTRIC HOUSEWARES & FANS [3634]
which ought to be escaped as an entity.
However, the document has valid XML fragments within it! In particular, there is one XML document that defines xmlns:us-gaap namespace, from lines 27243-49312. Let's extract just that, using only the knowledge that the root element defines the namespace we want, and the assumptions that no element with the same name is nested within the document, and that the root element does not have an unescaped > character in any attribute. (These assumptions are valid for this file, but may not be valid for every XML file.)
txt = IO.read('0001558370-15-001143.txt')
gaap_finder = %r{(<(\w+) [^>]+xmlns:us-gaap=.+?</\2>)}m
txt.scan(gaap_finder) do |xml,_|
doc = Nokogiri.XML( xml )
gaaps = doc.xpath('//us-gaap:*')
p gaaps.length
#=> 569
end
The code above handles the case where there may be more than one XML document in the txt file, though in this case there is only one.
Decoded, the gaap_finder regex says this:
%r{...}m — this is a regular expression (that allows slashes in it, unescaped) with "multiline mode", where a period will match newline characters
(...) — capture everything we find
< — start with a literal "less-than" symbol
(\w+) — find one or more word characters (the tag name), and save them
— the word characters must be followed by a space (important to avoid capturing the <xsd:xbrl ...> element in this file)
[^>]+ — followed by one or more characters that is NOT a "greater-than" symbol (to ensure that we stay in the same element that we started in)
xmlns:us-gaap\s*= — followed by this literal namespace declaration (which may have whitespace separating it from the equals sign)
.+? — followed by anything (as little as possible)...
</\2> — ...up until you see a closing tag with the same name as what we captured for the name of the starting tag
Because of the way scan works when the regex has capturing groups, each result is a two-element array, where the first element is the entire captured XML and the second element is the name of the tag that we captured (which we "discard" by assigning it to the _ variable).
If you want to be less magic about your capturing, the text file format appears to always wrap each XML document in <XBRL>...</XBRL>. So, you could do this to process every XML file (there are seven, five of which do not happen to have any us-gaap namespaces):
txt = IO.read('0001558370-15-001143.txt')
xbrls = %r{(?<=<XBRL>).+?(?=</XBRL>)}m # find text inside <XBRL>…</XBRL>
txt.scan(xbrls) do |xml|
doc = Nokogiri.XML( xml )
if doc.namespaces["xmlns:us-gaap"]
gaaps = doc.xpath('//us-gaap:*')
p gaaps.length
end
end
#=> 569
#=> 0 (for the XML Schema document that defines the namespace)
I couldn't figure out how to update an existing doc with a new namespace, but since Nokogiri will recognize namespaces on the root element, and those namespaces are, syntactically, just attributes, you can update the document with a new namespace declaration, serialize the doc to a string, and re-parse it:
str = '<html><body><us-gaap:foo>foo</us-gaap:foo></body></html>'
doc_without_ns = Nokogiri::XML(str)
doc_without_ns.root['xmlns:us-gaap'] = 'http://your/actual/ns/here'
doc = Nokogiri::XML(doc_without_ns.to_xml)
doc.xpath("//us-gaap:*")
# Returns [#<Nokogiri::XML::Element:0x3ff375583f9c name="foo" namespace=#<Nokogiri::XML::Namespace:0x3ff375583f24 prefix="us-gaap" href="http://your/actual/ns/here"> children=[#<Nokogiri::XML::Text:0x3ff375583768 "foo">]>]

How to add rst format in nodes for directive?

How I can use rst in nodes? For example I want to output icluded file about.rst
class Foo(Directive):
def run(self):
return [
nodes.Text("**adad**"), # <-- Must be a bold text
nodes.Text(".. include:: about.rst"), # <-- Must include file
]
You can construct a ViewList of your raw rst data (one line per entry), get Sphinx to parse that content, and then return the nodes Sphinx gives you. The following worked for me:
from docutils import nodes
from docutils.statemachine import ViewList
from sphinx.util.compat import Directive
from sphinx.util.nodes import nested_parse_with_titles
class Foo(Directive):
def run(self):
rst = ViewList()
# Add the content one line at a time.
# Second argument is the filename to report in any warnings
# or errors, third argument is the line number.
rst.append("**adad**", "fakefile.rst", 10)
rst.append("", "fakefile.rst", 11)
rst.append(".. include:: about.rst", "fakefile.rst", 12)
# Create a node.
node = nodes.section()
node.document = self.state.document
# Parse the rst.
nested_parse_with_titles(self.state, rst, node)
# And return the result.
return node.children
def setup(app):
app.add_directive('foo', Foo)
I had to do something similar for a project --- in lieu of any (easily found) relevant documentation I used the source of the inbuilt autodoc extension as a guide.
Adding text nodes with content formatted with rst syntax wouldn't help. You need to create rst node objects to build required rst element tree. Moreover since you try to include another rst file in the example, you would need to use nested parsing as the actual content is not known in advance and can't be hardcoded.
In run() method of rst directive class, self.state.nested_parse() method can be called. It's original purpose is to parse content of the directive like this:
# parse text content of this directive
# into anonymous node element (can't be used directly in the tree)
node = nodes.Element()
self.state.nested_parse(self.content, self.content_offset, node)
In your case you would either try to open abour.rst file, parse it and add
parsed node tree into the result node list or you can just try to run nested
parse on string constant with include directive.

How can I configure the separator character used for :menuselection:?

I am using Sphinx to generate HTML documentation for my project. Under Inline Markup, the Sphinx documentation discusses :menuselection: for marking a sequence of menu selections using markup like:
:menuselection:`Start --> Programs`
This results in the following HTML:
<span class="menuselection">Start ‣ Programs</span>
i.e. the --> gets converted to the small triangle, which I've determined is U+2023, TRIANGULAR BULLET.
That's all well and good, but I'd like to use a different character instead of the triangle. I have searched the Sphinx package and the theme package (sphinx-bootstrap-theme) somewhat exhaustively for 'menuselection', the triangle character, and a few other things, but haven't turned up anything that does the substitution from --> to ‣ (nothing obvious to me, anyway). But something must be converting it between my .rst source and the html.
My question is: what, specifically is doing the conversion (sphinx core? HTML writer? Theme JS?)?
The conversion is done in the sphinx.roles.menusel_role() function. You can create your own version of this function with a different separator character and register it to be used.
Add the following to your project's conf.py:
from docutils import nodes, utils
from docutils.parsers.rst import roles
from sphinx.roles import _amp_re
def patched_menusel_role(typ, rawtext, text, lineno, inliner, options={}, content=[]):
text = utils.unescape(text)
if typ == 'menuselection':
text = text.replace('-->', u'\N{RIGHTWARDS ARROW}') # Here is the patch
spans = _amp_re.split(text)
node = nodes.emphasis(rawtext=rawtext)
for i, span in enumerate(spans):
span = span.replace('&&', '&')
if i == 0:
if len(span) > 0:
textnode = nodes.Text(span)
node += textnode
continue
accel_node = nodes.inline()
letter_node = nodes.Text(span[0])
accel_node += letter_node
accel_node['classes'].append('accelerator')
node += accel_node
textnode = nodes.Text(span[1:])
node += textnode
node['classes'].append(typ)
return [node], []
# Use 'patched_menusel_role' function for processing the 'menuselection' role
roles.register_local_role("menuselection", patched_menusel_role)
When building html, make sure to make clean first so that the updated conf.py is re-parsed with the patch.

Lxml or Xpath content print

I have the following function
def parseTitle(self, post):
"""
Returns title string with spaces replaced by dots
""
return post.xpath('h2')[0].text.replace('.', ' ')
I would to see the content of post. I have tried everything I can think of.
How can I properly debug the content? This is an website of movies where I'm rip links and title and this function should parse the title.
I am sure H# is not existing, how can I print/debug this?
post is lxml element tree object, isn't it?
so first, you could try:
# import lxml.html # if not yet imported
# (or you can use lxml.etree instead of lxml.html)
print lxml.html.tostring(post)
if isn't, you should create element tree object from it
post = lxml.html.fromstring(post)
or maybe the problem is just that you should replace h2 with //h2?
your question is not very explanatory..

Resources