I am using Sphinx to generate HTML documentation for my project. Under Inline Markup, the Sphinx documentation discusses :menuselection: for marking a sequence of menu selections using markup like:
:menuselection:`Start --> Programs`
This results in the following HTML:
<span class="menuselection">Start ‣ Programs</span>
i.e. the --> gets converted to the small triangle, which I've determined is U+2023, TRIANGULAR BULLET.
That's all well and good, but I'd like to use a different character instead of the triangle. I have searched the Sphinx package and the theme package (sphinx-bootstrap-theme) somewhat exhaustively for 'menuselection', the triangle character, and a few other things, but haven't turned up anything that does the substitution from --> to ‣ (nothing obvious to me, anyway). But something must be converting it between my .rst source and the html.
My question is: what, specifically is doing the conversion (sphinx core? HTML writer? Theme JS?)?
The conversion is done in the sphinx.roles.menusel_role() function. You can create your own version of this function with a different separator character and register it to be used.
Add the following to your project's conf.py:
from docutils import nodes, utils
from docutils.parsers.rst import roles
from sphinx.roles import _amp_re
def patched_menusel_role(typ, rawtext, text, lineno, inliner, options={}, content=[]):
text = utils.unescape(text)
if typ == 'menuselection':
text = text.replace('-->', u'\N{RIGHTWARDS ARROW}') # Here is the patch
spans = _amp_re.split(text)
node = nodes.emphasis(rawtext=rawtext)
for i, span in enumerate(spans):
span = span.replace('&&', '&')
if i == 0:
if len(span) > 0:
textnode = nodes.Text(span)
node += textnode
continue
accel_node = nodes.inline()
letter_node = nodes.Text(span[0])
accel_node += letter_node
accel_node['classes'].append('accelerator')
node += accel_node
textnode = nodes.Text(span[1:])
node += textnode
node['classes'].append(typ)
return [node], []
# Use 'patched_menusel_role' function for processing the 'menuselection' role
roles.register_local_role("menuselection", patched_menusel_role)
When building html, make sure to make clean first so that the updated conf.py is re-parsed with the patch.
Related
I would like to create a sphinx role :config:`param` that appears as literal text config["param"] and at the same time is a link.
I've a basic running version:
def config_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
rendered = nodes.Text('config["{}"]'.format(text))
rel_source = inliner.document.attributes['source'].split('/doc/', 1)[1]
levels = rel_source.count('/')
refuri = ('../' * levels +
'tutorials/introductory/customizing.html#config')
ref = nodes.reference(rawtext, rendered, refuri=refuri)
return [nodes.literal('', '', ref)], []
def setup(app):
app.add_role("config", config_role)
return {"parallel_read_safe": True, "parallel_write_safe": True}
However, there are two issues:
The target of the link is in another part of the documentation. Therefore I have to create the refuri explicitly. Is is possible to use the link anchor somehow as one would do with :ref:`config` in .rst?
nodes.reference has the unwanted effect that the double-quotes are replaced by typographic quotes. Since it's in a literal bock, I'd rather keep the plain double quotes. Is this somehow possible?
How I can use rst in nodes? For example I want to output icluded file about.rst
class Foo(Directive):
def run(self):
return [
nodes.Text("**adad**"), # <-- Must be a bold text
nodes.Text(".. include:: about.rst"), # <-- Must include file
]
You can construct a ViewList of your raw rst data (one line per entry), get Sphinx to parse that content, and then return the nodes Sphinx gives you. The following worked for me:
from docutils import nodes
from docutils.statemachine import ViewList
from sphinx.util.compat import Directive
from sphinx.util.nodes import nested_parse_with_titles
class Foo(Directive):
def run(self):
rst = ViewList()
# Add the content one line at a time.
# Second argument is the filename to report in any warnings
# or errors, third argument is the line number.
rst.append("**adad**", "fakefile.rst", 10)
rst.append("", "fakefile.rst", 11)
rst.append(".. include:: about.rst", "fakefile.rst", 12)
# Create a node.
node = nodes.section()
node.document = self.state.document
# Parse the rst.
nested_parse_with_titles(self.state, rst, node)
# And return the result.
return node.children
def setup(app):
app.add_directive('foo', Foo)
I had to do something similar for a project --- in lieu of any (easily found) relevant documentation I used the source of the inbuilt autodoc extension as a guide.
Adding text nodes with content formatted with rst syntax wouldn't help. You need to create rst node objects to build required rst element tree. Moreover since you try to include another rst file in the example, you would need to use nested parsing as the actual content is not known in advance and can't be hardcoded.
In run() method of rst directive class, self.state.nested_parse() method can be called. It's original purpose is to parse content of the directive like this:
# parse text content of this directive
# into anonymous node element (can't be used directly in the tree)
node = nodes.Element()
self.state.nested_parse(self.content, self.content_offset, node)
In your case you would either try to open abour.rst file, parse it and add
parsed node tree into the result node list or you can just try to run nested
parse on string constant with include directive.
I need to update multiple targets when a link is clicked.
This example builds a list of links.
When the link is clicked, the callback needs to populate two different parts of the .html file.
The actual application uses bokeh for plotting.
The user will click on a link, the 'linkDetails1' and 'linkDetails2' will hold the script and div return from calls to bokeh.component()
The user will click on a link, and the script, div returned from bokeh's component() function will populate the 'linkDetails'.
Obviously this naive approach does not work.
How can I make a list of links that when clicked on will populate two separate places in the .html file?
################################
#views/default/test.html:
{{extend 'layout.html'}}
{{=linkDetails1}}
{{=linkDetails2}}
{{=links}}
################################
# controllers/default.py:
def test():
"""
example action using the internationalization operator T and flash
rendered by views/default/index.html or views/generic.html
if you need a simple wiki simply replace the two lines below with:
return auth.wiki()
"""
d = dict()
links = []
for ii in range(5):
link = A("click on link %d"%ii, callback=URL('linkHandler/%d'%ii), )
links.append(["Item %d"%ii, link])
table = TABLE()
table.append([TR(*rows) for rows in links])
d["links"] = table
d["linkDetails1"] = "linkDetails1"
d["linkDetails2"] = "linkDetails2"
return d
def linkHandler():
import os
d = dict()
# request.url will be linked/N
ii = int(os.path.split(request.url)[1])
# want to put some information into linkDetails, some into linkDiv
# this does not work:
d = dict()
d["linkDetails1"] = "linkHandler %d"%ii
d["linkDetails2"] = "linkHandler %d"%ii
return d
I must admit that I'm not 100% clear on what you're trying to do here, but if you need to update e.g. 2 div elements in your page in response to a single click, there are a couple of ways to accomplish that.
The easiest, and arguably most web2py-ish way is to contain your targets in an outer div that's a target for the update.
Another alternative, which is very powerful is to use something like Taconite [1], which you can use to update multiple parts of the DOM in a single response.
[1] http://www.malsup.com/jquery/taconite/
In this case, it doesn't look like you need the Ajax call to return content to two separate parts of the DOM. Instead, both elements returned (the script and the div elements) can simply be placed inside a single parent div.
# views/default/test.html:
{{extend 'layout.html'}}
<div id="link_details">
{{=linkDetails1}}
{{=linkDetails2}}
</div>
{{=links}}
# controllers/default.py
def test():
...
for ii in range(5):
link = A("click on link %d" % ii,
callback=URL('default', 'linkHandler', args=ii),
target="link_details")
...
If you provide a "target" argument to A(), the result of the Ajax call will go into the DOM element with that ID.
def linkHandler():
...
content = CAT(SCRIPT(...), DIV(...))
return content
In linkHandler, instead of returning a dictionary (which requires a view in order to generate HTML), you can simply return a web2py HTML helper, which will automatically be serialized to HTML and then inserted into the target div. The CAT() helper simply concatenates other elements (in this case, your script and associated div).
I want to create several files from a single template, which differ only by a variable name. For example :
(file1.rst):
.. |variable| replace:: 1
.. include template.rst
(template.rst) :
Variable |variable|
=====================
Image
-------
.. image:: ./images/|variable|-image.png
where of course I have an image called "./images/1-image.png". The substitution of "|variable|" by "1" works well in the title, but not in the image file name, and at compilation I get :
WARNING: image file not readable: ./images/|variable|-image.png
How can I get reST to make the substitution in the variable name too? (if this changes anything, am using Sphinx).
There are two problems here: a substitution problem, and a parsing order problem.
For the first problem, the substitution reference |variable| cannot have adjacent characters (besides whitespace or maybe _ for hyperlinking) or else it won't parse as a substitution reference, so you need to escape it:
./images/\ |variable|\ -image.png
However, the second problem is waiting around the corner. While I'm not certain of the details, it seems reST is unable to parse substitutions inside other directives. I think it first parses the image directive, which puts it in the document tree and thus out of reach of the substitution mechanism. Similarly, I don't think it's possible to use a substitution to insert content intended to be parsed (e.g. .. |img1| replace::`.. image:: images/1-image.png`). This is all speculative based on some tests and my incomplete comprehension of the official documentation, so someone more knowledgeable can correct what I've said here.
I think you're aware of the actual image substitution directive (as opposed to text substitution), but I don't think it attains the generality you're aiming for (you'll still need a separate directive for the image as from the |variable|), but in any case it looks like this:
.. |img1| image:: images/1-image.png
Since you're using Sphinx, you can try creating your own directive extension (see this answer for information), but it won't solve the substitutions-inside-markup problem.
You have to create a custom directive in this case as Sphinx doesn't allow you to substitute image paths. You can change Sphinx figure directive as follows and use it instead of the image directive.
from typing import Any, Dict, List, Tuple
from typing import cast
from docutils import nodes
from docutils.nodes import Node, make_id, system_message
from docutils.parsers.rst import directives
from docutils.parsers.rst.directives import images, html, tables
from sphinx import addnodes
from sphinx.directives import optional_int
from sphinx.domains.math import MathDomain
from sphinx.util.docutils import SphinxDirective
from sphinx.util.nodes import set_source_info
if False:
# For type annotation
from sphinx.application import Sphinx
class CustomFigure(images.Figure):
"""The figure directive which applies `:name:` option to the figure node
instead of the image node.
"""
def run(self) -> List[Node]:
name = self.options.pop('name', None)
path = self.arguments[0] #path = ./images/variable-image.png
#replace 'variable' from th.e given value
self.argument[0] = path.replace("variable", "string substitution")
result = super().run()
if len(result) == 2 or isinstance(result[0], nodes.system_message):
return result
assert len(result) == 1
figure_node = cast(nodes.figure, result[0])
if name:
# set ``name`` to figure_node if given
self.options['name'] = name
self.add_name(figure_node)
# copy lineno from image node
if figure_node.line is None and len(figure_node) == 2:
caption = cast(nodes.caption, figure_node[1])
figure_node.line = caption.line
return [figure_node]
def setup(app: "Sphinx") -> Dict[str, Any]:
directives.register_directive('figure', Figure)
return {
'version': 'builtin',
'parallel_read_safe': True,
'parallel_write_safe': True,
}
You can add this CustomFigure.py directive in the conf.py of the project and use the customfigure directive across Sphinx project instead of the Image directive. Refer http://www.sphinx-doc.org/en/master/usage/extensions/index.html to add a custom directive to your Sphinx project.
I'm using XPath with Scrapy to scrape data off of a movie website BoxOfficeMojo.com.
As a general question: I'm wondering how to select certain child nodes of one parent node all in one Xpath string.
Depending on the movie web page from which I'm scraping data, sometimes the data I need is located at different children nodes, such as whether or not there is a link or not. I will be going through about 14000 movies, so this process needs to be automated.
Using this as an example. I will need actor/s, director/s and producer/s.
This is the Xpath to the director: Note: The %s corresponds to a determined index where that information is found - in the action Jackson example director is found at [1] and actors at [2].
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()
However, would a link exist to a page on the director, this would be the Xpath:
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/a/text()
Actors are a bit more tricky, as there <br> included for subsequent actors listed, which may be the children of an /a or children of the parent /font, so:
//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font//a/text()
Gets all most all of the actors (except those with font/br).
Now, the main problem here, I believe, is that there are multiple //div[#class="mp_box_content"] - everything I have works EXCEPT that I also end up getting some digits from other mp_box_content. Also I have added numerous try:, except: statements in order to get everything (actors, directors, producers who both have and do not have links associated with them). For example, the following is my Scrapy code for actors:
actors = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font//a/text()' % (locActor,)).extract()
try:
second = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()' % (locActor,)).extract()
for n in second:
actors.append(n)
except:
actors = hxs.select('//div[#class="mp_box_content"]/table/tr[%s]/td[2]/font/text()' % (locActor,)).extract()
This is an attempt to cover for the facts that: the first actor may not have a link associated with him/her and subsequent actors do, the first actor may have a link associated with him/her but the rest may not.
I appreciate the time taken to read this and any attempts to help me find/address this problem! Please let me know if any more information is needed.
I am assuming you are only interested in textual content, not the links to actors' pages etc.
Here is a proposition using lxml.html (and a bit of lxml.etree) directly
First, I recommend you select td[2] cells by the text content of td[1], with expressions like .//tr[starts-with(td[1], "Director")]/td[2] to account for "Director", or "Directors"
Second, testing various expressions with or without <font>, with or without <a> etc., makes code difficult to read and maintain, and since you're interested only in the text content, you might as well use string(.//tr[starts-with(td[1], "Actor")]/td[2]) to get the text, or use lxml.html.tostring(e, method="text", encoding=unicode) on selected elements
And for the <br> issue for multiple names, the way I do is generally modify the lxml tree containing the targetted content to add a special formatting character to <br> elements' .text or .tail, for example a \n, with one of lxml's iter() functions. This can be useful on other HTML block elements, like <hr> for example.
You may see better what I mean with some spider code:
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
import lxml.etree
import lxml.html
MARKER = "|"
def br2nl(tree):
for element in tree:
for elem in element.iter("br"):
elem.text = MARKER
def extract_category_lines(tree):
if tree is not None and len(tree):
# modify the tree by adding a MARKER after <br> elements
br2nl(tree)
# use lxml's .tostring() to get a unicode string
# and split lines on the marker we added above
# so we get lists of actors, producers, directors...
return lxml.html.tostring(
tree[0], method="text", encoding=unicode).split(MARKER)
class BoxOfficeMojoSpider(BaseSpider):
name = "boxofficemojo"
start_urls = [
"http://www.boxofficemojo.com/movies/?id=actionjackson.htm",
"http://www.boxofficemojo.com/movies/?id=cloudatlas.htm",
]
# locate 2nd cell by text content of first cell
XPATH_CATEGORY_CELL = lxml.etree.XPath('.//tr[starts-with(td[1], $category)]/td[2]')
def parse(self, response):
root = lxml.html.fromstring(response.body)
# locate the "The Players" table
players = root.xpath('//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table')
# we have only one table in "players" so the for loop is not really necessary
for players_table in players:
directors_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Director")
actors_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Actor")
producers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Producer")
writers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Producer")
composers_cells = self.XPATH_CATEGORY_CELL(players_table,
category="Composer")
directors = extract_category_lines(directors_cells)
actors = extract_category_lines(actors_cells)
producers = extract_category_lines(producers_cells)
writers = extract_category_lines(writers_cells)
composers = extract_category_lines(composers_cells)
print "Directors:", directors
print "Actors:", actors
print "Producers:", producers
print "Writers:", writers
print "Composers:", composers
# here you should of course populate scrapy items
The code can be simplified for sure, but I hope you get the idea.
You can do similar things with HtmlXPathSelector of course (with the string() XPath function for example), but without modifying the tree for <br> (how to do that with hxs?) it works only for non-multiple names in your case:
>>> hxs.select('string(//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table//tr[contains(td, "Director")]/td[2])').extract()
[u'Craig R. Baxley']
>>> hxs.select('string(//div[#class="mp_box"][div[#class="mp_box_tab"]="The Players"]/div[#class="mp_box_content"]/table//tr[contains(td, "Actor")]/td[2])').extract()
[u'Carl WeathersCraig T. NelsonSharon Stone']