How to include the source line number everywhere in html output in Sphinx? - python-sphinx

Let's say I'm writing a custom editor for my RestructuredText/Sphinx stuff, with "live" html output preview. Output is built using Sphinx.
The source files are pure RestructuredText. No code there.
One desirable feature would be that right-clicking on some part of the preview opens the editor at the correct line of the source file.
To achieve that, one way would be to put that line number in every tag of the html file, for example using classes (e.g., class = "... lineno-124"). Or use html comments.
Note that I don't want to add more content to my source files, just that the line number be included everywhere in the output.
An approximate line number would be enough.
Someone knows how to do this in Sphinx, my way or another?

I decided to add <a> tags with a specific class "lineno lineno-nnn" where nnn is the line number in the RestructuredText source.
The directive .. linenocomment:: nnn is inserted before each new block of unindented text in the source, before the actual parsing (using a 'source-read' event hook).
linenocomment is a custom directive that pushes the <a> tag at build time.
Half a solution is still a solution...
import docutils.nodes as dn
from docutils.parsers.rst import Directive
class linenocomment(dn.General,dn.Element):
pass
def visit_linenocomment_html(self,node):
self.body.append(self.starttag(node,'a',CLASS="lineno lineno-{}".format(node['lineno'])))
def depart_linenocomment_html(self,node):
self.body.append('</a>')
class LineNoComment(Directive):
required_arguments = 1
optional_arguments = 0
has_content = False
add_index = False
def run(self):
node = linenocomment()
node['lineno'] = self.arguments[0]
return [node]
def insert_line_comments(app, docname, source):
print(source)
new_source = []
last_line_empty = True
lineno = 0
for line in source[0].split('\n'):
if line.strip() == '':
last_line_empty = True
new_source.append(line)
elif line[0].isspace():
new_source.append(line)
last_line_empty = False
elif not last_line_empty:
new_source.append(line)
else:
last_line_empty = False
new_source.append('.. linenocomment:: {}'.format(lineno))
new_source.append('')
new_source.append(line)
lineno += 1
source[0] = '\n'.join(new_source)
print(source)
def setup(app):
app.add_node(linenocomment,html=(visit_linenocomment_html,depart_linenocomment_html))
app.add_directive('linenocomment', LineNoComment)
app.connect('source-read',insert_line_comments)
return {
'version': 0.1
}

Related

MyST-Parser: Auto linking / linkifying references to bug tracker issues

I use sphinx w/ MyST-Parser for markdown, and
I want GitHub or GitLab-style auto linking (linkfying) for references.
Is there a way to have MyST render the reference:
#346
In docutils-speak, this is a Text node (example)
And behave as if it was:
[#346](https://github.com/vcs-python/libvcs/pull/346)
So when rendered it'd be like:
#346
Not the custom role:
{issue}`1` <- Not this
Another example: Linkifying the reference #user to a GitHub, GitLab, StackOverflow user.
What I'm currently doing (and why it doesn't work)
Right now I'm using the canonical solution docutils offers: custom roles.
I use sphinx-issues (PyPI), and does just that. It uses a sphinx setting variable, issues_github_path to parse the URL:
e.g. in Sphinx configuration conf.py:
issues_github_path = 'vcs-python/libvcs'
reStructuredText:
:issue:`346`
MyST-Parser:
{issue}`346`
Why custom roles don't work
Sadly, those aren't bi-directional with GitHub/GitLab/tools. If you copy/paste MyST-Parser -> GitHub/GitLab or preview it directly, it looks very bad:
Example of CHANGES:
Example issue: https://github.com/vcs-python/libvcs/issues/363
What we want is to just be able to copy markdown including #347 to and from.
Does a solution already exist?
Are there any projects out there of docutils or sphinx plugins to turn #username or #issues into links?
sphinx (at least) can demonstrable do so for custom roles - as seen in sphinx-issues usage of issues_github_path - by using project configuration context.
MyST-Parser has a linkify extension which uses linkify-it-py
This can turn https://www.google.com into https://www.google.com and not need to use <https://www.google.com>.
Therefore, there may already be a tool out there.
Can it be done through the API?
The toolchain for myst, sphinx and docutils is robust. This is a special case.
This needs to be done at the Text node level. Custom role won't work - as stated above - since it'll create markdown that can't be copied between GitLab and GitHub issues trivially.
The stack:
MyST-Parser API (Markdown-it-py API) > Sphinx APIs (MySTParser + Sphinx) > Docutils API
At the time of writing, I'm using Sphinx 4.3.2, MyST-Parser 0.17.2, and docutils 0.17.1 on python 3.10.2.
Notes
For the sake of an example, I'm using an open source project of mine that is facing this issue.
This is only about autolinking issues or usernames - things that'd easily be mappable to URLs. autodoc code-linking is out of scope.
There is a (defunct) project that does this: sphinxcontrib-issuetracker.
I've rebooted it:
conf.py:
import sys
from pathlib import Path
cwd = Path(__file__).parent
project_root = cwd.parent
sys.path.insert(0, str(project_root))
sys.path.insert(0, str(cwd / "_ext"))
extensions = [
"link_issues",
]
# issuetracker
issuetracker = "github"
issuetracker_project = "cihai/unihan-etl" # e.g. for https://github.com/cihai/unihan-etl
_ext/link_issues.py:
"""Issue linking w/ plain-text autolinking, e.g. #42
Credit: https://github.com/ignatenkobrain/sphinxcontrib-issuetracker
License: BSD
Changes by Tony Narlock (2022-08-21):
- Type annotations
mypy --strict, requires types-requests, types-docutils
Python < 3.10 require typing-extensions
- TrackerConfig: Use dataclasses instead of typing.NamedTuple and hacking __new__
- app.warn (removed in 5.0) -> Use Sphinx Logging API
https://www.sphinx-doc.org/en/master/extdev/logging.html#logging-api
- Add PendingIssueXRef
Typing for tracker_config and precision
- Add IssueTrackerBuildEnvironment
Subclassed / typed BuildEnvironment with .tracker_config
- Just GitHub (for demonstration)
"""
import dataclasses
import re
import sys
import time
import typing as t
import requests
from docutils import nodes
from sphinx.addnodes import pending_xref
from sphinx.application import Sphinx
from sphinx.config import Config
from sphinx.environment import BuildEnvironment
from sphinx.transforms import SphinxTransform
from sphinx.util import logging
if t.TYPE_CHECKING:
if sys.version_info >= (3, 10):
from typing import TypeGuard
else:
from typing_extensions import TypeGuard
logger = logging.getLogger(__name__)
GITHUB_API_URL = "https://api.github.com/repos/{0.project}/issues/{1}"
class IssueTrackerBuildEnvironment(BuildEnvironment):
tracker_config: "TrackerConfig"
issuetracker_cache: "IssueTrackerCache"
github_rate_limit: t.Tuple[float, bool]
class Issue(t.NamedTuple):
id: str
title: str
url: str
closed: bool
IssueTrackerCache = t.Dict[str, Issue]
#dataclasses.dataclass
class TrackerConfig:
project: str
url: str
"""
Issue tracker configuration.
This class provides configuration for trackers, and is passed as
``tracker_config`` arguments to callbacks of
:event:`issuetracker-lookup-issue`.
"""
def __post_init__(self) -> None:
if self.url is not None:
self.url = self.url.rstrip("/")
#classmethod
def from_sphinx_config(cls, config: Config) -> "TrackerConfig":
"""
Get tracker configuration from ``config``.
"""
project = config.issuetracker_project or config.project
url = config.issuetracker_url
return cls(project=project, url=url)
class PendingIssueXRef(pending_xref):
tracker_config: TrackerConfig
class IssueReferences(SphinxTransform):
default_priority = 999
def apply(self) -> None:
config = self.document.settings.env.config
tracker_config = TrackerConfig.from_sphinx_config(config)
issue_pattern = config.issuetracker_issue_pattern
title_template = None
if isinstance(issue_pattern, str):
issue_pattern = re.compile(issue_pattern)
for node in self.document.traverse(nodes.Text):
parent = node.parent
if isinstance(parent, (nodes.literal, nodes.FixedTextElement)):
# ignore inline and block literal text
continue
if isinstance(parent, nodes.reference):
continue
text = str(node)
new_nodes = []
last_issue_ref_end = 0
for match in issue_pattern.finditer(text):
# catch invalid pattern with too many groups
if len(match.groups()) != 1:
raise ValueError(
"issuetracker_issue_pattern must have "
"exactly one group: {0!r}".format(match.groups())
)
# extract the text between the last issue reference and the
# current issue reference and put it into a new text node
head = text[last_issue_ref_end : match.start()]
if head:
new_nodes.append(nodes.Text(head))
# adjust the position of the last issue reference in the
# text
last_issue_ref_end = match.end()
# extract the issue text (including the leading dash)
issuetext = match.group(0)
# extract the issue number (excluding the leading dash)
issue_id = match.group(1)
# turn the issue reference into a reference node
refnode = PendingIssueXRef()
refnode["refdomain"] = None
refnode["reftarget"] = issue_id
refnode["reftype"] = "issue"
refnode["trackerconfig"] = tracker_config
reftitle = title_template or issuetext
refnode.append(
nodes.inline(issuetext, reftitle, classes=["xref", "issue"])
)
new_nodes.append(refnode)
if not new_nodes:
# no issue references were found, move on to the next node
continue
# extract the remaining text after the last issue reference, and
# put it into a text node
tail = text[last_issue_ref_end:]
if tail:
new_nodes.append(nodes.Text(tail))
# find and remove the original node, and insert all new nodes
# instead
parent.replace(node, new_nodes)
def is_issuetracker_env(
env: t.Any,
) -> "TypeGuard['IssueTrackerBuildEnvironment']":
return hasattr(env, "issuetracker_cache") and env.issuetracker_cache is not None
def lookup_issue(
app: Sphinx, tracker_config: TrackerConfig, issue_id: str
) -> t.Optional[Issue]:
"""
Lookup the given issue.
The issue is first looked up in an internal cache. If it is not found, the
event ``issuetracker-lookup-issue`` is emitted. The result of this
invocation is then cached and returned.
``app`` is the sphinx application object. ``tracker_config`` is the
:class:`TrackerConfig` object representing the issue tracker configuration.
``issue_id`` is a string containing the issue id.
Return a :class:`Issue` object for the issue with the given ``issue_id``,
or ``None`` if the issue wasn't found.
"""
env = app.env
if is_issuetracker_env(env):
cache: IssueTrackerCache = env.issuetracker_cache
if issue_id not in cache:
issue = app.emit_firstresult(
"issuetracker-lookup-issue", tracker_config, issue_id
)
cache[issue_id] = issue
return cache[issue_id]
return None
def lookup_issues(app: Sphinx, doctree: nodes.document) -> None:
"""
Lookup issues found in the given ``doctree``.
Each issue reference in the given ``doctree`` is looked up. Each lookup
result is cached by mapping the referenced issue id to the looked up
:class:`Issue` object (an existing issue) or ``None`` (a missing issue).
The cache is available at ``app.env.issuetracker_cache`` and is pickled
along with the environment.
"""
for node in doctree.traverse(PendingIssueXRef):
if node["reftype"] == "issue":
lookup_issue(app, node["trackerconfig"], node["reftarget"])
def make_issue_reference(issue: Issue, content_node: nodes.inline) -> nodes.reference:
"""
Create a reference node for the given issue.
``content_node`` is a docutils node which is supposed to be added as
content of the created reference. ``issue`` is the :class:`Issue` which
the reference shall point to.
Return a :class:`docutils.nodes.reference` for the issue.
"""
reference = nodes.reference()
reference["refuri"] = issue.url
if issue.title:
reference["reftitle"] = issue.title
if issue.closed:
content_node["classes"].append("closed")
reference.append(content_node)
return reference
def resolve_issue_reference(
app: Sphinx, env: BuildEnvironment, node: PendingIssueXRef, contnode: nodes.inline
) -> t.Optional[nodes.reference]:
"""
Resolve an issue reference and turn it into a real reference to the
corresponding issue.
``app`` and ``env`` are the Sphinx application and environment
respectively. ``node`` is a ``pending_xref`` node representing the missing
reference. It is expected to have the following attributes:
- ``reftype``: The reference type
- ``trackerconfig``: The :class:`TrackerConfig`` to use for this node
- ``reftarget``: The issue id
- ``classes``: The node classes
References with a ``reftype`` other than ``'issue'`` are skipped by
returning ``None``. Otherwise the new node is returned.
If the referenced issue was found, a real reference to this issue is
returned. The text of this reference is formatted with the :class:`Issue`
object available in the ``issue`` key. The reference title is set to the
issue title. If the issue is closed, the class ``closed`` is added to the
new content node.
Otherwise, if the issue was not found, the content node is returned.
"""
if node["reftype"] != "issue":
return None
issue = lookup_issue(app, node["trackerconfig"], node["reftarget"])
if issue is None:
return contnode
else:
classes = contnode["classes"]
conttext = str(contnode[0])
formatted_conttext = nodes.Text(conttext.format(issue=issue))
formatted_contnode = nodes.inline(conttext, formatted_conttext, classes=classes)
assert issue is not None
return make_issue_reference(issue, formatted_contnode)
return None
def init_cache(app: Sphinx) -> None:
if not hasattr(app.env, "issuetracker_cache"):
app.env.issuetracker_cache: "IssueTrackerCache" = {} # type: ignore
return None
def check_project_with_username(tracker_config: TrackerConfig) -> None:
if "/" not in tracker_config.project:
raise ValueError(
"username missing in project name: {0.project}".format(tracker_config)
)
HEADERS = {"User-Agent": "sphinxcontrib-issuetracker v{0}".format("1.0")}
def get(app: Sphinx, url: str) -> t.Optional[requests.Response]:
"""
Get a response from the given ``url``.
``url`` is a string containing the URL to request via GET. ``app`` is the
Sphinx application object.
Return the :class:`~requests.Response` object on status code 200, or
``None`` otherwise. If the status code is not 200 or 404, a warning is
emitted via ``app``.
"""
response = requests.get(url, headers=HEADERS)
if response.status_code == requests.codes.ok:
return response
elif response.status_code != requests.codes.not_found:
msg = "GET {0.url} failed with code {0.status_code}"
logger.warning(msg.format(response))
return None
def lookup_github_issue(
app: Sphinx, tracker_config: TrackerConfig, issue_id: str
) -> t.Optional[Issue]:
check_project_with_username(tracker_config)
env = app.env
if is_issuetracker_env(env):
# Get rate limit information from the environment
timestamp, limit_hit = getattr(env, "github_rate_limit", (0, False))
if limit_hit and time.time() - timestamp > 3600:
# Github limits applications hourly
limit_hit = False
if not limit_hit:
url = GITHUB_API_URL.format(tracker_config, issue_id)
response = get(app, url)
if response:
rate_remaining = response.headers.get("X-RateLimit-Remaining")
assert rate_remaining is not None
if rate_remaining.isdigit() and int(rate_remaining) == 0:
logger.warning("Github rate limit hit")
env.github_rate_limit = (time.time(), True)
issue = response.json()
closed = issue["state"] == "closed"
return Issue(
id=issue_id,
title=issue["title"],
closed=closed,
url=issue["html_url"],
)
else:
logger.warning(
"Github rate limit exceeded, not resolving issue {0}".format(issue_id)
)
return None
BUILTIN_ISSUE_TRACKERS: t.Dict[str, t.Any] = {
"github": lookup_github_issue,
}
def init_transformer(app: Sphinx) -> None:
if app.config.issuetracker_plaintext_issues:
app.add_transform(IssueReferences)
def connect_builtin_tracker(app: Sphinx) -> None:
if app.config.issuetracker:
tracker = BUILTIN_ISSUE_TRACKERS[app.config.issuetracker.lower()]
app.connect(str("issuetracker-lookup-issue"), tracker)
def setup(app: Sphinx) -> t.Dict[str, t.Any]:
app.add_config_value("mybase", "https://github.com/cihai/unihan-etl", "env")
app.add_event(str("issuetracker-lookup-issue"))
app.connect(str("builder-inited"), connect_builtin_tracker)
app.add_config_value("issuetracker", None, "env")
app.add_config_value("issuetracker_project", None, "env")
app.add_config_value("issuetracker_url", None, "env")
# configuration specific to plaintext issue references
app.add_config_value("issuetracker_plaintext_issues", True, "env")
app.add_config_value(
"issuetracker_issue_pattern",
re.compile(
r"#(\d+)",
),
"env",
)
app.add_config_value("issuetracker_title_template", None, "env")
app.connect(str("builder-inited"), init_cache)
app.connect(str("builder-inited"), init_transformer)
app.connect(str("doctree-read"), lookup_issues)
app.connect(str("missing-reference"), resolve_issue_reference)
return {
"version": "1.0",
"parallel_read_safe": True,
"parallel_write_safe": True,
}
Mirrors
https://gist.github.com/tony/05a3043d97d37c158763fb2f6a2d5392
https://github.com/ignatenkobrain/sphinxcontrib-issuetracker/issues/25
Mypy users
mypy --strict docs/_ext/link_issues.py work as of mypy 0.971
If you use mypy: pip install types-docutils types-requests
Install:
https://pypi.org/project/types-docutils/
https://pypi.org/project/types-requests/
https://pypi.org/project/typing-extensions/ (Python <3.10)
Example
via unihan-etl#261 / v0.17.2 (source, view, but page may be outdated)

Keep custom code block attributes in pandoc when converting to Markdown

I am converting an org file to Markdown (specifically commonmark). I am adding a custom attribute to my code blocks, which the commonmark writer does not support, and strips them from the code block during conversion. I am trying to find a way to keep my custom attributes.
This is what I have:
#+begin_src python :hl_lines "2"
def some_function():
print("foo bar")
return
#+end_src
This is what I want in my .md file:
``` python hl_lines="2"
def some_function():
print("foo bar")
return
```
After doing some research, I think a filter can solve my issue: I am now playing with panflute, a python lib for writing pandoc filters.
I found some relevant questions, but they apply to other conversions (rST -> html, rst -> latex) and I don't know enough Lua to translate the code into Python and the org -> md conversion.
Thanks for any help.
I was able to write a script, posting it here for future Python-based questions about pandoc filters.
The filter below requires panflute, but there are other libs for pandoc filters in Python.
import panflute
def keep_attributes_markdown(elem, doc, format="commonmark"):
"""Keep custom attributes specified in code block headers when exporting to Markdown"""
if type(elem) == panflute.CodeBlock:
language = "." + elem.classes[0]
attributes = ""
attributes = " ".join(
[key + "=" + value for key, value in elem.attributes.items()]
)
header = "``` { " + " ".join([language, attributes]).strip() + " }"
panflute.debug(header)
code = elem.text.strip()
footer = "```"
content = [
panflute.RawBlock(header, format=format),
panflute.RawBlock(code, format=format),
panflute.RawBlock(footer, format=format),
]
return content
def main(doc=None):
return panflute.run_filter(keep_attributes_markdown, doc=doc)
if __name__ == "__main__":
main()
You can now run the following command:
pandoc --from=org --to=commonmark --filter=/full/path/to/keep_attributes_markdown.py --output=target_file.md your_file.org

Declare additional dependency to sphinx-build in an extension

TL,DR: From a Sphinx extension, how do I tell sphinx-build to treat an additional file as a dependency? In my immediate use case, this is the extension's source code, but the question could equally apply to some auxiliary file used by the extension.
I'm generating documentation with Sphinx using a custom extension. I'm using sphinx-build to build the documentation. For example, I use this command to generate the HTML (this is the command in the makefile generated by sphinx-quickstart):
sphinx-build -b html -d _build/doctrees . _build/html
Since my custom extension is maintained together with the source of the documentation, I want sphinx-build to treat it as a dependency of the generated HTML (and LaTeX, etc.). So whenever I change my extension's source code, I want sphinx-build to regenerate the output.
How do I tell sphinx-build to treat an additional file as a dependency? That is not mentioned in the toctree, since it isn't part of the source. Logically, this should be something I do from my extension's setup function.
Sample extension (my_extension.py):
from docutils import nodes
from docutils.parsers.rst import Directive
class Foo(Directive):
def run(self):
node = nodes.paragraph(text='Hello world\n')
return [node]
def setup(app):
app.add_directive('foo', Foo)
Sample source (index.rst):
.. toctree::
:maxdepth: 2
.. foo::
Sample conf.py (basically the output of sphinx-quickstart plus my extension):
import sys
import os
sys.path.insert(0, os.path.abspath('.'))
extensions = ['my_extension']
templates_path = ['_templates']
source_suffix = '.rst'
master_doc = 'index'
project = 'Hello directive'
copyright = '2019, Gilles'
author = 'Gilles'
version = '1'
release = '1'
language = None
exclude_patterns = ['_build']
pygments_style = 'sphinx'
todo_include_todos = False
html_theme = 'alabaster'
html_static_path = ['_static']
htmlhelp_basename = 'Hellodirectivedoc'
latex_elements = {
}
latex_documents = [
(master_doc, 'Hellodirective.tex', 'Hello directive Documentation',
'Gilles', 'manual'),
]
man_pages = [
(master_doc, 'hellodirective', 'Hello directive Documentation',
[author], 1)
]
texinfo_documents = [
(master_doc, 'Hellodirective', 'Hello directive Documentation',
author, 'Hellodirective', 'One line description of project.',
'Miscellaneous'),
]
Validation of a solution:
Run make html (or sphinx-build as above).
Modify my_extension.py to replace Hello world by Hello again.
Run make html again.
The generated HTML (_build/html/index.html) must now contain Hello again instead of Hello world.
It looks like the note_dependency method in the build environment API should do what I want. But when should I call it? I tried various events but none seemed to hit the environment object in the right state. What did work was to call it from a directive.
import os
from docutils import nodes
from docutils.parsers.rst import Directive
import sphinx.application
class Foo(Directive):
def run(self):
self.state.document.settings.env.note_dependency(__file__)
node = nodes.paragraph(text='Hello done\n')
return [node]
def setup(app):
app.add_directive('foo', Foo)
If a document contains at least one foo directive, it'll get marked as stale when the extension that introduces this directive changes. This makes sense, although it could get tedious if an extension adds many directives or makes different changes. I don't know if there's a better way.
Inspired by Luc Van Oostenryck's autodoc-C.
As far as I know app.env.note_dependency can be called within the doctree-read to add any file as a dependency to the document currently being read.
So in your use case, I assume this would work:
from typing import Any, Dict
from sphinx.application import Sphinx
import docutils.nodes as nodes
def doctree-read(app: Sphinx, doctree: nodes.document):
app.env.note_dependency(file)
def setup(app: Sphinx):
app.connect("doctree-read", doctree-read)

How to stop ImageMagick in Ruby (Rmagick) evaluating an # sign in text annotation

In an app I recently built for a client the following code resulted in the variable #nameText being evaluated, and then resulting in an error 'no text' (since the variable doesn't exist).
To get around this I used gsub, as per the example below. Is there a way to tell Magick not to evaluate the string at all?
require 'RMagick'
#image = Magick::Image.read( '/path/to/image.jpg' ).first
#nameText = '#SomeTwitterUser'
#text = Magick::Draw.new
#text.font_family = 'Futura'
#text.pointsize = 22
#text.font_weight = Magick::BoldWeight
# Causes error 'no text'...
# #text.annotate( #image, 0,0,200,54, #nameText )
#text.annotate( #image, 0,0,200,54, #nameText.gsub('#', '\#') )
This is the C code from RMagick that is returning the error:
// Translate & store in Draw structure
draw->info->text = InterpretImageProperties(NULL, image, StringValuePtr(text));
if (!draw->info->text)
{
rb_raise(rb_eArgError, "no text");
}
It is the call to InterpretImageProperties that is modifying the input text - but it is not Ruby, or a Ruby instance variable that it is trying to reference. The function is defined here in the Image Magick core library: http://www.imagemagick.org/api/MagickCore/property_8c_source.html#l02966
Look a bit further down, and you can see the code:
/* handle a '#' replace string from file */
if (*p == '#') {
p++;
if (*p != '-' && (IsPathAccessible(p) == MagickFalse) ) {
(void) ThrowMagickException(&image->exception,GetMagickModule(),
OptionError,"UnableToAccessPath","%s",p);
return((char *) NULL);
}
return(FileToString(p,~0,&image->exception));
}
In summary, this is a core library feature which will attempt to load text from file (named SomeTwitterUser in your case, I have confirmed this -try it!), and your work-around is probably the best you can do.
For efficiency, and minimal changes to input strings, you could rely on the selectivity of the library code and only modify the string if it starts with #:
#text.annotate( #image, 0,0,200,54, #name_string.gsub( /^#/, '\#') )

How can I include a YAML file inside another?

So I have two YAML files, "A" and "B" and I want the contents of A to be inserted inside B, either spliced into the existing data structure, like an array, or as a child of an element, like the value for a certain hash key.
Is this possible at all? How? If not, any pointers to a normative reference?
No, standard YAML does not include any kind of "import" or "include" statement.
Your question does not ask for a Python solution, but here is one using PyYAML.
PyYAML allows you to attach custom constructors (such as !include) to the YAML loader. I've included a root directory that can be set so that this solution supports relative and absolute file references.
Class-Based Solution
Here is a class-based solution, that avoids the global root variable of my original response.
See this gist for a similar, more robust Python 3 solution that uses a metaclass to register the custom constructor.
import yaml
import os
class Loader(yaml.SafeLoader):
def __init__(self, stream):
self._root = os.path.split(stream.name)[0]
super(Loader, self).__init__(stream)
def include(self, node):
filename = os.path.join(self._root, self.construct_scalar(node))
with open(filename, 'r') as f:
return yaml.load(f, Loader)
Loader.add_constructor('!include', Loader.include)
An example:
foo.yaml
a: 1
b:
- 1.43
- 543.55
c: !include bar.yaml
bar.yaml
- 3.6
- [1, 2, 3]
Now the files can be loaded using:
>>> with open('foo.yaml', 'r') as f:
>>> data = yaml.load(f, Loader)
>>> data
{'a': 1, 'b': [1.43, 543.55], 'c': [3.6, [1, 2, 3]]}
For Python users, you can try pyyaml-include.
Install
pip install pyyaml-include
Usage
import yaml
from yamlinclude import YamlIncludeConstructor
YamlIncludeConstructor.add_to_loader_class(loader_class=yaml.FullLoader, base_dir='/your/conf/dir')
with open('0.yaml') as f:
data = yaml.load(f, Loader=yaml.FullLoader)
print(data)
Consider we have such YAML files:
├── 0.yaml
└── include.d
├── 1.yaml
└── 2.yaml
1.yaml 's content:
name: "1"
2.yaml 's content:
name: "2"
Include files by name
On top level:
If 0.yaml was:
!include include.d/1.yaml
We'll get:
{"name": "1"}
In mapping:
If 0.yaml was:
file1: !include include.d/1.yaml
file2: !include include.d/2.yaml
We'll get:
file1:
name: "1"
file2:
name: "2"
In sequence:
If 0.yaml was:
files:
- !include include.d/1.yaml
- !include include.d/2.yaml
We'll get:
files:
- name: "1"
- name: "2"
ℹ Note:
File name can be either absolute (like /usr/conf/1.5/Make.yml) or relative (like ../../cfg/img.yml).
Include files by wildcards
File name can contain shell-style wildcards. Data loaded from the file(s) found by wildcards will be set in a sequence.
If 0.yaml was:
files: !include include.d/*.yaml
We'll get:
files:
- name: "1"
- name: "2"
ℹ Note:
For Python>=3.5, if recursive argument of !include YAML tag is true, the pattern “**” will match any files and zero or more directories and subdirectories.
Using the “**” pattern in large directory trees may consume an inordinate amount of time because of recursive search.
In order to enable recursive argument, we shall write the !include tag in Mapping or Sequence mode:
Arguments in Sequence mode:
!include [tests/data/include.d/**/*.yaml, true]
Arguments in Mapping mode:
!include {pathname: tests/data/include.d/**/*.yaml, recursive: true}
Includes are not directly supported in YAML as far as I know, you will have to provide a mechanism yourself however, this is generally easy to do.
I have used YAML as a configuration language in my python apps, and in this case often define a convention like this:
>>> main.yml <<<
includes: [ wibble.yml, wobble.yml]
Then in my (python) code I do:
import yaml
cfg = yaml.load(open("main.yml"))
for inc in cfg.get("includes", []):
cfg.update(yaml.load(open(inc)))
The only down side is that variables in the includes will always override the variables in main, and there is no way to change that precedence by changing where the "includes: statement appears in the main.yml file.
On a slightly different point, YAML doesn't support includes as its not really designed as as exclusively as a file based mark up. What would an include mean if you got it in a response to an AJAX request?
The YML standard does not specify a way to do this. And this problem does not limit itself to YML. JSON has the same limitations.
Many applications which use YML or JSON based configurations run into this problem eventually. And when that happens, they make up their own convention.
e.g. for swagger API definitions:
$ref: 'file.yml'
e.g. for docker compose configurations:
services:
app:
extends:
file: docker-compose.base.yml
Alternatively, if you want to split up the content of a yml file in multiple files, like a tree of content, you can define your own folder-structure convention and use an (existing) merge script.
Expanding on #Josh_Bode's answer, here's my own PyYAML solution, which has the advantage of being a self-contained subclass of yaml.Loader. It doesn't depend on any module-level globals, or on modifying the global state of the yaml module.
import yaml, os
class IncludeLoader(yaml.Loader):
"""
yaml.Loader subclass handles "!include path/to/foo.yml" directives in config
files. When constructed with a file object, the root path for includes
defaults to the directory containing the file, otherwise to the current
working directory. In either case, the root path can be overridden by the
`root` keyword argument.
When an included file F contain its own !include directive, the path is
relative to F's location.
Example:
YAML file /home/frodo/one-ring.yml:
---
Name: The One Ring
Specials:
- resize-to-wearer
Effects:
- !include path/to/invisibility.yml
YAML file /home/frodo/path/to/invisibility.yml:
---
Name: invisibility
Message: Suddenly you disappear!
Loading:
data = IncludeLoader(open('/home/frodo/one-ring.yml', 'r')).get_data()
Result:
{'Effects': [{'Message': 'Suddenly you disappear!', 'Name':
'invisibility'}], 'Name': 'The One Ring', 'Specials':
['resize-to-wearer']}
"""
def __init__(self, *args, **kwargs):
super(IncludeLoader, self).__init__(*args, **kwargs)
self.add_constructor('!include', self._include)
if 'root' in kwargs:
self.root = kwargs['root']
elif isinstance(self.stream, file):
self.root = os.path.dirname(self.stream.name)
else:
self.root = os.path.curdir
def _include(self, loader, node):
oldRoot = self.root
filename = os.path.join(self.root, loader.construct_scalar(node))
self.root = os.path.dirname(filename)
data = yaml.load(open(filename, 'r'))
self.root = oldRoot
return data
With Yglu, you can import other files like this:
A.yaml
foo: !? $import('B.yaml')
B.yaml
bar: Hello
$ yglu A.yaml
foo:
bar: Hello
As $import is a function, you can also pass an expression as argument:
dep: !- b
foo: !? $import($_.dep.toUpper() + '.yaml')
This would give the same output as above.
Disclaimer: I am the author of Yglu.
Standard YAML 1.2 doesn't include natively this feature. Nevertheless many implementations provides some extension to do so.
I present a way of achieving it with Java and snakeyaml:1.24 (Java library to parse/emit YAML files) that allows creating a custom YAML tag to achieve the following goal (you will see I'm using it to load test suites defined in several YAML files and that I made it work as a list of includes for a target test: node):
# ... yaml prev stuff
tests: !include
- '1.hello-test-suite.yaml'
- '3.foo-test-suite.yaml'
- '2.bar-test-suite.yaml'
# ... more yaml document
Here is the one-class Java that allows processing the !include tag. Files are loaded from classpath (Maven resources directory):
/**
* Custom YAML loader. It adds support to the custom !include tag which allows splitting a YAML file across several
* files for a better organization of YAML tests.
*/
#Slf4j // <-- This is a Lombok annotation to auto-generate logger
public class MyYamlLoader {
private static final Constructor CUSTOM_CONSTRUCTOR = new MyYamlConstructor();
private MyYamlLoader() {
}
/**
* Parse the only YAML document in a stream and produce the Java Map. It provides support for the custom !include
* YAML tag to split YAML contents across several files.
*/
public static Map<String, Object> load(InputStream inputStream) {
return new Yaml(CUSTOM_CONSTRUCTOR)
.load(inputStream);
}
/**
* Custom SnakeYAML constructor that registers custom tags.
*/
private static class MyYamlConstructor extends Constructor {
private static final String TAG_INCLUDE = "!include";
MyYamlConstructor() {
// Register custom tags
yamlConstructors.put(new Tag(TAG_INCLUDE), new IncludeConstruct());
}
/**
* The actual include tag construct.
*/
private static class IncludeConstruct implements Construct {
#Override
public Object construct(Node node) {
List<Node> inclusions = castToSequenceNode(node);
return parseInclusions(inclusions);
}
#Override
public void construct2ndStep(Node node, Object object) {
// do nothing
}
private List<Node> castToSequenceNode(Node node) {
try {
return ((SequenceNode) node).getValue();
} catch (ClassCastException e) {
throw new IllegalArgumentException(String.format("The !import value must be a sequence node, but " +
"'%s' found.", node));
}
}
private Object parseInclusions(List<Node> inclusions) {
List<InputStream> inputStreams = inputStreams(inclusions);
try (final SequenceInputStream sequencedInputStream =
new SequenceInputStream(Collections.enumeration(inputStreams))) {
return new Yaml(CUSTOM_CONSTRUCTOR)
.load(sequencedInputStream);
} catch (IOException e) {
log.error("Error closing the stream.", e);
return null;
}
}
private List<InputStream> inputStreams(List<Node> scalarNodes) {
return scalarNodes.stream()
.map(this::inputStream)
.collect(toList());
}
private InputStream inputStream(Node scalarNode) {
String filePath = castToScalarNode(scalarNode).getValue();
final InputStream is = getClass().getClassLoader().getResourceAsStream(filePath);
Assert.notNull(is, String.format("Resource file %s not found.", filePath));
return is;
}
private ScalarNode castToScalarNode(Node scalarNode) {
try {
return ((ScalarNode) scalarNode);
} catch (ClassCastException e) {
throw new IllegalArgumentException(String.format("The value must be a scalar node, but '%s' found" +
".", scalarNode));
}
}
}
}
}
Unfortunately YAML doesn't provide this in its standard.
But if you are using Ruby, there is a gem providing the functionality you are asking for by extending the ruby YAML library:
https://github.com/entwanderer/yaml_extend
I make some examples for your reference.
import yaml
main_yaml = """
Package:
- !include _shape_yaml
- !include _path_yaml
"""
_shape_yaml = """
# Define
Rectangle: &id_Rectangle
name: Rectangle
width: &Rectangle_width 20
height: &Rectangle_height 10
area: !product [*Rectangle_width, *Rectangle_height]
Circle: &id_Circle
name: Circle
radius: &Circle_radius 5
area: !product [*Circle_radius, *Circle_radius, pi]
# Setting
Shape:
property: *id_Rectangle
color: red
"""
_path_yaml = """
# Define
Root: &BASE /path/src/
Paths:
a: &id_path_a !join [*BASE, a]
b: &id_path_b !join [*BASE, b]
# Setting
Path:
input_file: *id_path_a
"""
# define custom tag handler
def yaml_import(loader, node):
other_yaml_file = loader.construct_scalar(node)
return yaml.load(eval(other_yaml_file), Loader=yaml.SafeLoader)
def yaml_product(loader, node):
import math
list_data = loader.construct_sequence(node)
result = 1
pi = math.pi
for val in list_data:
result *= eval(val) if isinstance(val, str) else val
return result
def yaml_join(loader, node):
seq = loader.construct_sequence(node)
return ''.join([str(i) for i in seq])
def yaml_ref(loader, node):
ref = loader.construct_sequence(node)
return ref[0]
def yaml_dict_ref(loader: yaml.loader.SafeLoader, node):
dict_data, key, const_value = loader.construct_sequence(node)
return dict_data[key] + str(const_value)
def main():
# register the tag handler
yaml.SafeLoader.add_constructor(tag='!include', constructor=yaml_import)
yaml.SafeLoader.add_constructor(tag='!product', constructor=yaml_product)
yaml.SafeLoader.add_constructor(tag='!join', constructor=yaml_join)
yaml.SafeLoader.add_constructor(tag='!ref', constructor=yaml_ref)
yaml.SafeLoader.add_constructor(tag='!dict_ref', constructor=yaml_dict_ref)
config = yaml.load(main_yaml, Loader=yaml.SafeLoader)
pk_shape, pk_path = config['Package']
pk_shape, pk_path = pk_shape['Shape'], pk_path['Path']
print(f"shape name: {pk_shape['property']['name']}")
print(f"shape area: {pk_shape['property']['area']}")
print(f"shape color: {pk_shape['color']}")
print(f"input file: {pk_path['input_file']}")
if __name__ == '__main__':
main()
output
shape name: Rectangle
shape area: 200
shape color: red
input file: /path/src/a
Update 2
and you can combine it, like this
# xxx.yaml
CREATE_FONT_PICTURE:
PROJECTS:
SUNG: &id_SUNG
name: SUNG
work_dir: SUNG
output_dir: temp
font_pixel: 24
DEFINE: &id_define !ref [*id_SUNG] # you can use config['CREATE_FONT_PICTURE']['DEFINE'][name, work_dir, ... font_pixel]
AUTO_INIT:
basename_suffix: !dict_ref [*id_define, name, !product [5, 3, 2]] # SUNG30
# ↓ This is not correct.
# basename_suffix: !dict_ref [*id_define, name, !product [5, 3, 2]] # It will build by Deep-level. id_define is Deep-level: 2. So you must put it after 2. otherwise, it can't refer to the correct value.
With Symfony, its handling of yaml will indirectly allow you to nest yaml files. The trick is to make use of the parameters option. eg:
common.yml
parameters:
yaml_to_repeat:
option: "value"
foo:
- "bar"
- "baz"
config.yml
imports:
- { resource: common.yml }
whatever:
thing: "%yaml_to_repeat%"
other_thing: "%yaml_to_repeat%"
The result will be the same as:
whatever:
thing:
option: "value"
foo:
- "bar"
- "baz"
other_thing:
option: "value"
foo:
- "bar"
- "baz"
I think the solution used by #maxy-B looks great. However, it didn't succeed for me with nested inclusions. For example if config_1.yaml includes config_2.yaml, which includes config_3.yaml there was a problem with the loader. However, if you simply point the new loader class to itself on load, it works! Specifically, if we replace the old _include function with the very slightly modified version:
def _include(self, loader, node):
oldRoot = self.root
filename = os.path.join(self.root, loader.construct_scalar(node))
self.root = os.path.dirname(filename)
data = yaml.load(open(filename, 'r'), loader = IncludeLoader)
self.root = oldRoot
return data
Upon reflection I agree with the other comments, that nested loading is not appropriate for yaml in general as the input stream may not be a file, but it is very useful!
Based on previous posts:
class SimYamlLoader(yaml.SafeLoader):
'''
Simple custom yaml loader that supports include, e.g:
main.yaml:
- !include file1.yaml
- !include dir/file2.yaml
'''
def __init__(self, stream):
self.root = os.path.split(stream.name)[0]
super().__init__(stream)
def _include(loader, node):
filename = os.path.join(loader.root, loader.construct_scalar(node))
with open(filename, 'r') as f:
return yaml.load(f, SimYamlLoader)
SimYamlLoader.add_constructor('!include', _include)
# example:
with open('main.yaml', 'r') as f:
lists = yaml.load(f, SimYamlLoader)
# if you want to merge the lists
data = functools.reduce(
lambda x, y: x if y is None else {**x, **dict(y)}, lists, {})
# python 3.10+:lambda x, y: x if y is None else x | dict(y), lists, {})
Maybe this could inspire you, try to align to jbb conventions:
https://docs.openstack.org/infra/jenkins-job-builder/definition.html#inclusion-tags
- job:
name: test-job-include-raw-1
builders:
- shell:
!include-raw: include-raw001-hello-world.sh
Adding on #Joshbode's initial answer above , I modified the snippet a little to support UNIX style wild card patterns.
I haven't tested in windows though. I was facing an issue of splitting an array in a large yaml across multiple files for easy maintenance and was looking for a solution to refer multiple files within a same array of the base yaml. Hence the below solution. The solution does not support recursive reference. It only supports wildcards in a given directory level referenced in the base yaml.
import yaml
import os
import glob
# Base code taken from below link :-
# Ref:https://stackoverflow.com/a/9577670
class Loader(yaml.SafeLoader):
def __init__(self, stream):
self._root = os.path.split(stream.name)[0]
super(Loader, self).__init__(stream)
def include(self, node):
consolidated_result = None
filename = os.path.join(self._root, self.construct_scalar(node))
# Below section is modified for supporting UNIX wildcard patterns
filenames = glob.glob(filename)
# Just to ensure the order of files considered are predictable
# and easy to debug in case of errors.
filenames.sort()
for file in filenames:
with open(file, 'r') as f:
result = yaml.load(f, Loader)
if isinstance(result, list):
if not isinstance(consolidated_result, list):
consolidated_result = []
consolidated_result += result
elif isinstance(result, dict):
if not isinstance(consolidated_result, dict):
consolidated_result = {}
consolidated_result.update(result)
else:
consolidated_result = result
return consolidated_result
Loader.add_constructor('!include', Loader.include)
Usage
a:
!include a.yaml
b:
# All yamls included within b folder level will be consolidated
!include b/*.yaml
Combining other answers, here is a short solution without overloading Loader class and it works with any loader operating on files:
import json
from pathlib import Path
from typing import Any
import yaml
def yaml_include_constructor(loader: yaml.BaseLoader, node: yaml.Node) -> Any:
"""Include file referenced with !include node"""
# noinspection PyTypeChecker
fp = Path(loader.name).parent.joinpath(loader.construct_scalar(node)).resolve()
fe = fp.suffix.lstrip(".")
with open(fp, 'r') as f:
if fe in ("yaml", "yml"):
return yaml.load(f, type(loader))
elif fe in ("json", "jsn"):
return json.load(f)
else:
return f.read()
def main():
loader = yaml.SafeLoader # Works with any loader
loader.add_constructor("!include", yaml_include_constructor)
with open(...) as f:
yml = yaml.load(f, loader)
# noinspection PyTypeChecker is there to prevent PEP-check warning Expected type 'ScalarNode', got 'Node' instead when passing node: yaml.Node to loader.construct_scalar().
This solution fails if yaml.load input stream is not file stream, as loader.name does not contain the path in that case:
class Reader(object):
...
def __init__(self, stream):
...
if isinstance(stream, str):
self.name = "<unicode string>"
...
elif isinstance(stream, bytes):
self.name = "<byte string>"
...
else:
self.name = getattr(stream, 'name', "<file>")
...
In my use case, I know that only YAML files will be included, so the solution can be simplified further:
def yaml_include_constructor(loader: yaml.Loader, node: yaml.Node) -> Any:
"""Include YAML file referenced with !include node"""
with open(Path(loader.name).parent.joinpath(loader.construct_yaml_str(node)).resolve(), 'r') as f:
return yaml.load(f, type(loader))
Loader = yaml.SafeLoader # Works with any loader
Loader.add_constructor("!include", yaml_include_constructor)
def main():
with open(...) as f:
yml = yaml.load(f, Loader=Loader)
or even one-liner using lambda:
Loader = yaml.SafeLoader # Works with any loader
Loader.add_constructor("!include",
lambda l, n: yaml.load(Path(l.name).parent.joinpath(l.construct_scalar(n)).read_text(), type(l)))
Probably it was not supported when question was asked but you can import other YAML file into one:
imports: [/your_location_to_yaml_file/Util.area.yaml]
Though I don't have any online reference but this works for me.

Resources