How do i get Asciidoc(tor) to generate eg. a nice overall function description out of several code comments and some code, including the function signature, without butchering my code with tags?
AFAIK Asciidoc only supports external includes in its Asciidoc file via surrounding tags in the code like
# tag::mytag[]
<CODE TO INCLUDE HERE>
# end::mytag[]
which would be quite noisy around every describing comment within a single function body and around every function signature.
Maybe there is an exotic, less verbose way like marking the single line comments like #! and single line tags that tells Asciidoctor to read only a single line relative to these tags.
Consider this tiny example.
def uber_func(to_uber: str) -> str:
"""
This is an overall description. Delivers some context.
"""
# Trivial code here
# To uber means <include code below>
result = to_uber + " IS SOOO " + to_uber + "!!!"
# Trivial code here
# Function only returns upper case.
return result.upper()
My naive Asciidoc approach to include all meaningfull comments, the docstring and the function signature from the code above would look awefull, plus, Asciidoc doesn't recognize and remove comment marks, so the resulting documentation might not be so pretty too.
Instead of this very ugly
# tag::uber_func[]
def uber_func(to_uber: str) -> str:
"""
This is an overall description. Delivers some context.
"""
# end::uber_func[]
# Trivial code here
# tag::uber_func[]
# To uber means
result = to_uber + " IS SOOO " + to_uber + "!!!"
# end::uber_func[]
# Trivial code here
# tag::uber_func[]
# Function only returns upper case.
# end::uber_func[]
return result.upper()
I would like to use some thing like (pseudo):
def uber_func(to_uber: str) -> str:
# tag::uber_func[readline:-1,ignore-comment-marks,doc-comment:#!]
#! This is an overall description. Delivers some context.
# Trivial code here
#! To uber means
# tag::uber_func[readline:+1]
result = to_uber + " IS SOOO " + to_uber + "!!!"
# Trivial code here
#! Function only returns upper case.
return result.upper()
# end::uber_func[]
I think the general issue is, that Asciidoc is merely a text formatting tool, which means, if i want it to generate a structured documentation mostly from my code, i would need to provide this structe in my code and in my .adoc file.
Documentation generators like Doxygen on the other side recognize this structure and the documenting comments automatically.
I value this feature very much, that some generators allow you to write code and pretty documentation side by side, which lowers the overall effort alot.
If Asciidoc doesn't allow me to do this in a reasonable way, i will have look for something else.
I think you would have to write a scraper that puts the comments into a structure, then pull that structure into your AsciiDoc. This way the comments can be internally formatted with AsciiDoc markup, and you can output it in Asciidoctor-generated documents, but you won't need Asciidoctor to read the source files directly.
I would try a system of using one # for non-publishing comments and ## for ones you wish to publish, or vice versa, or append a # to the ones that are for docs publishing. As well as those denoted by the """ notation. Then your scraper can read the block name (uber_func or whatever portion is important) and then scrape the keeper comments and all the literal code, arranging them all in a file. The below file has seen most comments tagged as text, non-keeper comments dropped, and non-comment content as code:
# tag::function__uber_func[]
# tag::function__uber_func_form[]
uber_func(to_uber: str) -> str:
# end::function__uber_func_form[]
# tag::function__uber_func_desc[]
This is an overall description. Delivers some context.
# end::function__uber_func_desc[]
# tag::function__uber_func_body[]
# tag::function__uber_func_text[]
To uber means
# end::function__uber_func_text[]
# tag::function__uber_func_code[]
----
result = to_uber + " IS SOOO " + to_uber + "!!!"
----
# end::function__uber_func_code[]
# tag::function__uber_func_text[]
Function only returns upper case.
# end::function__uber_func_text[]
# tag::function__uber_func_code[]
----
return result.upper()
----
# end::function__uber_func_code[]
# end::function__uber_func[]
I know this looks hideous, but it is super useful to an AsciiDoc template. For instance, use just:
uber_func::
include::includes/api-stuff.adoc[tags="function__uber_func_form"]
+
include::includes/api-stuff.adoc[tags="function__uber_func_desc"]
+
include::includes/api-stuff.adoc[tags="function__uber_func_body"]
This would be even better if you parse it to a data format (like JSON or YAML) and then press it into AsciiDoc template dynamically. But you could maintain something like the above if it was not too massive. At a certain size (20+ such records?) you need an intermediary datasource (an ephemeral data file produced by the scraping), and at a certain larger scale (> 100 code blocks/endpoints?), you likely need a system that specializes in API documentation, such as Doxygen, et al.
Related
I am trying to document a Python class that has some class-level member variables, but I could not get it appropriately documented using reST/Sphinx.
The code is this:
class OSM:
"""Some blah and examples"""
url = 'http://overpass-api.de/api/interpreter' # URL of the Overpass API
sleep_time = 10 # pause between successive queries when assembling OSM dataset
But I get this output (see the green circled area, where I would like to have some text describing both variables, as above).
I apologize for the blurring, but part of the example is somewhat sensitive
You have several options to document class level variables.
Put a comment that starts with #: before the variable, or on the same line. (Uses only autodoc.)
Perhaps the easiest choice. If needed you can customize the value using :annotation: option. If you want to type-hint the value use #: type:.
Put a docstring after the variable.
Useful if the variable requires extensive documentation.
For module data members and class attributes, documentation can either be put into a comment with special formatting (using a #: to start the comment instead of just #), or in a docstring after the definition. Comments need to be either on a line of their own before the definition, or immediately after the assignment on the same line. The latter form is restricted to one line only.
Document the variable in the class docstring. (Uses sphinx-napoleon extension, shown in the example.)
This has the drawback that the variable's value will be omitted. Since it's a class level variable your IDE's static type checker may complain if you don't prefix the variable with cls. or class_name.. The distinction is however convenient because instance variables can also be documented in the class docstring.
The following example shows all three options. The .rst has additional complexity to illustrate the neededautodoc functionalities. Type hints were included in all cases but can also be omitted.
class OSM:
"""Some blah and examples"""
#: str: URL of the Overpass API.
url = 'http://overpass-api.de/api/interpreter'
#: int: pause between successive queries when assembling OSM dataset.
sleep_time = 10
class OSM2:
"""Some blah and examples.
Attributes:
cls.url (str): URL of the Overpass API.
"""
url = 'http://overpass-api.de/api/interpreter'
sleep_time = 10
"""str: Docstring of sleep_time after the variable."""
Corresponding .rst
OSM module
==========
.. automodule:: OSM_module
:members:
:exclude-members: OSM2
.. autoclass:: OSM2
:no-undoc-members:
:exclude-members: sleep_time
.. autoattribute:: sleep_time
:annotation: = "If you want to specify a different value from the source code."
The result:
This also works, if you're ok with suppressing a silly linter rule.
class OSM2:
sleep_time = 10; """str: Inline docstring of sleep_time after the variable."""
class torch.FloatStorage[source]
byte()
Casts this storage to byte type
char()
Casts this storage to char type
Im trying to get some documentation done, i have managed to to get the format like the one shown above, But im not sure how to give that link of source code which is at the end of that function!
The link takes the person to the file which contains the code,But im not sure how to do it,
This is achieved thanks to one of the builtin sphinx extension.
The one you are looking for in spinx.ext.viewcode. To enable it, add the string 'sphinx.ext.viewcode' to the list extensions in your conf.py file.
In summary, you should see something like that in conf.py
extensions = [
# other extensions that you might already use
# ...
'sphinx.ext.viewcode',
]
I'd recommend looking at the linkcode extension too. Allows you to build a full HTTP link to the code on GitHub or such like. This is sometimes a better option that including the code within the documentation itself. (E.g. may have stronger permission on it than the docs themselves.)
You write a little helper function in your conf.py file, and it does the rest.
What I really like about linkcode is that it creates links for enums, enum values, and data elements, which I could not get to be linked with viewcode.
I extended the link building code to use #:~:text= to cause the linked-to page to scroll to the text. Not perfect, as it will only scroll to the first instance, which may not always be correct, but likely 80~90% of the time it will be.
from urllib.parse import quote
def linkcode_resolve(domain, info):
# print(f"domain={domain}, info={info}")
if domain != 'py':
return None
if not info['module']:
return None
filename = quote(info['module'].replace('.', '/'))
if not filename.startswith("tests"):
filename = "src/" + filename
if "fullname" in info:
anchor = info["fullname"]
anchor = "#:~:text=" + quote(anchor.split(".")[-1])
else:
anchor = ""
# github
result = "https://<github>/<user>/<repo>/blob/master/%s.py%s" % (filename, anchor)
# print(result)
return result
While building a blog using django I realized that it would be extremely practical to store the text of an article and all the related informations (title, author, etc...) together in a human-readable file format, and then charge those files on the database using a simple script.
Now that said, YAML caught my attention for his readability and ease of use, the only downside of the YAML syntax is the indentation:
---
title: Title of the article
author: Somebody
# Other stuffs here ...
text:|
This is the text of the article. I can write whatever I want
but I need to be careful with the indentation...and this is a
bit boring.
---
I believe that's not the best solution (especially if the files are going to be written by casual users). A format like this one could be much better
---
title: Title of the article
author: Somebody
# Other stuffs here ...
---
Here there is the text of the article, it is not valid YAML but
just plain text. Here I could put **Markdown** or <html>...or whatever
I want...
Is there any solution? Preferably using python.
Other file formats propositions are welcome as well!
Unfortunately this is not possible, what one would think could work is using | for a single scalar in the separate document:
import ruamel.yaml
yaml_str = """\
title: Title of the article
author: Somebody
---
|
Here there is the text of the article, it is not valid YAML but
just plain text. Here I could put **Markdown** or <html>...or whatever
I want...
"""
for d in ruamel.yaml.load_all(yaml_str):
print(d)
print('-----')
but it doesn't because | is the block indentation indicator. And although at the top level an indentation of 0 (zero) would easily work, ruamel.yaml (and PyYAML) don't allow this.
It is however easy to parse this yourself, which has the advantage over using the front matter package that you can use YAML 1.2 and are not restricted to using YAML 1.1 because of frontmaker using the PyYAML. Also note that I used the more appropriate end of document marker ... to separate YAML from the markdown:
import ruamel.yaml
combined_str = """\
title: Title of the article
author: Somebody
...
Here there is the text of the article, it is not valid YAML but
just plain text. Here I could put **Markdown** or <html>...or whatever
I want...
"""
with open('test.yaml', 'w') as fp:
fp.write(combined_str)
data = None
lines = []
yaml_str = ""
with open('test.yaml') as fp:
for line in fp:
if data is not None:
lines.append(line)
continue
if line == '...\n':
data = ruamel.yaml.round_trip_load(yaml_str)
continue
yaml_str += line
print(data['author'])
print(lines[2])
which gives:
Somebody
I want...
(the round_trip_load allows dumping with preservation of comments, anchor names etc).
I found Front Matter does exactly what I want to do.
There is also a python package.
I have a file of a few hundred megabytes containing strings:
str1 x1 x2\n
str2 xx1 xx2\n
str3 xxx1 xxx2\n
str4 xxxx1 xxxx2\n
str5 xxxxx1 xxxxx2
where x1 and x2 are some numbers. How big the numbers x(...x)1 and x(...x)2 are is unknown.
Each line has in "\n" in it. I have a list of strings str2 and str4.
I want to find the corresponding numbers for those strings.
What I'm doing is pretty straightforward (and, probably, not efficient performance-wise):
source_str = read_from_file() # source_str contains all file content of a few hundred Megabyte
str_to_find = [str2, str4]
res = []
str_to_find.each do |x|
index = source_str.index(x)
if index
a = source_str[index .. index + x.length] # a contains "str2"
#?? how do I "select" xx1 and xx2 ??
# and finally...
# res << num1
# res << num2
end
end
Note that I can't apply source_str.split("\n") due to the error ArgumentError: invalid byte sequence in UTF-8 and I can't fix it by changing a file in any way. The file can't be changed.
You want to avoid reading a hundred of megabytes into memory, as well as scanning them repeatedly. This has the potential of taking forever, while clogging the machine's available memory.
Try to re-frame the problem, so you can treat the large input file as a stream, so instead of asking for each string you want to find "does it exist in my file?", try asking for each line in the file "does it contain a string I am looking for?".
str_to_find = [str2, str4]
numbers = []
File.foreach('foo.txt') do |li|
columns = li.split
numbers += columns[2] if str_to_find.include?(columns.shift)
end
Also, read again #theTinMan's answer regarding the file encoding - what he is suggesting is that you may be able fine-tune the reading of the file to avoid the error, without changing the file itself.
If you have a very large number of items in str_to_find, I'd suggest that you use a Set instead of an Array for better performance:
str_to_find = [str1, str2, ... str5000].to_set
If you want to find a line in a text file, which it sounds like you are reading, then read the file line-by-line.
The IO class has the foreach method, which makes it easy to read a file line-by-line, which also makes it possible to easily locate lines that contain the particular string you want to find.
If you had your source input file saved as "foo.txt", you could read it using something like:
str2 = 'some value'
str4 = 'some other value'
numbers = []
File.foreach('foo.txt') do |li|
numbers << li.split[2] if li[str2] || li[str2]
end
At the end of the loop numbers should contain the numbers you want.
You say you're getting an encoding error, but you don't give us any clue what the characters are that are causing it. Without that information we can't really help you fix that problem except to say you need to tell Ruby what the file encoding is. You can do that when the file is opened; You'd properly set the open_args to whatever the encoding should be. Odds are good it should be an encoding of ISO-8859-1 or Win-1252 since those are very common with Windows machines.
I have to find a list of values, iterating through each line doesn't seem sensible because I'd have to iterate for each value over and over again.
We can only work with the examples you give us. Since that wasn't clearly explained in your question you got an answer based on what was initially said.
Ruby's Regexp has the tools necessary to make this work, but to do it correctly requires taking advantage of Perl's Regexp::Assemble library, since Ruby has nothing close to it. See "Is there an efficient way to perform hundreds of text substitutions in ruby?" for more information.
Note that this will allow you to scan through a huge string in memory, however that is still not a good way to process what you are talking about. I'd use a database instead, which are designed for this sort of task.
I am creating a grammar corrector app. You input slang and it returns a formal English correction. All the slang words that are supported are kept inside arrays. I created a method that looks like this for when a slang is entered that is not supported.
def addtodic(lingo)
print"\nCorrection not supported. Please type a synonym to add '#{lingo}' the dictionary: "
syn = gets.chomp
if $hello.include?("#{syn}")
$hello.unshift(lingo)
puts"\nCorrection: Hello.\n"
elsif $howru.include?("#{syn}")
$howru.unshift(lingo)
puts"\nCorrection: Hello. How are you?\n"
end
end
This works, but only until the application is closed. how can I make this persist so that it amends the source code as well? If I cannot, how would I go about creating an external file that holds all of the cases and referencing that in my source code?
You will want to load and store your arrays in a external file.
How to store arrays in a file in ruby? is relevant to what you are trying to do.
Short example
Suppose you have a file that has one slang phrase per line
% cat hello.txt
hi
hey
yo dawg
The following script will read the file into an array, add a term, then write the array to a file again.
# Read the file ($/ is record separator)
$hello = File.read('hello.txt').split $/
# Add a term
$hello.unshift 'hallo'
# Write file back to original location
open('hello.txt', 'w') { |f| f.puts $hello.join $/ }
The file will now contain an extra line with the term you just added.
% cat hello.txt
hallo
hi
hey
yo dawg
This is just one simple way of storing an array to file. Check the link at the beginning of this answer for other ways (which will work better for less trivial examples).