Using asciidoc, is it possible to tag a single word as source code instead of an entire line? - documentation-generation

I am currently writing a document describing the project I am working on, using asciidoc plugin for Intellij. Basically, I know that this works :
[source, java]
int i = 0;
Output will be :
int i = 0;
with the grey rectangle, as expected, to specify it's a sample code.
But can you, like StackOverlow does, apply this tag to a single word, without newline needed?
Basically, can you have the grey rectangle on a single word without newline needed, like in the following example :
"This value will be stored in Myobject" ?
Do not hesitate if it is not clear enough, this question is kind of hard to ask.

Inline code highlighting is possible by enclosing your text / words in single or double backticks:
This value will be stored in `Myobject`.
This value will be stored in ``Myobject``.
Result:
This value will be stored in Myobject.
See also the docs: https://asciidoctor.org/docs/asciidoc-syntax-quick-reference/#source-code

Related

Should arguments to a custom directive be escaped?

I have created a custom directive for a documentation project of mine which is built using Sphinx and reStructuredText. The directive is used like this:
.. xpath-try:: //xpath[#expression="here"]
This will render the XPath expression as a simple code block, but with the addition of a link that the user can click to execute the expression against a sample XML document and view the matches (example link, example rendered page).
My directive specifies that it does not have content, takes one mandatory argument (the xpath expression) and recognises a couple of options:
class XPathTryDirective(Directive):
has_content = False
required_arguments = 1
optional_arguments = 0
final_argument_whitespace = True
option_spec = {
'filename': directives.unchanged,
'ns_args': directives.unchanged,
}
def run(self):
xpath_expr = self.arguments[0]
node = xpath_try(xpath_expr, xpath_expr)
...
return [node]
Everything seems to be working exactly as intended except that if the XPath expression contains a * then the syntax highlighting in my editor (gVim) gets really messed up. If I escape the * with a backslash, then that makes my editor happy, but the backslash comes through in the output.
My questions are:
Are special characters in an argument to a directive supposed to be escaped?
If so, does the directive API provide a way to get the unescaped version?
Or is it working fine and the only problem is my editor is failing to highlight things correctly?
It may seem like a minor concern but as I'm a novice at rst, I find the highlighting to be very helpful.
Are special characters in an argument to a directive supposed to be escaped?
No, I think that there is no additional processing performed on arguments of rst directives. Which matches your observation: whatever you specify as an argument of the directive, you are able to get directly via self.arguments[0].
Or is it working fine and the only problem is my editor is failing to highlight things correctly?
Yes, this seems to be the case. Character * is used for emphasis/italics in rst and it gets more attention during syntax highlighting for some reason.
This means that the solution here would be to tweak or fix vim syntax file for restructuredtext.

Detect cursor is over comment or string in Scintilla NET

Is there any built in function in Scintilla.NET to detect the cursor is over a comment or string? I'd want to avoid the autocompletion to work when the user is typing comments or strings.
I'm aware I can scan the whole text backwards, searching for //, /* */ and pairs of " " but I'm almost sure there must be a built-in function to do that.
Thanks!
If you're using a lexer, you can get the style number at the current caret postion and check to see if it corresponds with a string or comment. The Scintilla API for retrieving the style number is:
SCI_GETSTYLEAT(int pos)
The Scintilla.NET documentation states there are already some convenience APIs for detecting comments:
ScintillaNET.Scintilla.PositionIsOnComment(System.Int32)
ScintillaNET.Scintilla.PositionIsOnComment(System.Int32,ScintillaNET.Lexer)
But there does not seem to be anything equivalent for strings - so it looks like you'll have to roll your own by using the above Scintilla message with one of the ScintillaNET.Scintilla.SendMessageDirect() methods.

Render non english characters in asciidoctor-pdf

I am trying to write documentation with asciidoctor-pdf and I need to use characters like : ă,â,î,ş,ţ. The pdf output is rendered but the mentioned characters are rendered empty. I am not sure how to handle the issue.
For example:
I wrote this code:
= Document Title
Doc Writer <doc#example.com>
:doctype: book
:source-highlighter: coderay
:listing-caption: Listing
// Uncomment next line to set page size (default is Letter)
//:pdf-page-size: A4
A simple http://asciidoc.org[AsciiDoc] document.
== Introducţie
A paragraph followed by a simple list with square bullets.
And the result was the word Introducţie rendered as Introduc ie and finally the error:
/usr/local/rvm/gems/ruby-2.2.2/gems/pdf-core-0.2.5/lib/pdf/core/pdf_object.rb:55: warning: regexp match /.../n against to UTF-8 string
Can be a system encoding configuration problem?
Do I need to set different encoding configuration in ruby?
Thank you.
I think that if you want to be sure, you can always use the decimal entity references form. For the latin small Letter T with cedilla it is: ţ
Check this table for the complete list:
List of Unicode characters
In addition, if you want to use this special char in a title, there was an issue with it:
Section id with characters outside of Windows-1252 encoding causes warning
It seems to be fixed now, but I did not verify it.
One of possible ways to write such special characters in titles is to declare them in preamble of your asciidoc document, for example,
:t-cedil: ţ
and to call it in the main text
== pass:normal[Test-{t-cedil}]
So your title will look like
Test-ţ

Preserving whitespace / line breaks with REXML

I'm using Ruby 1.9.3 and REXML to parse an XML document, make a few changes (additions/subtractions), then re-output the file. Within this file is a block that looks like this:
<someElement>
some.namespace.something1=somevalue1
some.namespace.something2=somevalue2
some.namespace.something3=somevalue3
</someElement>
The problem is that after re-writing the file, this block always ends up looking like this:
<someElement>
some.namespace.something1=somevalue1
some.namespace.something2=somevalue2 some.namespace.something3=somevalue3
</someElement>
The newline after the second value (but never the first!) has been lost and turned into a space. Later, some other code which I have no control or influence over will be reading this file and depending on those newlines to properly parse the content. Generally in this situation i'd use a CDATA to preserve the whitespace, but this isn't an option as the code that parses this data later is not expecting one - it's essential that the inner text of this element is preserved exactly as-is.
My read/write code looks like this:
xmlFile = File.open(myFile)
contents = xmlFile.read
xmlDoc = REXML::Document.new(contents, { :respect_whitespace => :all })
xmlFile.close
{perform some tasks}
out = ""
xmlDoc.write(out, 2)
File.open(filePath, "w"){|file| file.puts(out)}
I'm looking for a way to preserve the whitespace of text between elements when reading/writing a file in this manner using REXML. I've read a number of other questions here on stackoverflow on this subject, but none that quite replicate this scenario. Any ideas or suggestions are welcome.
I get correct behavior by removing the indent (second) parameter to Document.write():
#xmlDoc.write(out, 2)
xmlDoc.write(out)
That seems like a bug in Document.write() according to my reading of the docs, but if you don't really need to set the indentation, then leaving that off should solve yor problem.

Putting spaces back into a string of text with unreliable space information

I need to parse some text from pdfs but the pdf formatting results in extremely unreliable spacing. The result is that I have to ignore the spaces and have a continuous stream of non-space characters.
Any suggestions on how to parse the string and put spaces back into the string by guessing?
I'm using ruby. Or should I say I'musingruby?
Edit: I've pulled the text out using pdf-reader. Some of the pdf files are nicely formatted and some are not. An example of text mixed with positioning:
.7aspe-5.5cts-715.1o0.6f-708.5f-0.4aces-721.4that-716.3are-720.0i-1.8mportant-716.3in-713.9soc-5.5i-1.8alcommunica6.6tion6.3.-711.6Althoug6.3h-708.1m-1.9od6.3els-709.3o6.4f-702.8f5.4ace-707.9proc6.6essing-708.2haveproposed-611.2ways-615.5to-614.7deal-613.2with-613.0these-613.9diff10.4erent-613.7tasks,-611.9it-617.1remainsunclear-448.0how-450.7these-443.2mechanisms-451.7might-446.7be-447.7implemented-447.2in-450.3visualOne-418.9model-418.8of-417.3human-416.4face-421.9processing-417.5proposes-422.7that-419.8informa-tion-584.5is-578.0processed-586.1in-583.1specialised-584.7modules-577.0(Breen-584.4et-582.9al.,-582.32002;Bruce-382.1and-384.0Y92.0oung,-380.21986;-379.2Haxby-379.9et-380.5al.,-
and if I print just string data (I added returns at the end of each line to keep it from
messing up the layout here:
'Distinctrepresentationsforfacialidentityandchangeableaspectsoffacesinthehumantemporal
lobeTimothyJ.Andrews*andMichaelP.EwbankDepartmentofPsychology,WolfsonResearchInstitute,
UniversityofDurham,UKReceived23December2003;revised26March2004;accepted27July2004Availab
leonline14October2004Theneuralsystemunderlyingfaceperceptionmustrepresenttheunchanging
featuresofafacethatspecifyidentity,aswellasthechangeableaspectsofafacethatfacilitates
ocialcommunication.However,thewayinformationaboutfacesisrepresentedinthebrainremainsc
ontroversial.Inthisstudy,weusedfMRadaptation(thereductioninfMRIactivitythatfollowsthe
repeatedpresentationofidenticalimages)toaskhowdifferentface-andobject-selectiveregionsofvisualcortexcontributetospecificaspectsoffaceperception'
The data is spit out by callbacks so if I print each string as it is returned it looks like this:
'The
-571.3
neural
-573.7
system
-577.4
underly
13.9
ing
-577.2
face
-573.0
perc
13.7
eption
-574.9
must
-572.1
repr
20.8
esent
-577.0
the
unchangin
14.4
g
-538.5
featur
16.5
es
-529.5
of
-536.6
a
-531.4
face
'
On examination it looks like the true spaces are large negative numbers < -300 and the false spaces are much smaller positive numbers. Thanks guys. Just getting to the point where i am asking the question clearly helped me answer it!
Hmmmm... I'd have to say that guessing is never a good idea. Looking at the problem root cause and solving that is the answer, anything else is a kludge.
If the spacing is unreliable from the PDF, how is it unreliable? The PDF viewer needs to be able to reliably space the text so the data is there somewhere, you just need to find it.
EDIT following comment:
The idea of parsing the file using a dictionary (your only other option really, apart from randomly inserting spaces and hoping for the best) and inserting spaces at identified word boundaries (a real problem when dealing with punctuation, plurals that don't alter the base word i.e. plural, etc) would, I believe, be a much greater programming challenge than correctly parsing the PDF in the first place. After all, PDF is clearly defined whereas English is somewhat wooly.
Why not look down the route of existing solutions like ps2ascii in linux, call the function from your Ruby and pick up the result.
PDF doesn't only store spaces as space characters, but also uses layout commands for spacing (so it doesn't print a space, but moves the "pen" to the right). Perhaps you should have a look at the PDF reference (the big PDF on the bottom of the site), Chapter 9 "Text" should be what you're looking for.
EDIT: After reading your comment to Lazarus' answer, this doesn't seem to be what you're looking for. I think you should try to get a word list from somewhere and try to split your text using it. A good strategy would be to do that using recursion, because for example:
"meandyou"
The first word could be "me" or "mean", but if you try "mean", "dyou" doesn't make sense, so it will be "me", same for the next word that could be "a" or "an" or "and", only "and" makes sense.
If it were me I'd go back to the source PDFs and try a different method of extracting the text, such as iText (for Java) or maybe some kind of PDF-to-HTML to text conversion software method.

Resources