Blank as last character in a monospace formatted text - asciidoc

Writing a documentation, I have some small text examples to give, which should be formatted as monospace text by using backticks. Theses text snippes contains always as last character a blank, which must not be omitted, as it is significant. Unfortunately Asciidoctor omits them.
For example `: ` should be rendered as : . But instead it is readered as :.
So how does the markup look to get a colon and a blank rendered as preformatted?

There's the built-in attribute {nbsp} for non-breaking space in Asciidoctor, which can be used for a blank.
So `:{nbsp}` is rendered as : .

Related

Why does Sphinx+MyST put spaces inside HTML tags in the generated output?

I'm new to MyST and Sphinx, so perhaps this has an obvious explanation, but it is not obvious to me. Whenever I use any construct in my Markdown source, such as bold facing or italics or even an html element such as <span>, the result in the formatted HTML output contains spaces inside the HTML tags. For example, an input such as
* **Barcode**: the barcode identifying the item
* **Title**: the title of the item.
* **Author**: the author of the item.
produces this:
Note the space before the :. Inspecting the HTML reveals that the bold-faced element contains unexpected space characters:
Note the spaces inside the <strong>...</strong>. Why is this happening? What am I doing wrong? More importantly, how do I make it stop?
I'm using Sphinx 3.4.3 with myst-parser 0.13.3. My conf.py defines extensions as 'myst_parser', 'sphinx.ext.autodoc', 'sphinx.ext.autosectionlabel', and 'sphinx.ext.napoleon'.

How do I link and embed a UTF8 encoded text file in an MS-Word document?

I would like to include the contents of a UTF8 text file in a MS Word document as a link. This works for an ansi encoded file using the field:
{INCLUDETEXT "path\file.txt" \c ansitext \* MERGEFORMAT}
Is there a directive akin to \c ansitext for UTF8 files? \c utf8 and \c utf8text do not appear to work.
If I do not give any directive, Word recognizes that the file is UTF8, but a dialog pops up requiring me to confirm this each time the file needs updating, which I want to avoid.
There is a directive ( \c Unicode ) but unfortunately using it does not actually eliminate the character encoding pop-up, even when the Unicode text starts with a BOM (Byte Order Mark), which are in any case discouraged by Unicode.
So although that answers the question actually asked, it doesn't solve the problem. Nor, according to the discussion in comments to the Question, would any of the following solve the problem for the OP, but they might help others.
According to the ISO 29500 standard that describes .docx documents, INCLUDETEXT is supposed to have an \e switch that lets you specify an encoding. But, according to Microsoft's standard document [MS-OI29500].pdf, Word ignores any \e switch.
As far as I am aware the only way to avoid that pop-up when the included text is in Unicode format (UTF-8) is to set a value in the Windows Registry that tells Word the default encoding for text files.
The problem with that is that that setting will affect what happens to all the text files opened by Word, whether through the file open dialog or an INCLUDETEXT.
To create the setting, you need to navigate to the following Registry location, e.g. for Word 2016/2019 it would be
HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\Word\Options
and for Word 2010 it would be
HKEY_CURRENT_USER\Software\Microsoft\Office\14.0\Word\Options
Then add a DWORD value called DefaultCPG and set its value to the code page you want to be the default. For UTF-8, that's decimal 65001.
If you have control over the format of the file to be included, you could consider using a format that wouldn't trigger the encoding pop-up. That leads to another set of problems, e.g. if you used HTML you would probably have to deal with HTML special characters such as & etc., whitespace, and RTL characters (which Word seems to reverse). But the following HTML "framework" is enough to insert a text chunk without additional paragraph marks and so on:
<html>
<meta charset="UTF-8">
<body>
<a name="x">your text</a>
</body>
</html>
In the INCLUDETEXT field, you then use the "x" to indicate the subset you want to include, e.g.
{INCLUDETEXT "path\file.htm" x \c HTML}
The HTML coding <a name="something"> is deprecated in HTML 5, but Word only understands the earlier HTML convention.

Are there cases of editing HTML output by Aspose.Word with CKEditor?

I am in trouble with the event that the sentence edited in CKEditor are not output to Word as a result of inheriting attributes of “-aw-import:ignore”.
A tag with this attribute is a tag that conveys the attribute of the original word when converting from html to word, and it is not output as word as a meta tag.
If the sentence entered in CKEditor inherits the attributes, it will not be output as word by mistake.
Aspose.Words writes this "-aw-import:ignore" only when it needs to make certain elements visible in HTML that would otherwise be collapsed and hidden by web browsers e.g. empty paragraphs, space sequences, etc.
Currently we mark only the following elements with “-aw-import:ignore”:
Sequences of spaces and non-breaking spaces that are used to simulate
padding on native list item (<li>) elements.
Non-breaking spaces that are used to prevent empty paragraphs from collapsing.
However, note that this list is not fixed and we may add more cases to it in the future.
Also, please note that Aspose.Words write   instead of because is not defined in XML. And by default Aspose.Words generate XHTML documents (i.e. HTML documents that comply with XML rules).
I work with Aspose as Developer Evangelist.
Please find below list of custom styles that Aspose.Words uses to save extra information in output HTML and usually this information is used for Aspose.Words-HTML-Aspose.Words round-trip. We will add description of these entities in documentation as soon as possible.
-aw-comment-author
-aw-comment-datetime
-aw-comment-initial
-aw-comment-start
-aw-comment-end
-aw-footnote-type
-aw-footnote-numberstyle
-aw-footnote-startnumber
-aw-footnote-isauto
-aw-headerfooter-type
-aw-bookmark-start
-aw-bookmark-end
-aw-different-first-page
-aw-tabstop-align
-aw-tabstop-pos
-aw-tabstop-leader
-aw-field-code
-aw-wrap-type
-aw-left-pos
-aw-top-pos
-aw-rel-hpos
-aw-rel-vpos
-aw-revision-author
-aw-revision-datetime

Ruby/Regex: Dealing with strings containing forward slashes and parentheses using gsub and regex

Hi I am using Watir to click through some links. I go to a page, click a link based on its text, and the do it again click a new link. I am locating the links based on their text (it is the only way I can based on their HTML) and need to match the text I pulled from the page to the link. The text that I get contains some extra text not part of the link, so I need to gsub it out. Here is my issue:
String: text = "Nuclear Launch Codes (Levels One/Two)"
Link: Nuclear Launch Codes (Levels One/Two) Blah Blah Blah
Because the links do not always have the exact text I need to locate them like so: /#{text}/
Problem is that returns "Nuclear Launch Codes (Levels One\/Two)"
I though I would gsub the 1st parenthesis and everything after, but I need to keep it because I can have Nuclear Launch Codes (Levels Four/Five)
Is there anyway to modify the string to match the link while ignoring the rest of the link text?
If I understand you correctly, try:
/#{Regexp.escape(text)}/
Or equivalently, if you prefer:
Regexp.new(Regexp.escape(text))
This will automatically escape parentheses, slashes and so on in the text so they are not treated as special regexp characters.

How to remove link's text from html table

I want to extract the plain text in the html table (that is, I don't want to grab the information including red arrow),
However, I tried to get the plain text by cell.text, it will get the unnecessary hyperlinks' text
"\n central tendency1 \n "
I expected that I can get
"central tendency"
So I tried cell.text.strip.downcase.gsub!(/\d/, ""),
However the gsub method will also clear the information in the green rectangle.
Is there any way to grab the text in html excepting the text of hyperlink ?
here's the html link I need to parse
You can remove all the links before converting to text with nokogiri:
table = doc.css(".page table")[0]
table.css("a").each(&:remove)
Edit: Alternatively, you can have a regexp that only removes numbers at the end of a string and if they're preceded by a letter, which seems like it may work in this specific case but cannot be relied upon to work in similar cases:
cell.text.strip.downcase.gsub(/(?<=\w)\d$/, "")

Resources