Asciidoc: How do I preserve both spacing and formatting - asciidoc

I need to produce the following output in asciidoc format, exactly as shown, including the spacing (it's VERY important that the columns line up), the font colouring and formatting and highlighting, and keep the asterisks at the start of the line.
As a code block, I can get the spacing OK, keep all the special characters, but loose all the formatting. And the formatting is as important as the text, if not here precisely, but in many other places. No sense in having copy referring to the highlighted element in a block of text that is not capable of showing highlighting.
aciextrtr# show ip bgp vrf BGP-FXX
...<output omitted>...
Network Next Hop Metric LocPrf Weight Path
*>l1.F.0.XX/32 0.0.0.0 100 32768 i
*>l10.FXX.3.0/24 0.0.0.0 100 32768 i
*>l10.FXX.20.0/24 0.0.0.0 100 32768 i
Curiously, the BEST I've been able to discover is the [verse] block , but it has ridiculously wide spacing between lines, and I had to escape a few special characters (* and the ellipses).
As a kludge, perhaps there is a way of re-defining [verse] so that it doesn't leave huge gaps between lines, but hopefully there is a way defining my own model based on verse that does what I want.
Curiously, Visual Studio code renders it more accurately than https://asciidoclive.com/edit/scratch/1 - I'd paste the source here but of course it won't render properly in this environment (strips back-ticks etc)
TIA

You can use a literal block to specify a monospace font and keep the lines together. Use the subs attribute to re-enable inline formatting.
For example:
[subs="+quotes"]
----
... <ellipses are not substituted> ...
*>l1.[.red]##F##.0.XX/32 #0.0.0.0# 100 32768
----

Related

How can I refactor an existing source code file to normalize all use of tab?

Sometimes I find myself editing a C source file which sees both use of tab as four spaces, and regular tab.
Is there any tool that attempts to parse the file and "normalize" this, i.e. convert all occurrences of four spaces to regular tab, or all occurrences of tab to four spaces, to keep it consistent?
I assume something like this can be done even with just a simple vim one-liner?
There's :retab and :retab! which can help, but there are caveats.
It's easier if you're using spaces for indentation, then just set 'expandtab' and execute :retab, then all your tabs will be converted to spaces at the appropriate tab stops (which default to 8.) That's easy and there are no traps in this method!
If you want to use 4 space indentation, then keep 'expandtab' enabled and set 'softtabstop' to 4. (Avoid modifying the 'tabstop' option, it should always stay at 8.)
If you want to do the inverse and convert to tabs instead, you could set 'noexpandtab' and then use :retab! (which will also look at sequences of spaces and try to convert them back to tabs.) The main problem with this approach is that it won't just consider indentation for conversion, but also sequences of spaces in the middle of lines, which can cause the operation to affect strings inside your code, which would be highly undesirable.
Perhaps a better approach for replacing spaces with tabs for indentation is to use the following substitute command:
:%s#^\s\+#\=repeat("\t", indent('.') / &tabstop).repeat(" ", indent('.') % &tabstop)#
Yeah it's a mouthful... It's matching whitespace at the beginning of the lines, then using the indent() function to find the total indentation (that function calculates indentation taking tab stops in consideration), then dividing that by the 'tabstop' to decide how many tabs and how many spaces a specific line needs.
If this command works for you, you might want to consider adding a mapping or :command for it, to keep it handy. For example:
command! -range=% Retab <line1>,<line2>s#^\s\+#\=repeat("\t", indent('.') / &tabstop).repeat(" ", indent('.') % &tabstop)
This also allows you to "Retab" a range of the file, including one you select with a visual selection.
Finally, one last alternative to :retab is that to ask Vim to "reformat" your code completely, using the = command, which will use the current 'indentexpr' or other indentation configurations such as 'cindent' to completely reindent the block. That typically respects your 'noexpandtab' and 'smarttabstop' options, so it use tabs and spaces for indentation consistently. The downside of this approach is that it will completely reformat your code, including changing indentation in places. The upside is that it typically has a semantic understanding of the language and will be able to take that in consideration when reindenting the code block.

GS1-128 barcode with ZPL does not put the AI in ()

i was expecting this command
^FO15,240^BY3,2:1^BCN,100,Y,N,Y,^FD>:>842011118888^FS
to generate a
(420) 11118888
interpretation line, instead it generates
~n42011118888
anyone have idea how to generate the expected output?
TIA!
Joey
If the firmware is up to date, D mode can be used.
^BCo,h,f,g,e,m
^XA
^FO15,240
^BY3,2:1
^BCN,100,Y,N,Y,D
^FD(420)11118888^FS
^XZ
D = UCC/EAN Mode (x.11.x and newer firmware)
This allows dealing with UCC/EAN with and without chained
application identifiers. The code starts in the appropriate subset
followed by FNC1 to indicate a UCC/EAN 128 bar code. The printer
automatically strips out parentheses and spaces for encoding, but
prints them in the human-readable section. The printer automatically
determines if a check digit is required, calculate it, and print it.
Automatically sizes the human readable.
The ^BC command's "interpretation line" feature does not support auto-insertion of the parentheses. (I think it's safe to assume this is partly because it has no way of determining what your data identifier is by just looking at the data provided - it could be 420, could be 4, could be any other portion of the data starting from the first character.)
My recommendation is that you create a separate text field which handles the logic for the parentheses, and place it just above or below the barcode itself. This is the way I've always approached these in the past - I prefer this method because I have direct control over the font, font size, and formatting of the interpretation line.

Why do xterm's docs call ' ' a control character?

I'm writing a parser for ANSI escape codes using xterm's docs as a guideline. Under the list of single character functions, they include:
SP Space.
Now, for most of the single character functions, I understand the purpose: BEL, for example, is going to require some special help from your terminal emulator to process, and TAB is likely to be involved in autocompletion rather than being printed as a normal character.
I can't imagine any situation where SP would need to be treated as anything other than a literal space character, so I'm considering dropping the SP control code from my parser. Would I risk anything by doing so? Is there a use for SP in the console that I'm not aware of?
Space isn't a "control" character. In ASCII, the control characters are codes 0 to 31 (space is 32), and 127 (DEL). The POSIX locale uses the same data, not coincidentally.
They are called control characters, because they allow the host (computer) to control (tell) the terminal to perform functions rather than simply print text:
A space is actually "printing" in this regard because (like all of the other ASCII characters), it advances the carriage position by one column. In the C language of course, a space is treated as non-graphic, which is a different shade of meaning. "Graphic" characters are visible.
In contrast, a TAB requires the terminal to do something special: move the carriage position by an amount that depends on where it happens to be at the moment.
"Carriage position" of course refers to printing terminals (such as those on which Unix was originally developed), or typewriters. The "carriage" (noun) is the mechanism which moved left/right to allow the terminal (or typewriter) to print at different positions along the line. "Carriage controls" in turn refer to the control characters which move the carriage left and right (other than as a side-effect of printing individual characters). It's obvious if you have ever used a typewriter...
In XTerm Control Sequences, SP is shown for clarity (to be able to reuse that name in other places, e.g., where a 32 is actually part of a control sequence). That wording was added in patch #25 to support the description of the group of controls S7C1T, S8C1T, and DECSCL — setting ANSI conformance level, none of which fall within ECMA-48.
A quick check shows 8 control sequences containing a space (which happens to be a valid intermediate byte, per ECMA-48, just like semicolon, which is visually distinct and does not require a name in the control sequences descriptions — you might find the PDF clearer than the HTML). None of those sequences are used in the obscure sense referred to in ECMA-48:
ECMA 48 section 6.1.1 is talking about overstriking one character on another to render a mixture of the two. This is very rare in video terminals, but assumed in most printing devices. The closest to this in a terminfo description might be ul (underline character overstrikes), and reviewing the few possibilities, some of those appear to be incorrect. xterm doesn't do that.
ECMA 48 section 8.3.140 in its comment about "character escapement" is referring to proportional fonts or variable-width character pitch (again, very rare in video terminals, but implemented in some printing devices). There are a few terminfo capabilities referring to pitch, and all of those are marked as "printer support". ncurses has one entry (att5310) using the cpi capability.
So: if you are referring to xterm's documentation, it is unlikely that you intend your parser for any other use than for video terminals. But if you intend it to be more general, then reading about printers would be a good way to improve your application.
ECMA 48 sheds some light on this.
tl;dr:
Some terminals may choose to differentiate between erased characters and space characters.
In terminals with variable width fonts, SP can be considered a control character that introduces a configurable amount of horizontal spacing.
Neither is really relevant today, so you're entirely free to just treat as just another character.
ECMA 48 section 6.1.1:
Depending on the implementation, there may or may not be a distinction between a character position in
the erased state and a character position imaging SPACE
ECMA 48 section 8.3.140:
SSW is used to establish for subsequent text the character escapement associated with the character
SPACE. The established escapement remains in effect until the next occurrence of SSW in the data
stream or until it is reset to the default value by a subsequent occurrence of CARRIAGE RETURN/LINE
FEED (CR/LF), CARRIAGE RETURN/FORM FEED (CR/FF), or of NEXT LINE (NEL) in the data
stream, see annex C.

Terminal overwriting same line when too long

In my terminal, when I'm typing over the end of a line, rather than start a new line, my new characters overwrite the beginning of the same line.
I have seen many StackOverflow questions on this topic, but none of them have helped me. Most have something to do with improperly bracketed colors, but as far as I can tell, my PS1 looks fine.
Here it is below, generated using bash -x:
PS1='\[\033[01;32m\]\w \[\033[1;36m\]☔︎ \[\033[00m\] '
Yes, that is in fact an umbrella with rain; I have my Bash prompt update with the weather using a script I wrote.
EDIT:
My BashWeather script actually can put any one of a few weather characters, so it would be great if we could solve for all of these, or come up with some other solution:
☂☃☽☀︎☔︎
If the umbrella with rain is particularly problematic, I can change that to the regular umbrella without issue.
The symbol being printed ☔︎ consists of two Unicode codepoints: U+2614 (UMBRELLA WITH RAIN DROPS) and U+FE0E (VARIATION SELECTOR-15). The second of these is a zero-length qualifier, which is intended to enforce "text style", as opposed to "emoji style", on the preceding symbol. If you're viewing this with a font can distinguish the two styles, the following might be the emoji version: ☔︉ Otherwise, you can see a table of text and emoji variants in Working Group document N4182 (the umbrella is near the top of page 3).
In theory, U+FE0E should be recognized as a zero-length codepoint, like any other combining character. However, it will not hurt to surround the variant selector in PS1 with the "non-printing" escape sequence \[…\].
It's a bit awkward to paste an isolated variant selector directly into a file, so I'd recommend using bash's unicode-escape feature:
WEATHERCHAR=$'\u2614\[\ufe0e\]'
#...
PS1=...${WEATHERCHAR}...
Note that \[ and \] are interpreted before parameter expansion, so WEATHERCHAR as defined above cannot be dynamically inserted into the prompt. An alternative would be to make the dynamically-inserted character just the $'\u2614' umbrella (or whatever), and insert the $'\[\ufe0e\]' in the prompt template along with the terminal color codes, etc.
Of course, it is entirely possible that the variant indicator isn't needed at all. It certainly makes no useful difference on my Ubuntu system, where the terminal font I use (Deja Vu Sans Mono) renders both variants with a box around the umbrella, which is simply distracting, while the fonts used in my browser seem to render the umbrella identically with and without variants. But YMMV.
This almost works for me, so should probably not be considered a complete solution. This is a stripped down prompt that consists of only an umbrella and a space:
PS1='\342\230\[\224\357\270\] '
I use the octal escapes for the UTF-8 encoding of the umbrella character, putting the last three bytes inside \[...\] so that bash doesn't think they take up space on the screen. I initially put the last four bytes in, but at least in my terminal, there is a display error where the umbrella is followed by an extra character (the question-mark-in-a-diamond glyph for missing characters), so the umbrella really does occupy two spaces.
This could be an issue with bash and 5-byte UTF-8 sequences; using a character with a 4-byte UTF-encoding poses no problem:
# U+10400 DESERET CAPITAL LETTER LONG I
# (looks like a lowercase delta)
PS1='\360\220\220\200 '

Least used delimiter character in normal text < ASCII 128

For coding reasons which would horrify you (I'm too embarrassed to say), I need to store a number of text items in a single string.
I will delimit them using a character.
Which character is best to use for this, i.e. which character is the least likely to appear in the text? Must be printable and probably less than 128 in ASCII to avoid locale issues.
I would choose "Unit Separator" ASCII code "US": ASCII 31 (0x1F)
In the old, old days, most things were done serially, without random access. This meant that a few control codes were embedded into ASCII.
ASCII 28 (0x1C) File Separator - Used to indicate separation between files on a data input stream.
ASCII 29 (0x1D) Group Separator - Used to indicate separation between tables on a data input stream (called groups back then).
ASCII 30 (0x1E) Record Separator - Used to indicate separation between records within a table (within a group). These roughly map to a tuple in modern nomenclature.
ASCII 31 (0x1F) Unit Separator - Used to indicate separation between units within a record. The roughly map to fields in modern nomenclature.
Unit Separator is in ASCII, and there is Unicode support for displaying it (typically a "us" in the same glyph) but many fonts don't display it.
If you must display it, I would recommend displaying it in-application, after it was parsed into fields.
Assuming for some embarrassing reason you can't use CSV I'd say go with the data. Take some sample data, and do a simple character count for each value 0-127. Choose one of the ones which doesn't occur. If there is too much choice get a bigger data set. It won't take much time to write, and you'll get the answer best for you.
The answer will be different for different problem domains, so | (pipe) is common in shell scripts, ^ is common in math formulae, and the same is probably true for most other characters.
I personally think I'd go for | (pipe) if given a choice but going with real data is safest.
And whatever you do, make sure you've worked out an escaping scheme!
When using different languages, this symbol: ¬
proved to be the best. However I'm still testing.
Probably | or ^ or ~ you could also combine two characters
You said "printable", but that can include characters such as a tab (0x09) or form feed (0x0c). I almost always choose tabs rather than commas for delimited files, since commas can sometimes appear in text.
(Interestingly enough the ascii table has characters GS (0x1D), RS (0x1E), and US (0x1F) for group, record, and unit separators, whatever those are/were.)
If by "printable" you mean a character that a user could recognize and easily type in, I would go for the pipe | symbol first, with a few other weird characters (# or ~ or ^ or \, or backtick which I can't seem to enter here) as a possibility. These characters +=!$%&*()-'":;<>,.?/ seem like they would be more likely to occur in user input. As for underscore _ and hash # and the brackets {}[] I don't know.
How about you use a CSV style format? Characters can be escaped in a standard CSV format, and there's already a lot of parsers already written.
Can you use a pipe symbol? That's usually the next most common delimiter after comma or tab delimited strings. It's unlikely most text would contain a pipe, and ord('|') returns 124 for me, so that seems to fit your requirements.
For fast escaping I use stuff like this:
say you want to concatinate str1, str2 and str3
what I do is:
delimitedStr=str1.Replace("#","#a").Replace("|","#p")+"|"+str2.Replace("#","#a").Replace("|","#p")+"|"+str3.Replace("#","#a").Replace("|","#p");
then to retrieve original use:
splitStr=delimitedStr.Split("|".ToCharArray());
str1=splitStr[0].Replace("#p","|").Replace("#a","#");
str2=splitStr[1].Replace("#p","|").Replace("#a","#");
str3=splitStr[2].Replace("#p","|").Replace("#a","#");
note: the order of the replace is important
its unbreakable and easy to implement
Pipe for the win! |
We use ascii 0x7f which is pseudo-printable and hardly ever comes up in regular usage.
Well it's going to depend on the nature of your text to some extent but a vertical bar 0x7C doesn't crop up in text very often.
I don't think I've ever seen an ampersand followed by a comma in natural text, but you can check the file first to see if it contains the delimiter, and if so, use an alternative. If you want to always be able to know that the delimiter you use will not cause a conflict, then do a loop checking the file for the delimiter you want, and if it exists, then double the string until the file no longer has a match. It doesn't matter if there are similar strings because your program will only look for exact delimiter matches.
This can be good or bad (usually bad) depending on the situation and language, but keep mind mind that you can always Base64 encode the whole thing. You then don't have to worry about escaping and unescaping various patterns on each side, and you can simply seperate and split strings based on a character which isn't used in your Base64 charset.
I have had to resort to this solution when faced with putting XML documents into XML properties/nodes. Properties can't have CDATA blocks in them at all, and nodes escaped as CDATA obviously cannot have further CDATA blocks inside that without breaking the structure.
CSV is probably a better idea for most situations, though.
Both pipe and caret are the obvious choices. I would note that if users are expected to type the entire response, caret is easier to find on any keyboard than is pipe.
I've used double pipe and double caret before. The idea of a non printable char works if your not hand creating or modifying the file. For quick random access file storage and retrieval field width is used. You don't even have to read the file.. your literally pulling from the file by reference. This is how databases do some storage.. but they also manage the spaces between records and such. And it introduced the problem of max data element width. (Index attach a header which is used to define the width of each element and it's data type in the original old days.. later they introduced compression with remapping chars. This allows for a text file to get about 1/8 the size in transmission.. variable length char encoding for the win
make it dynamic : )
announce your control characters in the file header
for example
delimiter: ~
escape: \
wrapline: $
width: 19
hello world~this i$
s \\just\\ a sampl$
e text~$someVar$~h$
ere is some \~\~ma$
rkdown strikethrou$
gh\~\~ text
would give the strings
hello world
this is \just\ a sample text
$someVar$
here is some ~~markdown strikethrough~~ text
i have implemented something similar:
a plaintar text container format,
to escape and wrap utf16 text in ascii,
as an alternative to mime multipart messages.
see https://github.com/milahu/live-diff-html-editor

Resources