asciidoc: Is there a way to get around the problem of lines beginning with [colour] attributes ending with ] not displaying in asciidoc?

asciidoc: Is there a way to get around the problem of lines beginning with [colour] attributes ending with ] not displaying in asciidoc? - asciidoc

My target asciidoc text is this:
[red]#Some prompt[x]# Make sure the option is [checked]
But it won't display in asciidoc
On further investigation, I found that any line beginning with a [colour] in square brackets, and ending in a right-bracket is similarly not displayed.
Now, in this case, I've got around the problem by putting the whole prompt section in bold, like this:
*[red]#Some prompt[x]#* Make sure the option is [checked]
but this is not ideal. Adding a period after the final close bracket \] also AVOIDS the problem - but in my use case I didn't like it.
I'd like to know if there is a better way. So far I've tried:
Escaping the leading open bracket \[
Escaping the final close bracket \]
Removing the [x] in the middle, thinging the additional brackets in the middle may influence the outcome
but none of these has worked.
So my question is:
Is there a way to get around the problem of lines beginning with [colour] attributes ending with ] not displaying in asciidoc?

It seems to me that a line which begins with an opening bracket and ends with a closing bracket is being interpreted as a block attribute line.
There are a number of ways you can mitigate this.
Use a character replacement attribute. There are many built-in attributes, or you can easily define your own.
For example:
[.red]#Some prompt[x]# Make sure the option is [checked{endsb}
Use one of the inline pass-through syntaxes, for example ++:
[.red]#Some prompt[x]# Make sure the option is [checked++]++
Prevent the first opening bracket from being the first character of the line. Also, uses a built-in attribute, and the markup needs to be changed to unconstrained.
For example:
{empty}[.red]##Some prompt[x]## Make sure the option is [checked]

Related

How can I refactor an existing source code file to normalize all use of tab?

Sometimes I find myself editing a C source file which sees both use of tab as four spaces, and regular tab.
Is there any tool that attempts to parse the file and "normalize" this, i.e. convert all occurrences of four spaces to regular tab, or all occurrences of tab to four spaces, to keep it consistent?
I assume something like this can be done even with just a simple vim one-liner?

There's :retab and :retab! which can help, but there are caveats.
It's easier if you're using spaces for indentation, then just set 'expandtab' and execute :retab, then all your tabs will be converted to spaces at the appropriate tab stops (which default to 8.) That's easy and there are no traps in this method!
If you want to use 4 space indentation, then keep 'expandtab' enabled and set 'softtabstop' to 4. (Avoid modifying the 'tabstop' option, it should always stay at 8.)
If you want to do the inverse and convert to tabs instead, you could set 'noexpandtab' and then use :retab! (which will also look at sequences of spaces and try to convert them back to tabs.) The main problem with this approach is that it won't just consider indentation for conversion, but also sequences of spaces in the middle of lines, which can cause the operation to affect strings inside your code, which would be highly undesirable.
Perhaps a better approach for replacing spaces with tabs for indentation is to use the following substitute command:
:%s#^\s\+#\=repeat("\t", indent('.') / &tabstop).repeat(" ", indent('.') % &tabstop)#
Yeah it's a mouthful... It's matching whitespace at the beginning of the lines, then using the indent() function to find the total indentation (that function calculates indentation taking tab stops in consideration), then dividing that by the 'tabstop' to decide how many tabs and how many spaces a specific line needs.
If this command works for you, you might want to consider adding a mapping or :command for it, to keep it handy. For example:
command! -range=% Retab <line1>,<line2>s#^\s\+#\=repeat("\t", indent('.') / &tabstop).repeat(" ", indent('.') % &tabstop)
This also allows you to "Retab" a range of the file, including one you select with a visual selection.
Finally, one last alternative to :retab is that to ask Vim to "reformat" your code completely, using the = command, which will use the current 'indentexpr' or other indentation configurations such as 'cindent' to completely reindent the block. That typically respects your 'noexpandtab' and 'smarttabstop' options, so it use tabs and spaces for indentation consistently. The downside of this approach is that it will completely reformat your code, including changing indentation in places. The upside is that it typically has a semantic understanding of the language and will be able to take that in consideration when reindenting the code block.

Terminal overwriting same line when too long

In my terminal, when I'm typing over the end of a line, rather than start a new line, my new characters overwrite the beginning of the same line.
I have seen many StackOverflow questions on this topic, but none of them have helped me. Most have something to do with improperly bracketed colors, but as far as I can tell, my PS1 looks fine.
Here it is below, generated using bash -x:
PS1='\[\033[01;32m\]\w \[\033[1;36m\]☔︎ \[\033[00m\] '
Yes, that is in fact an umbrella with rain; I have my Bash prompt update with the weather using a script I wrote.
EDIT:
My BashWeather script actually can put any one of a few weather characters, so it would be great if we could solve for all of these, or come up with some other solution:
☂☃☽☀︎☔︎
If the umbrella with rain is particularly problematic, I can change that to the regular umbrella without issue.

The symbol being printed ☔︎ consists of two Unicode codepoints: U+2614 (UMBRELLA WITH RAIN DROPS) and U+FE0E (VARIATION SELECTOR-15). The second of these is a zero-length qualifier, which is intended to enforce "text style", as opposed to "emoji style", on the preceding symbol. If you're viewing this with a font can distinguish the two styles, the following might be the emoji version: ☔︉ Otherwise, you can see a table of text and emoji variants in Working Group document N4182 (the umbrella is near the top of page 3).
In theory, U+FE0E should be recognized as a zero-length codepoint, like any other combining character. However, it will not hurt to surround the variant selector in PS1 with the "non-printing" escape sequence \[…\].
It's a bit awkward to paste an isolated variant selector directly into a file, so I'd recommend using bash's unicode-escape feature:
WEATHERCHAR=$'\u2614\[\ufe0e\]'
#...
PS1=...${WEATHERCHAR}...
Note that \[ and \] are interpreted before parameter expansion, so WEATHERCHAR as defined above cannot be dynamically inserted into the prompt. An alternative would be to make the dynamically-inserted character just the $'\u2614' umbrella (or whatever), and insert the $'\[\ufe0e\]' in the prompt template along with the terminal color codes, etc.
Of course, it is entirely possible that the variant indicator isn't needed at all. It certainly makes no useful difference on my Ubuntu system, where the terminal font I use (Deja Vu Sans Mono) renders both variants with a box around the umbrella, which is simply distracting, while the fonts used in my browser seem to render the umbrella identically with and without variants. But YMMV.

This almost works for me, so should probably not be considered a complete solution. This is a stripped down prompt that consists of only an umbrella and a space:
PS1='\342\230\[\224\357\270\] '
I use the octal escapes for the UTF-8 encoding of the umbrella character, putting the last three bytes inside \[...\] so that bash doesn't think they take up space on the screen. I initially put the last four bytes in, but at least in my terminal, there is a display error where the umbrella is followed by an extra character (the question-mark-in-a-diamond glyph for missing characters), so the umbrella really does occupy two spaces.
This could be an issue with bash and 5-byte UTF-8 sequences; using a character with a 4-byte UTF-encoding poses no problem:
# U+10400 DESERET CAPITAL LETTER LONG I
# (looks like a lowercase delta)
PS1='\360\220\220\200 '

NP++: Regular expression

I have a text with many expressions like this <.....>, e.g.:
<..> Text1 <.sdfdsvd> Text 2 <....dgdfg> Text3 <...something> Text4
How can I eliminate now all brackets <...> and all commands/texts between these brackets? But the other "real" text between these (like text1, text2 above) should not be touched.
I tried with the regular expression:
<.*>
But this finds also a block like this, including the inbetween text:
<..> Text1 <.sdfdsvd>
My second try was to search for alle expressions <.> without a third bracket between these two, so I tried:
<.*[^>^<]>
But that does not work either, no change in behavior. How to construct the needed expression correctly?

This works in Notepad++:
Find what: <[^>]+?>
Replace with: nothing
Try it out: http://regex101.com/r/lC9mD4
There are a few problems with your attempt: <.*[^>^<]>
.* matches all characters up through the final possible match. This means that all tags except the last will be bypassed. This is called greedy. In my solution, I have changed it to possessive, which goes up to the first possible match: .*?...although I apply this to the character class itself: [^>]+?.
[^>^<] is incorrect for two reasons, one small, one big. The small reason is that the first caret ^ says "do not match any of the following characters", and the characters following it are >, ^, and <. So you are saying you don't want to match the caret character, which is incorrect (but not harmful). The larger problem is that this is attempting to match exactly one character, when it needs to be one or more, which is signified by the plus sign: [^><]+.
Otherwise, your attempt is not that far off from my solution.

This seems to work:
<[^\s]*>
It looks for a left bracket, then anything that isn't whitespace between the brackets, then a right bracket. It would need some adjusting if there's whitespace between the brackets (<text1 text2>), though, and at that point a modification of one of your attempts would work better:
<[^<^>]*>
This one looks for a left bracket, then anything that isn't a left bracket or right bracket, then a right bracket.

Try <.*?>. If you don't use the "?", regular expressions will try to find the longest string that matches. Using "*?" will force to find the shortest.

Matching an unescaped balanced pair of delimiters

How can I match a balanced pair of delimiters not escaped by backslash (that is itself not escaped by a backslash) (without the need to consider nesting)? For example with backticks, I tried this, but the escaped backtick is not working as escaped.
regex = /(?!<\\)`(.*?)(?!<\\)`/
"hello `how\` are` you"
# => $1: "how\\"
# expected "how\\` are"
And the regex above does not consider a backslash that is escaped by a backslash and is in front of a backtick, but I would like to.
How does StackOverflow do this?
The purpose of this is not much complicated. I have documentation texts, which include the backtick notation for inline code just like StackOverflow, and I want to display that in an HTML file with the inline code decorated with some span material. There would be no nesting, but escaped backticks or escaped backslashes may appear anywhere.

Lookbehind is the first thing everyone thinks of for this kind of problem, but it's the wrong tool, even in flavors like .NET that support unrestricted lookbehinds. You can hack something up, but it's going to be ugly, even in .NET. Here's a better way:
`[^`\\]*(\\.[^`\\]*)*`
The first part starts from the opening delimiter and gobbles up anything that's not the delimiter or a backslash. If the next character is a backslash, it consumes that and the character following it, whatever it may be. It could be the delimiter character, another backslash, or anything else, it doesn't matter.
It repeats those steps as many times as necessary, and when neither [^`\\] nor \\. can match, the next character must be the closing delimiter. Or the end of the string, but I'm assuming the input is well formed. But if it's not well formed, this regex will fail very quickly. I mention that because of this other approach I see a lot:
`(?:[^`\\]+|\\.)*`
This works fine on well-formed input, but what happens if you remove the last backtick from your sample input?
"hello `how\` are you"
According to RegexBuddy, after encountering the first backtick, this regex performed 9,252 distinct operations (or steps) before it could give up and report failure; mine failed in ten steps.
EDIT To extract just the par inside the delimiters, wrap that part in a capturing group. You'll still have to remove the backslashes manually.
`([^`\\]*(?:\\.[^`\\]*)*)`
I also changed the other group to non-capturing, which I should have done from the start. I don't avoid capturing religiously, but if you are using them to capture stuff, any other groups you use should be non-capturing.
EDIT I think I've been reading too much into the question. On StackOverflow, if you want to include literal backticks in an inline-code segment or a comment, you use three backticks as the the delimiter, not just one. Since there's no need to escape backticks, you can ignore backslashes as well. Your regex could turn out to be as simple as this:
```(.*?)```
Dealing with the possibility of false delimiters, you use the same basic technique:
```([^`]*(?:`(?!``)[^`]*)*)```
Is this what you're after?
By the way, this answer doesn't contradict #nneonneo's comment above. This answer doesn't consider the context in which the match is taking place. Is it in the source code of a program or web page? If it is, did the match occur inside a comment or a string literal? How do I even know the first backtick I found wasn't escaped? Regexes don't know anything about the context in which they operate; that's what parsers are for.

If you don't need nesting, regexes can indeed be a proper tool. Lexers of programming languages, for instance, use regexes to tokenize strings, and strings usually allow their own delimiters as an escaped content. Anything more complicated than that will probably need a full-blown parser though.
The "general formula" is to match an escaped character (\\.) or any character that's valid as content but don't need to be escaped ([^{list of invalid chars}]). A "naïve" solution would be joining them with or (|), but for a more efficient variant see #AlanMoore's answer.
The complete example is shown below, in two variants: the first assumes than backslashes should only be used for escaping inside the string, the second assumes that a backslash anywhere in the text escapes the next character.
`((?:\\.|[^`\\])*)`
(?:\\.|[^`\\])*`((?:\\.|[^`\\])*)`
Working examples here and here. However, as #nneonneo commented (and I endorsed), regexes are not meant to do a complete parse, so you'd better keep things simple if you want them to work out right (do you want to find a token in the text, or do you want to delimit it already knowing where it starts? The answer to that question is important to decide which strategy works best for your case).

Block Indent Regex

I'm having problems about a regexp.
I'm trying to implement a regex to select just the tab indent blocks, but i cant find a way of make it work:
Example:
INDENT(1)
INDENT(2)
CONTENT(a)
CONTENT(b)
INDENT(3)
CONTENT(c)
So I need blocks like:
INDENT(2)
CONTENT(a)
CONTENT(b)
AND
INDENT(3)
CONTENT(c)
How I can do this?
really tks, its almost that, here is my original need:
table
tr
td
"joao"
"joao"
td
"marcos"
I need separated "td" blocks, could i adapt your example to that?

It depends on exactly what you are trying to do, but maybe something like this:
^(\t+)(\S.*)\n(?:\1\t.*\n)*
Working example: http://www.rubular.com/r/qj3WSWK9JR
The pattern searches for:
^(\t+)(\S.*)\n - a line that begins with a tab (I've also captured the first line in a group, just to see the effect), followed by
(?:\1\t.*\n)* - lines with more tabs.
Similarly, you can use ^( +)(\S.*)\n(?:\1 .*\n)* for spaces (example). Mixing spaces and tabs may be a little problematic though.
For the updated question, consider using ^(\t{2,})(\S.*)\n(?:\1\t.*\n)*, for at least 2 tabs at the beginning of the line.

You could use the following regex to get the groups...
[^\s]*.*\r\n(?:\s+.*\r*\n*)*
this requires that your lines not begin with white space for the beginning of the blocks.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio