Dealing with unescaped strings

Dealing with unescaped strings - autocad

I wrote an alternative fonction to open AutoCAD drawings. However, AutoCAD made it really hard to change how a document is opened when it is ran from Windows Explorer (double click the file with file association). The only method I found is to change a registry key which is "OpenDdeExec". There is a supplied argument (%1) that gives me a unescaped path to the file to open.
I need to ignore the escaping in path or replace the backslashes with double backslashes before it gets parsed as being special characters. In C#, you could do something like string s = #"I\Like random\backslashes"; and backslashes would be taken as the actualy backslashe character. In lisp, the only equivalence I found is quote which has a weird behavior (since it's normal use isn't exactly what I'm trying to acheive).
If I write something like (quote (I\Like random\backslashes)), the outcome will be (I\\Like random\\backslashes) which is ALMOST what I need. However, I have to get rid of the parenthesis. Any idea how I can acheive this?
Note: Doing this (quote I\Like random\backslashes) will break due to the space. It would, however, work on (quote I\Like\backslashes). This would output I\\Like\\backslashes just like I want.
This is the current OpenDdeExec with the described issue:
(OPENFROMSHELL (QUOTE (%1)))

This is unfortunately not possible with AutoCAD's limited LISP.

Related

What does the "%" mean in tcl?

In a situation like this for example:
[% $create_port %]
or [list [% $RTL_LIST %]]
I realized it had to do with the brackets, but what confuses me is that sometimes it is used with the brackets and variable followed, and sometimes you have brackets with variables inside without the %.
So i'm not sure what it is used for.
Any help is appreciated.

% is not a metacharacter in the Tcl language core, but it still has a few meanings in Tcl. In particular, it's the modulus operator in expr and a substitution field specifier in format, scan, clock format and clock scan. (It's also the default prompt character, and I have a trivial pass-through % command in my ~/.tclshrc to make cut-n-pasting code easier, but nobody else in the world needs to follow my lead there!)
But the code you have written does not appear to be any of those (because it would be a syntax error in all of the commands I've mentioned). It looks like it is some sort of directive processing scheme (with the special sequences being [% and %], with the brackets) though not one I recognise such as doctools or rivet. Because a program that embeds a Tcl interpreter could do an arbitrary transformation to scripts before executing them, it's extremely difficult to guess what it might really be.

Terminal overwriting same line when too long

In my terminal, when I'm typing over the end of a line, rather than start a new line, my new characters overwrite the beginning of the same line.
I have seen many StackOverflow questions on this topic, but none of them have helped me. Most have something to do with improperly bracketed colors, but as far as I can tell, my PS1 looks fine.
Here it is below, generated using bash -x:
PS1='\[\033[01;32m\]\w \[\033[1;36m\]☔︎ \[\033[00m\] '
Yes, that is in fact an umbrella with rain; I have my Bash prompt update with the weather using a script I wrote.
EDIT:
My BashWeather script actually can put any one of a few weather characters, so it would be great if we could solve for all of these, or come up with some other solution:
☂☃☽☀︎☔︎
If the umbrella with rain is particularly problematic, I can change that to the regular umbrella without issue.

The symbol being printed ☔︎ consists of two Unicode codepoints: U+2614 (UMBRELLA WITH RAIN DROPS) and U+FE0E (VARIATION SELECTOR-15). The second of these is a zero-length qualifier, which is intended to enforce "text style", as opposed to "emoji style", on the preceding symbol. If you're viewing this with a font can distinguish the two styles, the following might be the emoji version: ☔︉ Otherwise, you can see a table of text and emoji variants in Working Group document N4182 (the umbrella is near the top of page 3).
In theory, U+FE0E should be recognized as a zero-length codepoint, like any other combining character. However, it will not hurt to surround the variant selector in PS1 with the "non-printing" escape sequence \[…\].
It's a bit awkward to paste an isolated variant selector directly into a file, so I'd recommend using bash's unicode-escape feature:
WEATHERCHAR=$'\u2614\[\ufe0e\]'
#...
PS1=...${WEATHERCHAR}...
Note that \[ and \] are interpreted before parameter expansion, so WEATHERCHAR as defined above cannot be dynamically inserted into the prompt. An alternative would be to make the dynamically-inserted character just the $'\u2614' umbrella (or whatever), and insert the $'\[\ufe0e\]' in the prompt template along with the terminal color codes, etc.
Of course, it is entirely possible that the variant indicator isn't needed at all. It certainly makes no useful difference on my Ubuntu system, where the terminal font I use (Deja Vu Sans Mono) renders both variants with a box around the umbrella, which is simply distracting, while the fonts used in my browser seem to render the umbrella identically with and without variants. But YMMV.

This almost works for me, so should probably not be considered a complete solution. This is a stripped down prompt that consists of only an umbrella and a space:
PS1='\342\230\[\224\357\270\] '
I use the octal escapes for the UTF-8 encoding of the umbrella character, putting the last three bytes inside \[...\] so that bash doesn't think they take up space on the screen. I initially put the last four bytes in, but at least in my terminal, there is a display error where the umbrella is followed by an extra character (the question-mark-in-a-diamond glyph for missing characters), so the umbrella really does occupy two spaces.
This could be an issue with bash and 5-byte UTF-8 sequences; using a character with a 4-byte UTF-encoding poses no problem:
# U+10400 DESERET CAPITAL LETTER LONG I
# (looks like a lowercase delta)
PS1='\360\220\220\200 '

Matching an unescaped balanced pair of delimiters

How can I match a balanced pair of delimiters not escaped by backslash (that is itself not escaped by a backslash) (without the need to consider nesting)? For example with backticks, I tried this, but the escaped backtick is not working as escaped.
regex = /(?!<\\)`(.*?)(?!<\\)`/
"hello `how\` are` you"
# => $1: "how\\"
# expected "how\\` are"
And the regex above does not consider a backslash that is escaped by a backslash and is in front of a backtick, but I would like to.
How does StackOverflow do this?
The purpose of this is not much complicated. I have documentation texts, which include the backtick notation for inline code just like StackOverflow, and I want to display that in an HTML file with the inline code decorated with some span material. There would be no nesting, but escaped backticks or escaped backslashes may appear anywhere.

Lookbehind is the first thing everyone thinks of for this kind of problem, but it's the wrong tool, even in flavors like .NET that support unrestricted lookbehinds. You can hack something up, but it's going to be ugly, even in .NET. Here's a better way:
`[^`\\]*(\\.[^`\\]*)*`
The first part starts from the opening delimiter and gobbles up anything that's not the delimiter or a backslash. If the next character is a backslash, it consumes that and the character following it, whatever it may be. It could be the delimiter character, another backslash, or anything else, it doesn't matter.
It repeats those steps as many times as necessary, and when neither [^`\\] nor \\. can match, the next character must be the closing delimiter. Or the end of the string, but I'm assuming the input is well formed. But if it's not well formed, this regex will fail very quickly. I mention that because of this other approach I see a lot:
`(?:[^`\\]+|\\.)*`
This works fine on well-formed input, but what happens if you remove the last backtick from your sample input?
"hello `how\` are you"
According to RegexBuddy, after encountering the first backtick, this regex performed 9,252 distinct operations (or steps) before it could give up and report failure; mine failed in ten steps.
EDIT To extract just the par inside the delimiters, wrap that part in a capturing group. You'll still have to remove the backslashes manually.
`([^`\\]*(?:\\.[^`\\]*)*)`
I also changed the other group to non-capturing, which I should have done from the start. I don't avoid capturing religiously, but if you are using them to capture stuff, any other groups you use should be non-capturing.
EDIT I think I've been reading too much into the question. On StackOverflow, if you want to include literal backticks in an inline-code segment or a comment, you use three backticks as the the delimiter, not just one. Since there's no need to escape backticks, you can ignore backslashes as well. Your regex could turn out to be as simple as this:
```(.*?)```
Dealing with the possibility of false delimiters, you use the same basic technique:
```([^`]*(?:`(?!``)[^`]*)*)```
Is this what you're after?
By the way, this answer doesn't contradict #nneonneo's comment above. This answer doesn't consider the context in which the match is taking place. Is it in the source code of a program or web page? If it is, did the match occur inside a comment or a string literal? How do I even know the first backtick I found wasn't escaped? Regexes don't know anything about the context in which they operate; that's what parsers are for.

If you don't need nesting, regexes can indeed be a proper tool. Lexers of programming languages, for instance, use regexes to tokenize strings, and strings usually allow their own delimiters as an escaped content. Anything more complicated than that will probably need a full-blown parser though.
The "general formula" is to match an escaped character (\\.) or any character that's valid as content but don't need to be escaped ([^{list of invalid chars}]). A "naïve" solution would be joining them with or (|), but for a more efficient variant see #AlanMoore's answer.
The complete example is shown below, in two variants: the first assumes than backslashes should only be used for escaping inside the string, the second assumes that a backslash anywhere in the text escapes the next character.
`((?:\\.|[^`\\])*)`
(?:\\.|[^`\\])*`((?:\\.|[^`\\])*)`
Working examples here and here. However, as #nneonneo commented (and I endorsed), regexes are not meant to do a complete parse, so you'd better keep things simple if you want them to work out right (do you want to find a token in the text, or do you want to delimit it already knowing where it starts? The answer to that question is important to decide which strategy works best for your case).

How to escape unicode characters in bash prompt correctly

I have a specific method for my bash prompt, let's say it looks like this:
CHAR="༇ "
my_function="
prompt=\" \[\$CHAR\]\"
echo -e \$prompt"
PS1="\$(${my_function}) \$ "
To explain the above, I'm builidng my bash prompt by executing a function stored in a string, which was a decision made as the result of this question. Let's pretend like it works fine, because it does, except when unicode characters get involved
I am trying to find the proper way to escape a unicode character, because right now it messes with the bash line length. An easy way to test if it's broken is to type a long command, execute it, press CTRL-R and type to find it, and then pressing CTRL-A CTRL-E to jump to the beginning / end of the line. If the text gets garbled then it's not working.
I have tried several things to properly escape the unicode character in the function string, but nothing seems to be working.
Special characters like this work:
COLOR_BLUE=$(tput sgr0 && tput setaf 6)
my_function="
prompt="\\[\$COLOR_BLUE\\] \"
echo -e \$prompt"
Which is the main reason I made the prompt a function string. That escape sequence does NOT mess with the line length, it's just the unicode character.

The \[...\] sequence says to ignore this part of the string completely, which is useful when your prompt contains a zero-length sequence, such as a control sequence which changes the text color or the title bar, say. But in this case, you are printing a character, so the length of it is not zero. Perhaps you could work around this by, say, using a no-op escape sequence to fool Bash into calculating the correct line length, but it sounds like that way lies madness.
The correct solution would be for the line length calculations in Bash to correctly grok UTF-8 (or whichever Unicode encoding it is that you are using). Uhm, have you tried without the \[...\] sequence?
Edit: The following implements the solution I propose in the comments below. The cursor position is saved, then two spaces are printed, outside of \[...\], then the cursor position is restored, and the Unicode character is printed on top of the two spaces. This assumes a fixed font width, with double width for the Unicode character.
PS1='\['"`tput sc`"'\] \['"`tput rc`"'༇ \] \$ '
At least in the OSX Terminal, Bash 3.2.17(1)-release, this passes cursory [sic] testing.
In the interest of transparency and legibility, I have ignored the requirement to have the prompt's functionality inside a function, and the color coding; this just changes the prompt to the character, space, dollar prompt, space. Adapt to suit your somewhat more complex needs.

#tripleee wins it, posting the final solution here because it's a pain to post code in comments:
CHAR="༇"
my_function="
prompt=\" \\[`tput sc`\\] \\[`tput rc`\\]\\[\$CHAR\\] \"
echo -e \$prompt"
PS1="\$(${my_function}) \$ "
The trick as pointed out in #tripleee's link is the use of the commands tput sc and tput rc which save and then restore the cursor position. The code is effectively saving the cursor position, printing two spaces for width, restoring the cursor position to before the spaces, then printing the special character so that the width of the line is from the two spaces, not the character.

(Not the answer to your problem, but some pointers and general experience related to your issue.)
I see the behaviour you describe about cmd-line editing (Ctrl-R, ... Cntrl-A Ctrl-E ...) all the time, even without unicode chars.
At one work-site, I spent the time to figure out the diff between the terminals interpretation of the TERM setting VS the TERM definition used by the OS (well, stty I suppose).
NOW, when I have this problem, I escape out of my current attempt to edit the line, bring the line up again, and then immediately go to the 'vi' mode, which opens the vi editor. (press just the 'v' char, right?). All the ease of use of a full-fledged session of vi; why go with less ;-)?
Looking again at your problem description, when you say
my_function="
prompt=\" \[\$CHAR\]\"
echo -e \$prompt"
That is just a string definition, right? and I'm assuming your simplifying the problem definition by assuming this is the output of your my_function. It seems very likely in the steps of creating the function definition, calling the function AND using the values returned are a lot of opportunities for shell-quoting to not work the way you want it to.
If you edit your question to include the my_function definition, and its complete use (reducing your function to just what is causing the problem), it may be easier for others to help with this too. Finally, do you use set -vx regularly? It can help show how/wnen/what of variable expansions, you may find something there.
Failing all of those, look at Orielly termcap & terminfo. You may need to look at the man page for your local systems stty and related cmds AND you may do well to look for user groups specific to you Linux system (I'm assuming you use a Linux variant).
I hope this helps.

Search and replace Greek letters in notebook

I often use Greek letters in my calculations. Is there any way, to replace all occurrences of say ø with µ? From a computational/mathematical standpoint, it makes no difference what the variable name is. But sometimes, we are conditioned to use certain variables and become used to them. So if I happen to use an odd variable and need to share my notebook with a colleague, I'd like to change the variable before sending it. Is there an efficient way to search and replace Greek letters in mma?

Using Mathematica 7 or later, you can use the "Find" dialog to do this. Type \[Phi] as the search string and \[Mu] as the replacement string. This may also work in Mathematica 6 or earlier, but I don't have those versions at hand at the moment to try it.
See the "Listing of Named Characters" in the Mathematica help for the escape codes that you can use.

The find and replace dialog should work for this.
Assuming version 8, you can either use long names (\[OSlash]) to input the names, or shortcuts (shift-esc o / shift-esc).
(shift-esc being necessary because plain old esc in the find dialog will dismiss the dialog.)
In earlier versions the long name method should work. (The long name won't collapse into the character, but after finding/replacing it's all fine.)

In mma7, one possibility that is sometimes handy is Use Selection for Find. That is, select the greek letter (or whatever you want to replace), then (from the EDIT menu) -> Find -> Use Selection for Find.
When the Search/Replace dialog box is now invoked, the \ [phi] (for example) will be in the Find dialog box. On a Macintosh, the shortcut is command E (followed by command F). Also works for\ [CapitalDelta] etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio