I'm trying to write a bash completion function that works for strings containing spaces and punctuation, even quotes may be in there.
I extract these strings with sed from files and thus have them as a several lines of text where each one contains a target string for completion. However, all the ways I tried to pass this to compgen -W, I always get completion for the individual words only.
There is a variable COMP_WORDBREAKS that defines which characters are treated as words' separators. Tuning this variable you can try to achieve what you want.
From bash(1):
COMP_WORDBREAKS — The set of characters that the Readline library treats as word separators when performing word completion. If COMP_WORDBREAKS is unset, it loses its special properties, even if it is subsequently reset.
Related
I am using Tcl_StringCaseMatch function in C++ code for string pattern matching. Everything works fine until input pattern or string has [] bracket. For example, like:
str1 = pq[0]
pattern = pq[*]
Tcl_StringCaseMatch is not working i.e returning false for above inputs.
How to avoid [] in pattern matching?
The problem is [] are special characters in the pattern matching. You need to escape them using a backslash to have them treated like plain characters
pattern= "pq\\[*\\]"
I don't think this should affect the string as well. The reason for double slashing is you want to pass the backslash itself to the TCL engine.
For the casual reader:
[] have a special meaning in TCL in general, beyond the pattern matching role they take here - "run command" (like `` or $() in shells), but [number] will have no effect, and the brackets are treated normally - thus the string str1 does not need escaping here.
For extra confusion:
TCL will interpret ] with no preceding [ as a normal character by default. I feel that's getting too confusing, and would rather that TCL complains on unbalanced brackets. As OP mentions though, this allows you to forgo the final two backslashes and use "pq\\[*]". I dislike this, and rather make it obvious both are treated normally and not the usual TCL way, but to each her/is own.
For a gradle script, I am composing strings that will be used as command line for a subsequent gradle Test-task. One of the strings is the user's password, which eventually will be passed to the called (exec'ed) "java ..." call using the JVM's -D option, e.g. -Dpassword=foobar.
What complicates things here is, that this password can/should of course contain special characters, that may interfere with the use of the string as command line. In other words: I need to escape special characters (which is OS-specific). :-(
Now to my actual question:
I want to use the String.replaceAll method, i.e. replaceAll(list_of_special characters, EscapeCharacter + Ref_to_matched_character),
e.g. simplified something like replaceAll("[#$%^&]", "^$1")
'^' meaning the escape character and '$1' meaning the matched character here.
Is that possible, i.e. can one refer to the matched pattern in the second argument of replaceAll?
Is that possible, i.e. can one refer to the matched pattern in the second argument of replaceAll?
yes, it's possible
'a#b$c'.replaceAll('([#$%^&])', '^$1')
returns
a^#b^$c
Thanks for the responses and the reviews improving readability. Meanwhile I got my expression working. For those interested:
// handles gthe following: `~!##$%^&*()_+-={}|[]\:;"'<>?,./
escaped = original.replaceAll('[~!##\\$\\%\\^\\&\\*\\(\\)_\\+-={}\\|\\[\\]\\\\:;\"\\\'<>\\?,\\./]', '^$0') // for Windows - cmd.exe
I want to parse the following word in shell script
VERSION=METER1.2.1
Here i want to split it as two words as
WORD1=METER
WORD2=1.2.1
Let me help how to parse it?
Far more efficient than using external tools such is sed is bash's built-in parameter expansion support. For instance, if you want the name variable to contain everything until the first number, and the numbers variable to contain everything after the last alpha character:
version=METER1.2.1
name=${version%%[0-9]*}
numbers=${version##*[[:alpha:]]}
To understand this, see the BashFAQ entry on string manipulation in general, or the BashFAQ entry on parameter expansion in particular.
Does the newline sequence or even all escape sequences in a file count as only one character, even though it's written \n?
After I separated a one-line-file into multiple lines, only one character per line was added according to the wc -m output of the terminal.
\n is a way of representing a newline character in various languages and programs but as the name suggests, a newline is only stored in a file as a single character.
The backslash helps both computers and humans to realise you are referring to a newline character without you having to actually type one, which would be confusing in a lot of instances.
The \n notation is usually used for a single character. Use a hexdump to see the actual bytes, for example xxd.
How can I match a balanced pair of delimiters not escaped by backslash (that is itself not escaped by a backslash) (without the need to consider nesting)? For example with backticks, I tried this, but the escaped backtick is not working as escaped.
regex = /(?!<\\)`(.*?)(?!<\\)`/
"hello `how\` are` you"
# => $1: "how\\"
# expected "how\\` are"
And the regex above does not consider a backslash that is escaped by a backslash and is in front of a backtick, but I would like to.
How does StackOverflow do this?
The purpose of this is not much complicated. I have documentation texts, which include the backtick notation for inline code just like StackOverflow, and I want to display that in an HTML file with the inline code decorated with some span material. There would be no nesting, but escaped backticks or escaped backslashes may appear anywhere.
Lookbehind is the first thing everyone thinks of for this kind of problem, but it's the wrong tool, even in flavors like .NET that support unrestricted lookbehinds. You can hack something up, but it's going to be ugly, even in .NET. Here's a better way:
`[^`\\]*(\\.[^`\\]*)*`
The first part starts from the opening delimiter and gobbles up anything that's not the delimiter or a backslash. If the next character is a backslash, it consumes that and the character following it, whatever it may be. It could be the delimiter character, another backslash, or anything else, it doesn't matter.
It repeats those steps as many times as necessary, and when neither [^`\\] nor \\. can match, the next character must be the closing delimiter. Or the end of the string, but I'm assuming the input is well formed. But if it's not well formed, this regex will fail very quickly. I mention that because of this other approach I see a lot:
`(?:[^`\\]+|\\.)*`
This works fine on well-formed input, but what happens if you remove the last backtick from your sample input?
"hello `how\` are you"
According to RegexBuddy, after encountering the first backtick, this regex performed 9,252 distinct operations (or steps) before it could give up and report failure; mine failed in ten steps.
EDIT To extract just the par inside the delimiters, wrap that part in a capturing group. You'll still have to remove the backslashes manually.
`([^`\\]*(?:\\.[^`\\]*)*)`
I also changed the other group to non-capturing, which I should have done from the start. I don't avoid capturing religiously, but if you are using them to capture stuff, any other groups you use should be non-capturing.
EDIT I think I've been reading too much into the question. On StackOverflow, if you want to include literal backticks in an inline-code segment or a comment, you use three backticks as the the delimiter, not just one. Since there's no need to escape backticks, you can ignore backslashes as well. Your regex could turn out to be as simple as this:
```(.*?)```
Dealing with the possibility of false delimiters, you use the same basic technique:
```([^`]*(?:`(?!``)[^`]*)*)```
Is this what you're after?
By the way, this answer doesn't contradict #nneonneo's comment above. This answer doesn't consider the context in which the match is taking place. Is it in the source code of a program or web page? If it is, did the match occur inside a comment or a string literal? How do I even know the first backtick I found wasn't escaped? Regexes don't know anything about the context in which they operate; that's what parsers are for.
If you don't need nesting, regexes can indeed be a proper tool. Lexers of programming languages, for instance, use regexes to tokenize strings, and strings usually allow their own delimiters as an escaped content. Anything more complicated than that will probably need a full-blown parser though.
The "general formula" is to match an escaped character (\\.) or any character that's valid as content but don't need to be escaped ([^{list of invalid chars}]). A "naïve" solution would be joining them with or (|), but for a more efficient variant see #AlanMoore's answer.
The complete example is shown below, in two variants: the first assumes than backslashes should only be used for escaping inside the string, the second assumes that a backslash anywhere in the text escapes the next character.
`((?:\\.|[^`\\])*)`
(?:\\.|[^`\\])*`((?:\\.|[^`\\])*)`
Working examples here and here. However, as #nneonneo commented (and I endorsed), regexes are not meant to do a complete parse, so you'd better keep things simple if you want them to work out right (do you want to find a token in the text, or do you want to delimit it already knowing where it starts? The answer to that question is important to decide which strategy works best for your case).