To give context, I'm trying to create a simple version of bash, and for that I need to mimic the way bash parses content with multiple sets of single and double quotes. I can't figure out the overall procedure by which bash handles quotes inside quotes. I noticed some repeated patterns but still don't have the full picture.
For instance this example:
$ "'"ls"'"
evaluates to:
$ 'ls'
or even this uglier example:
$ "'"'"""'"ls"'"""'"'"
evaluates to:
$ '"""ls"""'
I noticed the following patterns arise:
if count of wrapping quotes are even it evaluates to what's inside the inverse quotes exclusively
if count of wrapping quotes are odd it evaluates to what's inside the inverse quotes inclusively.
For example even wrapping quotes:
$ ""'ls'""
evaluates to what's inside the inverse quotes (single quotes) without the single quotes themselves, evaluation:
$ ls
Or for odd count of wrapper quotes:
$ '''"ls"'''
it evaluates to content of double quotes inclusively:
$ "ls" : command not found.
Still I don't get the full picture of how this parsing pattern for more complex quotes inside quotes is done.
Quotes are processed sequentially, looking for matching closing quotes. Everything between the starting and ending quote becomes a single string. Nested quotes have no special meaning.
"'"ls"'"
When it processes the first ", it scans looking for the next " that ends the string, which contains '.
Then it scans the fixed string ls.
When it processes the " after ls, it scans looking for the next ", resulting in another string '.
These are all concatenated, resulting in 'ls'.
"'"'"""'"ls"'"""'"'"
"'" is the string '.
'"""' is the string """
"ls" is the string ls
'"""' is the string """
"'" is the string '.
Concatenating them all together produces '"""ls"""'
""'ls'""
"" is an empty string. 'ls' is the string ls. "" is another empty string. Concatenating them together produces ls.
'''"ls"'''
'' is an empty string. '"ls"' is the string "ls" (containing literal double quotes). '' is an empty string. Concatenating them produces "ls". Since there's no command with that name (including the literal double quotes), you get an error.
There are differences between single and double quotes, but they don't affect any of the examples you posted. See Difference between single and double quotes in Bash
Related
In a shell script I would like to quote this string of special characters \'%"\"'\ How do I escape the quotes/backslashes inside the string ?
If you use single quotes around the whole string, then the only thing you need to worry about is replacing every ' with '"'"':
$ string='\'"'"'%"\"'"'"'\'
$ echo "$string"
\'%"\"'\
This means:
' close the previous single-quoted string
"'" a new double-quoted string, containing a single quote
' open a new single-quoted string
The shell concatenates adjacent strings, so you get a single quote where you want it. You can replace the middle part by \' but personally I think that's more confusing!
I have to implement a minishell written in C for my school homework,and I am currently working on the parsing of the user input and I have a question regarding single quote and double quote.
How are they parsed, I mean many combo are possible as long as there is more than two single quotes / double quote.
Let’s say I have : ""hello"".
If I echo this the bash output would be hello, is this because bash interpreted "hello" inside "" or because bash interpreted this as : "" then hello "".
How does bash parse multiple chained quotes :
" "Hello" " : let’s say a = Hello. Does bash sees "a" or bash see ""a""
Bash parses single- and double-quotes slightly differently. Single-quotes are simpler, so I'll cover them first.
A single-quoted string (or single-quoted section of a string -- I'll get to that) runs from a single-quote to the next single-quote. Anything other than a single-quote (including double-quotes, backslashes, newlines, etc) is just a literal character in the string. But the next single-quote ends the single-quoted section. There is no way to put a single-quote inside a single-quoted string, because the next single-quote will end the single-quoted section.
You can have differently-quoted sections within a single "word" (or string, or whatever you want to call it). So, for example, ''hello'' will be parsed as a zero-length single-quoted section, the unquoted section hello, then another zero-length single-quoted section. Since there's no whitespace between them, they're all treated as part of the same word (and since the single-quoted sections are zero-length, they have no effect at all on the resulting word/string).
Double-quotes are slightly different, in that some characters within them retain their special meanings. For example, $ can introduce variable or command substitution, etc (although the result won't be subject to word-splitting like it would be without the double-quotes). Backslashes also function as escapes inside double-quotes, so \$ will be treated as a literal dollar-sign, not as the start of a variable expansion or anything. Other characters that can have their special meaning removed by backslash-escaping are backslashes themselves, and... double-quotes! You can include a double-quote in a double-quote by escaping it. The double-quoted section ends a the next non-escaped double-quote.
So compare:
echo ""hello"" # just prints hello
echo "\"hello\"" # prints "hello", because the escaped
# quotes are part of the string
echo "$PATH" # prints the value of the PATH variable
echo "\$PATH" # prints $PATH
echo ""'$PATH'"" # prints $PATH, because it's in
# single-quotes (with zero-lenght
# double-quoted sections on each side
Also, single- and double-quotes have no special meaning within the other type of quote. So:
echo "'hello'" # prints 'hello', because the single-quotes
# are just ordinary characters in a
# double-quoted string
echo '"hello"' # similarly, prints "hello"
echo "'$PATH'" # prints the PATH variable with
# single-quotes around it (because
# $variable expands in double-quotes)
I have read a ton of pages including the bash manual, but still find the "non-obvious" use of backslashes confusing.
If I do:
echo \*
it prints a single asterisks, this is normal as I am escaping the asterisks making it literal.
If I do:
echo \\*
it prints \*
This also seems normal, the first backslash escapes the second.
If I do
echo `echo \\*`
It prints the contents of the directory. But in my mind it should print the same as echo \\* because when that is substituted and passed to echo. I understand this is the non-obvious use of backslashes everyone talks about, but I am struggling to understand WHY it happens.
Also the bash manual says
When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by ‘$’, ‘`’, or ‘\’.
But it doesn't define what the "literal meaning on backslash" is. Is it as an escape character, a continuation character, or just literally a backslash character?
Also, it says it retain it's literal meaning, except when followed by ... So when it's followed by one of those three characters what does it do? Does it only escape those three characters?
This is mostly for historical interest since `...` command substitution has been superseded by the cleaner $(...) form. No new script should ever use backticks.
Here's how you evaluate a $(command) substitution
Run the command
Here's how you evaluate a `string` command substitution:
Determine the span of the string, from the opening backtick to the closing unescaped backtick (behavior is undefined if this backtick is inside a string literal: the shell will typically either treat it as literal backtick or as a closing backtick depending on its parser implementation)
Unescape the string by removing backslashes that come before one of the three characters dollar, backtick or backslash. This following character is then inserted literally into the command. A backslash followed by any other character will be left alone.
E.g. Hello\\ World will become Hello\ World, because the \\ is replaced with \
Hello\ World will also become Hello\ World, because the backslash is followed by a character other than one of those three, and therefore retains its literal meaning of just being a backslash
\\\* will become \\* since the \\ will become just \ (since backslash is one of the three), and the \* will remain \* (since asterisk is not)
Evaluate the result as a shell command (this includes following all regular shell escaping rules on the result of the now-unescaped command string)
So to evaluate echo `echo \\*`:
Determine the span of the string, here echo \\*
Unescape it according to the backtick quoting rules: echo \*
Evaluate it as a command, which runs echo to output a literal *
Since the result of the substitution is unquoted, the output will undergo:
Word splitting: * becomes * (since it's just one word)
Pathname expansion on each of the words, so * becomes bin Desktop Downloads Photos public_html according to files in the current directory
Note in particular that this was not the same as replacing the the backtick command with the output and rerunning the result. For example, we did not consider escapes, quotes and expansions in the output, which a simple text based macro expansion would have.
Pass each of these as arguments to the next command (also echo): echo bin Desktop Downloads Photos public_html
The result is a list of files in the current directory.
How do I escape characters in linux using the sed command?
I want to print something like this
echo hey$ya
But I'm just receiving a
hey
how can escape the $ character?
The reason you are only seing "hey" echoed is that because of the $, the shell tries to expand a variable called ya. Since no such variable exists, it expands to an empty string (basically it disappears).
You can use single quotes, they prevent variable expansion :
echo 'hey$ya'
You can also escape the character :
echo hey\$ya
Strings can also be enclosed in double quotes (e.g. echo "hey$ya"), but these do not prevent expansion, all they do is keep the whole expression as a single string instead of allowing word splitting to separate words in separate arguments for the command being executed. Using double quotes would not work in your case.
\ is the escape character. So your example would be:
~ » echo hey\$ya
hey$ya
~ »
I've got this in a for loop in a bash script:
xpath $f '//bad/objdesc/desc[$i]' > $f.$i.xml
$i is the counter.
It doesn't work.
How do I refer to $i properly in the brackets of the desc element?
Thanks.
Use double quotes instead, so that the variable gets expanded instead of being treated literally as $i:
xpath "$f" "//bad/objdesc/desc[$i]" > "$f.$i.xml"
From the bash manual:
3.1.2.2 Single Quotes
Enclosing characters in single quotes (‘'’) preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.
3.1.2.3 Double Quotes
Enclosing characters in double quotes (‘"’) preserves the literal value of all characters within the quotes, with the exception of ‘$’, ‘’, ‘\’, and, when history expansion is enabled, ‘!’. The characters ‘$’ and ‘’ retain their special meaning within double quotes (see Shell Expansions). The backslash retains its special meaning only when followed by one of the following characters: ‘$’, ‘`’, ‘"’, ‘\’, or newline. Within double quotes, backslashes that are followed by one of these characters are removed. Backslashes preceding characters without a special meaning are left unmodified. A double quote may be quoted within double quotes by preceding it with a backslash. If enabled, history expansion will be performed unless an ‘!’ appearing in double quotes is escaped using a backslash. The backslash preceding the ‘!’ is not removed.
Here's a much more efficient alternate approach, for the specific indexing-into-a-list case.
counter=1
while IFS='' read -r -d $'\x03' line; do
printf '%s' "$line" >"%f.$(( counter++ )).xml"
done < <(xmlstarlet sel -t -m //bad/objdesc/desc -v . -o $'\x03')
This runs xmlstarlet (a more capable alternative to xpath) only once, and retrieves the entire set of values (even should they be multiline) into a bash array, where they can be accessed as ${array[0]}, ${array[1]}, etc.
It has the caveat that values containing the literal EOT character within their contents will be split into multiple values on that character.
A word on safety:
Expanding query parameters -- whether XPath, SQL, or otherwise -- within double quotes in bash opens you up to query injection attacks. Let's say you had a variable with a customer IDs, and you were looking up account balances; if you used xpath "//record[customer='$id' and record='$rec']", a customer who could provide literal values for $rec could also look up content belonging to customers with a different ID by escaping the quotes.
The safe alternative is to use a tool that lets you pass your query in single-quotes, and pass your parameters out-of-band, as with a --var name="$value" parameter. Until it has this capability, I cannot advise the Perl xpath tool for general use.
Should someone other than the original poster of the question be considering using the xpath tool with user-provided or untrusted data, please keep the above in mind.