Exactly how do backslashes work within backticks? - bash

From the Bash FAQ:
Backslashes (\) inside backticks are handled in a non-obvious manner:
$ echo "`echo \\a`" "$(echo \\a)"
a \a
$ echo "`echo \\\\a`" "$(echo \\\\a)"
\a \\a
But the FAQ does not break down the parsing rules that lead to this difference. The only relevant quote from man bash I found was:
When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by $, `, or .
The "$(echo \\a)" and "$(echo \\\\a)" cases are easy enough: Backslash, the escape character, is escaping itself into a literal backlash. Thus every instance of \\ becomes \ in the output. But I'm struggling to understand the analogous logic for the backtick cases. What is the underlying rule and how does the observed output follow from it?
Finally, a related question... If you don't quote the backticks, you get a "no match" error:
$ echo `echo \\\\a`
-bash: no match: \a
What's happening in this case?
update
Re: my main question, I have a theory for a set of rules that explains all the behavior, but still don't see how it follows from any of the documented rules in bash. Here are my proposed rules....
Inside backticks, a backslash in front of a character simply returns that character. Ie, a single backslash has no effect. And this is true for all characters, except backlash itself and backticks. In the case of backslash itself, \\ becomes an escaping backslash. It will escape its next character.
Let's see how this plays out in an example:
a=xx
echo "`echo $a`" # prints the value of $a
echo "`echo \$a`" # single backslash has no effect: equivalent to above
echo "`echo \\$a`" # escaping backslash make $ literal
prints:
xx
xx
$a
Try it online!
Let's analyze the original examples from this perspective:
echo "`echo \\a`"
Here the \\ produces an escaping backslash, but when we "escape" a we just get back a, so it prints a.
echo "`echo \\\\a`"
Here the first pair \\ produces an escaping backslash which is applied to \, producing a literal backslash. That is, the first 3 \\\ become a single literal \ in the output. The remaining \a just produces a. Final result is \a.

The logic is quite simple as such. So we look at bash source code (4.4) itself
subst.c:9273
case '`': /* Backquoted command substitution. */
{
t_index = sindex++;
temp = string_extract(string, &sindex, "`", SX_REQMATCH);
/* The test of sindex against t_index is to allow bare instances of
` to pass through, for backwards compatibility. */
if (temp == &extract_string_error || temp == &extract_string_fatal)
{
if (sindex - 1 == t_index)
{
sindex = t_index;
goto add_character;
}
last_command_exit_value = EXECUTION_FAILURE;
report_error(_("bad substitution: no closing \"`\" in %s"), string + t_index);
free(string);
free(istring);
return ((temp == &extract_string_error) ? &expand_word_error
: &expand_word_fatal);
}
if (expanded_something)
*expanded_something = 1;
if (word->flags & W_NOCOMSUB)
/* sindex + 1 because string[sindex] == '`' */
temp1 = substring(string, t_index, sindex + 1);
else
{
de_backslash(temp);
tword = command_substitute(temp, quoted);
temp1 = tword ? tword->word : (char *)NULL;
if (tword)
dispose_word_desc(tword);
}
FREE(temp);
temp = temp1;
goto dollar_add_string;
}
As you can see calls a function de_backslash(temp); on the string which updates the string in c. The code the same function is below
subst.c:1607
/* Remove backslashes which are quoting backquotes from STRING. Modifies
STRING, and returns a pointer to it. */
char *
de_backslash(string) char *string;
{
register size_t slen;
register int i, j, prev_i;
DECLARE_MBSTATE;
slen = strlen(string);
i = j = 0;
/* Loop copying string[i] to string[j], i >= j. */
while (i < slen)
{
if (string[i] == '\\' && (string[i + 1] == '`' || string[i + 1] == '\\' ||
string[i + 1] == '$'))
i++;
prev_i = i;
ADVANCE_CHAR(string, slen, i);
if (j < prev_i)
do
string[j++] = string[prev_i++];
while (prev_i < i);
else
j = i;
}
string[j] = '\0';
return (string);
}
The above just does simple thing if there is \ character and the next character is \ or backtick or $, then skip this \ character and copy the next character
So if convert it to python for simplicity
text = r"\\\\$a"
slen = len(text)
i = 0
j = 0
data = ""
while i < slen:
if (text[i] == '\\' and (text[i + 1] == '`' or text[i + 1] == '\\' or
text[i + 1] == '$')):
i += 1
data += text[i]
i += 1
print(data)
The output of the same is \\$a. And now lets test the same in bash
$ a=xxx
$ echo "$(echo \\$a)"
\xxx
$ echo "`echo \\\\$a`"
\xxx

Did some more research to find the reference and rule of what is happening. From the GNU Bash Reference Manual it states
When the old-style backquote form of substitution is used, backslash
retains its literal meaning except when followed by ‘$’, ‘`’, or ‘\’.
The first backquote not preceded by a backslash terminates the command
substitution. When using the $(command) form, all characters between
the parentheses make up the command; none are treated specially.
In other words \, \$, and ` inside of `` are processed by the CLI parser before the command substitution. Everything else is passed to the command substitution for processing.
Let's step through each example from the question. After the # I put how the command substitution was processed by the CLI parser before `` or $() is executed.
Your first example explained.
$ echo "`echo \\a`" # echo \a
a
$ echo "$(echo \\a)" # echo \\a
\a
Your second example explained:
$ echo "`echo \\\\a`" # echo \\a
\a
$ echo "$(echo \\\\a)" # echo \\\\a
\\a
Your third example:
a=xx
$ echo "`echo $a`" # echo xx
xx
$ echo "`echo \$a`" # echo $a
xx
echo "`echo \\$a`" # echo \$a
$a
Your third example using $()
$ echo "$(echo $a)" # echo $a
xx
$ echo "$(echo \$a)" # echo \$a
$a
$ echo "$(echo \\$a)" # echo \\$a
\xx

Related

sed Capital_Case not working

I'm trying to convert a string that has either - (hyphen) or _ (underscore) to Capital_Case string.
#!/usr/bin/env sh
function cap_case() {
[ $# -eq 1 ] || return 1;
_str=$1;
_capitalize=${_str//[-_]/_} | sed -E 's/(^|_)([a-zA-Z])/\u\2/g'
echo "Capitalize:"
echo $_capitalize
return 0
}
read string
echo $(cap_case $string)
But I don't get anything out.
First I am replacing any occurrence of - and _ with _ ${_str//[-_]/_}, and then I pipe that string to sed which finds the first letter, or _ as the first group, and then the letter after the first group in the second group, and I want to uppercase the found letter with \u\2. I tried with \U\2 but that didn't work as well.
I want the string some_string to become
Some_String
And string some-string to become
Some_String
I'm on a mac, using zsh if that is helpful.
EDIT: More generic solution here to make each field's first letter Capital.
echo "some_string_other" | awk -F"_" '{for(i=1;i<=NF;i++){$i=toupper(substr($i,1,1)) substr($i,2)}} 1' OFS="_"
Following awk may help you.
echo "some_string" | awk -F"_" '{$1=toupper(substr($1,1,1)) substr($1,2);$2=toupper(substr($2,1,1)) substr($2,2)} 1' OFS="_"
Output will be as follows.
echo "some_string" | awk -F"_" '{$1=toupper(substr($1,1,1)) substr($1,2);$2=toupper(substr($2,1,1)) substr($2,2)} 1' OFS="_"
Some_String
This being zsh, you don't need sed (or even a function, really):
$ s=some-string-bar
$ print ${(C)s:gs/-/_}
Some_String_Bar
The (C) flag capitalizes words (where "words" are defined as sequences of alphanumeric characters separated by other characters); :gs/-/_ replaces hyphens with underscores.
If you really want a function, it's cap_case () { print ${(C)1:gs/-/_} }.
pure bash:
#!/bin/bash
camel_case(){
local d display string
declare -a strings # = scope local
[ "$2" ] && d="$2" || d=" " # optional output delimiter
ifs_ini="$IFS"
IFS+='_-' # we keep initial IFS
strings=( "$1" ) # array
for string in ${strings[#]} ; do
display+="${string^}$d"
done
echo "${display%$d}"
IFS="$ifs_ini"
}
camel_case "some-string_here" "_"
camel_case "some-string_here some strings here" "+"
camel_case "some-string_here some strings here"
echo "$BASH_VERSION"
exit
output:
Some_String_Here
Some+String+Here+Some+Strings+Here
Some String Here Some Strings Here
4.4.18(1) release
You can try this gnu sed
echo 'some_other-string' | sed -E 's/(^.)/\u&/;s/[_-](.)/_\u\1/g'
Explains :
s/(^.)/\u&/
(^.) match the first char and \u& put the match in capital letter.
s/[_-](.)/_\u\1/g
[_-](.) capture a char preceded by _ or - and replace it by _ and the matched char in capital letter.
The g at the end tell sed to make the replacement for each char which meet the criteria
You didn't assign to _capitalize - you set a _capitalize environment variable for the empty command that you piped into sed.
You probably meant
_capitalize=$(<<<"${_str//[-_]/_}" sed -E 's/(^|_)([a-zA-Z])/\1\u\2/g')
Note also that ${//} isn't standard shell, so you really ought to specify an interpreter other than sh.
A simpler approach would be simply:
#!/bin/sh
cap_case() {
printf "Capitalize: "
echo "$*" | sed -e 'y/-/_/' -e 's/\(^\|_\)[[:alpha:]]/\U&/g'
}
echo $(cap_case "snake_case")
Note that the \u / \U replacement is a GNU extension to sed - if you're using a non-GNU implementation, check whether it supports this feature.

Creating users from .txt file [duplicate]

Why doesn't work the following bash code?
for i in $( echo "emmbbmmaaddsb" | split -t "mm" )
do
echo "$i"
done
expected output:
e
bb
aaddsb
The recommended tool for character subtitution is sed's command s/regexp/replacement/ for one regexp occurence or global s/regexp/replacement/g, you do not even need a loop or variables.
Pipe your echo output and try to substitute the characters mm witht the newline character \n:
echo "emmbbmmaaddsb" | sed 's/mm/\n/g'
The output is:
e
bb
aaddsb
Since you're expecting newlines, you can simply replace all instances of mm in your string with a newline. In pure native bash:
in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"
If you wanted to do such a replacement on a longer input stream, you might be better off using awk, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal shell function (backending into awk) given in BashFAQ #21 is applicable:
# Taken from http://mywiki.wooledge.org/BashFAQ/021
# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
# STR cannot be empty
[[ $1 ]] || return
# string manip needed to escape '\'s, so awk doesn't expand '\n' and such
awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
# get the length of the search string
BEGIN {
len = length(str);
}
{
# empty the output string
out = "";
# continue looping while the search string is in the line
while (i = index($0, str)) {
# append everything up to the search string, and the replacement string
out = out substr($0, 1, i-1) rep;
# remove everything up to and including the first instance of the
# search string from the line
$0 = substr($0, i + len);
}
# append whatever is left
out = out $0;
print out;
}
'
}
...used, in this context, as:
gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt
A more general example, without replacing the multi-character delimiter with a single character delimiter is given below :
Using parameter expansions : (from the comment of #gniourf_gniourf)
#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
array+=( "${s%%"$delimiter"*}" );
s=${s#*"$delimiter"};
done;
declare -p array
A more crude kind of way
#!/bin/bash
# main string
str="LearnABCtoABCSplitABCaABCString"
# delimiter string
delimiter="ABC"
#length of main string
strLen=${#str}
#length of delimiter string
dLen=${#delimiter}
#iterator for length of string
i=0
#length tracker for ongoing substring
wordLen=0
#starting position for ongoing substring
strP=0
array=()
while [ $i -lt $strLen ]; do
if [ $delimiter == ${str:$i:$dLen} ]; then
array+=(${str:strP:$wordLen})
strP=$(( i + dLen ))
wordLen=0
i=$(( i + dLen ))
fi
i=$(( i + 1 ))
wordLen=$(( wordLen + 1 ))
done
array+=(${str:strP:$wordLen})
declare -p array
Reference - Bash Tutorial - Bash Split String
With awk you can use the gsub to replace all regex matches.
As in your question, to replace all substrings of two or more 'm' chars with a new line, run:
echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, "\n" ); print; }'
e
bb
aaddsb
The ‘g’ in gsub() stands for “global,” which means replace everywhere.
You may also ask to print just N match, for example:
echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, " " ); print $2; }'
bb

Howto split a string on a multi-character delimiter in bash?

Why doesn't work the following bash code?
for i in $( echo "emmbbmmaaddsb" | split -t "mm" )
do
echo "$i"
done
expected output:
e
bb
aaddsb
The recommended tool for character subtitution is sed's command s/regexp/replacement/ for one regexp occurence or global s/regexp/replacement/g, you do not even need a loop or variables.
Pipe your echo output and try to substitute the characters mm witht the newline character \n:
echo "emmbbmmaaddsb" | sed 's/mm/\n/g'
The output is:
e
bb
aaddsb
Since you're expecting newlines, you can simply replace all instances of mm in your string with a newline. In pure native bash:
in='emmbbmmaaddsb'
sep='mm'
printf '%s\n' "${in//$sep/$'\n'}"
If you wanted to do such a replacement on a longer input stream, you might be better off using awk, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal shell function (backending into awk) given in BashFAQ #21 is applicable:
# Taken from http://mywiki.wooledge.org/BashFAQ/021
# usage: gsub_literal STR REP
# replaces all instances of STR with REP. reads from stdin and writes to stdout.
gsub_literal() {
# STR cannot be empty
[[ $1 ]] || return
# string manip needed to escape '\'s, so awk doesn't expand '\n' and such
awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
# get the length of the search string
BEGIN {
len = length(str);
}
{
# empty the output string
out = "";
# continue looping while the search string is in the line
while (i = index($0, str)) {
# append everything up to the search string, and the replacement string
out = out substr($0, 1, i-1) rep;
# remove everything up to and including the first instance of the
# search string from the line
$0 = substr($0, i + len);
}
# append whatever is left
out = out $0;
print out;
}
'
}
...used, in this context, as:
gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt
A more general example, without replacing the multi-character delimiter with a single character delimiter is given below :
Using parameter expansions : (from the comment of #gniourf_gniourf)
#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
array+=( "${s%%"$delimiter"*}" );
s=${s#*"$delimiter"};
done;
declare -p array
A more crude kind of way
#!/bin/bash
# main string
str="LearnABCtoABCSplitABCaABCString"
# delimiter string
delimiter="ABC"
#length of main string
strLen=${#str}
#length of delimiter string
dLen=${#delimiter}
#iterator for length of string
i=0
#length tracker for ongoing substring
wordLen=0
#starting position for ongoing substring
strP=0
array=()
while [ $i -lt $strLen ]; do
if [ $delimiter == ${str:$i:$dLen} ]; then
array+=(${str:strP:$wordLen})
strP=$(( i + dLen ))
wordLen=0
i=$(( i + dLen ))
fi
i=$(( i + 1 ))
wordLen=$(( wordLen + 1 ))
done
array+=(${str:strP:$wordLen})
declare -p array
Reference - Bash Tutorial - Bash Split String
With awk you can use the gsub to replace all regex matches.
As in your question, to replace all substrings of two or more 'm' chars with a new line, run:
echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, "\n" ); print; }'
e
bb
aaddsb
The ‘g’ in gsub() stands for “global,” which means replace everywhere.
You may also ask to print just N match, for example:
echo "emmbbmmaaddsb" | awk '{ gsub(/mm+/, " " ); print $2; }'
bb

Is there any csh alternative for printf %q of bash?

Bash's build in command printf supports the %q format string, which escapes the content of a variable for shell input.
I have tried some options::q only escaped space, and gnu printf does not support %q.
Currently, I use below code:
set valq = `echo $val:q | bash -c 'read q;printf %q "$q"'`
/path/to/executable $valq
I do not like csh script having dependency of bash. Is there any csh native solution for this?
Thanks.
Here is a test code for illustrating the problem I have met.
wrapper.csh
#!/bin/csh -f
set i = 1
set tst1 = ""
set tst2 = ""
while ( $i <= $#argv )
set arg = "$argv[$i]"
set tst1 = ($tst1:q $arg:q)
set arg2 = `echo $arg:q | bash -c 'read q;printf %q "$q"'`
set tst2 = "$tst2:q $arg2:q"
# i = $i + 1
end
echo "====case 1===="
./test.csh $tst1:q
./test.csh $tst1
./test.csh $tst2
echo "====case 2===="
csh -cf "./test.csh $tst1"
csh -cf "./test.csh $tst1:q"
csh -cf "./test.csh $tst2"
test.csh
#!/bin/csh -f
echo -n "TEST ARG:"
set i = 1
while ($i <= $#argv)
echo -n "${i}:$argv[$i] "
# i = $i + 1
end
echo
Test Results 1:
>./wrapper.csh "a ()" b c
====case 1====
TEST ARG:1:a () 2:b 3:c
TEST ARG:1:a 2:() 3:b 4:c
TEST ARG:1:a\ 2:\(\) 3:b 4:c
====case 2====
Badly placed ()'s.
Badly placed ()'s.
TEST ARG:1:a () 2:b 3:c
Test Results 2:
bash>./wrapper.csh "'\"a ()" b c csh>./wrapper.csh "'"'"'"a ( ) " b c
====case 1====
TEST ARG:1:'"a () 2:b 3:c
TEST ARG:1:'"a 2:() 3:b 4:c
TEST ARG:1:\'\"a\ 2:\(\) 3:b 4:c
====case 2====
Unmatched '.
Unmatched '.
TEST ARG:1:'"a () 2:b 3:c
Summary for the test:
If commands is directly called inside csh, then $val:q is the proper usage.
If commands is passed by arguments, then printf %q is the proper usage.
Just use /path/to/executable "$val".
Update
If variables are expanded within " (as in csh -cf "test.csh $tst1") and if special characters and multiple words are to be preserved, the words must indeed be quoted. But the special printf of bash isn't indispensable for this; we could do it e. g. with:
set tst1q=`printf " '%s'" $tst1:q`
csh -cf "test.csh $tst1q"
(the normal printf without %q).
Update
To allow both " and ', you can after you initially do
set s='s/[] "$&-*;<>?`|~[]/\\&/g'
replace bash -c 'read q;printf %q "$q"' with sed "$s" in wrapper.csh.
The regular expression
[] "$&-*;<>?`|~[]
is a bracket expression, a list of characters enclosed in []. It matches a single character which is to be prepended with a backslash by the replacement \\& (the special character & refers to the matched character). I didn't include the characters , and ^ (they are escaped by printf %q, but that's not needed in csh), while I included ~ (which isn't escaped by printf %q, but needs to be in csh - try wrapper.csh "~").

Replacing quotation marks with "``" and "''"

I have a document containing many " marks, but I want to convert it for use in TeX.
TeX uses 2 ` marks for the beginning quote mark, and 2 ' mark for the closing quote mark.
I only want to make changes to these when " appears on a single line in an even number (e.g. there are 2, 4, or 6 "'s on the line). For e.g.
"This line has 2 quotation marks."
--> ``This line has 2 quotation marks.''
"This line," said the spider, "Has 4 quotation marks."
--> ``This line,'' said the spider, ``Has 4 quotation marks.''
"This line," said the spider, must have a problem, because there are 3 quotation marks."
--> (unchanged)
My sentences never break across lines, so there is no need to check on multiple lines.
There are few quotes with single quotes, so I can manually change those.
How can I convert these?
This is my one-liner which is works for me:
awk -F\" '{if((NF-1)%2==0){res=$0;for(i=1;i<NF;i++){to="``";if(i%2==0){to="'\'\''"}res=gensub("\"", to, 1, res)};print res}else{print}}' input.txt >output.txt
And there is long version of this one-liner with comments:
{
FS="\"" # set field separator to double quote
if ((NF-1) % 2 == 0) { # if count of double quotes in line are even number
res = $0 # save original line to res variable
for (i = 1; i < NF; i++) { # for each double quote
to = "``" # replace current occurency of double quote by ``
if (i % 2 == 0) { # if its closes quote replace by ''
to = "''"
}
# replace " by to in res and save result to res
res = gensub("\"", to, 1, res)
}
print res # print resulted line
} else {
print # print original line when nothing to change
}
}
You may run this script by:
awk -f replace-quotes.awk input.txt >output.txt
Here's my one-liner using repeated sed's:
cat file.txt | sed -e 's/"\([^"]*\)"/`\1`/g' | sed '/"/s/`/\"/g' | sed -e 's/`\([^`]*\)`/``\1'\'''\''/g'
(note: it won't work correctly if there are already back-ticks (`) in the file but otherwise should do the trick)
EDIT:
Removed back-tick bug by simplifying, now works for all cases:
cat file.txt | sed -e 's/"\([^"]*\)"/``\1'\'\''/g' | sed '/"/s/``/"/g' | sed '/"/s/'\'\''/"/g'
With comments:
cat file.txt # read file
| sed -e 's/"\([^"]*\)"/``\1'\'\''/g' # initial replace
| sed '/"/s/``/"/g' # revert `` to " on lines with extra "
| sed '/"/s/'\'\''/"/g' # revert '' to " on lines with extra "
Using awk
awk '{n=gsub("\"","&")}!(n%2){while(n--){n%2?Q=q:Q="`";sub("\"",Q Q)}}1' q=\' in
Explanation
awk '{
n=gsub("\"","&") # set n to the number of quotes in the current line
}
!(n%2){ # if there are even number of quotes
while(n--){ # as long as we have double-quotes
n%2?Q=q:Q="`" # alternate Q between a backtick and single quote
sub("\"",Q Q) # replace the next double quote with two of whatever Q is
}
}1 # print out all other lines untouched'
q=\' in # set the q variable to a single quote and pass the file 'in' as input
Using sed
sed '/^\([^"]*"[^"]*"[^"]*\)*$/s/"\([^"]*\)"/``\1'\'\''/g' in
This might work for you:
sed 'h;s/"\([^"]*\)"/``\1''\'\''/g;/"/g' file
Explanation:
Make a copy of the original line h
Replace pairs of "'s s/"\([^"]*\)"/``\1''\'\''/g
Check for odd " and if found revert to original line /"/g

Resources