Running sed on string, benefit of using "echo" + "pipe" over "<<<" [duplicate] - bash

This question already has an answer here:
What's the difference between "here string" and echo + pipe
(1 answer)
Closed 6 years ago.
Commonly I see people manipulating strings using sed as follows:
echo "./asdf" | sed -n -e "s%./%%p"
I recently learned I can also do:
sed -n -e "s%./%%p" <<< "./asdf"
Is there a reason to avoid the latter?
For instance, is it bash-specific behaviour?

How should I trim ./ from the beginning of a path (or perform other simple string manipulations)?
Bash's built-in syntax for this is called parameter expansion. ${s#./} will expand $s with any leading ./ trimmed internal to the shell, with no subprocess or other overhead. BashFAQ #100 covers many additional string manipulation operations.
What are the differences between echo "$s" | ... and ... <<<"$s"?
Portability
As you've noted, <<< is not available in POSIX sh; this is a ksh extension also available in bash and zsh.
That said, if you need portability, the multiline equivalent is not far away:
... <<EOF
$s
EOF
Disk usage
As currently implemented by bash (and as an implementation detail subject to change), <<< creates a temporary file, populates, it, and redirects from it. If your TEMPDIR is not on an in-memory filesystem, this may be slower, or may generate churn.
Process overhead
A pipeline, as in echo foo | ..., creates a subshell -- it forks off a completely new process, responsible for running echo and then exiting. When you're running result=$(echo "$s" | ...), then that pipeline is itself in a subshell of your parent shell, and that shell has its output read by the parent.
Modern unixlikes go to significant effort to make fork()ing off a subprocess low-overhead to the extent possible, but even then it can add up when in an operation done in a loop -- and on platforms such as Cygwin it can be even more significant.
echo bugs
Last but not least -- <<<"$s" will represent any contents of the variable s precisely, with the exception that it can add a trailing newline. By contrast, echo has a great deal of leeway in its specified behavior: It can honor backslash expansions or not depending on compliance with the optional XSI extensions to the standard (and presence or lack of the widespread but entirely noncompliant extension of -e, and/or runtime flags that disable it); the ability to avoid addition of trailing newlines with -n is not guaranteed by the standard; &c. Even if you're using a pipeline, it's better to use printf:
# emit *exactly* the contents of "$s", with no newline added
printf '%s' "$s" | ...
# emit the contents of "$s", with an added trailing newline
printf '%s\n' "$s" | ...
# emit the contents of "$s", with '\t', '\n', '\b' &c replaced, and no added newline
printf '%b' "$s" | ...

Using sed at all is not desirable if it can be helped (see Charles Duffy's answer); put the string in a variable and let the shell do it with POSIX-compatible parameter expansion.
$ s="./asdf"
$ echo "${s#./}"
asdf

I think there are two things at play here.
<<< vs. pipelines
sed (or other external command) vs parameter expansion
If you can do something with expansion, it is very likely it will be much quicker, as it saves an external command being launched.
However, not everything can be done with expansion. So you may have to use an external command and use as input something you have in a variable. In this case, you will have to make your choice based on portability considerations. As for performance, if it matters, you should probably test in your context what performs best.

Related

How do I name bash script arguments based on which number they are [duplicate]

Seems that the recommended way of doing indirect variable setting in bash is to use eval:
var=x; val=foo
eval $var=$val
echo $x # --> foo
The problem is the usual one with eval:
var=x; val=1$'\n'pwd
eval $var=$val # bad output here
(and since it is recommended in many places, I wonder just how many scripts are vulnerable because of this...)
In any case, the obvious solution of using (escaped) quotes doesn't really work:
var=x; val=1\"$'\n'pwd\"
eval $var=\"$val\" # fail with the above
The thing is that bash has indirect variable reference baked in (with ${!foo}), but I don't see any such way to do indirect assignment -- is there any sane way to do this?
For the record, I did find a solution, but this is not something that I'd consider "sane"...:
eval "$var='"${val//\'/\'\"\'\"\'}"'"
A slightly better way, avoiding the possible security implications of using eval, is
declare "$var=$val"
Note that declare is a synonym for typeset in bash. The typeset command is more widely supported (ksh and zsh also use it):
typeset "$var=$val"
In modern versions of bash, one should use a nameref.
declare -n var=x
x=$val
It's safer than eval, but still not perfect.
Bash has an extension to printf that saves its result into a variable:
printf -v "${VARNAME}" '%s' "${VALUE}"
This prevents all possible escaping issues.
If you use an invalid identifier for $VARNAME, the command will fail and return status code 2:
$ printf -v ';;;' '%s' foobar; echo $?
bash: printf: `;;;': not a valid identifier
2
eval "$var=\$val"
The argument to eval should always be a single string enclosed in either single or double quotes. All code that deviates from this pattern has some unintended behavior in edge cases, such as file names with special characters.
When the argument to eval is expanded by the shell, the $var is replaced with the variable name, and the \$ is replaced with a simple dollar. The string that is evaluated therefore becomes:
varname=$value
This is exactly what you want.
Generally, all expressions of the form $varname should be enclosed in double quotes, to prevent accidental expansion of filename patterns like *.c.
There are only two places where the quotes may be omitted since they are defined to not expand pathnames and split fields: variable assignments and case. POSIX 2018 says:
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
This list of expansions is missing the parameter expansion and the field splitting. Sure, that's hard to see from reading this sentence alone, but that's the official definition.
Since this is a variable assignment, the quotes are not needed here. They don't hurt, though, so you could also write the original code as:
eval "$var=\"the value is \$val\""
Note that the second dollar is escaped using a backslash, to prevent it from being expanded in the first run. What happens is:
eval "$var=\"the value is \$val\""
The argument to the command eval is sent through parameter expansion and unescaping, resulting in:
varname="the value is $val"
This string is then evaluated as a variable assignment, which assigns the following value to the variable varname:
the value is value
The main point is that the recommended way to do this is:
eval "$var=\$val"
with the RHS done indirectly too. Since eval is used in the same
environment, it will have $val bound, so deferring it works, and since
now it's just a variable. Since the $val variable has a known name,
there are no issues with quoting, and it could have even been written as:
eval $var=\$val
But since it's better to always add quotes, the former is better, or
even this:
eval "$var=\"\$val\""
A better alternative in bash that was mentioned for the whole thing that
avoids eval completely (and is not as subtle as declare etc):
printf -v "$var" "%s" "$val"
Though this is not a direct answer what I originally asked...
Newer versions of bash support something called "parameter transformation", documented in a section of the same name in bash(1).
"${value#Q}" expands to a shell-quoted version of "${value}" that you can re-use as input.
Which means the following is a safe solution:
eval="${varname}=${value#Q}"
Just for completeness I also want to suggest the possible use of the bash built in read. I've also made corrections regarding -d'' based on socowi's comments.
But much care needs to be exercised when using read to ensure the input is sanitized (-d'' reads until null termination and printf "...\0" terminates the value with a null), and that read itself is executed in the main shell where the variable is needed and not a sub-shell (hence the < <( ... ) syntax).
var=x; val=foo0shouldnotterminateearly
read -d'' -r "$var" < <(printf "$val\0")
echo $x # --> foo0shouldnotterminateearly
echo ${!var} # --> foo0shouldnotterminateearly
I tested this with \n \t \r spaces and 0, etc it worked as expected on my version of bash.
The -r will avoid escaping \, so if you had the characters "\" and "n" in your value and not an actual newline, x will contain the two characters "\" and "n" also.
This method may not be aesthetically as pleasing as the eval or printf solution, and would be more useful if the value is coming in from a file or other input file descriptor
read -d'' -r "$var" < <( cat $file )
And here are some alternative suggestions for the < <() syntax
read -d'' -r "$var" <<< "$val"$'\0'
read -d'' -r "$var" < <(printf "$val") #Apparently I didn't even need the \0, the printf process ending was enough to trigger the read to finish.
read -d'' -r "$var" <<< $(printf "$val")
read -d'' -r "$var" <<< "$val"
read -d'' -r "$var" < <(printf "$val")
Yet another way to accomplish this, without eval, is to use "read":
INDIRECT=foo
read -d '' -r "${INDIRECT}" <<<"$(( 2 * 2 ))"
echo "${foo}" # outputs "4"

Bash/Shell: Why am I getting the wrong output for if-else statements? [duplicate]

I'm writing a shell script that should be somewhat secure, i.e., does not pass secure data through parameters of commands and preferably does not use temporary files. How can I pass a variable to the standard input of a command?
Or, if it's not possible, how can I correctly use temporary files for such a task?
Passing a value to standard input in Bash is as simple as:
your-command <<< "$your_variable"
Always make sure you put quotes around variable expressions!
Be cautious, that this will probably work only in bash and will not work in sh.
Simple, but error-prone: using echo
Something as simple as this will do the trick:
echo "$blah" | my_cmd
Do note that this may not work correctly if $blah contains -n, -e, -E etc; or if it contains backslashes (bash's copy of echo preserves literal backslashes in absence of -e by default, but will treat them as escape sequences and replace them with corresponding characters even without -e if optional XSI extensions are enabled).
More sophisticated approach: using printf
printf '%s\n' "$blah" | my_cmd
This does not have the disadvantages listed above: all possible C strings (strings not containing NULs) are printed unchanged.
(cat <<END
$passwd
END
) | command
The cat is not really needed, but it helps to structure the code better and allows you to use more commands in parentheses as input to your command.
Note that the 'echo "$var" | command operations mean that standard input is limited to the line(s) echoed. If you also want the terminal to be connected, then you'll need to be fancier:
{ echo "$var"; cat - ; } | command
( echo "$var"; cat - ) | command
This means that the first line(s) will be the contents of $var but the rest will come from cat reading its standard input. If the command does not do anything too fancy (try to turn on command line editing, or run like vim does) then it will be fine. Otherwise, you need to get really fancy - I think expect or one of its derivatives is likely to be appropriate.
The command line notations are practically identical - but the second semi-colon is necessary with the braces whereas it is not with parentheses.
This robust and portable way has already appeared in comments. It should be a standalone answer.
printf '%s' "$var" | my_cmd
or
printf '%s\n' "$var" | my_cmd
Notes:
It's better than echo, reasons are here: Why is printf better than echo?
printf "$var" is wrong. The first argument is format where various sequences like %s or \n are interpreted. To pass the variable right, it must not be interpreted as format.
Usually variables don't contain trailing newlines. The former command (with %s) passes the variable as it is. However tools that work with text may ignore or complain about an incomplete line (see Why should text files end with a newline?). So you may want the latter command (with %s\n) which appends a newline character to the content of the variable. Non-obvious facts:
Here string in Bash (<<<"$var" my_cmd) does append a newline.
Any method that appends a newline results in non-empty stdin of my_cmd, even if the variable is empty or undefined.
I liked Martin's answer, but it has some problems depending on what is in the variable. This
your-command <<< """$your_variable"""
is better if you variable contains " or !.
As per Martin's answer, there is a Bash feature called Here Strings (which itself is a variant of the more widely supported Here Documents feature):
3.6.7 Here Strings
A variant of here documents, the format is:
<<< word
The word is expanded and supplied to the command on its standard
input.
Note that Here Strings would appear to be Bash-only, so, for improved portability, you'd probably be better off with the original Here Documents feature, as per PoltoS's answer:
( cat <<EOF
$variable
EOF
) | cmd
Or, a simpler variant of the above:
(cmd <<EOF
$variable
EOF
)
You can omit ( and ), unless you want to have this redirected further into other commands.
Try this:
echo "$variable" | command
If you came here from a duplicate, you are probably a beginner who tried to do something like
"$variable" >file
or
"$variable" | wc -l
where you obviously meant something like
echo "$variable" >file
echo "$variable" | wc -l
(Real beginners also forget the quotes; usually use quotes unless you have a specific reason to omit them, at least until you understand quoting.)

removing backslash with tr

So Im removing special characters from filenames and replacing with spaces. I have all working apart from files with single backslashes contained therein.
Note these files are created in the Finder on OS X
old_name="testing\this\folder"
new_name=$(echo $old_name | tr '<>:\\#%|?*' ' ');
This results in new_name being "testing hisolder"
How can I just removed the backslashes and not the preceding character?
This results in new_name being "testing hisolder"
This string looks like the result of echo -e "testing\this\folder", because \t and \f are actually replaced with the tabulation and form feed control characters.
Maybe you have an alias like alias echo='echo -e', or maybe the implementation of echo in your version of the shell interprets backslash escapes:
POSIX does not require support for any options, and says that the
behavior of ‘echo’ is implementation-defined if any STRING contains a
backslash or if the first argument is ‘-n’. Portable programs can use
the ‘printf’ command if they need to omit trailing newlines or output
control characters or backslashes.
(from the info page)
So you should use printf instead of echo in new software. In particular, echo $old_name should be replaced with printf %s "$old_name".
There is a good explanation in this discussion, for instance.
No need for printf
As #mklement0 suggested, you can avoid the pipe by means of the Bash here string:
tr '<>:\\#%|?*' ' ' <<<"$old_name"
Ruslan's excellent answer explains why your command may not be working for you and offers a robust, portable solution.
tl;dr:
You probably ran your code with sh rather than bash (even though on macOS sh is Bash in disguise), or you had shell option xpg_echo explicitly turned on.
Use printf instead of echo for portability.
In Bash, with the default options and using the echo builtin, your command should work as-is (except that you should double-quote $old_name for robustness), because echo by default does not expand escape sequences such as \t in its operands.
However, Bash's echo can be made to expand control-character escape sequences:
explicitly, by executing shopt -s xpg_echo
implicitly, if you run Bash as sh or with the --posix option (which, among other options and behavior changes, activates xpg_echo)
Thus, your symptom may have been caused by running your code from a script with shebang line #!/bin/sh, for instance.
However, if you're targeting sh, i.e., if you're writing a portable script, then echo should be avoided altogether for the very reason that its behavior differs across shells and platforms - see Ruslan's printf solution.
As an aside: perhaps a more robust approach to your tr command is a whitelisting approach: stating only the characters that are explicitly allowed in your result, and excluding other with the -C option:
old_name='testing\this\folder'
new_name=$(printf '%s' "$old_name" | tr -C '[:alnum:]_-' ' ')
That way, any characters that aren't either letters, numbers, _, or - are replaced with a space.
With Bash, you can use parameter expansion:
$ old_name="testing\this\folder"
$ new_name=${old_name//[<>:\\#%|?*]/ }
$ echo $new_name
testing this folder
For more, please refer to the Bash manual on shell parameter expansion.
I think your test case is missing proper escaping for \, so you're not really testing the case of a backslash contained in a string.
This worked for me:
old_name='testing\\this\\folder'
new_name=$(echo $old_name | tr '<>:\\#%|?*' ' ');
echo $new_name
# testing this folder

Correctly allow word splitting of command substitution in bash

I write, maintain and use a healthy amount of bash scripts. I would consider myself a bash hacker and strive to someday be a bash ninja ( need to learn more awk first ). One of the most important feature/frustrations of bash to understand is how quotes, and subsequent parameter expansion, work. This is well documented, and for a good reason, many pitfalls, bugs and newbie-traps exist in the mysterious world of quoted parameter expansion and word splitting. For this reason, the advice is to "Double quote everything," but what if I want word splitting to occur?
In multiple style guides I can not find an example of safe and proper use of word splitting after command substitution.
What is the correct way to use unquoted command substitution?
Example:
I don't need help getting this command working, but it seems to be a violation of established patterns, if you would like to give feedback on this command, please keep it in comments
docker stats $(docker ps | awk '{print $NF}' | grep -v NAMES)
The command substitute returns output such as:
container-1 container-3 excitable-newton
This one-liner uses the command substitution to spit out the names of each of my running docker containers and the feeds them, with word splitting, as separate inputs to the docker stats command, which takes an arbitrary length list of container names and gives back some info about them.
If I used:
docker stats "$(docker ps | awk '{print $NF}' | grep -v NAMES)"
There would be one string of newline separated container names passed to docker stats.
This seems like a perfect example of when I would want word splitting, but shellcheck disagrees, is this somehow unsafe? Is there an established pattern for using word-splitting after expansion or substitution?
The safe way to capture output from one command and pass it to another is to temporarily capture the output in an array. This allows splitting on arbitrary delimiters and prevents unintentional splitting or globbing while capturing output as more than one string to be passed on to another command.
If you want to read a space-separated string into an array, use read -a:
read -r -a names < <(docker ps | awk '{print $NF}' | grep -v NAMES)
printf 'Found name: %s\n' "${names[#]}"
Unlike the unquoted-expansion approach, this doesn't expand globs. Thus, foo[bar] can't be replaced with a filesystem entry named foob, or with an empty string if no such filesystem entry exists and the nullglob shell option is set. (Likewise, * will no longer be replaced with a list of files in the current directory).
To go into detail regarding behavior: read -r -a reads up to a delimiter passed as the first character of the option argument following -d (if given), or a NUL if that option argument is 0 bytes, and splits the results into fields based on characters within IFS -- a set which, by default, contains the newline, the tab, and the space; it then assigns those split results to an array.
This behavior does not meaningfully vary based on shell-local configuration, except for IFS, which can be modified scoped to the single command.
mapfile -t and readarray -t are similarly consistent in behavior, and likewise recommended if portability constraints do not prevent their use.
By contrast, array=( $string ) is much more dependent on the shell's configuration and settings, and will behave badly if the shell's configuration is left at defaults:
When using array=( $string ), if set -f is not set, each word created by splitting $string is evaluated as a glob, with further variances based in behavior depending on the shopt settings nullglob (which would cause a pattern which didn't expand to any contents to result in an empty set, rather than the default of expanding to the glob expression itself), failglob (which would cause a pattern which didn't expand to any contents to result in a failure), extglob, dotglob and others.
When using array=( $string ), the value of IFS used for the split operation cannot be easily and reliably altered in a manner scoped to this single operation. By contrast, one can run IFS=: read to force read to split only on :s without modifying the value of IFS outside the scope of that single value; no equivalent for array=( $string ) exists without storing and re-setting IFS (which is an error-prone operation; some common idioms [such as assignment to oIFS or a similar variable name] operate contrary to intent in common scenarios, such as failing to reproduce an unset or empty IFS at the end of the block to which the temporary modification is intended to apply).
Thanks to #I'L'I's pointing to an example of a valid exception to the "Quote Everything" rule, my code does appear to be a exception to the rule.
In my particular use case, using docker container names, the risk of accidental globbing or expansion is low due to the constraints on container names. However #Charles Duffy provided a surefire and safe way to go about word splitting one command output before feeding it into the next command by reading the first output into an array using bash built-in read ( I found readarray better suited my case ).
readarray -t names < <(docker ps | awk '{print $NF}' | grep -v NAMES)
docker stats "${names[#]}"
This pattern allows for the output from the first command to be fed to the second command as properly split, separate arguments while avoiding unwanted globbing or splitting. Unfortunately my slick one-liner will perish in favor of safety.

Getting quoted-dollar-at ( "$#" ) behaviour for other variable expansion?

The shell has a great feature, where it'll preserve argument quoting across variable expansion when you use "$#", such that the script:
for f in "$#"; do echo "$f"; done
when invoked with arguments:
"with spaces" '$and $(metachars)'
will print, literally:
with spaces
$and $(metachars)
This isn't the normal behaviour of expansion of a quoted string, it seems to be a special case for "$#".
Is there any way to get this behaviour for other variables? In the specific case I'm interested in, I want to safely expand $SSH_ORIGINAL_COMMAND in a command= specifier in a restricted public key entry, without having to worry about spaces in arguments, metacharacters, etc.
"$SSH_ORIGINAL_COMMAND" expands like "$*" would, i.e. a naïve expansion that doesn't add any quoting around separate arguments.
Is the information required for "$#" style expansion simply not available to the shell in this case, by the time it gets the env var SSH_ORIGINAL_COMMAND? So I'd instead need to convince sshd to quote the arguments?
The answer to this question is making me wonder if it's possible at all.
You can get similar "quoted dollar-at" behavior for arbitrary arrays using "${YOUR_ARRAY_HERE[#]}" syntax for bash arrays. Of course, that's no complete answer, because you still have to break the string into multiple array elements according to the quotes.
One thought was to use bash -x, which renders expanded output, but only if you actually run the command; it doesn't work with -n, which prevents you from actually executing the commands in question. Likewise you could use eval or bash -c along with set -- to manage the quote removal, performing expansion on the outer shell and quote removal on the inner shell, but that would be extremely hard to bulletproof against executing arbitrary code.
As an end run, use xargs instead. xargs handles single and double quotes. This is a very imperfect solution, because xargs treats backslash-escaped characters very differently than bash does and fails entirely to handle semicolons and so forth, but if your input is relatively predictable it gets you most of the way there without forcing you to write a full shell parser.
SSH_ORIGINAL_COMMAND='foo "bar baz" $quux'
# Build out the parsed array.
# Bash 4 users may be able to do this with readarray or mapfile instead.
# You may also choose to null-terminate if newlines matter.
COMMAND_ARRAY=()
while read line; do
COMMAND_ARRAY+=("$line")
done < <(xargs -n 1 <<< "$SSH_ORIGINAL_COMMAND")
# Demonstrate working with the array.
N=0
for arg in "${COMMAND_ARRAY[#]}"; do
echo "COMMAND_ARRAY[$N]: $arg"
((N++))
done
Output:
COMMAND_ARRAY[0]: foo
COMMAND_ARRAY[1]: bar baz
COMMAND_ARRAY[2]: $quux

Resources