printf, ignoring excess arguments? - bash

I noticed today Bash printf has a -v option
-v var assign the output to shell variable VAR rather than
display it on the standard output
If I invoke like this it works
$ printf -v var "Hello world"
$ printf "$var"
Hello world
Coming from a pipe it does not work
$ grep "Hello world" test.txt | xargs printf -v var
-vprintf: warning: ignoring excess arguments, starting with `var'
$ grep "Hello world" test.txt | xargs printf -v var "%s"
-vprintf: warning: ignoring excess arguments, starting with `var'

xargs will invoke /usr/bin/printf (or wherever that binary is installed on your system). It will not invoke bash's builtin function. And only a builtin (or sourcing a script or similar) can modify the shell's environment.
Even if it could call bash's builtin, the xargs in your example runs in a subsell. The subshell cannot modify it's parent's environment anyway. So what you're trying cannot work.
A few options I see if I understand your sample correctly; sample data:
$ cat input
abc other stuff
def ignored
cba more stuff
Simple variable (a bit tricky depending on what exactly you want):
$ var=$(grep a input)
$ echo $var
abc other stuff cba more stuff
$ echo "$var"
abc other stuff
cba more stuff
With an array if you want individual words in the arrays:
$ var=($(grep a input))
$ echo "${var[0]}"-"${var[1]}"
abc-other
Or if you want the whole lines in each array element:
$ IFS=$'\n' var=($(grep a input)) ; unset IFS
$ echo "${var[0]}"-"${var[1]}"
abc other stuff-cba more stuff

There are two printf's - one is a shell bultin and this is invoked if you just run printf and the other is a regular binary, usually /usr/bin/printf. The latter doesn't take a -v argument, hence the error message. Since printf is an argument to xargs here, the binary is run, not the shell bulitin. Additionally, since it's at the receiving end of a pipeline, it is run as a subprocess. Variables can only be inherited from parent to child process but not the other way around, so even if the printf binary could modify the environment, the change wouldn't be visible to the parent process. So there are two reasons why your command cannot work. But you can always do var=$(something | bash -c 'some operation using builtin printf').

Mat gives an excellent explanation of what's going on and why.
If you want to iterate over the output of a command and set a variable to successive values using Bash's sprintf-style printf feature (-v), you can do it like this:
grep "Hello world" test.txt | xargs bash -c 'printf -v var "%-25s" "$#"; do_something_with_formatted "$var"' _ {} \;

Related

Bash get the command that is piping into a script

Take the following example:
ls -l | grep -i readme | ./myscript.sh
What I am trying to do is get ls -l | grep -i readme as a string variable in myscript.sh. So essentially I am trying to get the whole command before the last pipe to use inside myscript.sh.
Is this possible?
No, it's not possible.
At the OS level, pipelines are implemented with the mkfifo(), dup2(), fork() and execve() syscalls. This doesn't provide a way to tell a program what the commands connected to its stdin are. Indeed, there's not guaranteed to be a string representing a pipeline of programs being used to generate stdin at all, even if your stdin really is a FIFO connected to another program's stdout; it could be that that pipeline was generated by programs calling execve() and friends directly.
The best available workaround is to invert your process flow.
It's not what you asked for, but it's what you can get.
#!/usr/bin/env bash
printf -v cmd_str '%q ' "$#" # generate a shell command representing our arguments
while IFS= read -r line; do
printf 'Output from %s: %s\n' "$cmd_str" "$line"
done < <("$#") # actually run those arguments as a command, and read from it
...and then have your script start the things it reads input from, rather than receiving them on stdin.
...thereafter, ./yourscript ls -l, or ./yourscript sh -c 'ls -l | grep -i readme'. (Of course, never use this except as an example; see ParsingLs).
It can't be done generally, but using the history command in bash it can maybe sort of be done, provided certain conditions are met:
history has to be turned on.
Only one shell has been running, or accepting new commands, (or failing that, running myscript.sh), since the start of myscript.sh.
Since command lines with leading spaces are, by default, not saved to the history, the invoking command for myscript.sh must have no leading spaces; or that default must be changed -- see Get bash history to remember only the commands run with space prefixed.
The invoking command needs to end with a &, because without it the new command line wouldn't be added to the history until after myscript.sh was completed.
The script needs to be a bash script, (it won't work with /bin/dash), and the calling shell needs a little prep work. Sometime before the script is run first do:
shopt -s histappend
PROMPT_COMMAND="history -a; history -n"
...this makes the bash history heritable. (Code swiped from unutbu's answer to a related question.)
Then myscript.sh might go:
#!/bin/bash
history -w
printf 'calling command was: %s\n' \
"$(history | rev |
grep "$0" ~/.bash_history | tail -1)"
Test run:
echo googa | ./myscript.sh &
Output, (minus the "&" associated cruft):
calling command was: echo googa | ./myscript.sh &
The cruft can be halved by changing "&" to "& fg", but the resulting output won't include the "fg" suffix.
I think you should pass it as one string parameter like this
./myscript.sh "$(ls -l | grep -i readme)"
I think that it is possible, have a look at this example:
#!/bin/bash
result=""
while read line; do
result=$result"${line}"
done
echo $result
Now run this script using a pipe, for example:
ls -l /etc | ./script.sh
I hope that will be helpful for you :)

grep output different in bash script

I am creating a bash script that will simply use grep to look through a bunch of logs for a certain string.
Something interesting happens though.
For the purpose of testing all of the log files the files are named test1.log, test2.log, test3.log, etc.
When using the grep command:
grep -oHnR TEST Logs/test*
The output contains all instances from all files in the folder as expected.
But when using a command but contained in the bash script below:
#!/bin/bash
#start
grep -oHnR $1 $2
#end
The output displays the instances from only 1 file.
When running the script I am using the following command:
bash test.bash TEST Logs/test*
Here is an example of the expected output (what occurs when simply using grep):
Logs/test2.log:8:TEST
Logs/test2.log:20:TEST
Logs/test2.log:41:TEST
Logs/test.log:2:TEST
Logs/test.log:18:TEST
and here is an example of the output received when using the bash script:
Logs/test2.log:8:TEST
Logs/test2.log:20:TEST
Logs/test2.log:41:TEST
Can someone explain to me why this happens?
When you call the line
bash test.bash TEST Logs/test*
this will be translated by the shell to
bash test.bash TEST Logs/test1.log Logs/test2.log Logs/test3.log Logs/test4.log
(if you have four log files).
The command line parameters TEST, Logs/test1.log, Logs/test2.log, etc. will be given the names $1, $2, $3, etc.; $1 will be TEST, $2 will be Logs/test1.log.
You just ignore the remaining parameters and use just one log file when you use $2 only.
A correct version would be this:
#!/bin/bash
#start
grep -oHnR "$#"
#end
This will pass all the parameters properly and also take care of nastinesses like spaces in file names (your version would have had trouble with these).
To understand what's happening, you can use a simpler script:
#!/bin/bash
echo $1
echo $2
That outputs the first two arguments, as you asked for.
You want to use the first argument, and then use all the rest as input files. So use shift like this:
#!/bin/bash
search=$1
shift
echo "$1"
echo "$#"
Notice also the use of double quotes.
In your case, because you want the search string and the filenames to be passed to grep in the same order, you don't even need to shift:
#!/bin/bash
grep -oHnR -e "$#"
(I added the -e in case the search string begins with -)
The unquoted * is being affected by globbing when you are calling the script.
Using set -x to output what is running from the script makes this more clear.
$ ./greptest.sh TEST test*
++ grep -oHnR TEST test1.log
$ ./greptest.sh TEST "test*"
++ grep -oHnR TEST test1.log test2.log test3.log
In the first case, bash is expanding the * into the list of file names versus the second case it is being passed to grep. In the first case you actually have >2 args (as each filename expanded would become an arg) - adding echo $# to the script shows this too:
$ ./greptest.sh TEST test*
++ grep -oHnR TEST test1.log
++ echo 4
4
$ ./greptest.sh TEST "test*"
++ grep -oHnR TEST test1.log test2.log test3.log
++ echo 2
2
You probably want to escape the wildcard on your bash invocation:
bash test.bash TEST Logs/test\*
That way it'll get passed through to grep as an *, otherwise the shell will have expanded it to every file in the Logs dir whose name starts with test.
Alternatively, change your script to allow more than one file on the command line:
#!/bin/bash
hold=$1
shift
grep -oHnR $hold $#

Either getting original return value from xargs or simulate xargs

I am working with bash. I have a file F containing the command-line arguments for a Java program, and I need to store both outputs of the Java programs, i.e., output on standard output and the exit value. Storing the standard output works via
cat F | xargs java program > Output
But xargs does not give access to the exit-code of the Java program.
So well, I split it, running the program twice, once for standard output, once for the exit code --- but getting the exit code and running it correctly seems impossible. One might try
java program $(cat F)
but that doesn't work if F contains for example " ", namely one command-line argument for program which is a space. The problem is the expansion of the argument $(cat F).
Now I don't see a way to get around that problem? I don't want "$(cat F)", since I want that $(cat F) expands into many strings --- but I don't want further expansion of these strings.
If on the other hand there would be a better xargs, giving access to the original exit value, that would solve the problem, but I am not aware of that.
Does this do what you want?
cat F | xargs bash -c 'java program "$#"; echo "Returned: $?"' - > Output
Or, as #rici correctly points out, avoid the UUOC
xargs bash -c 'java program "$#"; echo "Returned: $?"' - < F > Output
Alternatively something like (though I haven't thought through all the ramifications of doing this so there may be a reason this is a bad idea).
{ sed -e 's/^/java program /' F | bash -s; echo "Returned $?"; } > Output
This lets you store the return code in a variable the xargs versions do not (at least not outside the xargs-spawned shell.
sed -e 's/^/java program /' F | bash -s > Output; ret=$?
To use a ${program} shell variable just expand it directly.
xargs bash -c 'java '"${program}"' "$#"; echo "Returned: $?"' - < F > Output
sed -e 's/^/java '"${program}"' /' F | bash -s > Output; ret=$?
Just beware of characters that are "magic" in the replacement of the s/// command.
I'm afraid the question is really not very clear, so I will make the assumptions here explicit:
The file F has one argument per line, with all whitespace other then newline characters being significant, and with no need to replace backslash escapes such as \t.
You only need to invoke the java program once, with all of the arguments.
You need the exit status to be preserved.
This can be done quite easily in bash by reading F into an array with mapfile:
# Invoked as: do_the_work program < F > output
do_the_work() {
local -a args
mapfile -t args
java "$#" "${args[#]}"
}
The status return of that function is precisely the status return of the java executable, so you could capture it immediately after the call:
do_the_work my_program
rc=$?
For convenience, the function allows you to also specify arguments on the command line; it uses "$#" to pass all the command-line arguments before passing the arguments read from stdin.
If you have GNU Parallel (and do not mind extra output on STDERR):
cat F | parallel -Xj1 --halt 1 java program > Output
echo $?

Why does cat exit a shell script, but only when it's fed by a pipe?

Why does cat exit a shell script, but only when it's fed by a pipe?
Case in point, take this shell script called "foobar.sh":
#! /bin/sh
echo $#
echo $#
cat $1
sed -e 's|foo|bar|g' $1
And a text file called "foo.txt" which contains only one line:
foo
Now if I type ./foobar.sh foo.txt on the command line, then I'll get this expected output:
1
foo.txt
foo
bar
However if I type cat foo.txt | ./foobar.sh then surprisingly I only get this output:
0
foo
I don't understand. If the number of arguments reported by $# is zero, then how can cat $1 still return foo? And, that being the case, why doesn't sed -e 's|foo|bar|g' $1 return anything since clearly $1 is foo?
This seems an awful lot like a bug, but I'm assuming it's magic instead. Please explain!
UPDATE
Based on the given answer, the following script gives the expected output, assuming a one-line foo.txt:
#! /bin/sh
if [ $# ]
then
yay=$(cat $1)
else
read yay
fi
echo $yay | cat
echo $yay | sed -e 's|foo|bar|g'
No, $1 is not "foo". $1 is
ie, undefined/nothing.
Unlike a programming language, variables in the shell are quite dumbly and literally replaced, and the resulting commands textually executed (well, sorta kinda). In this case, "cat $1" becomes just "cat ", which will take input from stdin. That's terribly convenient to your execution since you've kindly provided "foo" on stdin via your pipe!
See what's happening?
sed likewise will read from stdin, but is already on end of stream, so exits.
When you don't give an argument to cat, it reads from stdin. When $1 isn't given the cat $1 is the same as a simple cat, which reads the text you piped in (cat foo.txt).
Then the sed command runs, and same as cat, it reads from stdin because it has no filename argument. cat has already consumed all of stdin. There's nothing left to read, so sed quits without printing anything.

Passing multiple arguments to a UNIX shell script

I have the following (bash) shell script, that I would ideally use to kill multiple processes by name.
#!/bin/bash
kill `ps -A | grep $* | awk '{ print $1 }'`
However, while this script works is one argument is passed:
end chrome
(the name of the script is end)
it does not work if more than one argument is passed:
$end chrome firefox
grep: firefox: No such file or directory
What is going on here?
I thought the $* passes multiple arguments to the shell script in sequence. I'm not mistyping anything in my input - and the programs I want to kill (chrome and firefox) are open.
Any help is appreciated.
Remember what grep does with multiple arguments - the first is the word to search for, and the remainder are the files to scan.
Also remember that $*, "$*", and $# all lose track of white space in arguments, whereas the magical "$#" notation does not.
So, to deal with your case, you're going to need to modify the way you invoke grep. You either need to use grep -F (aka fgrep) with options for each argument, or you need to use grep -E (aka egrep) with alternation. In part, it depends on whether you might have to deal with arguments that themselves contain pipe symbols.
It is surprisingly tricky to do this reliably with a single invocation of grep; you might well be best off tolerating the overhead of running the pipeline multiple times:
for process in "$#"
do
kill $(ps -A | grep -w "$process" | awk '{print $1}')
done
If the overhead of running ps multiple times like that is too painful (it hurts me to write it - but I've not measured the cost), then you probably do something like:
case $# in
(0) echo "Usage: $(basename $0 .sh) procname [...]" >&2; exit 1;;
(1) kill $(ps -A | grep -w "$1" | awk '{print $1}');;
(*) tmp=${TMPDIR:-/tmp}/end.$$
trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15
ps -A > $tmp.1
for process in "$#"
do
grep "$process" $tmp.1
done |
awk '{print $1}' |
sort -u |
xargs kill
rm -f $tmp.1
trap 0
;;
esac
The use of plain xargs is OK because it is dealing with a list of process IDs, and process IDs do not contain spaces or newlines. This keeps the simple code for the simple case; the complex case uses a temporary file to hold the output of ps and then scans it once per process name in the command line. The sort -u ensures that if some process happens to match all your keywords (for example, grep -E '(firefox|chrome)' would match both), only one signal is sent.
The trap lines etc ensure that the temporary file is cleaned up unless someone is excessively brutal to the command (the signals caught are HUP, INT, QUIT, PIPE and TERM, aka 1, 2, 3, 13 and 15; the zero catches the shell exiting for any reason). Any time a script creates a temporary file, you should have similar trapping around the use of the file so that it will be cleaned up if the process is terminated.
If you're feeling cautious and you have GNU Grep, you might add the -w option so that the names provided on the command line only match whole words.
All the above will work with almost any shell in the Bourne/Korn/POSIX/Bash family (you'd need to use backticks with strict Bourne shell in place of $(...), and the leading parenthesis on the conditions in the case are also not allowed with Bourne shell). However, you can use an array to get things handled right.
n=0
unset args # Force args to be an empty array (it could be an env var on entry)
for i in "$#"
do
args[$((n++))]="-e"
args[$((n++))]="$i"
done
kill $(ps -A | fgrep "${args[#]}" | awk '{print $1}')
This carefully preserves spacing in the arguments and uses exact matches for the process names. It avoids temporary files. The code shown doesn't validate for zero arguments; that would have to be done beforehand. Or you could add a line args[0]='/collywobbles/' or something similar to provide a default - non-existent - command to search for.
To answer your question, what's going on is that $* expands to a parameter list, and so the second and later words look like files to grep(1).
To process them in sequence, you have to do something like:
for i in $*; do
echo $i
done
Usually, "$#" (with the quotes) is used in place of $* in cases like this.
See man sh, and check out killall(1), pkill(1), and pgrep(1) as well.
Look into pkill(1) instead, or killall(1) as #khachik comments.
$* should be rarely used. I would generally recommend "$#". Shell argument parsing is relatively complex and easy to get wrong. Usually the way you get it wrong is to end up having things evaluated that shouldn't be.
For example, if you typed this:
end '`rm foo`'
you would discover that if you had a file named 'foo' you don't anymore.
Here is a script that will do what you are asking to have done. It fails if any of the arguments contain '\n' or '\0' characters:
#!/bin/sh
kill $(ps -A | fgrep -e "$(for arg in "$#"; do echo "$arg"; done)" | awk '{ print $1; }')
I vastly prefer $(...) syntax for doing what backtick does. It's much clearer, and it's also less ambiguous when you nest things.

Resources