Using xargs to assign stdin to a variable - bash

All that I really want to do is make sure everything in a pipeline succeeded and assign the last stdin to a variable. Consider the following dumbed down scenario:
x=`exit 1|cat`
When I run declare -a, I see this:
declare -a PIPESTATUS='([0]="0")'
I need some way to notice the exit 1, so I converted it to this:
exit 1|cat|xargs -I {} x={}
And declare -a gave me:
declare -a PIPESTATUS='([0]="1" [1]="0" [2]="0")'
That is what I wanted, so I tried to see what would happen if the exit 1 didn't happen:
echo 1|cat|xargs -I {} x={}
But it fails with:
xargs: x={}: No such file or directory
Is there any way to have xargs assign {} to x? What about other methods of having PIPESTATUS work and assigning the stdin to a variable?
Note: these examples are dumbed down. I'm not really doing an exit 1, echo 1 or a cat, but used these commands to simplify so we can focus on my particular issue.

When you use backticks (or the preferred $()) you're running those commands in a subshell. The PIPESTATUS you're getting is for the assignment rather than the piped commands in the subshell.
When you use xargs, it knows nothing about the shell so it can't make variable assignments.
Try set -o pipefail then you can get the status from $?.

xargs is run in a child process, as are all the commands you call. So they can't effect the environment of your shell.
You might be able to do something with named pipes (mkfifo), or possible bash's read function?
EDIT:
Maybe just redirect the output to a file, then you can use PIPESTATUS:
command1 | command2 | command3 >/tmp/tmpfile
## Examine PIPESTATUS
X=$(cat /tmp/tmpfile)

How about ...
read x <<<"$(echo 1)"
read x < <(echo 1)
echo "$x"

Why not just populate a new array?
IFS=$'\n' read -r -d '' -a result < <(echo a | cat | cat; echo "PIPESTATUS='${PIPESTATUS[*]}'" )
IFS=$'\n' read -r -d '' -a result < <(echo a | exit 1 | cat; echo "PIPESTATUS='${PIPESTATUS[*]}'" )
echo "${#result[#]}"
echo "${result[#]}"
echo "${result[0]}"
echo "${result[1]}"

There are already a few helpful solutions. It turns out that I actually had an example that matches the question as framed above; close-enough anyway.
Consider this:
XX=$(ls -l *.cpp | wc -l | xargs -I{} echo {})
echo $XX
3
Meaning that I had 3 x .cpp files to in my working directory. Now $XX is 3 and I can make use of that result in my script. It is contrived, because I don't actually need the xargs in this example. It works though.
In the example from the question ...
x=`exit 1|cat`
I don't think that will give you what was specified. exit will quit the sub-shell before the cat gets a mention. Also on that note,
I might start with something like
declare -a PIPESTATUS='([0]="0")'
x=$?
x now has the status from the last command.

Assign each line of input to an array, e.g. all python files in a directory
declare -a pyFiles=($(ls -l *.py | awk '{print $9}'))
where $9 is the nineth field in ls -l corresponding to the filename

Related

xargs output buffering -P parallel

I have a bash function that i call in parallel using xargs -P like so
echo ${list} | xargs -n 1 -P 24 -I# bash -l -c 'myAwesomeShellFunction #'
Everything works fine but output is messed up for obvious reasons (no buffering)
Trying to figure out a way to buffer output effectively. I was thinking I could use awk, but I'm not good enough to write such a script and I can't find anything worthwhile on google? Can someone help me write this "output buffer" in sed or awk? Nothing fancy, just accumulate output and spit it out after process terminates. I don't care the order that shell functions execute, just need their output buffered... Something like:
echo ${list} | xargs -n 1 -P 24 -I# bash -l -c 'myAwesomeShellFunction # | sed -u ""'
P.s. I tried to use stdbuf as per
https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe but did not work, i specified buffering on o and e but output still unbuffered:
echo ${list} | xargs -n 1 -P 24 -I# stdbuf -i0 -oL -eL bash -l -c 'myAwesomeShellFunction #'
Here's my first attempt, this only captures first line of output:
$ bash -c "echo stuff;sleep 3; echo more stuff" | awk '{while (( getline line) > 0 )print "got ",$line;}'
$ got stuff
This isn't quite atomic if your output is longer than a page (4kb typically), but for most cases it'll do:
xargs -P 24 bash -c 'for arg; do printf "%s\n" "$(myAwesomeShellFunction "$arg")"; done' _
The magic here is the command substitution: $(...) creates a subshell (a fork()ed-off copy of your shell), runs the code ... in it, and then reads that in to be substituted into the relevant position in the outer script.
Note that we don't need -n 1 (if you're dealing with a large number of arguments -- for a small number it may improve parallelization), since we're iterating over as many arguments as each of your 24 parallel bash instances is passed.
If you want to make it truly atomic, you can do that with a lockfile:
# generate a lockfile, arrange for it to be deleted when this shell exits
lockfile=$(mktemp -t lock.XXXXXX); export lockfile
trap 'rm -f "$lockfile"' 0
xargs -P 24 bash -c '
for arg; do
{
output=$(myAwesomeShellFunction "$arg")
flock -x 99
printf "%s\n" "$output"
} 99>"$lockfile"
done
' _

xargs not working with built in shell functions

I am trying to speed up the processing of a database. I migrated towards xargs. But I'm seriously stuck. Piping a list of arguments to xargs does not work if the command invoked by xargs isn't a built in. I can't figure out why. Here is my code:
#!/bin/bash
list='foo
bar'
test(){
echo "$1"
}
echo "$list" | tr '\012' '\000' | xargs -0 -n1 -I '{}' 'test' {}
So there is no output at all. And test function never gets executed. But if I replace "test" in the "xargs" command with "echo" or "printf" it works fine.
You can't pass a shell function to xargs directly, but you can invoke a shell.
printf 'foo\0bar\0' |
xargs -r -0 sh -c 'for f; do echo "$f"; done' _
The stuff inside sh -c '...' can be arbitrarily complex; if you really wanted to, you could declare and then use your function. But since it's simple and nonrecursive, I just inlined the functionality.
The dummy underscore parameter is because the first argument after sh -c 'script' is used to populate $0.
Because your question seems to be about optimization, I imagine you don't want to spawn a separate shell for every item passed to xargs -- if you did, nothing would get faster. So I put in the for loop and took out the -I etc arguments to xargs.
xargs takes an executable as an argument (including custom scripts) rather than a function defined in the environment.
Either move your code to a script or use xargs to pass arguments to an external command.
Change from:
echo "$list" | tr '\012' '\000' | xargs -0 -n1 -I '{}' 'test' {}
To:
export -f test
echo "$list" | tr '\012' '\000' | xargs -0 -n1 -I '{}' sh -c 'test {}'
I've seen a solution from 'jac' on the bbs.archlinux.org web-site that uses a primary and secondary (slave) pair of scripts that are very efficients. Instead of an internal 'function' that normally would accept a single $1 parameter, the primary sends a list of parameters to its secondary where a while-loop handles each member of the list as consecutive $1 values. Here's a sample pair I'm using to apply the 'file' command to a bunch of executables, which in my case all begin with "em" in the filename. Make changes as necessary:
#!/bin/bash
# primary: showfil
ls -l em* | grep '^-rwx' | awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print $0}' | xargs -I% ~/showfilf "%"
~/showfilf fixmstr spisort trc
exit 0
#!/bin/bash
# secondary: showfilf
myarch=$(uname -s | grep 'arwin')
while [[ -n "$1" ]]; do
if [ -x "$1" ]; then
if [ -n "$myarch" ]; then
file "./$1"
else
myfile=$(file "./$1" | awk '{print $1" "$3" "$10" "$11" "$12}')
myfile=${myfile%(uses}
myfile=${myfile%for}
echo "$myfile"
fi
fi
shift
done
exit 0
This code works on Darwin (Mac) and Linux, and probably other systems. The 'grep' in the primary retains only executable files, not directories or symlinks. The 'awk' eliminates the first eight fields of 'ls' and retains just the filename,which is passed to 'xargs', which builds a list of quoted filenames to send to 'showfilf'. There's a separate invocation of 'showfilf' with three other filenames in the list. 'showfilf' has a while-loop which processes the list. Note that there is system-dependent code here, determined by 'uname -s' and 'grep'. Lastly, make these scripts executable, and place them on your $PATH, such as $HOME. If your $PATH doesn't include your $HOME, I recommend you modify it in your .bashrc or .bash_login something like this: export PATH=$PATH:$HOME

Access $? Variable with a piped statement?

I have some code that I would like to have the $? variable of.
VARIABLE=`grep "searched_string" test.log | sed 's/searched/found/'`
Is there any way to test if this entire line (rather than just the sed command) was completed successfully? If I try the following code right after it:
if [ "$?" -ne 0 ]
then
echo 1
exit
fi
it doesn't run even if the grep part of the statement fails.
Could someone show how to resolve this issue?
Use the
echo ${PIPESTATUS[#]}
will print out the array of exit-statuses of all commands.
$ ls | grep . | wc -l
28
$ echo ${PIPESTATUS[#]}
0 0 0
but
$ ls | grep nonexistentfilename | wc -l
0
$ echo ${PIPESTATUS[#]}
0 1 0 #the grep returns 1 - pattern not found
or
$ ls nonexistentfilename | grep somegibberish | wc -l
ls: nonexistentfilename: No such file or directory
0
$ echo ${PIPESTATUS[#]}
1 1 0 #ls and grep fails
for exact command status
echo ${PIPESTATUS[1]} #for the grep
also here is the
set -o pipefail
from the docs
pipefail
If set, the return value of a pipeline is the value of the
last (rightmost) command to exit with a non-zero status, or zero if
all commands in the pipeline exit successfully. This option is
disabled by default.
$ ls nonexistentfile | wc -c
ls: nonexistentfile: No such file or directory
0
$ echo $?
0
$ set -o pipefail
$ ls nonexistentfile | wc -c
ls: nonexistentfile: No such file or directory
0
$ echo $?
1
EDIT based on the comment
Youre probably tried the next:
VARIABLE=$(grep "searched_string" test.log | sed 's/searched/found/')
echo "${PIPESTATUS[#]}"
Of course, this can't work because the whole $(...) part runs in the subshell (another process) and therefore any variable what is created is lost when the subshell exits. (at the ))
You should put the whole PIPESTATUS mechanism into $(...) like next:
variable=$(
grep "searched_string" test.log | sed 's/searched/found/'
# do something with PIPESTATUS
# you should not echo anythig to stdout (because will be captured into $variable)
# you can echo on stderr - e.g.
echo "=${PIPESTATUS[#]}=" >&2
)
Also, the second line of the comment is an solution, eg:
var_with_status=$(command | commmand2 ; echo ":DELIMITER:${PIPESTATUS[#]}")
now, the $var_with_status will contain not only the result of the command | command2 but the PIPESTATUS too, delimited with some unique delimiter, so you can extract it...
Also, the set -o pipefail will indicate the result - if you don't need exact place of the fail.
Also you can write the PIPESTATUS in some temp-file (in the subshell) and the parent can read it and delete the temp-file...
Also is possible print the PIPESTATUS into different file-descriptors in the subshell and read this descriptor in the parent shell, but....
... beware do not fall into the XY problem, where you will make extremelly complicated script, only because you don't want change the logic of the processing.
e.g. you can always break you script into safe parts, like:
var1=$(grep 'str' test.log)
#check the `$var1` and do something with the error indicated with `$?`
var2=(sed '....' <<<"$var1")
#check the `$var2` and do something with the error indicated with `$?`
#and so on
simple enough?
So, ask yourself - do you really need mungling with how to get the PIPESTATUS form an subshell?
Ps: don't use uppercase variable names. could interfere with some environment variables and causes hard-to-debug problems..

Get the name of the caller script in bash script

Let's assume I have 3 shell scripts:
script_1.sh
#!/bin/bash
./script_3.sh
script_2.sh
#!/bin/bash
./script_3.sh
the problem is that in script_3.sh I want to know the name of the caller script.
so that I can respond differently to each caller I support
please don't assume I'm asking about $0 cause $0 will echo script_3 every time no matter who is the caller
here is an example input with expected output
./script_1.sh should echo script_1
./script_2.sh should echo script_2
./script_3.sh should echo user_name or root or anything to distinguish between the 3 cases?
Is that possible? and if possible, how can it be done?
this is going to be added to a rm modified script... so when I call rm it do something and when git or any other CLI tool use rm it is not affected by the modification
Based on #user3100381's answer, here's a much simpler command to get the same thing which I believe should be fairly portable:
PARENT_COMMAND=$(ps -o comm= $PPID)
Replace comm= with args= to get the full command line (command + arguments). The = alone is used to suppress the headers.
See: http://pubs.opengroup.org/onlinepubs/009604499/utilities/ps.html
In case you are sourceing instead of calling/executing the script there is no new process forked and thus the solutions with ps won't work reliably.
Use bash built-in caller in that case.
$ cat h.sh
#! /bin/bash
function warn_me() {
echo "$#"
caller
}
$
$ cat g.sh
#!/bin/bash
source h.sh
warn_me "Error: You did not do something"
$
$ . g.sh
Error: You did not do something
g.sh
$
Source
The $PPID variable holds the parent process ID. So you could parse the output from ps to get the command.
#!/bin/bash
PARENT_COMMAND=$(ps $PPID | tail -n 1 | awk "{print \$5}")
Based on #J.L.answer, with more in depth explanations, that works for linux :
cat /proc/$PPID/comm
gives you the name of the command of the parent pid
If you prefer the command with all options, then :
cat /proc/$PPID/cmdline
explanations :
$PPID is defined by the shell, it's the pid of the parent processes
in /proc/, you have some dirs with the pid of each process (linux). Then, if you cat /proc/$PPID/comm, you echo the command name of the PID
Check man proc
Couple of useful files things kept in /proc/$PPID here
/proc/*some_process_id*/exe A symlink to the last executed command under *some_process_id*
/proc/*some_process_id*/cmdline A file containing the last executed command under *some_process_id* and null-byte separated arguments
So a slight simplification.
sed 's/\x0/ /g' "/proc/$PPID/cmdline"
If you have /proc:
$(cat /proc/$PPID/comm)
Declare this:
PARENT_NAME=`ps -ocomm --no-header $PPID`
Thus you'll get a nice variable $PARENT_NAME that holds the parent's name.
You can simply use the command below to avoid calling cut/awk/sed:
ps --no-headers -o command $PPID
If you only want the parent and none of the subsequent processes, you can use:
ps --no-headers -o command $PPID | cut -d' ' -f1
You could pass in a variable to script_3.sh to determine how to respond...
script_1.sh
#!/bin/bash
./script_3.sh script1
script_2.sh
#!/bin/bash
./script_3.sh script2
script_3.sh
#!/bin/bash
if [ $1 == 'script1' ] ; then
echo "we were called from script1!"
elsif [ $1 == 'script2' ] ; then
echo "we were called from script2!"
fi

What is the equivalent to xargs -r under OsX

Are they any equivalent under OSX to the xargs -r under Linux ? I'm trying to find a way to interupt a pipe if there's no data.
For instance imagine you do the following:
touch test
cat test | xargs -r echo "content: "
That doesn't yield any result because xargs interrupts the pipe.
Is there either some hidden xargs option or something else to achieve the same result under OSX?
The POSIX standard for xargs mandates that the command be executed once, even if there are no arguments. This is a nuisance, which is why GNU xargs has the -r option. Unfortunately, neither BSD (MacOS X) nor the other mainstream Unix versions (AIX, HP-UX, Solaris) support it.
If it is crucial to you, obtain and install GNU xargs somewhere that your environment will find it, without affecting the system (so don't replace /usr/bin/xargs unless you're a braver man than I am — but /usr/local/bin/xargs might be OK, or $HOME/bin/xargs, or …).
You can use test or [:
if [ -s test ] ; then cat test | xargs echo content: ; fi
There is no standard way to determine if the xargs you are running is GNU or not. I set $gnuargs to either "true" or "false" and then have a function that replaces xargs and does the right thing.
On Linux, FreeBSD and MacOS this script works for me. The POSIX standard for xargs mandates that the command be executed once, even if there are no arguments. FreeBSD and MacOS X violate this rule, thus don't need "-r". GNU finds it annoying, and adds -r. This script does the right thing and can be enhanced if you find a version of Unix that does it some other way.
#!/bin/bash
gnuxargs=$(xargs --version 2>&1 |grep -s GNU >/dev/null && echo true || echo false)
function portable_xargs_r() {
if $gnuxargs ; then
cat - | xargs -r "$#"
else
cat - | xargs "$#"
fi
}
echo 'this' > foo
echo '=== Expect one line'
portable_xargs_r <foo echo "content: "
echo '=== DONE.'
cat </dev/null > foo
echo '=== Expect zero lines'
portable_xargs_r <foo echo "content: "
echo '=== DONE.'
Here's a quick and dirty xargs-r using a temporary file.
#!/bin/sh
t=$(mktemp -t xargsrXXXXXXXXX) || exit
trap 'rm -f $t' EXIT HUP INT TERM
cat >"$t"
test -s "$t" || exit
exec xargs "$#" <"$t"
with POSIX xargs¹, to avoid running the-command when the input is empty, you could use moreutils's ifne (for if not empty):
... | ifne xargs ... the-command ...
Or use a sh wrapper that checks the number of arguments:
... | xargs ... sh -c '[ "$#" -eq 0 ] || exec the-command ... "$#"' sh
¹ though one can hardly use xargs POSIXly as it doesn't support -0, has unspecified behaviour when the input is non-text (like for filenames which on most systems are not guaranteed to be text except in the POSIX locale), parses its input in a very arcane way and that is locale-dependant, and doesn't give any guarantee if any word is more than 255 bytes long!
You could make sure that the input always has at least one line. This may not always be possible, but you'd be surprised how many creative ways this can be done.
A typical use case looks like:
find . -print0 | xargs -r -0 grep PATTERN
Some versions of xargs do not have an -r flag. In that case, you can supply /dev/null as the first filename so that grep is never handed an empty list of filenames. Since the pattern will never be found in /dev/null, this won't affect the output:
find . -print0 | xargs -0 grep PATTERN /dev/null
You can test if the stream has any content:
cat test | { if IFS= read -r tmp; then { printf "%s\n" "$tmp"; cat; } | xargs echo "content: "; fi; }
# ^^^ - otherwise just do nothing
# ^^^^^^^^^^^^^^^^^^^^^^^ - to xargs
# ^^^ - and the rest of input
# ^^^^^^^^^^^^^^^^^^^^^^ - redirect first line
# ^^^^^^^^^^^^^^^^^^^ - try reading anything
# or with a function
# even TODO: add the check of `portable_xargs_r` in the other answer and call `xargs -r` when available.
xargs_r() {
if IFS= read -r tmp; then
{ printf "%s\n" "$tmp"; cat; } | xargs "$#"
fi
}
cat test | xargs_r echo "content: "
This method runs the check inside the pipe inside the subshell, so it effectively can be used in a complicated pipe setup.

Resources