Shell Scripting: Using xargs to execute parallel instances of a shell function - bash

I'm trying to use xargs in a shell script to run parallel instances of a function I've defined in the same script. The function times the fetching of a page, and so it's important that the pages are actually fetched concurrently in parallel processes, and not in background processes (if my understanding of this is wrong and there's negligible difference between the two, just let me know).
The function is:
function time_a_url ()
{
oneurltime=$($time_command -p wget -p $1 -O /dev/null 2>&1 1>/dev/null | grep real | cut -d" " -f2)
echo "Fetching $1 took $oneurltime seconds."
}
How does one do this with an xargs pipe in a form that can take number of times to run time_a_url in parallel as an argument? And yes, I know about GNU parallel, I just don't have the privilege to install software where I'm writing this.

Here's a demo of how you might be able to get your function to work:
$ f() { echo "[$#]"; }
$ export -f f
$ echo -e "b 1\nc 2\nd 3 4" | xargs -P 0 -n 1 -I{} bash -c f\ \{\}
[b 1]
[d 3 4]
[c 2]
The keys to making this work are to export the function so the bash that xargs spawns will see it and to escape the space between the function name and the escaped braces. You should be able to adapt this to work in your situation. You'll need to adjust the arguments for -P and -n (or remove them) to suit your needs.
You can probably get rid of the grep and cut. If you're using the Bash builtin time, you can specify an output format using the TIMEFORMAT variable. If you're using GNU /usr/bin/time, you can use the --format argument. Either of these will allow you to drop the -p also.
You can replace this part of your wget command: 2>&1 1>/dev/null with -q. In any case, you have those reversed. The correct order would be >/dev/null 2>&1.

On Mac OS X:
xargs: max. processes must be >0 (for: xargs -P [>0])
f() { echo "[$#]"; }
export -f f
echo -e "b 1\nc 2\nd 3 4" | sed 's/ /\\ /g' | xargs -P 10 -n 1 -I{} bash -c f\ \{\}
echo -e "b 1\nc 2\nd 3 4" | xargs -P 10 -I '{}' bash -c 'f "$#"' arg0 '{}'

If you install GNU Parallel on another system, you will see the functionality is in a single file (called parallel).
You should be able to simply copy that file to your own ~/bin.

Related

xargs output buffering -P parallel

I have a bash function that i call in parallel using xargs -P like so
echo ${list} | xargs -n 1 -P 24 -I# bash -l -c 'myAwesomeShellFunction #'
Everything works fine but output is messed up for obvious reasons (no buffering)
Trying to figure out a way to buffer output effectively. I was thinking I could use awk, but I'm not good enough to write such a script and I can't find anything worthwhile on google? Can someone help me write this "output buffer" in sed or awk? Nothing fancy, just accumulate output and spit it out after process terminates. I don't care the order that shell functions execute, just need their output buffered... Something like:
echo ${list} | xargs -n 1 -P 24 -I# bash -l -c 'myAwesomeShellFunction # | sed -u ""'
P.s. I tried to use stdbuf as per
https://unix.stackexchange.com/questions/25372/turn-off-buffering-in-pipe but did not work, i specified buffering on o and e but output still unbuffered:
echo ${list} | xargs -n 1 -P 24 -I# stdbuf -i0 -oL -eL bash -l -c 'myAwesomeShellFunction #'
Here's my first attempt, this only captures first line of output:
$ bash -c "echo stuff;sleep 3; echo more stuff" | awk '{while (( getline line) > 0 )print "got ",$line;}'
$ got stuff
This isn't quite atomic if your output is longer than a page (4kb typically), but for most cases it'll do:
xargs -P 24 bash -c 'for arg; do printf "%s\n" "$(myAwesomeShellFunction "$arg")"; done' _
The magic here is the command substitution: $(...) creates a subshell (a fork()ed-off copy of your shell), runs the code ... in it, and then reads that in to be substituted into the relevant position in the outer script.
Note that we don't need -n 1 (if you're dealing with a large number of arguments -- for a small number it may improve parallelization), since we're iterating over as many arguments as each of your 24 parallel bash instances is passed.
If you want to make it truly atomic, you can do that with a lockfile:
# generate a lockfile, arrange for it to be deleted when this shell exits
lockfile=$(mktemp -t lock.XXXXXX); export lockfile
trap 'rm -f "$lockfile"' 0
xargs -P 24 bash -c '
for arg; do
{
output=$(myAwesomeShellFunction "$arg")
flock -x 99
printf "%s\n" "$output"
} 99>"$lockfile"
done
' _

Bash - how do i output line and then pipe line to another command side by side? [duplicate]

cat a.txt | xargs -I % echo %
In the example above, xargs takes echo % as the command argument. But in some cases, I need multiple commands to process the argument instead of one. For example:
cat a.txt | xargs -I % {command1; command2; ... }
But xargs doesn't accept this form. One solution I know is that I can define a function to wrap the commands, but I want to avoid that because it is complex. Is there a better solution?
cat a.txt | xargs -d $'\n' sh -c 'for arg do command1 "$arg"; command2 "$arg"; ...; done' _
...or, without a Useless Use Of cat:
<a.txt xargs -d $'\n' sh -c 'for arg do command1 "$arg"; command2 "$arg"; ...; done' _
To explain some of the finer points:
The use of "$arg" instead of % (and the absence of -I in the xargs command line) is for security reasons: Passing data on sh's command-line argument list instead of substituting it into code prevents content that data might contain (such as $(rm -rf ~), to take a particularly malicious example) from being executed as code.
Similarly, the use of -d $'\n' is a GNU extension which causes xargs to treat each line of the input file as a separate data item. Either this or -0 (which expects NULs instead of newlines) is necessary to prevent xargs from trying to apply shell-like (but not quite shell-compatible) parsing to the stream it reads. (If you don't have GNU xargs, you can use tr '\n' '\0' <a.txt | xargs -0 ... to get line-oriented reading without -d).
The _ is a placeholder for $0, such that other data values added by xargs become $1 and onward, which happens to be the default set of values a for loop iterates over.
You can use
cat file.txt | xargs -i sh -c 'command {} | command2 {} && command3 {}'
{} = variable for each line on the text file
With GNU Parallel you can do:
cat a.txt | parallel 'command1 {}; command2 {}; ...; '
For security reasons it is recommended you use your package manager to
install. But if you cannot do that then you can use this 10 seconds
installation.
The 10 seconds installation will try to do a full installation; if
that fails, a personal installation; if that fails, a minimal
installation.
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
I prefer style which allows dry run mode (without | sh) :
cat a.txt | xargs -I % echo "command1; command2; ... " | sh
Works with pipes too:
cat a.txt | xargs -I % echo "echo % | cat " | sh
This is just another approach without xargs nor cat:
while read stuff; do
command1 "$stuff"
command2 "$stuff"
...
done < a.txt
This seems to be the safest version.
tr '[\n]' '[\0]' < a.txt | xargs -r0 /bin/bash -c 'command1 "$#"; command2 "$#";' ''
(-0 can be removed and the tr replaced with a redirect (or the file can be replaced with a null separated file instead). It is mainly in there since I mainly use xargs with find with -print0 output) (This might also be relevant on xargs versions without the -0 extension)
It is safe, since args will pass the parameters to the shell as an array when executing it. The shell (at least bash) would then pass them as an unaltered array to the other processes when all are obtained using ["$#"][1]
If you use ...| xargs -r0 -I{} bash -c 'f="{}"; command "$f";' '', the assignment will fail if the string contains double quotes. This is true for every variant using -i or -I. (Due to it being replaced into a string, you can always inject commands by inserting unexpected characters (like quotes, backticks or dollar signs) into the input data)
If the commands can only take one parameter at a time:
tr '[\n]' '[\0]' < a.txt | xargs -r0 -n1 /bin/bash -c 'command1 "$#"; command2 "$#";' ''
Or with somewhat less processes:
tr '[\n]' '[\0]' < a.txt | xargs -r0 /bin/bash -c 'for f in "$#"; do command1 "$f"; command2 "$f"; done;' ''
If you have GNU xargs or another with the -P extension and you want to run 32 processes in parallel, each with not more than 10 parameters for each command:
tr '[\n]' '[\0]' < a.txt | xargs -r0 -n10 -P32 /bin/bash -c 'command1 "$#"; command2 "$#";' ''
This should be robust against any special characters in the input. (If the input is null separated.) The tr version will get some invalid input if some of the lines contain newlines, but that is unavoidable with a newline separated file.
The blank first parameter for bash -c is due to this: (From the bash man page) (Thanks #clacke)
-c If the -c option is present, then commands are read from the first non-option argument com‐
mand_string. If there are arguments after the command_string, the first argument is assigned to $0
and any remaining arguments are assigned to the positional parameters. The assignment to $0 sets
the name of the shell, which is used in warning and error messages.
One thing I do is to add to .bashrc/.profile this function:
function each() {
while read line; do
for f in "$#"; do
$f $line
done
done
}
then you can do things like
... | each command1 command2 "command3 has spaces"
which is less verbose than xargs or -exec. You could also modify the function to insert the value from the read at an arbitrary location in the commands to each, if you needed that behavior also.
Another possible solution that works for me is something like -
cat a.txt | xargs bash -c 'command1 $#; command2 $#' bash
Note the 'bash' at the end - I assume it is passed as argv[0] to bash. Without it in this syntax the first parameter to each command is lost. It may be any word.
Example:
cat a.txt | xargs -n 5 bash -c 'echo -n `date +%Y%m%d-%H%M%S:` ; echo " data: " $#; echo "data again: " $#' bash
My current BKM for this is
... | xargs -n1 -I % perl -e 'system("echo 1 %"); system("echo 2 %");'
It is unfortunate that this uses perl, which is less likely to be installed than bash; but it handles more input that the accepted answer. (I welcome a ubiquitous version that does not rely on perl.)
#KeithThompson's suggestion of
... | xargs -I % sh -c 'command1; command2; ...'
is great - unless you have the shell comment character # in your input, in which case part of the first command and all of the second command will be truncated.
Hashes # can be quite common, if the input is derived from a filesystem listing, such as ls or find, and your editor creates temporary files with # in their name.
Example of the problem:
$ bash 1366 $> /bin/ls | cat
#Makefile#
#README#
Makefile
README
Oops, here is the problem:
$ bash 1367 $> ls | xargs -n1 -I % sh -i -c 'echo 1 %; echo 2 %'
1
1
1
1 Makefile
2 Makefile
1 README
2 README
Ahh, that's better:
$ bash 1368 $> ls | xargs -n1 -I % perl -e 'system("echo 1 %"); system("echo 2 %");'
1 #Makefile#
2 #Makefile#
1 #README#
2 #README#
1 Makefile
2 Makefile
1 README
2 README
$ bash 1369 $>
Try this:
git config --global alias.all '!f() { find . -d -name ".git" | sed s/\\/\.git//g | xargs -P10 -I{} git --git-dir={}/.git --work-tree={} $1; }; f'
It runs ten threads in parallel and does what ever git command you want to all repos in the folder structure. No matter if the repo is one or n levels deep.
E.g: git all pull
I have good idea to solve the problem.
Only write a comman mcmd, then you can do
find . -type f | xargs -i mcmd echo {} ## cat {} #pipe sed -n '1,3p'
The mcmd content as follows:
echo $* | sed -e 's/##/\n/g' -e 's/#pipe/|/g' | csh

xargs not working with built in shell functions

I am trying to speed up the processing of a database. I migrated towards xargs. But I'm seriously stuck. Piping a list of arguments to xargs does not work if the command invoked by xargs isn't a built in. I can't figure out why. Here is my code:
#!/bin/bash
list='foo
bar'
test(){
echo "$1"
}
echo "$list" | tr '\012' '\000' | xargs -0 -n1 -I '{}' 'test' {}
So there is no output at all. And test function never gets executed. But if I replace "test" in the "xargs" command with "echo" or "printf" it works fine.
You can't pass a shell function to xargs directly, but you can invoke a shell.
printf 'foo\0bar\0' |
xargs -r -0 sh -c 'for f; do echo "$f"; done' _
The stuff inside sh -c '...' can be arbitrarily complex; if you really wanted to, you could declare and then use your function. But since it's simple and nonrecursive, I just inlined the functionality.
The dummy underscore parameter is because the first argument after sh -c 'script' is used to populate $0.
Because your question seems to be about optimization, I imagine you don't want to spawn a separate shell for every item passed to xargs -- if you did, nothing would get faster. So I put in the for loop and took out the -I etc arguments to xargs.
xargs takes an executable as an argument (including custom scripts) rather than a function defined in the environment.
Either move your code to a script or use xargs to pass arguments to an external command.
Change from:
echo "$list" | tr '\012' '\000' | xargs -0 -n1 -I '{}' 'test' {}
To:
export -f test
echo "$list" | tr '\012' '\000' | xargs -0 -n1 -I '{}' sh -c 'test {}'
I've seen a solution from 'jac' on the bbs.archlinux.org web-site that uses a primary and secondary (slave) pair of scripts that are very efficients. Instead of an internal 'function' that normally would accept a single $1 parameter, the primary sends a list of parameters to its secondary where a while-loop handles each member of the list as consecutive $1 values. Here's a sample pair I'm using to apply the 'file' command to a bunch of executables, which in my case all begin with "em" in the filename. Make changes as necessary:
#!/bin/bash
# primary: showfil
ls -l em* | grep '^-rwx' | awk '{$1=$2=$3=$4=$5=$6=$7=$8=""; print $0}' | xargs -I% ~/showfilf "%"
~/showfilf fixmstr spisort trc
exit 0
#!/bin/bash
# secondary: showfilf
myarch=$(uname -s | grep 'arwin')
while [[ -n "$1" ]]; do
if [ -x "$1" ]; then
if [ -n "$myarch" ]; then
file "./$1"
else
myfile=$(file "./$1" | awk '{print $1" "$3" "$10" "$11" "$12}')
myfile=${myfile%(uses}
myfile=${myfile%for}
echo "$myfile"
fi
fi
shift
done
exit 0
This code works on Darwin (Mac) and Linux, and probably other systems. The 'grep' in the primary retains only executable files, not directories or symlinks. The 'awk' eliminates the first eight fields of 'ls' and retains just the filename,which is passed to 'xargs', which builds a list of quoted filenames to send to 'showfilf'. There's a separate invocation of 'showfilf' with three other filenames in the list. 'showfilf' has a while-loop which processes the list. Note that there is system-dependent code here, determined by 'uname -s' and 'grep'. Lastly, make these scripts executable, and place them on your $PATH, such as $HOME. If your $PATH doesn't include your $HOME, I recommend you modify it in your .bashrc or .bash_login something like this: export PATH=$PATH:$HOME

How to correctly wrap multiple command calls in bash?

My problem can be summed up by making this simple command works :
nice -n 10 "ls|xargs -I% echo \"%\""
Which fails :
nice: ls|xargs -I% echo "%": No such file or directory
Removing the quotes makes it works, but my point is to wrap multiple quoted commands into one to do something more complex like :
ftphost="192.168.1.1"
dirinputtopush="/tmp/archivedir/"
ftpoutputdir="mydir/"
nice -n 19 ls $dirinputtopush | xargs -I% "lftp $ftphost -e \"mirror -R $dirinputtopush% $ftpoutputdirrecent ;quit\"; sleep 10"
Try using nice -n 10 bash -c 'your; commands | or_complex pipelines' as command. This way bash is the binary and the string after -c contains a sequence interpreted by bash so it can contain pipelines, loops etc. Watch out for proper quoting. You need to do it this way because nice expects a binary, not expressions interpreted by the shell. In contrast, shell builtins such as time (but not /usr/bin/time which is a separate binary) will accept shell expressions as the command to execute. They can because they're built into the shell. nice is not, so it requires a binary to execute.
Children inherit nice value:
nice -n 10 bash -c 'ls | xargs -I% echo %'
Nice each command separately:
nice -n 10 ls | nice -n 10 xargs -I% echo %

What is the equivalent to xargs -r under OsX

Are they any equivalent under OSX to the xargs -r under Linux ? I'm trying to find a way to interupt a pipe if there's no data.
For instance imagine you do the following:
touch test
cat test | xargs -r echo "content: "
That doesn't yield any result because xargs interrupts the pipe.
Is there either some hidden xargs option or something else to achieve the same result under OSX?
The POSIX standard for xargs mandates that the command be executed once, even if there are no arguments. This is a nuisance, which is why GNU xargs has the -r option. Unfortunately, neither BSD (MacOS X) nor the other mainstream Unix versions (AIX, HP-UX, Solaris) support it.
If it is crucial to you, obtain and install GNU xargs somewhere that your environment will find it, without affecting the system (so don't replace /usr/bin/xargs unless you're a braver man than I am — but /usr/local/bin/xargs might be OK, or $HOME/bin/xargs, or …).
You can use test or [:
if [ -s test ] ; then cat test | xargs echo content: ; fi
There is no standard way to determine if the xargs you are running is GNU or not. I set $gnuargs to either "true" or "false" and then have a function that replaces xargs and does the right thing.
On Linux, FreeBSD and MacOS this script works for me. The POSIX standard for xargs mandates that the command be executed once, even if there are no arguments. FreeBSD and MacOS X violate this rule, thus don't need "-r". GNU finds it annoying, and adds -r. This script does the right thing and can be enhanced if you find a version of Unix that does it some other way.
#!/bin/bash
gnuxargs=$(xargs --version 2>&1 |grep -s GNU >/dev/null && echo true || echo false)
function portable_xargs_r() {
if $gnuxargs ; then
cat - | xargs -r "$#"
else
cat - | xargs "$#"
fi
}
echo 'this' > foo
echo '=== Expect one line'
portable_xargs_r <foo echo "content: "
echo '=== DONE.'
cat </dev/null > foo
echo '=== Expect zero lines'
portable_xargs_r <foo echo "content: "
echo '=== DONE.'
Here's a quick and dirty xargs-r using a temporary file.
#!/bin/sh
t=$(mktemp -t xargsrXXXXXXXXX) || exit
trap 'rm -f $t' EXIT HUP INT TERM
cat >"$t"
test -s "$t" || exit
exec xargs "$#" <"$t"
with POSIX xargs¹, to avoid running the-command when the input is empty, you could use moreutils's ifne (for if not empty):
... | ifne xargs ... the-command ...
Or use a sh wrapper that checks the number of arguments:
... | xargs ... sh -c '[ "$#" -eq 0 ] || exec the-command ... "$#"' sh
¹ though one can hardly use xargs POSIXly as it doesn't support -0, has unspecified behaviour when the input is non-text (like for filenames which on most systems are not guaranteed to be text except in the POSIX locale), parses its input in a very arcane way and that is locale-dependant, and doesn't give any guarantee if any word is more than 255 bytes long!
You could make sure that the input always has at least one line. This may not always be possible, but you'd be surprised how many creative ways this can be done.
A typical use case looks like:
find . -print0 | xargs -r -0 grep PATTERN
Some versions of xargs do not have an -r flag. In that case, you can supply /dev/null as the first filename so that grep is never handed an empty list of filenames. Since the pattern will never be found in /dev/null, this won't affect the output:
find . -print0 | xargs -0 grep PATTERN /dev/null
You can test if the stream has any content:
cat test | { if IFS= read -r tmp; then { printf "%s\n" "$tmp"; cat; } | xargs echo "content: "; fi; }
# ^^^ - otherwise just do nothing
# ^^^^^^^^^^^^^^^^^^^^^^^ - to xargs
# ^^^ - and the rest of input
# ^^^^^^^^^^^^^^^^^^^^^^ - redirect first line
# ^^^^^^^^^^^^^^^^^^^ - try reading anything
# or with a function
# even TODO: add the check of `portable_xargs_r` in the other answer and call `xargs -r` when available.
xargs_r() {
if IFS= read -r tmp; then
{ printf "%s\n" "$tmp"; cat; } | xargs "$#"
fi
}
cat test | xargs_r echo "content: "
This method runs the check inside the pipe inside the subshell, so it effectively can be used in a complicated pipe setup.

Resources