Get xargs to word-split placeholder {} - bash

(Though word splitting has a specific definition in Bash, in this post it means to split on spaces or tabs.)
Demonstrating the question using this input to xargs,
$ cat input.txt
LineOneWithOneArg
LineTwo WithTwoArgs
LineThree WithThree Args
LineFour With Double Spaces
and this Bash command to echo the arguments passed to it,
$ bash -c 'IFS=,; echo "$*"' arg0 arg1 arg2 arg3
arg1,arg2,arg3
notice how xargs -L1 word-splits each line into multiple arguments.
$ xargs <input.txt -L1 bash -c 'IFS=,; echo "$*"' arg0
LineOneWithOneArg
LineTwo,WithTwoArgs
LineThree,WithThree,Args
LineFour,With,Double,Spaces
However, xargs -I{} expands the whole line into {} as a single argument.
$ xargs <input.txt -I{} bash -c 'IFS=,; echo "$*"' arg0 {}
LineOneWithOneArg
LineTwo WithTwoArgs
LineThree WithThree Args
LineFour With Double Spaces
While this is perfectly reasonable behavior for the vast majority of cases, there are times when word-splitting behavior (the first xargs example) is preferred.
And while xargs -L1 can be seen as a workaround, it can only be used to place arguments at the end of the command line, making it impossible to express
$ xargs -I{} command first-arg {} last-arg
with xargs -L1. (That is of course, unless command is able to accept arguments in a different order, as is the case with options.)
Is there any way to get xargs -I{} to word-split each line when expanding the {} placeholder?

Sort of.
echo -e "1\n2 3" | xargs sh -c 'echo a "$#" b' "$0"
Outputs:
a 1 2 3 b
ref: https://stackoverflow.com/a/35612138/1563960
Also:
echo -e "1\n2 3" | xargs -L1 sh -c 'echo a "$#" b' "$0"
Outputs:
a 1 b
a 2 3 b

Related

Cannot get bash function call to work inside another bash call with xargs in it [duplicate]

I am trying to use xargs to call a more complex function in parallel.
#!/bin/bash
echo_var(){
echo $1
return 0
}
seq -f "n%04g" 1 100 |xargs -n 1 -P 10 -i echo_var {}
exit 0
This returns the error
xargs: echo_var: No such file or directory
Any ideas on how I can use xargs to accomplish this, or any other solution(s) would be welcome.
Exporting the function should do it (untested):
export -f echo_var
seq -f "n%04g" 1 100 | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$#"' _ {}
You can use the builtin printf instead of the external seq:
printf "n%04g\n" {1..100} | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$#"' _ {}
Also, using return 0 and exit 0 like that masks any error value that might be produced by the command preceding it. Also, if there's no error, it's the default and thus somewhat redundant.
#phobic mentions that the Bash command could be simplified to
bash -c 'echo_var "{}"'
moving the {} directly inside it. But it's vulnerable to command injection as pointed out by #Sasha.
Here is an example why you should not use the embedded format:
$ echo '$(date)' | xargs -I {} bash -c 'echo_var "{}"'
Sun Aug 18 11:56:45 CDT 2019
Another example of why not:
echo '\"; date\"' | xargs -I {} bash -c 'echo_var "{}"'
This is what is output using the safe format:
$ echo '$(date)' | xargs -I {} bash -c 'echo_var "$#"' _ {}
$(date)
This is comparable to using parameterized SQL queries to avoid injection.
I'm using date in a command substitution or in escaped quotes here instead of the rm command used in Sasha's comment since it's non-destructive.
Using GNU Parallel is looks like this:
#!/bin/bash
echo_var(){
echo $1
return 0
}
export -f echo_var
seq -f "n%04g" 1 100 | parallel -P 10 echo_var {}
exit 0
If you use version 20170822 you do not even have to export -f as long as you have run this:
. `which env_parallel.bash`
seq -f "n%04g" 1 100 | env_parallel -P 10 echo_var {}
Something like this should work also:
function testing() { sleep $1 ; }
echo {1..10} | xargs -n 1 | xargs -I# -P4 bash -c "$(declare -f testing) ; testing # ; echo # "
Maybe this is bad practice, but you if you are defining functions in a .bashrc or other script, you can wrap the file or at least the function definitions with a setting of allexport:
set -o allexport
function funcy_town {
echo 'this is a function'
}
function func_rock {
echo 'this is a function, but different'
}
function cyber_func {
echo 'this function does important things'
}
function the_man_from_funcle {
echo 'not gonna lie'
}
function funcle_wiggly {
echo 'at this point I\'m doing it for the funny names'
}
function extreme_function {
echo 'goodbye'
}
set +o allexport
Seems I can't make comments :-(
I was wondering about the focus on
bash -c 'echo_var "$#"' _ {}
vs
bash -c 'echo_var "{}"'
The 1st substitutes the {} as an arg to bash while the 2nd as an arg to the function. The fact that example 1 doesn't expand the $(date) is simply a a side effect.
If you don't want the functions args expanded , use single single quotes rather than double. To avoid messy nesting , use double quote (expand args on the other one)
$ echo '$(date)' | xargs -0 -L1 -I {} bash -c 'printit "{}"'
Fri 11 Sep 17:02:24 BST 2020
$ echo '$(date)' | xargs -0 -L1 -I {} bash -c "printit '{}'"
$(date)

calling shell function using parallel with list of quoted filenames as input

Using Bash.
I have an exported shell function which I want to apply to many files.
Normally I would use xargs, but the syntax like this (see here) is too ugly for use.
...... | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$#"' _ {}
In that discussion, parallel had an easier syntax:
..... | parallel -P 10 echo_var {}
Now I have run into the following problem: the list of files to which I want to apply my function is a list of files on one line, each quoted and separated by spaces thus:
"file 1" "file 2" "file 3".
how can I feed this space-separated, quoted, list into parallel?
I can replicate the list using echo for testing.
e.g.
echo '"file 1" "file 2" "file 3"'|parallel -d " " my_function {}
but I can't get this to work.
How can I fix it?
How can I fix it?
You have to choose a unique separator.
echo 'file 1|file 2|file 3' | xargs -d "|" -n1 bash -c 'my_function "$#"' --
echo 'file 1^file 2^file 3' | parallel -d "^" my_function
The safest is to use zero byte as the separator:
echo -e 'file 1\x00file 2\x00file 3' | xargs -0 ' -n1 bash -c 'my_function "$#"' --
printf "%s\0" 'file 1' 'file 2' 'file 3' | parallel -0 my_function
The best is to store your elements inside a bash array and use a zero separated stream to process them:
files=("file 1" "file 2" "file 3")
printf "%s\0" "${files[#]}" | xargs -0 -n1 bash -c 'my_function "$#"' --
printf "%s\0" "${files[#]}" | parallel -0 my_function
Note that empty arrays will run the function without any arguments. It's sometimes preferred to use -r --no-run-if-empty option not to run the function when input is empty. The --no-run-if-empty is supported by parallel and is a gnu extension in xargs (xargs on BSD and on OSX do not have --no-run-if-empty).
Note: xargs by default parses ', " and \. This is why the following is possible and will work:
echo '"file 1" "file 2" "file 3"' | xargs -n1 bash -c 'my_function "$#"' --
echo "'file 1' 'file 2' 'file 3'" | xargs -n1 bash -c 'my_function "$#"' --
echo 'file\ 1 file\ 2 file\ 3' | xargs -n1 bash -c 'my_function "$#"' --
And it can result in some strange things, so remember to almost always specify -d option to xargs:
$ # note \x replaced by single x
$ echo '\\a\b\c' | xargs
\abc
$ # quotes are parsed and need to match
$ echo 'abc"def' | xargs
xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option
$ echo "abc'def" | xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
xargs is a portable tool available quite everywhere, while parallel is a GNU program, which has to be installed separately.
The problem boils down to the values can contain space, and space is the value separator. So we need something that can parse the input into separate values containing space. Since they are bash-quoted the obvious choice is to use bash for unquoting the values.
You have several options:
(echo "file 1";
echo "file 2";
echo "file \"name\" \$(3)") | parallel my_function
printf "%s\n" "file 1" "file 2" "file \"name\" \$(3)" |
parallel my_function
If the input is in a variable:
var='"file 1" "file 2" "file \"name\" \$(3)"'
eval 'printf "%s\n" '"$var" |
parallel my_function
Or you can convert the variable to an array:
var='"file 1" "file 2" "file \"name\" \$(3)"'
eval arr=("$var")
And if the input is in an array:
parallel my_function ::: "${arr[#]}"

Calling a nested function in xargs (on MacOS) [duplicate]

I am trying to use xargs to call a more complex function in parallel.
#!/bin/bash
echo_var(){
echo $1
return 0
}
seq -f "n%04g" 1 100 |xargs -n 1 -P 10 -i echo_var {}
exit 0
This returns the error
xargs: echo_var: No such file or directory
Any ideas on how I can use xargs to accomplish this, or any other solution(s) would be welcome.
Exporting the function should do it (untested):
export -f echo_var
seq -f "n%04g" 1 100 | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$#"' _ {}
You can use the builtin printf instead of the external seq:
printf "n%04g\n" {1..100} | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$#"' _ {}
Also, using return 0 and exit 0 like that masks any error value that might be produced by the command preceding it. Also, if there's no error, it's the default and thus somewhat redundant.
#phobic mentions that the Bash command could be simplified to
bash -c 'echo_var "{}"'
moving the {} directly inside it. But it's vulnerable to command injection as pointed out by #Sasha.
Here is an example why you should not use the embedded format:
$ echo '$(date)' | xargs -I {} bash -c 'echo_var "{}"'
Sun Aug 18 11:56:45 CDT 2019
Another example of why not:
echo '\"; date\"' | xargs -I {} bash -c 'echo_var "{}"'
This is what is output using the safe format:
$ echo '$(date)' | xargs -I {} bash -c 'echo_var "$#"' _ {}
$(date)
This is comparable to using parameterized SQL queries to avoid injection.
I'm using date in a command substitution or in escaped quotes here instead of the rm command used in Sasha's comment since it's non-destructive.
Using GNU Parallel is looks like this:
#!/bin/bash
echo_var(){
echo $1
return 0
}
export -f echo_var
seq -f "n%04g" 1 100 | parallel -P 10 echo_var {}
exit 0
If you use version 20170822 you do not even have to export -f as long as you have run this:
. `which env_parallel.bash`
seq -f "n%04g" 1 100 | env_parallel -P 10 echo_var {}
Something like this should work also:
function testing() { sleep $1 ; }
echo {1..10} | xargs -n 1 | xargs -I# -P4 bash -c "$(declare -f testing) ; testing # ; echo # "
Maybe this is bad practice, but you if you are defining functions in a .bashrc or other script, you can wrap the file or at least the function definitions with a setting of allexport:
set -o allexport
function funcy_town {
echo 'this is a function'
}
function func_rock {
echo 'this is a function, but different'
}
function cyber_func {
echo 'this function does important things'
}
function the_man_from_funcle {
echo 'not gonna lie'
}
function funcle_wiggly {
echo 'at this point I\'m doing it for the funny names'
}
function extreme_function {
echo 'goodbye'
}
set +o allexport
Seems I can't make comments :-(
I was wondering about the focus on
bash -c 'echo_var "$#"' _ {}
vs
bash -c 'echo_var "{}"'
The 1st substitutes the {} as an arg to bash while the 2nd as an arg to the function. The fact that example 1 doesn't expand the $(date) is simply a a side effect.
If you don't want the functions args expanded , use single single quotes rather than double. To avoid messy nesting , use double quote (expand args on the other one)
$ echo '$(date)' | xargs -0 -L1 -I {} bash -c 'printit "{}"'
Fri 11 Sep 17:02:24 BST 2020
$ echo '$(date)' | xargs -0 -L1 -I {} bash -c "printit '{}'"
$(date)

Use argument twice from standard output pipelining

I have a command line tool which receives two arguments:
TOOL arg1 -o arg2
I would like to invoke it with the same argument provided it for arg1 and arg2, and to make that easy for me, i thought i would do:
each <arg1_value> | TOOL $1 -o $1
but that doesn't work, $1 is not replaced, but is added once to the end of the commandline.
An explicit example, performing:
cp fileA fileA
returns an error fileA and fileA are identical (not copied)
While performing:
echo fileA | cp $1 $1
returns the following error:
usage: cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file target_file
cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file ... target_directory
any ideas?
If you want to use xargs, the [-I] option may help:
-I replace-str
Replace occurrences of replace-str in the initial-arguments with names read from standard input. Also, unquoted blanks do not terminate input items; instead the separa‐
tor is the newline character. Implies -x and -L 1.
Here is a simple example:
mkdir test && cd test && touch tmp
ls | xargs -I '{}' cp '{}' '{}'
Returns an Error cp: tmp and tmp are the same file
The xargs utility will duplicate its input stream to replace all placeholders in its argument if you use the -I flag:
$ echo hello | xargs -I XXX echo XXX XXX XXX
hello hello hello
The placeholder XXX (may be any string) is replaced with the entire line of input from the input stream to xargs, so if we give it two lines:
$ printf "hello\nworld\n" | xargs -I XXX echo XXX XXX XXX
hello hello hello
world world world
You may use this with your tool:
$ generate_args | xargs -I XXX TOOL XXX -o XXX
Where generate_args is a script, command or shell function that generates arguments for your tool.
The reason
each <arg1_value> | TOOL $1 -o $1
did not work, apart from each not being a command that I recognise, is that $1 expands to the first positional parameter of the current shell or function.
The following would have worked:
set - "arg1_value"
TOOL "$1" -o "$1"
because that sets the value of $1 before calling you tool.
You can re-run a shell to perform variable expansion, with sh -c. The -c takes an argument which is command to run in a shell, performing expansion. Next arguments of sh will be interpreted as $0, $1, and so on, to use in the -c. For example:
sh -c 'echo $1, i repeat: $1' foo bar baz will print execute echo $1, i repeat: $1 with $1 set to bar ($0 is set to foo and $2 to baz), finally printing bar, i repeat: bar
The $1,$2...$N are only visible to bash script to interpret arguments to those scripts and won't work the way you want them to. Piping redirects stdout to stdin and is not what you are looking for either.
If you just want a one-liner, use something like
ARG1=hello && tool $ARG1 $ARG1
Using GNU parallel to use STDIN four times, to print a multiplication table:
seq 5 | parallel 'echo {} \* {} = $(( {} * {} ))'
Output:
1 * 1 = 1
2 * 2 = 4
3 * 3 = 9
4 * 4 = 16
5 * 5 = 25
One could encapsulate the tool using awk:
$ echo arg1 arg2 | awk '{ system("echo TOOL " $1 " -o " $2) }'
TOOL arg1 -o arg2
Remove the echo within the system() call and TOOL should be executed in accordance with requirements:
echo arg1 arg2 | awk '{ system("TOOL " $1 " -o " $2) }'
Double up the data from a pipe, and feed it to a command two at a time, using sed and xargs:
seq 5 | sed p | xargs -L 2 echo
Output:
1 1
2 2
3 3
4 4
5 5

Using export -f with xargs not working [duplicate]

I am trying to use xargs to call a more complex function in parallel.
#!/bin/bash
echo_var(){
echo $1
return 0
}
seq -f "n%04g" 1 100 |xargs -n 1 -P 10 -i echo_var {}
exit 0
This returns the error
xargs: echo_var: No such file or directory
Any ideas on how I can use xargs to accomplish this, or any other solution(s) would be welcome.
Exporting the function should do it (untested):
export -f echo_var
seq -f "n%04g" 1 100 | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$#"' _ {}
You can use the builtin printf instead of the external seq:
printf "n%04g\n" {1..100} | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$#"' _ {}
Also, using return 0 and exit 0 like that masks any error value that might be produced by the command preceding it. Also, if there's no error, it's the default and thus somewhat redundant.
#phobic mentions that the Bash command could be simplified to
bash -c 'echo_var "{}"'
moving the {} directly inside it. But it's vulnerable to command injection as pointed out by #Sasha.
Here is an example why you should not use the embedded format:
$ echo '$(date)' | xargs -I {} bash -c 'echo_var "{}"'
Sun Aug 18 11:56:45 CDT 2019
Another example of why not:
echo '\"; date\"' | xargs -I {} bash -c 'echo_var "{}"'
This is what is output using the safe format:
$ echo '$(date)' | xargs -I {} bash -c 'echo_var "$#"' _ {}
$(date)
This is comparable to using parameterized SQL queries to avoid injection.
I'm using date in a command substitution or in escaped quotes here instead of the rm command used in Sasha's comment since it's non-destructive.
Using GNU Parallel is looks like this:
#!/bin/bash
echo_var(){
echo $1
return 0
}
export -f echo_var
seq -f "n%04g" 1 100 | parallel -P 10 echo_var {}
exit 0
If you use version 20170822 you do not even have to export -f as long as you have run this:
. `which env_parallel.bash`
seq -f "n%04g" 1 100 | env_parallel -P 10 echo_var {}
Something like this should work also:
function testing() { sleep $1 ; }
echo {1..10} | xargs -n 1 | xargs -I# -P4 bash -c "$(declare -f testing) ; testing # ; echo # "
Maybe this is bad practice, but you if you are defining functions in a .bashrc or other script, you can wrap the file or at least the function definitions with a setting of allexport:
set -o allexport
function funcy_town {
echo 'this is a function'
}
function func_rock {
echo 'this is a function, but different'
}
function cyber_func {
echo 'this function does important things'
}
function the_man_from_funcle {
echo 'not gonna lie'
}
function funcle_wiggly {
echo 'at this point I\'m doing it for the funny names'
}
function extreme_function {
echo 'goodbye'
}
set +o allexport
Seems I can't make comments :-(
I was wondering about the focus on
bash -c 'echo_var "$#"' _ {}
vs
bash -c 'echo_var "{}"'
The 1st substitutes the {} as an arg to bash while the 2nd as an arg to the function. The fact that example 1 doesn't expand the $(date) is simply a a side effect.
If you don't want the functions args expanded , use single single quotes rather than double. To avoid messy nesting , use double quote (expand args on the other one)
$ echo '$(date)' | xargs -0 -L1 -I {} bash -c 'printit "{}"'
Fri 11 Sep 17:02:24 BST 2020
$ echo '$(date)' | xargs -0 -L1 -I {} bash -c "printit '{}'"
$(date)

Resources