calling shell function using parallel with list of quoted filenames as input - bash

Using Bash.
I have an exported shell function which I want to apply to many files.
Normally I would use xargs, but the syntax like this (see here) is too ugly for use.
...... | xargs -n 1 -P 10 -I {} bash -c 'echo_var "$#"' _ {}
In that discussion, parallel had an easier syntax:
..... | parallel -P 10 echo_var {}
Now I have run into the following problem: the list of files to which I want to apply my function is a list of files on one line, each quoted and separated by spaces thus:
"file 1" "file 2" "file 3".
how can I feed this space-separated, quoted, list into parallel?
I can replicate the list using echo for testing.
e.g.
echo '"file 1" "file 2" "file 3"'|parallel -d " " my_function {}
but I can't get this to work.
How can I fix it?

How can I fix it?
You have to choose a unique separator.
echo 'file 1|file 2|file 3' | xargs -d "|" -n1 bash -c 'my_function "$#"' --
echo 'file 1^file 2^file 3' | parallel -d "^" my_function
The safest is to use zero byte as the separator:
echo -e 'file 1\x00file 2\x00file 3' | xargs -0 ' -n1 bash -c 'my_function "$#"' --
printf "%s\0" 'file 1' 'file 2' 'file 3' | parallel -0 my_function
The best is to store your elements inside a bash array and use a zero separated stream to process them:
files=("file 1" "file 2" "file 3")
printf "%s\0" "${files[#]}" | xargs -0 -n1 bash -c 'my_function "$#"' --
printf "%s\0" "${files[#]}" | parallel -0 my_function
Note that empty arrays will run the function without any arguments. It's sometimes preferred to use -r --no-run-if-empty option not to run the function when input is empty. The --no-run-if-empty is supported by parallel and is a gnu extension in xargs (xargs on BSD and on OSX do not have --no-run-if-empty).
Note: xargs by default parses ', " and \. This is why the following is possible and will work:
echo '"file 1" "file 2" "file 3"' | xargs -n1 bash -c 'my_function "$#"' --
echo "'file 1' 'file 2' 'file 3'" | xargs -n1 bash -c 'my_function "$#"' --
echo 'file\ 1 file\ 2 file\ 3' | xargs -n1 bash -c 'my_function "$#"' --
And it can result in some strange things, so remember to almost always specify -d option to xargs:
$ # note \x replaced by single x
$ echo '\\a\b\c' | xargs
\abc
$ # quotes are parsed and need to match
$ echo 'abc"def' | xargs
xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option
$ echo "abc'def" | xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
xargs is a portable tool available quite everywhere, while parallel is a GNU program, which has to be installed separately.

The problem boils down to the values can contain space, and space is the value separator. So we need something that can parse the input into separate values containing space. Since they are bash-quoted the obvious choice is to use bash for unquoting the values.
You have several options:
(echo "file 1";
echo "file 2";
echo "file \"name\" \$(3)") | parallel my_function
printf "%s\n" "file 1" "file 2" "file \"name\" \$(3)" |
parallel my_function
If the input is in a variable:
var='"file 1" "file 2" "file \"name\" \$(3)"'
eval 'printf "%s\n" '"$var" |
parallel my_function
Or you can convert the variable to an array:
var='"file 1" "file 2" "file \"name\" \$(3)"'
eval arr=("$var")
And if the input is in an array:
parallel my_function ::: "${arr[#]}"

Related

make the bash script to be faster

I have a fairly large list of websites in "file.txt" and wanted to check if the words "Hello World!" in the site in the list using looping and curl.
i.e in "file.txt" :
blabla.com
blabla2.com
blabla3.com
then my code :
#!/bin/bash
put() {
printf "list : "
read list
run=$(cat $list)
}
put
scan_list() {
for run in $(cat $list);do
if [[ $(curl -skL ${run}) =~ "Hello World!" ]];then
printf "${run} Hello World! \n"
else
printf "${run} No Hello:( \n"
fi
done
}
scan_list
this takes a lot of time, is there a way to make the checking process faster?
Use xargs:
% tr '\12' '\0' < file.txt | \
xargs -0 -r -n 1 -t -P 3 sh -c '
if curl -skL "$1" | grep -q "Hello World!"; then
echo "$1 Hello World!"
exit
fi
echo "$1 No Hello:("
' _
Use tr to convert returns in the file.txt to nulls (\0).
Pass through xargs with -0 option to parse by nulls.
The -r option prevents the command from being ran if the input is empty. This is only available on Linux, so for macOS or *BSD you will need to check that file.txt is not empty before running.
The -n 1 permits only one file per execution.
The -t option is debugging, it prints the command before it is ran.
We allow 3 simultaneous commands in parallel with the -P 3 option.
Using sh -c with a single quoted multi-line command, we substitute $1 for the entries from the file.
The _ fills in the $0 argument, so our entries are $1.

How to extract code into a funciton when using xargs -P?

At fisrt,I have write the code,and it run well.
# version1
all_num=10
thread_num=5
a=$(date +%H%M%S)
seq 1 ${all_num} | xargs -n 1 -I {} -P ${thread_num} sh -c 'echo abc{}'
b=$(date +%H%M%S)
echo -e "startTime:\t$a"
echo -e "endTime:\t$b"
Now I want to extract code into a funciton,but it was wrong,how to fix it?
get_file(i){
echo "abc"+i
}
all_num=10
thread_num=5
a=$(date +%H%M%S)
seq 1 ${all_num} | xargs -n 1 -I {} -P ${thread_num} sh -c "$(get_file {})"
b=$(date +%H%M%S)
echo -e "startTime:\t$a"
echo -e "endTime:\t$b"
Because /bin/sh isn't guaranteed to have support for either printing text that when evaluates defines your function, or exporting functions through the environment, we need to do this the hard way, just duplicating the text of the function inside the copy of sh started by xargs.
Other questions already exist in this site describing how to accomplish this with bash, which is quite considerably easier. See f/e How can I use xargs to run a function in a command substitution for each match?
#!/bin/sh
all_num=10
thread_num=5
batch_size=1 # but with a larger all_num, turn this up to start fewer copies of sh
a=$(date +%H%M%S) # warning: this is really inefficient
seq 1 ${all_num} | xargs -n "${batch_size}" -P "${thread_num}" sh -c '
get_file() { i=$1; echo "abc ${i}"; }
for arg do
get_file "$arg"
done
' _
b=$(date +%H%M%S)
printf 'startTime:\t%s\n' "$a"
printf 'endTime:\t%s\n' "$b"
Note:
echo -e is not guaranteed to work with /bin/sh. Moreover, for a shell to be truly compliant, echo -e is required to write -e to its output. See Why is printf better than echo? on UNIX & Linux Stack Exchange, and the APPLICATION USAGE section of the POSIX echo specification.
Putting {} in a sh -c '...{}...' position is a Really Bad Idea. Consider the case where you're passed in a filename that contains $(rm -rf ~)'$(rm -rf ~)' -- it can't be safely inserted in an unquoted context, or a double-quoted context, or a single-quoted context, or a heredoc.
Note that seq is also nonstandard and not guaranteed to be present on all POSIX-compliant systems. i=0; while [ "$i" -lt "$all_num" ]; do echo "$i"; i=$((i + 1)); done is an alternative that will work on all POSIX systems.

Why I am not getting a value when i call a function within another in a bash script

I have a function that generates a random file name
#generate random file names
get_rand_filename() {
if [ "$ASCIIONLY" == "1" ]; then
for ((i=0; i<$((MINFILENAMELEN+RANDOM%MAXFILENAMELEN)); i++)) {
printf \\$(printf '%03o' ${AARR[RANDOM%aarrcount]});
}
else
# no need to escape double quotes for filename
cat /dev/urandom | tr -dc '[ -~]' | tr -d '[$></~:`\\]' | head -c$((MINFILENAMELEN+RANDOM%MAXFILENAMELEN)) #| sed 's/\(["]\)/\\\1/g'
fi
printf "%s" $FILEEXT
}
export -f get_rand_filename
When I call it from within another function
cf(){
fD=$1
echo "the target dir recieved is " $fD
CFILE="$(get_rand_filename)"
echo "the file name is "$CFILE
}
export -f cf
when I call
echo "$targetdir" | xargs -0 sh -c 'cf $1' sh
I only get the FILEXT (no random file name)
when I call
cf "$targetdir"
I get a valid result
I need to be able to handle spaces in the $targetdir and file name string.
echo "$targetdir" | xargs -0 sh -c 'cf $1' sh
You should invoke bash rather than sh. Function exporting is a bash feature.
$ foo() { echo bar; }
$ export -f foo
$ sh -c 'foo'
sh: 1: foo: not found
$ bash -c 'foo'
bar
Also, get rid of the -0 option since the input isn't NUL-separated. Use -d'\n' instead. And quote "$1" for robustness.
echo "$targetdir" | xargs -d'\n' bash -c 'cf "$1"' bash
Actually, you could use -0 if you change the input format.
printf '%s\0' "$targetdir" | xargs -0 bash -c 'cf "$1"' bash
For what it's worth, mktemp creates random temporary files, and does it safely. It makes sure the file doesn't already exist and then creates it to prevent anybody else from snatching up the name in the split second between the name being generated and it being returned to the caller.

Get xargs to word-split placeholder {}

(Though word splitting has a specific definition in Bash, in this post it means to split on spaces or tabs.)
Demonstrating the question using this input to xargs,
$ cat input.txt
LineOneWithOneArg
LineTwo WithTwoArgs
LineThree WithThree Args
LineFour With Double Spaces
and this Bash command to echo the arguments passed to it,
$ bash -c 'IFS=,; echo "$*"' arg0 arg1 arg2 arg3
arg1,arg2,arg3
notice how xargs -L1 word-splits each line into multiple arguments.
$ xargs <input.txt -L1 bash -c 'IFS=,; echo "$*"' arg0
LineOneWithOneArg
LineTwo,WithTwoArgs
LineThree,WithThree,Args
LineFour,With,Double,Spaces
However, xargs -I{} expands the whole line into {} as a single argument.
$ xargs <input.txt -I{} bash -c 'IFS=,; echo "$*"' arg0 {}
LineOneWithOneArg
LineTwo WithTwoArgs
LineThree WithThree Args
LineFour With Double Spaces
While this is perfectly reasonable behavior for the vast majority of cases, there are times when word-splitting behavior (the first xargs example) is preferred.
And while xargs -L1 can be seen as a workaround, it can only be used to place arguments at the end of the command line, making it impossible to express
$ xargs -I{} command first-arg {} last-arg
with xargs -L1. (That is of course, unless command is able to accept arguments in a different order, as is the case with options.)
Is there any way to get xargs -I{} to word-split each line when expanding the {} placeholder?
Sort of.
echo -e "1\n2 3" | xargs sh -c 'echo a "$#" b' "$0"
Outputs:
a 1 2 3 b
ref: https://stackoverflow.com/a/35612138/1563960
Also:
echo -e "1\n2 3" | xargs -L1 sh -c 'echo a "$#" b' "$0"
Outputs:
a 1 b
a 2 3 b

why shell for expression cannot parse xargs parameter correctly

I have a black list to save tag id list, e.g. 1-3,7-9, actually it represents 1,2,3,7,8,9. And could expand it by below shell
for i in {1..3,7..9}; do for j in {$i}; do echo -n "$j,"; done; done
1,2,3,7,8,9
but first I should convert - to ..
echo -n "1-3,7-9" | sed 's/-/../g'
1..3,7..9
then put it into for expression as a parameter
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # for i in {#}; do for j in {$i}; do echo -n "$j,"; done; done
zsh: parse error near `do'
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # echo #
1..3,7..9
but for expression cannot parse it correctly, why is so?
Because you didn't do anything to stop the outermost shell from picking up the special keywords and characters ( do, for, $, etc ) that you mean to be run by xargs.
xargs isn't a shell built-in; it gets the command line you want it to run for each element on stdin, from its arguments. just like any other program, if you want ; or any other sequence special to be bash in an argument, you need to somehow escape it.
It seems like what you really want here, in my mind, is to invoke in a subshell a command ( your nested for loops ) for each input element.
I've come up with this; it seems to to the job:
echo -n "1-3,7-9" \
| sed 's/-/../g' \
| xargs -I # \
bash -c "for i in {#}; do for j in {\$i}; do echo -n \"\$j,\"; done; done;"
which gives:
{1..3},{7..9},
Could use below shell to achieve this
# Mac newline need special treatment
echo "1-3,7-9" | sed -e 's/-/../g' -e $'s/,/\\\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,%
#Linux
echo "1-3,7-9" | sed -e 's/-/../g' -e 's/,/\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,
but use this way is a little complicated maybe awk is more intuitive
# awk
echo "1-3,7-9,11,13-17" | awk '{n=split($0,a,","); for(i=1;i<=n;i++){m=split(a[i],a2,"-");for(j=a2[1];j<=a2[m];j++){print j}}}' | tr '\n' ','
1,2,3,7,8,9,11,13,14,15,16,17,%
echo -n "1-3,7-9" | perl -ne 's/-/../g;$,=",";print eval $_'

Resources