Passing arguments to parallel processing in xargs - bash

What is the purpose of xargs_tasks in the following command?:
cat source.ndjson | tr '\n' '\0' | xargs -0 -n 1 -P 8 sh -c './theprocess.py $1' xargs_tasks >> errors.log 2>&1
Namely, after reading the man page details of the -c option to sh, you can actually get the same results by running:
cat source.ndjson | tr '\n' '\0' | xargs -0 -n 1 -P 8 sh -c './theprocess.py $0' >> errors.log 2>&1
That is, with -c, any arguments that come after the string in quotes are assigned to the variables $0, $1, and so forth. xargs_tasks in the first command is set to $0, and the arguments piped to xargs one at a time get set to $1. Hence my second command works exactly the same since I directly have $0 inside the string instead of $1.
My guess was that xargs_tasks gives you a short string to filter with in something like htop, but that's a stretch.

When you use the -c option to sh, the next argument after the command is taken as $0 in the command, and the arguments after it become $1, $2, etc.
$0 is supposed to contain the name of the shell script, while arguments start at $1. So xargs_task is there just to be a placeholder to prevent the first argument from being put into $0.

Related

Passing arrays as command line arguments with xargs

I have the following two scripts:
#script1.sh:
#!/bin/bash
this_chunk=(1 2 3 4)
printf "%s\n" "${this_chunk[#]}" | ./script2.sh
#script2.sh:
#!/bin/bash
while read -r arr
do
echo "--$arr"
done
When I execute script1.sh, the output is as expected:
--1
--2
--3
--4
which shows that I was able to pipe the elements of the array this_chunk as arguments to script2.sh. However, if I change the line calling script2.sh to
printf "%s\n" "${this_chunk[#]}" | xargs ./script2.sh
there is no output. My question is, how to pass the array this_chunk using xargs, rather than simple piping? The reason is that I will have to deal with large arrays and thus long argument lists which will be a problem with piping.
Edit:
Based on the answers and comments, this is the correct way to do it:
#script1.sh
#!/bin/bash
this_chunk=(1 2 3 4)
printf "%s\0" "${this_chunk[#]}" | xargs -0 ./script2.sh
#script2.sh
#!/bin/bash
for i in "${#}"; do
echo $i
done
how to pass the array this_chunk using xargs
Note that xargs by default interprets ' " and \ sequences. To disable the interpretation, either preprocess the data, or better use GNU xargs with -d '\n' option. -d option is not part of POSIX xargs.
printf "%s\n" "${this_chunk[#]}" | xargs -d '\n' ./script2.sh
That said, with GNU xargs prefer zero terminated streams, to preserve newlines:
printf "%s\0" "${this_chunk[#]}" | xargs -0 ./script2.sh
Your script ./script2.sh ignores command line arguments, and your xargs spawns the process with standard input closed. Because the input is closed, read -r arr fails, so your scripts does not print anything, as expected. (Note that in POSIX xargs, when the spawned process tries to read from stdin, the result is unspecified.)

Basename not working with xargs place holder

I am trying to use basename utility in xargs piped from printf as below:
printf "%s" "$ACTUAL_FILES" | xargs -d ' ' -i printf "%s\n" "$(basename {})"
Here $ACTUAL_FILES is an array of absolute file paths, each delimited with a space.
With the above snippet I am trying to print filename without path in each line. But the output I am getting is same as in $ACTUAL_FILES with each element in new line.
I know that we can achieve this with bash sub shell and echo with xargs, but I was informed to use printf with xargs.
How can I use basename or any other utility to get the filename.
You need to strip the path after processing xargs (I write your var in lowercase):
printf "%s" "${actual_files}" | xargs -d ' ' -i printf "%s\n" "{}" | sed 's#.*/##'
Processing can be easier when you start with replacing spaces by newlines.
tr ' ' '\n' <<< "${actual_files}"| sed 's#.*/##'
You can avoid tr with
grep -Eo "[^/]*( |$)" <<< "${actual_files}"
xargs sends your arguments directly to the command. There is no shell intervention anymore when xargs makes the arguments and sends them to the command. Using xargs you cannot make use of a subshell call ($(..)). You can either preparse them before sending them to xargs, or you can let xargs make a shell instead in which you can make use of all the shell features.
printf "%s" "$ACTUAL_FILES" | xargs -d ' ' -i bash -c 'printf "%s\n" "$(basename {})"'
If it is possible for you to use, GNU parallel comes with much more features in building a command.
printf "%s" "$ACTUAL_FILES" | parallel -q printf "%s\n" "{/}"
In here {/} automatically is the basename of the input arguments and -q preserves the quoting used.

Curl and xargs in piped commands

I want to process an old database where password are plain text (comma separated ; passwd is the 5th field in the csv file where the database has been exported) to crypt them for further use by dokuwiki. Here is my bash command (grep and sed are there to extract the crypted passwd from curl output) :
cat users.csv | awk 'FS="," { print $4 }' | xargs -l bash -c 'curl -s --data-binary "pass1=$0&pass2=$0" "https://sprhost.com/tools/SMD5.php" -o - ' | xargs | grep -o '<tt.*tt>' | sed -e 's/tt//g' | sed -e 's/<[^>]*>//g'
I get the following comment from xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
And only the first line of the file is processed, and nothing appends then.
Using the -0 option, and playing around with quotes, doesn't solve anything. Where am I wrong in the command line ? May be a more advanced language will be more adequate to do this.
Thank for help, LM
In general, if you have such a long pipe of commands, it is better to split them if things go wrong. Going through your pipe:
cat users.csv |
Nothing unexpected there.
awk 'FS="," { print $4 }' |
You probably wanted to do awk 'BEGIN {FS=","} { print $4 }'. Try the first two commands in the pipe and see if they produce the correct answer.
xargs -l bash -c 'curl -s --data-binary "pass1=$0&pass2=$0" "https://sprhost.com/tools/SMD5.php" -o - ' |
Nothing wrong there, although there might be better ways to do an MD5 hash.
xargs |
What is this xargs doing in the pipe? It should be removed.
grep -o '<tt.*tt>' |
Note that this will produce two lines:
<tt>$1$17ab075e$0VQMuM3cr5CtElvMxrPcE0</tt>
<tt><your_docuwiki_root>/conf/users.auth.php</tt>
which is probably not what you expected.
sed -e 's/tt//g' |
sed -e 's/<[^>]*>//g'
which will remove the html-tags, though
sed 's/<tt>//;s/<.tt>//'
will do the same.
So I'd say a wrong awk and an xargs too many.

Bash code error unexpected syntax error

I am not sure why i am getting the unexpected syntax '( err
#!/bin/bash
DirBogoDict=$1
BogoFilter=/home/nikhilkulkarni/Downloads/bogofilter-1.2.4/src/bogofilter
echo "spam.."
for i in 'cat full/index |fgrep spam |awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500'
do
cat $i |$BogoFilter -d $DirBogoDict -M -k 1024 -v
done
echo "ham.."
for i in 'cat full/index | fgrep ham | awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500'
do
cat $i |$BogoFilter -d $DirBogoDict -M -k 1024 -v
done
Error:
./score.bash: line 7: syntax error near unexpected token `('
./score.bash: line 7: `for i in 'cat full/index |fgrep spam |awk -F"/" '{if(NR>1000)print$2"/"$3}'|head -500''
Uh, because you have massive syntax errors.
The immediate problem is that you have an unpaired single quote before the cat which exposes the Awk script to the shell, which of course cannot parse it as shell script code.
Presumably you want to use backticks instead of single quotes, although you should actually not read input with for.
With a fair bit of refactoring, you might want something like
for type in spam ham; do
awk -F"/" -v type="$type" '$0 ~ type && NR>1000 && i++<500 {
print $2"/"$3 }' full/index |
xargs $BogoFilter -d $DirBogoDict -M -k 1024 -v
done
This refactors the useless cat | grep | awk | head into a single Awk script, and avoids the silly loop over each output line. I assume bogofilter can read file name arguments; if not, you will need to refactor the xargs slightly. If you can pipe all the files in one go, try
... xargs cat | $BogoFilter -d $DirBogoDict -M -k 1024 -v
or if you really need to pass in one at a time, maybe
... xargs sh -c 'for f; do $BogoFilter -d $DirBogoDict -M -k 1024 -v <"$f"; done' _
... in which case you will need to export the variables BogoFilter and DirBogoDict to expose them to the subshell (or just inline them -- why do you need them to be variables in the first place? Putting command names in variables is particularly weird; just update your PATH and then simply use the command's name).
In general, if you find yourself typing the same commands more than once, you should think about how to avoid that. This is called the DRY principle.
The syntax error is due to bad quoting. The expression whose output you want to loop over should be in command substitution syntax ($(...) or backticks), not single quotes.

pipe tail output into another script

I am trying to pipe the output of a tail command into another bash script to process:
tail -n +1 -f your_log_file | myscript.sh
However, when I run it, the $1 parameter (inside the myscript.sh) never gets reached. What am I missing? How do I pipe the output to be the input parameter of the script?
PS - I want tail to run forever and continue piping each individual line into the script.
Edit
For now the entire contents of myscripts.sh are:
echo $1;
Generally, here is one way to handle standard input to a script:
#!/bin/bash
while read line; do
echo $line
done
That is a very rough bash equivalent to cat. It does demonstrate a key fact: each command inside the script inherits its standard input from the shell, so you don't really need to do anything special to get access to the data coming in. read takes its input from the shell, which (in your case) is getting its input from the tail process connected to it via the pipe.
As another example, consider this script; we'll call it 'mygrep.sh'.
#!/bin/bash
grep "$1"
Now the pipeline
some-text-producing-command | ./mygrep.sh bob
behaves identically to
some-text-producing-command | grep bob
$1 is set if you call your script like this:
./myscript.sh foo
Then $1 has the value "foo".
The positional parameters and standard input are separate; you could do this
tail -n +1 -f your_log_file | myscript.sh foo
Now standard input is still coming from the tail process, and $1 is still set to 'foo'.
Perhaps your were confused with awk?
tail -n +1 -f your_log_file | awk '{
print $1
}'
would print the first column from the output of the tail command.
In the shell, a similar effect can be achieved with:
tail -n +1 -f your_log_file | while read first junk; do
echo "$first"
done
Alternatively, you could put the whole while ... done loop inside myscript.sh
Piping connects the output (stdout) of one process to the input (stdin) of another process. stdin is not the same thing as the arguments sent to a process when it starts.
What you want to do is convert the lines in the output of your first process into arguments for the the second process. This is exactly what the xargs command is for.
All you need to do is pipe an xargs in between the initial command and it will work:
tail -n +1 -f your_log_file | xargs | myscript.sh

Resources