appending file contents as parameter for unix shell command - bash

I'm looking for a unix shell command to append the contents of a file as the parameters of another shell command. For example:
command << commandArguments.txt

xargs was built specifically for this:
cat commandArguments.txt | xargs mycommand
If you have multiple lines in the file, you can use xargs -L1 -P10 to run ten copies of your command at a time, in parallel.

xargs takes its standard in and formats it as positional parameters for a shell command. It was originally meant to deal with short command line limits, but it is useful for other purposes as well.
For example, within the last minute I've used it to connect to 10 servers in parallel and check their uptimes:
echo server{1..10} | tr ' ' '\n' | xargs -n 1 -P 50 -I ^ ssh ^ uptime
Some interesting aspects of this command pipeline:
The names of the servers to connect to were taken from the incoming pipe
The tr is needed to put each name on its own line. This is because xargs expects line-delimited input
The -n option controls how many incoming lines are used per command invocation. -n 1 says make a new ssh process for each incoming line.
By default, the parameters are appended to the end of the command. With -I, one can specify a token (^) that will be replaced with the argument instead.
The -P controls how many child processes to run concurrently, greatly widening the space of interesting possibilities..

command `cat commandArguments.txt`
Using backticks will use the result of the enclosed command as a literal in the outer command

Related

Joining every group of N lines into one with bash

I would like to join every group of N lines in the output of another command using bash.
Are there any standard linux commands i can use to achieve this?
Example:
./command
46.219464 0.000993
17.951781 0.002545
15.770583 0.002873
87.431820 0.000664
97.380751 0.001921
25.338819 0.007437
Desired output:
46.219464 0.000993 17.951781 0.002545
15.770583 0.002873 87.431820 0.000664
97.380751 0.001921 25.338819 0.007437
If your output has consistent number of fields, you can use xargs -n N to group on X elements per line:
$ ...command... | xargs -n4
46.219464 0.000993 17.951781 0.002545
15.770583 0.002873 87.431820 0.000664
97.380751 0.001921 25.338819 0.007437
From man xargs:
-n max-args, --max-args=max-args
Use at most max-args arguments per command line. Fewer than max-args
arguments will be used if the size (see the -s option) is exceeded,
unless the -x option is given, in which case xargs will exit.
Seems like you're trying to join every two lines with the delimiter \t(tab). If yes then you could try the below paste command,
command | paste -d'\t' - -
If you want space as delimiter then use -d<space>,
command | paste -d' ' - -

Pass Every Line of Input as stdin for Invocation of Utility

I have a file containing valid xmls (one per line) and I want to execute a utility (xpath) on each line one by one.
I tried xargs but that seems doesn't seem to have an option to pass the line as stdin :-
% cat <xmls-file> | xargs -p -t -L1 xpath -p "//Path/to/node"
Cannot open file '//Path/to/node' at /System/Library/Perl/Extras/5.12/XML/XPath.pm line 53.
I also tried parallel --spreadstdin but that doesn't seem to work either :-
% cat <xmls-file> | parallel --spreadstdin xpath -p "//Path/to/node"
junk after document element at line 2, column 0, byte 1607
If you want every line of a file to be split off and made stdin for a utility
you could use a for loop in bash shell:
cat xmls-file | while read line
do ( echo $f > /tmp/input$$;
xpath -p "//Path/to/node" </tmp/input$$
rm -f /tmp/input$$
);
done
The $$ appends the process id number, creating a unique name
I assume xmls-file contains, on each line, what you want iterated into $f and that you want this as stdin for a command line, not as a parameter to the command.
On the other hand, your specification may be incorrect and maybe instead you need each line
to be part of a command. In that case, delete the echo and rm lines, and change the xpath command to include $f wherever the line from the file is needed.
I've not done much XML so the do command may need to be edited.
You are very close with the GNU Parallel version; only -n1 missing:
cat <xmls-file> | parallel -n1 --spreadstdin xpath -p "//Path/to/node"

xargs input involving spaces

I am working on a Mac using OSX and I'm using bash as my shell. I have a script that goes something to the effect of:
VAR1="pass me into parallel please!"
VAR2="oh me too, and there's actually a lot of us, but its best we stay here too"
printf "%s\n" {0..249} | xargs -0 -P 8 -n 1 . ./parallel.sh
I get the error: xargs: .: Permission denied. The purpose is to run a another script in parallel (called parallel.sh) which get's fed the numbers 0-249. Additionally I want to make sure that parallel can see and us VAR1 and VAR2. But when I try to source the script parallel with . ./parallel, xargs doesn't like that. The point of sourcing is because the script has other variables I wish parallel to have access to.
I have read something about using print0 since xargs separates it's inputs by spaces, but I really didn't understand what -print0 does and how to use it. Thanks for any help you guys can offer.
If you want the several processes running the script, then they can't be part of the parent process and therefore they can't access the exact same variables. However, if you export your variables, then each process can get a copy of them:
export VAR1="pass me into parallel please!"
export VAR2="oh me too, and there's actually a lot of us, but its best we stay here too"
printf "%s\n" {0..249} | xargs -P 8 -n 1 ./parallel.sh
Now you can just drop the extra dot since you aren't sourcing the parallel.sh script, you are just running it.
Also there is no need to use -0 since your input is just a series of numbers, one on each line.
To avoid the space problem I'd use new line character as separator for xargs with the -d option:
xargs -d '\n' ...
i think you have permission issues , try getting a execute permission for that file "parallel.sh"
command works fine for me :
Kaizen ~/so_test $ printf "%s\n" {0..4} | xargs -0 -P 8 -n 1 echo
0
1
2
3
4
man find :
-print0
True; print the full file name on the standard output, followed by a
null character (instead of the newline character that -print uses).
This allows file names that contain newlines or other types of white space to be correctly interpreted by programs that process the find
output. This option corresponds to the -0 option of xargs.
for print0 use : check the link out : there is a question for it in stack overflow
Capturing output of find . -print0 into a bash array
The issue of passing arguments is related to xarg's interpretation of white space. From the xargs man page:
-0 Change xargs to expect NUL (``\0'') characters as separators, instead of spaces and newlines.
The issue of environment variables can be solved by using export to make the variables available to subprocesses:
say.sh
echo "$1 $V"
result
bash$ export V=whatevs
bash$ printf "%s\n" {0..3} | xargs -P 8 -n 1 ./say.sh
1 whatevs
2 whatevs
0 whatevs
3 whatevs

Change text in argument for xargs (or GNU Parallel)

I have a program that I can run in two ways: single-end or paired-end mode. Here's the syntax:
program <output-directory-name> <input1> [input2]
Where the output directory and at least one input is required. If I wanted to run this on three files, say, sample A, B, and C, I would use something like find with xargs or parallel:
user#host:~/single$ ls
sampleA.txt sampleB.txt sampleC.txt
user#host:~/single$ find . -name "sample*" | xargs -i echo program {}-out {}
program ./sampleA.txt-out ./sampleA.txt
program ./sampleB.txt-out ./sampleB.txt
program ./sampleC.txt-out ./sampleC.txt
user#host:~/single$ find . -name "sample*" | parallel --dry-run program {}-out {}
program ./sampleA.txt-out ./sampleA.txt
program ./sampleB.txt-out ./sampleB.txt
program ./sampleC.txt-out ./sampleC.txt
But when I want to run the program in "paired-end" mode, I need to give it two inputs. These are related files, but they can't simply be concatenated - you have to run the program with both as inputs. Files are named sensibly, e.g., sampleA_1.txt and sampleA_2.txt.
I want to be able to create this easily on the command line with something like xargs (or preferably parallel):
user#host:~/paired$ ls
sampleA_1.txt sampleB_1.txt sampleC_1.txt
sampleA_2.txt sampleB_2.txt sampleC_2.txt
user#host:~/paired$ find . -name "sample*_1.txt" | sed/awk? | parallel ?
program ./sampleA-out ./sampleA_1.txt ./sampleA_2.txt
program ./sampleB-out ./sampleB_1.txt ./sampleB_2.txt
program ./sampleC-out ./sampleC_1.txt ./sampleC_2.txt
Ideally, the command would strip off the _1.txt to create the output directory name (sampleA-out, etc), but I really need to be able to take that argument and change the _1 to a _2 for the second input.
I know this is dead simple with a script - I did this in Perl with a quick regular expression substitution. But I would love to be able to do this with a quick one-liner.
Thanks in advance.
I did this in Perl with a quick regular expression substitution. But I would love to be able to do this with a quick one-liner.
Perl has one-liners, too, just as sed and awk do. You can write:
find . -name "sample*_1.txt" | perl -pe 's/_1\.txt$//' | parallel program {}-out {}_1.txt {}_2.txt
(The -e flag means "the next argument is the program text"; the -p flag means "the program should be run in loop; for each line of input, set $_ to that line, then run the program, then print $_".)
With sed and xargs you could do something like this:
find . -name "sample*_1.txt" | sed -n 's/_1\..*$//;h;s/$/_out/p;g;s/$/_1.txt/p;g;s/$/_2.txt/p' | xargs -L 3 echo program
I.e.: sed creates the three arguments and xargs -L 3 composes commands lines with three arguments.
Assuming you always have exactly 2 files in your directory for each pair and assuming they get sorted the right way by find (this you can ensure by piping results of find through sort), maybe xargs -l 2 would do the job. This tells xargs to place 2 consecutive incoming parameters on each command line it executes.
A shorter version:
parallel --xapply program {1.}.out {1} {2} :::: <(ls *_1.txt) <(ls *_2.txt)
but this only works if every _1.txt has a matching _2.txt and vice versa.

To understand xargs better

I want to understand the use of xargs man in Rampion's code:
screen -t man /bin/sh -c 'xargs man || read'
Thanks to Rampion: we do not need cat!
Why do we need xargs in the command?
I understand the xargs -part as follows
cat nothing to xargs
xargs makes a list of man -commands
I have had an idea that xargs makes a list of commands. For instance,
find . -type f -print0 | xargs -0 grep masi
is the same as a list of commands:
find fileA AND grep masi in it
find fileB AND grep masi in it
and so on for fileC, fileD, ...
No, I don't cat nothing. I cat whatever input I get after I run the command. cat is actually extraneous here, so let's ignore it.
xargs man waits on user input. Which is necessary. Since in the script you grabbed that from, I can't paste in the argument for man until after I create the window. So the command that runs in the window needs to wait for me to give it something, before it tries to run man.
If we just ran screen /bin/sh -d 'man || read', it would always complain "What manual page do you want?" since we never told it.
xargs gathers arguments from stdin and executes the command given with those arguments.
so cat is waiting for something to be typed, and then xargs is running man with that input.
xargs is useful if you have a lot of files to process, I often use it with output from find.
xargs will stuff as many arguments as it can onto the command line.
It's great for doing something like
find . -name '*.o' -print | xargs rm
The cat command does not operate on nothing; it operates on standard input, up until it is told that the input is ended. As Rampion notes, the cat command is not necessary here, but it is operating on its implicit input (standard input), not on nothing.
The xargs command reads the output from cat, and groups the information into arguments to the man command specified as its (only) argument. When it reaches a limit (configurable on the command line), it will execute the man command.
The find ... -print0 | xargs -0 ... idiom deals with file names that contain awkward characters such as blanks, tabs and newlines. The find command prints each filename followed by an ASCII NUL ('\0'); this is one of two characters that cannot appear in a simple file name - the other being '/' (which appears in path names, of course, but not in simple file names). It is not directly equivalent to the sequence you provide; xargs groups collections of file names into a single argument list, up to a size limit. If the names are short enough (they usually are), then there will be fewer executions of grep than there are file names.
Note, too, the grep only prints the file name where the material is found if it has more than one file to search -- or if it supports an option so that it always prints the file names and the option is used: '-H' is a GNU extension to grep that does this. The portable way to ensure that the file names always appear is to list /dev/null as the first file (so 'xargs grep something /dev/null'); it doesn't take long to search /dev/null.

Resources