Piping output to executable invokes the executable multiple times - bash

I'm piping a command's output to be used as arguments for an executable:
command | xargs -d '\n' "executable"
When the command yields sufficiently many lines of output I can see that the executable is run multiple times with sub-pages of the data each run. This is problematic because the state in the executable assumes that each invocation is independent from the next.
Is it possible to force the "command" to feed the entire output in a single go to the executable?

Don't use xargs, use $(...) to substitute the output into the command line.
IFS=$'\n' # this is analogous to -d '\n' in xargs
set -o noglob # prevent wildcard expansion when substituting command output
executable $(command)
However, this could get an error if the output of command is too long. xargs splits it up into multiple invocations to prevent this. But if you really require everything to be in one invocation, the error is the way to tell that this isn't possible, and prevents the incorrect results due to multiple invocations.

Is it possible to force the "command" to feed the entire output in a single go to the executable?
Yes and no.
To run the executable only once, you can use
command | bash -c 'mapfile -t a; executable "${a[#]}"'
However, this might fail if you exceed ARG_MAX of your system. A program invocation together with its arguments and environment variables must be smaller than ARG_MAX bytes. (On Linux there is even an additional restriction limiting the size of each single argument). There is no way around this.
You can check your ARG_MAX using getconf ARG_MAX or xargs --show-limits < /dev/null. This website compiled a nice list of the values on various systems.
If you are barely over the maximum and don't need environment variables, you can clear the environment to make some space.
command | env -i bash -c 'mapfile -t a; executable "${a[#]}"'
Other than that there is no way but to run executable multiple times or modify it, preferably so that it read lines from stdin instead of arguments. That way you can write
command | executable

Related

How to count number of lines in a file in a Makefile?

I have a Makefile in which I have a text file and a certain number. I would like to compare the number of lines in that text file to see if it is equal to or greater than that number, but haven't had success using the wc bash command.
I've tried using wc on the text file, but $(wc -l < filename.txt) always evaluates as empty, making ifeq($(wc -l < filename.txt), number)) error out. I've tried using the bash syntax for if statements, but that didn't work either.
e.g.
TEST="$(wc -l < missing_test_line_count.txt)"
TEST1=$(wc -l missing_test_line_count.txt)
TEST2=`wc -l missing_test_line_count.txt`
TEST3="$(wc -l missing_test_line_count.txt)"
Doing #echo $(value TEST) for any of these variables (TEST thru TEST3) results in an empty output.
I've also tried to put the number of lines in the text file into another text file (e.g. linecount.txt with a single line that says '30'). Then I tried to get that file content stored into a variable so I could compare it to the number, but that hasn't worked because I cannot define the variable at the beginning.
Any suggestions? Is this possible, or do I have to write a script separately? Would like to do it within the Makefile if possible.
First of all, you should specify, that you need GNU make's syntax (your reference to ifeq is one's only clue to that).
Back to your actual question, there is no wc function in gmake itself. What you're trying to do is to execute an external command. You do that with a shell function:
TEST= $(shell wc -l < missing_test_line_count.txt)
In BSD make's the same is achieved with the !=-assignment:
TEST!= wc -l < missing_test_line_count.txt
To run a shell as part of a variable expansion in a GNUMakefile, you need to use the shell function:
TEST:=$(shell wc -l < missing_test_line_count.txt)
will run the command in question when this line is read in the Makefile, setting TEST to be the result. Alternately you can use
TEST=$(shell wc -l < missing_test_line_count.txt)
which will run the command each time the $(TEST) variable is expanded.
Note that this function (and functions in general) is a GNUMake extension. Normal make does not support doing this.

Replication and expansion of program flags in BASH script

I am working with a program that combines individuals files, and I am incorporating this program into a BASH pipeline that I'm putting together. The program requires a flag for each file, like so:
program -V file_1.g.vcf -V file_2.g.vcf -V file_3.g.vcf -O combined_output.g.vcf
In order to allow the script to work with any number of samples, I would like to read the individual files names within a directory, and expand the path for each file after a '-V' flag.
I have tried adding the file paths to a variable with the following, but have not had success with proper expansion:
GVCFS=('-V' `ls gvcfs/*.g.vcf`)
Any help is greatly appreciated!
You can do this by using a loop to populate an array with the options:
options=()
for file in gvcfs/*.g.vcf; do # Don't parse ls, just use a direct wildcard expression
options+=(-V "${file##*/}") # If you want the full path, leave off ##*/
done
program "${options[#]}" -O combined_output.g.vcf
printf can help:
options=( $(printf -- "-V %s " gvcfs/*.g.vcf ) )
Though this will not deal gracefully with whitespace in filenames.
Also consider realpath to generate absolute filenames.

compare process list before and after running bash

Trying to compare the process list before and after running a bash script of tests. Having trouble, since ps returns 1, and I'm not sure how to compare the before and after when I have them.
Ideally, it would look something like this. Forgive the crude pseudo-code:
run-tests:
ps -x
export before=$?
# run tests and scripts
ps -x
export after=$?
# compare before and after
Suggests and advice appreciated.
I'm assuming you want to count the number of running processes before and after (your question wasn't overly clear on that). If so, you can pipe ps into wc:
export before=`ps --no-headers | wc -l`
-- EDIT ---
I reread the question, and it may be that you're looking for the actual processes that differ. If that's the case, then, you can capture the output in variables and compare those:
target:
# before=$$(ps --no-headers); \
run test; \
after=$$(ps --no-headers); \
echo "differing processes:"; \
comm -3 <(echo "$before") <(echo "$after")
A few quick notes on this: I concatenated all the lines using \'s as you mentioned you used makefiles, and the scope of a variable is only the recipe line in which it's defined. By concatenating the lines, the variables have a scope of the whole recipe.
I used double $$ as your original post suggested a makefile, and a makefile $$ will expand to a single $ in the bash code to be run.
Doing var=$(command) in bash assigns var the output of command
I used the <() convention which is specific to bash. This lets you treat the output of a command as file, without having to actually create a file. Notice that I put quotes around the variables -- this is required, otherwise the bash will ignore newlines when expanding the variable.

Difference in pipe and file redirection - BASH

Redirection is used to redirect stdout/stdin/stderr!
Ex: ls > log.txt.
Pipes are used to give the output of a command as input to another command.
Ex: ls | grep file.txt
Why exactly are these two operators doing the same thing?
Why not just write ls > grep to pass the output through, isn't this just a kind of redirection also?
I realize Linux is "Do one thing and do it well", so there has to be more of a logical reason that I'm missing.
You do need a differentiating syntax feature - and using > vs. | will do just fine.
If you used > in both scenarios, how would you know whether
ls > grep
is trying to write to a file named grep or send input to the grep command?
grep is perhaps not the best example, as you may then be tempted to disambiguate by the presence of grep's mandatory arguments; however, (optionally) argument-less commands do exist, such as column.
that other guy offers another example in the comments: test may refer to a test output file or to the argument-less invocation of the standard test command.
Another way of looking at it:
Your suggestion is essentially to use > as a generic send-output-somewhere operator, irrespective of the type of target (file vs. command).
However, that only shifts the need for disambiguation, and then you have to disambiguate when specifying the target - is it a file to output to or a command to run?
Given that the shell also has an implicit disambiguation feature when it comes to the first token of a simple command - foo [...] only ever invokes a command - differentiating at the level of the operator - > for outputting to files, | for sending to commands - is the sensible choice.
This would actually make > do two things, open a file or run a new program, depending on what the operand is. (Ignoring the ambiguity when the argument is the name of an executable file: do we overwrite it or run it?)
bash and some other shells provide additional syntax (process substitution) that does technically replace the need for |, although not in a way that you would choose to use it over a pipe. For instance, you can write
ls > >(grep regex)
>(...) is treated as the "name" of a file (in fact, you can run echo >(true) to see what that file name is), whose contents are provided to the enclosed command as input. So now, instead of a single operator | that handles connecting output from A to the input of B, you have one operator > to redirect output, and another operator to redirect input.
It's also symmetrical:
grep regex < <(ls)
# or grep regex <(ls), since grep can read from standard input or a named file
<(...) is the "name" of an input file whose contents come from the output of the enclosed command.
The benefit of process substitution (and their underlying basis, named pipes) is when you want one process to write to many processes:
command1 | tee >(command2) >(command3) >(command4)
or for one process to read from many processes:
diff <(command1) <(command2)
They are not doing the same job. If you were to take that example:
ls > grep
This is taking the output of ls and writing it to a file called grep.
Now if you were to do something like:
ls | grep '.*.txt'
This will take the output of ls and grep for any txt files. They in no way provide the same outcome.

How do I use line output from a command as separate file names in bash?

I am trying to cleanly, without errors/quirks, open multiple files via command line in the vim text editor. I am using bash as my shell. Specifically, I want to open the first 23 files in the current working directory. My initial command was:
$ ls | head -23 | xargs vim
But when I do this, I get the following error message before all the files open:
Vim: Warning: Input is not from a terminal
and no new text is shown in the terminal after vim exits. I have to blindly do a reset in order to get a normal terminal back, apart from opening a new one.
This seems to be discussed here: using xargs vim with gnu screen, and: Why does "locate filename | xargs vim" cause strange terminal behaviour?
Since the warning occurs, xargs seems to be a no-no with vim. (Unless you do some convoluted thing using subshells and input/output redirection which I'm not too interested in as a frequent command. And using an alias/function... meh.)
The solution seemed to be to use bash's command substitution. So I tried:
$ vim $(ls | head -23)
But the files have spaces and parentheses in them, in this format:
surname, firstname(email).txt
So what the shell then does, which also is the result in the xarg case, is provide surname, and firstname(email).txt as two separate command arguments, leaving me in vim with at least twice the number of files I wanted to open, and none of the files I actually wanted to open.
So, I figure I need to escape the file names somehow. I tried to quote the command:
$ vim "$(ls | head -23)"
Then the shell concatenates the entire output from the substitution and provides that as a single command argument, so I'm left with a super-long file name which is also not what I want.
I've also tried to work with the -exec, -print, -print0 and -printf options of find, various things with arrays in bash, and probably some things I can't remember. I'm at a loss right now.
What can I do to use file names that come from a command, as separate command arguments and shell-quoted so they actually work?
Thanks for any and all help!
Here's an array-based solution:
fileset=(*)
vim "${fileset[#]:0:23}"
xargs -a <(ls | head -23) -d '\n' vim
-a tells xargs to read arguments from the named file instead of stdin. <(...) lets us pass the output of the ls/head pipeline where a filename is expected.

Resources