Redirecting the input of two commands via pipe operator in Bash - bash

I have not learned Bash in a formal way so please do give me suggestions for a more descriptive question title.
Instead of creating a temporary file whose lifespan is limited to that of the command it is used by (in this case, command), as in:
zcat input.txt.gz > input.txt
command input.txt
rm input.txt
we can avoid it as follows:
zcat input.txt.gz | command -
Now my question is whether this is possible with two inputs. I wish to avoid creating two temporary files, as in:
zcat input1.txt.gz > input1.txt
zcat input2.txt.gz > input2.txt
command input1.txt input2.txt
rm input1.txt input2.txt
I am guessing that the following solution can remove the need to create one of the two temporary files, as:
zcat input1.txt.gz > input1.txt
zcat input2.txt.gz | command input1.txt -
rm input1.txt
but I wonder if there is a way to completely avoid creating the temporary file.
I hope my question was clear enough. Though I used zcat as an example, the solution I am looking for should be more general. Thanks in advance.

If you're trying to combine the output of multiple commands into a single pipe, use a subshell:
(cat file1.txt; cat file2.txt) | nl
If you want to use the output of a command as a filename for a command, use process substitution:
diff <(zcat file1.gz) <(zcat file2.gz)

Subshells might get you what you want:
command $(zcat input1.txt.gz) $(zcat input2.txt)
So long the stdout of the 2 subshells (above) make up arguments for 'command'

Related

Bash: buffer entire stdin, then output

I need to modify a file in-place using a program prog that doesn't support it.
prog $file > $file.temp
cat $file.temp > $file
rm $file.temp
I want to do this in a single step, without temp files. This looks good but won't work:
cat <(prog $1) > $1
It would work if I had a way of buffering the contents of a pipe (blocking until the write end closes), eg:
cat <(prog $1 | buffer_until_close) > $1
How can I do this, or achieve the desired syntax some other way?
It would work if I had a way of buffering the contents of a pipe (blocking until the write end closes), eg:
cat <(prog $1 | buffer_until_close) > $1
No, it wouldn't. The redirection of stdout (>$1) is performed before any program is started. And as soon as the shell sets up the redirection, it truncates the output file.
However, as mentioned in the comments, sponge will work:
prog $1 | sponge $1
sponge is found in the moreutils package and most linux distros will preinstall it.

In bash, is there a way have multiple pipes to one process?

For example, if I want to do a diff of two files after preprocessing both of them with sed, is there any way to do this without temporary files?
I have tried things like this and (as I expected) it did not work:
(sed "$expr" file1; sed "$expr" file2) | diff - -
I was thinking there might be a way to create pipes explicitly or something.
Try doing this :
diff <(sed "$expr" file1) <(sed "$expr" file2)
This uses Process Substitution. <( ) is replaced by a temporary filename. Writing or reading that file causes bytes to get piped to the command inside. Often used in combination with file redirection:
cmd1 2> >(cmd2)
See
http://mywiki.wooledge.org/ProcessSubstitution
http://mywiki.wooledge.org/BashFAQ/024

Use pipe of commands as argument for diff

I am having trouble with this simple task:
cat file | grep -E ^[0-9]+$ > file_grep
diff file file_grep
Problem is, I want to do this without file_grep
I have tried:
diff file `cat file | grep -E ^[0-9]+$`
and
diff file "`cat file | grep -E ^[0-9]+$`"
and a few other combinations :-) but I can't get it to work.
I always get an error, when the diff gets extra argument which is content of file filtered by grep.
Something similar always worked for me, when I wanted to echo command outputs from within a script like this (using backtick escapes):
echo `ls`
Thanks
If you're using bash:
diff file <(grep -E '^[0-9]+$' file)
The <(COMMAND) sequence expands to the name of a pseudo-file (such as /dev/fd/63) from which you can read the output of the command.
But for this particular case, ruakh's solution is simpler. It takes advantage of the fact that - as an argument to diff causes it to read its standard input. The <(COMMAND) syntax becomes more useful when both arguments to diff are command output, such as:
diff <(this_command) <(that_command)
The simplest approach is:
grep -E '^[0-9]+$' file | diff file -
The hyphen - as the filename is a specific notation that tells diff "use standard input"; it's documented in the diff man-page. (Most of the common utilities support the same notation.)
The reason that backticks don't work is that they capture the output of a command and pass it as an argument. For example, this:
cat `echo file`
is equivalent to this:
cat file
and this:
diff file "`cat file | grep -E ^[0-9]+$`"
is equivalent to something like this:
diff file "123
234
456"
That is, it actually tries to pass 123234345 (plus newlines) as a filename, rather than as the contents of a file. Technically, you could achieve the latter by using Bash's "process substitution" feature that actually creates a sort of temporary file:
diff file <(cat file | grep -E '^[0-9]+$')
but in your case it's not needed, because of diff's support for -.
grep -E '^[0-9]+$' file | diff - file
where - means "read from standard input".
Try process substitution:
$ diff file <(grep -E "^[0-9]+$" file)
From the bash manpage:
Process Substitution
Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of
naming open files. It takes the form of <(list) or >(list). The process list is run with its input or
output connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to
the current command as the result of the expansion. If the >(list) form is used, writing to the file
will provide input for list. If the <(list) form is used, the file passed as an argument should be read
to obtain the output of list.
In bash, the syntax is
diff file <(cat file | grep -E ^[0-9]+$)

How do I read the first line of a file using cat?

How do I read the first line of a file using cat?
You don't need cat.
head -1 file
will work fine.
You don't, use head instead.
head -n 1 file.txt
There are many different ways:
sed -n 1p file
head -n 1 file
awk 'NR==1' file
You could use cat file.txt | head -1, but it would probably be better to use head directly, as in head -1 file.txt.
This may not be possible with cat. Is there a reason you have to use cat?
If you simply need to do it with a bash command, this should work for you:
head -n 1 file.txt
cat alone may not be possible, but if you don't want to use head this works:
cat <file> | awk 'NR == 1'
I'm surprised that this question has been around as long as it has, and nobody has provided the pre-mapfile built-in approach yet.
IFS= read -r first_line <file
...puts the first line of the file in the variable expanded by "$first_line", easy as that.
Moreover, because read is built into bash and this usage requires no subshell, it's significantly more efficient than approaches involving subprocesses such as head or awk.
You dont need any external command if you have bash v4+
< file.txt mapfile -n1 && echo ${MAPFILE[0]}
or if you really want cat
cat file.txt | mapfile -n1 && echo ${MAPFILE[0]}
:)
Use the below command to get the first row from a CSV file or any file formats.
head -1 FileName.csv
There is plenty of good answer to this question. Just gonna drop another one into the basket if you wish to do it with lolcat
lolcat FileName.csv | head -n 1
Adding one more obnoxious alternative to the list:
perl -pe'$.<=1||last' file
# or
perl -pe'$.<=1||last' < file
# or
cat file | perl -pe'$.<=1||last'

Concatenating multiple text files into a single file in Bash

What is the quickest and most pragmatic way to combine all *.txt file in a directory into one large text file?
Currently I'm using windows with cygwin so I have access to BASH.
Windows shell command would be nice too but I doubt there is one.
This appends the output to all.txt
cat *.txt >> all.txt
This overwrites all.txt
cat *.txt > all.txt
Just remember, for all the solutions given so far, the shell decides the order in which the files are concatenated. For Bash, IIRC, that's alphabetical order. If the order is important, you should either name the files appropriately (01file.txt, 02file.txt, etc...) or specify each file in the order you want it concatenated.
$ cat file1 file2 file3 file4 file5 file6 > out.txt
The Windows shell command type can do this:
type *.txt > outputfile.txt
Type type command also writes file names to stderr, which are not captured by the > redirect operator (but will show up on the console).
You can use Windows shell copy to concatenate files.
C:\> copy *.txt outputfile
From the help:
To append files, specify a single file for destination, but multiple files for source (using wildcards or file1+file2+file3 format).
Be careful, because none of these methods work with a large number of files. Personally, I used this line:
for i in $(ls | grep ".txt");do cat $i >> output.txt;done
EDIT: As someone said in the comments, you can replace $(ls | grep ".txt") with $(ls *.txt)
EDIT: thanks to #gnourf_gnourf expertise, the use of glob is the correct way to iterate over files in a directory. Consequently, blasphemous expressions like $(ls | grep ".txt") must be replaced by *.txt (see the article here).
Good Solution
for i in *.txt;do cat $i >> output.txt;done
How about this approach?
find . -type f -name '*.txt' -exec cat {} + >> output.txt
the most pragmatic way with the shell is the cat command. other ways include,
awk '1' *.txt > all.txt
perl -ne 'print;' *.txt > all.txt
type [source folder]\*.[File extension] > [destination folder]\[file name].[File extension]
For Example:
type C:\*.txt > C:\1\all.txt
That will Take all the txt files in the C:\ Folder and save it in C:\1 Folder by the name of all.txt
Or
type [source folder]\* > [destination folder]\[file name].[File extension]
For Example:
type C:\* > C:\1\all.txt
That will take all the files that are present in the folder and put there Content in C:\1\all.txt
You can do like this:
cat [directory_path]/**/*.[h,m] > test.txt
if you use {} to include the extension of the files you want to find, there is a sequencing problem.
The most upvoted answers will fail if the file list is too long.
A more portable solution would be using fd
fd -e txt -d 1 -X awk 1 > combined.txt
-d 1 limits the search to the current directory. If you omit this option then it will recursively find all .txt files from the current directory.
-X (otherwise known as --exec-batch) executes a command (awk 1 in this case) for all the search results at once.
Note, fd is not a "standard" Unix program, so you will likely need to install it
When you run into a problem where it cats all.txt into all.txt,
You can try check all.txt is existing or not, if exists, remove
Like this:
[ -e $"all.txt" ] && rm $"all.txt"
all of that is nasty....
ls | grep *.txt | while read file; do cat $file >> ./output.txt; done;
easy stuff.

Resources