Why does using the `>` operator delete the contents of the file? [duplicate] - bash

This question already has answers here:
Is it OK to use the same input file as output of a piped command?
(3 answers)
Closed last year.
File1:
foo
bar
baz
File2:
bar.patch
grep -f file1 file2 > file1
Expected result, file1 contains bar.patch, instead it is empty.
How is grep processing the input file such that I cannot redirect to it?

Redirection > happens before the command is executed, hence the grep sees empty file and returns nothing. If you really want to write to the same file you can use the sponge command (it is part of the moreutils). It buffers the input and writes to the output file only after the end of data. Example:
grep -f file1 file2 | sponge file1
Check the bash manual for details about redirection:
Before a command is executed, its input and output may be redirected
using a special notation interpreted by the shell. Redirection allows
commands’ file handles to be duplicated, opened, closed, made to refer
to different files, and can change the files the command reads from
and writes to. Redirection may also be used to modify file handles in
the current shell execution environment. The following redirection
operators may precede or appear anywhere within a simple command or
may follow a command. Redirections are processed in the order they
appear, from left to right.
Or just redirect to a temporary file and then mv it over the file1.
grep -f file1 file2 > file3
mv -f file3 file1

Related

pipe output of grep as an argument to a bash script

script.sh takes two files as arguments, and it calls a python script that opens them and does further processing on these files:
script.sh file1.txt file2.txt
I would like to remove all lines from file1.txt that start with #.
grep -v '^#' file1.txt
I tried the following but it fails :
script.sh $(grep -v '^#' file1.txt) file2.txt
How can I pipe the output of grep as the first file argument of script.sh?
script.sh <(grep -v '^#' file1.txt) file2.txt
Command substitution as you had it, $(), will insert the output from the grep command verbatim as arguments to script.sh. In contrast, to quote the bash man page:
Process substitution allows a process's input or output to be referred to using a filename. It takes the form of <(list) or >(list). The process list is run asynchronously, and its input or output appears as a filename. This filename is passed as an argument to the current command as the result of the expansion.
and
If the <(list) form is used, the file passed as an argument should be read to obtain the output of list.
So, my understanding is that when you use $(), the output of your grep command is substituted into the final statement with a result something like:
script.sh contents of file1 without comments \n oops newline etc file2.txt
But when you use <(), the grep command is run and the output goes into a named pipe or temporary file or some equivalent, and the name of that "file" is substituted into your statement, so you end up with something along the lines of
script.sh /dev/fd/temp_thing_containing_grep_results file2.txt

How to remove lines from the output of a command in a bash script [duplicate]

This question already has answers here:
Get last line of shell output as a variable
(2 answers)
Ignoring the first line of stderr and keeping stdout intact
(1 answer)
Cronjob - How to output stdout, and ignore stderr
(3 answers)
Closed 4 years ago.
I am trying to do a command that gives multiple lines; eg ldapwhoami and Id like to make it in a bash script that prints only the last line instead of all the command lines.
I tried the following
#/bin/bash
$(ldapwhoami | sed 1d2d3d)
but it doesn't seem to work, any help would be appreciated.
To print only the final line use tail:
ldapwhoami | tail -n 1
To delete the first three lines with sed, change your command to:
ldapwhoami | sed '1d;2d;3d;'
note the semicolons and the quotes
Also possible with awk
ldapwhoami | awk 'NR > 3'
The above assumes that all output goes to standard output. In unix though there are two output streams that are connected to each process, the standard output (denoted with 1 - that is used for the output of the program), and the standard error (denoted with 2 - that is used for any diagnostic/error messages). The reason for this separation is that it is often desirable not to "pollute" the output with diagnostic messages, if it is processed by another script.
So for commands that generate output on both steams, if we want to capture both, we redirect the standard error to standard output, using 2>&1) like this:
ldapwhoami 2>&1 | tail -n 1
(for awk and sed the same syntax is used)
In bash, the above may be written using shorthand form as
ldapwhoami |& tail -n 1
If all you need is the standard output, and you don't care about standard error, you can redirect it to /dev/null
ldapwhoami 2> /dev/null

How to store the command output in variable and redirect to file in a same line?

a=`cat /etc/redhat-release | awk '{print $2}' > /tmp/a.txt`
The above command is not redirecting the output to file.
A command substitution captures stdout of the command contained. When you redirect that output to a file, it's no longer on stdout, so it's no longer captured.
Use tee to create two copies -- one in a file, one on stdout.
a=$(awk '{print $2}' </etc/redhat-release | tee /tmp/a)
Note also:
cat shouldn't be used when it isn't needed: Giving awk a direct handle on the input file saves an extra process, allowing a direct write from a file rather than a FIFO -- and is following a practice that will generate much larger efficiency games with programs like sort, shuf, tail, or wc -c that can use more efficient algorithms when reading from a file.
The modern (and standard-compliant, since 1991) syntax for command substitution is $(...). It nests better than the ancient backtick syntax it replaces, and use of backslashes within is less confusing.

Echoing awk output to file to remove duplicates has strange output

I made a small shell script to try to remove duplicate entries (lines) from a text file. When the script is ran and the file has three lines, all identical, a strange output occurs.
The shell script is ran on an Ubuntu distribution.
The contents of my text file:
one
one
one
The script I am running to remove duplicates:
echo -e $(awk '!a[$0]++' /test/test.txt) > /test/test.txt
The awk is intended to delete duplicates, while the echo is intended to output it to a file.
Upon running my script, I receive the following output in the file:
one
one
It should also be noted that there is an additional newline after the second line, and a space at the start of the second line.
Writing to a file at the same time that you are reading from it usually leads to disaster.
If you have GNU awk, then use the -i inplace option:
$ cat text
one
one
one
$ gawk -i inplace '!a[$0]++' text
$ cat text
one
If you have BSD awk, then use:
awk '!a[$0]++' text >tmp && mv tmp text
Alternatively, if you have sponge installed:
awk '!a[$0]++' text | sponge text
sponge does not update the file until the pipeline has finished reading and processing it.

Use pipe of commands as argument for diff

I am having trouble with this simple task:
cat file | grep -E ^[0-9]+$ > file_grep
diff file file_grep
Problem is, I want to do this without file_grep
I have tried:
diff file `cat file | grep -E ^[0-9]+$`
and
diff file "`cat file | grep -E ^[0-9]+$`"
and a few other combinations :-) but I can't get it to work.
I always get an error, when the diff gets extra argument which is content of file filtered by grep.
Something similar always worked for me, when I wanted to echo command outputs from within a script like this (using backtick escapes):
echo `ls`
Thanks
If you're using bash:
diff file <(grep -E '^[0-9]+$' file)
The <(COMMAND) sequence expands to the name of a pseudo-file (such as /dev/fd/63) from which you can read the output of the command.
But for this particular case, ruakh's solution is simpler. It takes advantage of the fact that - as an argument to diff causes it to read its standard input. The <(COMMAND) syntax becomes more useful when both arguments to diff are command output, such as:
diff <(this_command) <(that_command)
The simplest approach is:
grep -E '^[0-9]+$' file | diff file -
The hyphen - as the filename is a specific notation that tells diff "use standard input"; it's documented in the diff man-page. (Most of the common utilities support the same notation.)
The reason that backticks don't work is that they capture the output of a command and pass it as an argument. For example, this:
cat `echo file`
is equivalent to this:
cat file
and this:
diff file "`cat file | grep -E ^[0-9]+$`"
is equivalent to something like this:
diff file "123
234
456"
That is, it actually tries to pass 123234345 (plus newlines) as a filename, rather than as the contents of a file. Technically, you could achieve the latter by using Bash's "process substitution" feature that actually creates a sort of temporary file:
diff file <(cat file | grep -E '^[0-9]+$')
but in your case it's not needed, because of diff's support for -.
grep -E '^[0-9]+$' file | diff - file
where - means "read from standard input".
Try process substitution:
$ diff file <(grep -E "^[0-9]+$" file)
From the bash manpage:
Process Substitution
Process substitution is supported on systems that support named pipes (FIFOs) or the /dev/fd method of
naming open files. It takes the form of <(list) or >(list). The process list is run with its input or
output connected to a FIFO or some file in /dev/fd. The name of this file is passed as an argument to
the current command as the result of the expansion. If the >(list) form is used, writing to the file
will provide input for list. If the <(list) form is used, the file passed as an argument should be read
to obtain the output of list.
In bash, the syntax is
diff file <(cat file | grep -E ^[0-9]+$)

Resources