Overriding file in bash in one pipe - bash

The result of cat file1 | cat > file1 is empty file1.
The result of head file1 | cat > file1 is empty file1 as well.
Of course, a real pipe is intended to have more steps - tools and operations inside.
Is any way to save transformed read content?
The real case is source .env && cat file1 | envsubst > file1

Thank you, but correct answer is:
cat file1 | envsubst | sponge file1
This (from Veda) is the wrong usage of sponge
cat file1 | envsubst | sponge | cat > file1

The only option is something like (head file1 | cat > tmp_file) && mv tmp_file file1. In other words, you have to write to a temporary file and then replace the original file with the temporary.
Fundamentally bash has to open all the files and pipes before it starts execing each of the stages in the pipeline. Once it has opened your output file for overwrite it is, er, overwritten.

Sponge does that for you. You need to install it separately though, it is not usually included in the os.
cat test | sponge | cat > test
Will leave you with the contents that were originally in file "test".
To get sponge in Ubuntu or Redhat, you need to install package moreutils.

Related

awk command inside for loop to read and write multiple files [duplicate]

I am learning awk and I would like to know if there is an option to write changes to file, similar to sed where I would use -i option to save modifications to a file.
I do understand that I could use redirection to write changes. However is there an option in awk to do that?
In GNU Awk 4.1.0 (released 2013) and later, it has the option of "inplace" file editing:
[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "sed -i" feature. [...]
Example usage:
$ gawk -i inplace '{ gsub(/foo/, "bar") }; { print }' file1 file2 file3
To keep the backup:
$ gawk -i inplace -v INPLACE_SUFFIX=.bak '{ gsub(/foo/, "bar") }
> { print }' file1 file2 file3
Unless you have GNU awk 4.1.0 or later...
You won't have such an option as sed's -i option so instead do:
$ awk '{print $0}' file > tmp && mv tmp file
Note: the -i is not magic, it is also creating a temporary file sed just handles it for you.
As of GNU awk 4.1.0...
GNU awk added this functionality in version 4.1.0 (released 10/05/2013). It is not as straight forwards as just giving the -i option as described in the released notes:
The new -i option (from xgawk) is used for loading awk library files. This differs from -f in that the first non-option argument
is treated as a script.
You need to use the bundled inplace.awk include file to invoke the extension properly like so:
$ cat file
123 abc
456 def
789 hij
$ gawk -i inplace '{print $1}' file
$ cat file
123
456
789
The variable INPLACE_SUFFIX can be used to specify the extension for a backup file:
$ gawk -i inplace -v INPLACE_SUFFIX=.bak '{print $1}' file
$ cat file
123
456
789
$ cat file.bak
123 abc
456 def
789 hij
I am happy this feature has been added but to me, the implementation isn't very awkish as the power comes from the conciseness of the language and -i inplace is 8 characters too long i.m.o.
Here is a link to the manual for the official word.
just a little hack that works
echo "$(awk '{awk code}' file)" > file
#sudo_O has the right answer.
This can't work:
someprocess < file > file
The shell performs the redirections before handing control over to someprocess (redirections). The > redirection will truncate the file to zero size (redirecting output). Therefore, by the time someprocess gets launched and wants to read from the file, there is no data for it to read.
An alternative is to use sponge:
awk '{print $0}' your_file | sponge your_file
Where you replace '{print $0}' by your awk script and your_file by the name of the file you want to edit in place.
sponge absorbs entirely the input before saving it to the file.
following won't work
echo $(awk '{awk code}' file) > file
this should work
echo "$(awk '{awk code}' file)" > file
In case you want an awk-only solution without creating a temporary file and usable with version!=(gawk 4.1.0):
awk '{a[b++]=$0} END {for(c=0;c<=b;c++)print a[c]>ARGV[1]}' file

tee bash command with redirection

I have the following file:
file1.txt
geek
for
geeks
I am using the tee command to perform two operations on the output. My question is about the the redirection character after the first tee. I want to get the first column of file1.txt and
write it to file2.txt. When I run the following command, I don't receive an error but it does not give me the first column:
wc -l file1.txt |tee awk '{print $1}' - > file2.txt | sed 's/4/6/g' > file3.txt
However, the following command works as expected. What does the > is doing here?
wc -l file1.txt |tee >(awk '{print $1}' - > file2.txt) | sed 's/4/6/g' > file3.txt
tee awk '{print $1}' - > file2.txt
does:
execute tee with 3 arguments awk and '{print $1}' and -.
tee will create a file named awk, another file named '{print $1}' and yet another file named -.
Then the output of tee will be redirected to file2.txt
tee will duplicate input to those 3 files and will output to file2.txt
Consequently | sed will receive no input, because the output of tee is redirected to the file and the subshell outputs nothing.
tee >(awk '{print $1}' - > file2.txt)
does:
>(...)
Run awk with two arguments '{print $1}' and -
------ '{print $1}' is interpreted as a script
------ - is interpreted as stdin (and could be omitted)
------ then the output of awk of redirected to file2.txt
Then bash creates a fifo or a /dev/fd/something file
Then the output of that file is connected to stdin of awk process
And the >(awk ...) is substituted for the filename of the file, most probably for /dev/fd/something
tee >(...)
executes tee with one argument, like tee /dev/fd/something
The /dev/fd/something is connected to awk process on the other side
So tee writes to /dev/fd/something and awk reads the data from stdin on the other side
the output of tee is redirected to | sed
What does the > is doing here?
The first occurrence is used to introduce a process substitution. The second occurrence is used to redirect output of awk command to a file named file2.txt. The third occurrence is used to redirect the output of sed command to file named file3.txt.
Here, Process substitution is used to capture output that would normally go to a file
The Bash syntax for writing to a process is >(command)

How to Write A Second Column in Bash in an Existing txt file

I need to extract the ID name of a parent directory and put that in a tab-delimited text file. Then I need to extract names of the contents of that folder and put it in the same row as that ID name I first extracted. Essentially, Column 1 should list the directory name from parent, Column 2 should list the name first file in that directory, Column 3 should be the name of the next file, and so on and so forth.
/path/to/folder/ID/
pwd | xargs echo | awk -F "/" '{print $n; exit}' >> Text.txt
where 'n' is the location of the desired parent folder (in this case, ID). This works fine, and writes something like "ID001" to my Text.txt file.
I try the same little hack again, using my pwd as my input to xargs, listing out the contents of that folder, and writing the names to my Text.txt file:
pwd | xargs echo | awk -F "/" '{print $7; exit}' >> Text.txt | pwd | xargs echo | xargs ls | xargs echo >> Text.txt
But instead of
ID001 file1 file2
I get
file1 file2
ID001
Which is mostly to be expected, given the commands. I am confused as to why my file names are being appended to the first row and not to the last row. The only related article I could find was this for writing a specific column to a CSV, but it wasn't quite what I was looking for.
This find plus awk pipeline MAY be what you're trying to do:
$ ls tmp
a b
$ find tmp -print | awk '{sub("^[^/]+/",""); printf "%s%s", sep, $0; sep="\t"} END{print ""}'
tmp a b
YMMV if your file names contain tabs or newlines of course.
You probably want to do that as part of multiple commands; for ease in understanding.
You can put the commands in a bash script.
Example scenario
$ pwd
/Users/pa357856/test/tmp/foo
$ ls
file1.txt file2.txt
commands -
$ parentDIR=`pwd | xargs echo | awk -F "/" '{print $6}'`
$ filesList=`ls`
$ echo "$parentDIR" "$filesList" >> test.txt
Result -
$ cat test.txt
foo file1.txt file2.txt

Shell, grep for list of patterns

I have two csv files a.csv and b.csv. I 'cut' one column from the a.csv file and now I want to grep for each one of the string from this column in second file b.csv.
Can someone please help me in writing a shell script for this?
You want the -f (and likely -F and possibly -w) flags to grep for this sort of task.
$ cut ... a.csv > tmp
$ grep -Ff tmp b.csv
You can do this without the temporary file on shells that support process substitution.
$ grep -Ff <(cut ... a.csv) b.csv

Combining flat file modification and concatenating steps

I perform this operation very often, and I am looking for a shortcut. Is there any way I can do the following without having to write to a temp file?
cut -k 3-5 file1 > temp1
cat temp1 file2 | sort > outfile
Thanks!
Like this:
cut -k 3-5 file1 | cat - file2 | sort > outfile
There may be ancient versions of cat which do not take - to mean standard input.
Just do them in sequence:
(cut -k 3-5 file1; cat file2) | sort > outfile
This has the added advantage of working in any Bourne-based shell without requiring bash- or zsh-specific features.
This should do it:
cat <(cut -k 3-5 file1) file2 | sort > outfile

Resources