Reason for this bash redirection behaviour - bash

Why echo a > file1 > file2 creates both files but only write to file2? (file1 is empty.)

Because I/O redirections are processed from left to right. The sequence of actions is:
Open file1 for writing (creating it if it doesn't exist).
Redirect stdout to file1.
Open file2 for writing (creating it if it doesn't exist).
Redirect stdout to file2.
Run echo a.

Related

Concatenate awk-output, string, and text file

I have the following two tab-separated files in my current directory.
a.tsv
do not use this line
but this one
and that too
b.tsv
three fields here
not here
For each tsv file there is an associated txt file in the same directory, with the same filename but different suffix.
a.txt
This is the a-specific text.
b.txt
Text associated to b.
For each pair of files I want to create a new file with the same name but the suffix _new.txt. The new files should contain all lines from the respective tsv file that contain exactly 3 fields, afterwards the string \n####\n, and then the whole content of the respective txt file. Thus, the following output files should be created.
Desired output
a_new.txt
but this one
and that too
####
This is the a-specific text.
b_new.txt
three fields here
####
Text associated to b.
Working, but bad solution
for file in ./*.tsv
do awk -F'\t' 'NF==3' $file > ${file//.tsv/_3_fields.tsv}
done
for file in ./*_3_fields.tsv
do cat $file <(printf "\n####\n") ${file//_3_fields.tsv/.txt} > ${file//_3_fields.tsv/_new.txt}
done
Non-working code
I'd like to get the result with one script, and avoid creating the intermediate file with the suffix _3_fields.tsv.
I tried command substitution as follows:
for file in ./*.tsv
do cat <<< $(awk -F'\t' 'NF==3' $file) <(printf "\n####\n") ${file//.tsv/.txt} > ${file//.tsv/_new.txt}
done
But this doesn't write the awk-processed part into the new files.
Yet, the command substitution seems to work if I only write the awk-processed part into the new file like follows:
for file in ./*.tsv; do cat <<< $(awk -F'\t' 'NF==3' $file) > ${file//.tsv/_new.txt}; done
I'd be interested in why the second last code doesn't work as expected, and what a good solution would be to do the task.
Maybe you wanted to redirect a sequence of commands
for file in ./*.tsv
do
{
awk -F'\t' 'NF==3' "$file"
printf "\n####\n"
cat "${file//.tsv/.txt}"
} > "${file//.tsv/_new.txt}"
done
Note that space after opening brace and semicolon or newline before closing brace are important.
Seems also you are confusing command substitution $() and process substituion <() or >(). Also <<< is to redirect content as standard input whereas < to redirect a file.

list files in a directory without extention in bashand save to another file

I am trying to list out filenames in a directory that match a specific extension .bam, but only the filename without the .bam. I guess I could also strip off the extentions in all the files and sort -u, but I am not sure. Thank you :).
files in directory
file1.bam
file1.vcf
file2.bam
file2.vcf
file3.bam
file3.vcf
bash
for i in *.bam; do echo "${i%.bam}"; done
file1
file2
file3
desired output saved to file
file1
file2
file3
The only part you seem to be missing is where you write the output of the for loop to a file.
for i in *.bam; do
echo "${i%.bam}"
done > results.txt

overwrite contents of a file: alternative to `>`?

I often find myself stringing together a series of shell commands, ultimately with the goal to replace the contents of a file. However, when using > it opens the original file for writing, so you lose all the contents.
For lack of a better term, is there a "lazy evaluation" version of > that will wait until all the previous commands have been executed before before opening the file for writing?
Currently I'm using:
somecommand file.txt | ... | ... > tmp.txt && rm file.txt && mv tmp.txt file.txt
Which is quite ugly.
sponge will help here:
(Quoting from the manpage)
NAME
sponge - soak up standard input and write to a file
SYNOPSIS
sed '...' file | grep '...' | sponge file
DESCRIPTION
sponge reads standard input and writes it out to the specified file.
Unlike a shell redirect, sponge soaks up all its input before opening
the output file. This allows constructing pipelines that read from and
write to the same file.
It also creates the output file atomically by renaming a temp file into
place, and preserves the permissions of the output file if it already
exists. If the output file is a special file or symlink, the data will
be written to it.
If no output file is specified, sponge outputs to stdout.
See also: Can I read and write to the same file in Linux without overwriting it? on unix.SE

Two redirection operators in one command

Please explain the output of this shell command:
ls >file1 >file2
Why does the output go to file2 instead of file1?
bash only allows one redirection per file descriptor. If multiple redirections are provided, like in your example, they are processed from left to right, with the last one being the only one that takes effect. (Notice, though, that each file will still be created, or truncated if already in existence; the others just won't be used by the process.)
Some shells (like zsh) have an option to allow multiple redirections. In bash, you can simulate this with a series of calls to tee:
ls | tee file1 file2 > /dev/null
Each call to tee writes its input to the named file(s) and its standard output.
If the shell finds multiple redirections of any output, it will redirect it to the last file given, in your case file2, since redirections are evaluated from left to right.
While it works, you should not do something like that!
You first redirected the STDOUT stream to go to file1 but then immediately redirected it to go to file2. So, the output goes to file2.
Every process initially has three file descriptors - 0 for STDIN, 1 for STDOUT and 2 for STDERR. >file1 means that open the file1 and assign its descriptor the id 1. Note that the process which is about to execute doesn't care about what is the end point where its STDOUT goes. It just writes to whatever is described by file descriptor 1.
For a more technical description of how this works, see this answer.
The redirect operator is short for a stdout redirection(1>). Since the command is evaluated left to right, the last stdout redirection is used for the running of ls.
ls 1> file1 1>file2
is equivalent to
ls >file1 >file2
If you're trying to redirection stderr, use
ls > file1 2> file2
0 = stdin
1 = stdout
2 = stderr
try this, you'll notice that file2 will receive the stderr message.
ls ------ > file1 2> file2
then these, in both cases output will be in stdout and will go to file1.
ls >file1 2>file2
ls 1>file1 2>file2
Because first redirection gets overridden by the second. Note though, that an empty file1 is still created when file1 was opened for output.

Redirection operator in UNIX

Suppose I have three files file1 file2 file3 having some content
Now when I do this on shell prompt cat file1 > file2 >file3
Content of file1 is copied to file3 and file2 becomes empty
Similarly when I do cat > file1 > file2 > file3
It ask for input and this input is stored in file3 and both file1 and file2 are empty
and also for cat > file1 > file2 < file3 contents of file3 is copied to file2 and file1 is empty.
Can someone please explain to me what is happening I am new to UNIX. Also any website where I can learn more about these redirection operators.
Thanks
Consider how the shell processes each part of the command as it parses it:
cat file1 > file2 >file3
cat file1: prepare a new process with the cat program image with argument file1. ( given 1 or more arguments, cat will read from each argument as a file and write to its output file descriptor)
> file2: change the new process' output file descriptor to write to file2 instead of the current output sink (initially the console for an interactive shell) - create `file2 if necessary.
> file3: change the new process' output file descriptor to write to file3 instead of the current output sink (was file2) - create file3 if necessary
End of command: Spawn the new process
So in the end, file2 is created, but unused. file3 gets the data.
cat > file1 > file2 > file3
cat: prepare a new process with the cat program/image with no arguments. (given no arguments, cat will read from its input file descriptor and write to its output file descriptor)
> file1: change the new process' output file descriptor to write to file1 instead of the current output sink (initially the console for an interactive shell) - create file1 if necessary.
> file2: change the new process' output file descriptor to write to file2 instead of the current output sink (was file1) - create file2 if necessary.
> file3: change the new process' output file descriptor to write to file3 instead of the current output sink - (was file2) create file3 if necessary
End of command: Spawn the new process
So in the end, file1 and file2 are created, but unused. file3 gets the data. cat waits for input on its input device (the console device as default for an interactive shell). Any input that cat receives will go to its output device (which ends up being file3 by the time the shell finished processing the command and invoked cat).
cat > file1 > file2 < file3
cat: prepare a new process with the cat program/image with no arguments. (given no arguments, cat will read from its input file descriptor and write to its output file descriptor)
> file1: change the new process' output file descriptor to write to file1 instead of the current output sink (initially the console for an interactive shell) - create file1 if necessary.
> file2: change the new process' output file descriptor to write to file2 instead of the current output sink (was file1) - create file2 if necessary.
< file3: change the new process' input file descriptor to read from file3 instead of the current input source (initially the console for an interactive shell)
End of command: Spawn the new process
So in the end, file1 is created, but unused. file2 gets the data. cat waits for input on its input device (which as set to file3 by the time the shell finished processing the command and invoked cat). Any input that cat receives will go to its output device (which ends up being file2 by the time the shell finished processing the command and invoked cat).
--
Note that in the first example, cat is the one who processes/opens file1. The shell simply passed the word file1 to the program as an argument. However, the shell opened/created file2 and file3. cat knew nothing about file3 and has no idea where the stuff it was writing to its standard output was going.
In the other 2 examples, the shell opened all the files. cat knew nothing about any files. cat had no idea where its standard input was coming from and where its standard output was going to.
Per #Sorpigal comment - the BASH manual has some good descriptions of what the different redirection operators do. Much of it is the same across different Unix shells to varying degrees, but consult your specific shell manual/manpage to confirm. Thanks #Sorpigal.
http://gnu.org/software/bash/manual/html_node/Redirections.html
You can redirect the standard input < standard output 1> or > error output 2> or both outputs &> but you can only redirect 1:1, you can't redirect one output into two different files.
What you are looking for is the tee utility.
If you don't want to lose original content, you should use redirect and append >> or << operators instead. You can read more here.

Resources