Why does a bash redirection happen *before* the command starts, if the ">" is after the command? - bash

Recently I tried to list all of the images located in a directory I had (several hundred) and put them into a file. I used a very simple command
ls > "image_names.txt"
I was bored and decided to look inside the file and realized that image_names.txt was located in the file. Then I realized, the order of operations performed was not what I thought. I read the command as left to right, in two separate steps:
ls (First list all the file names)
> "image_names.txt" (Then create this file and pipe it here)
Why is it creating the file first then listing all of the files in the directory, despite the ls command coming first?

When you use output redirection, the shell needs a place to put your output( suppose it was very long, then it could all be lost on terminate, or exhaust all working memory), so the first step is to open the output file for streaming output from the executed command's stdout.
This is especially important to know in this kind of command
cat a.txt | grep "foo" > a.txt
since a is opened first and not in append mode it will be truncated, meaning there is no input for cat. So the behaviour you expect that the lines will be filtered from a.txt and replace a.txt will not actually happen. Instead you will just lose the contents of a.txt.

Because redirection > "image_names.txt" was performed before ls command.

Related

Pipe command in Bash

Pipe command is showing it's results properly .When i try to use it cat or > it doesn't show the output
i have try to run the command with different spaces but it didn't help
sort spiderman.txt | cat > superman.txt
sort spiderman.txt | > superman.txt
in the first above code cat is not showing it's output (the cat command is not showing contents of superman.txt ) however if i write is separately the cat command it's showing the contents
in the second command nothing happens to superman.txt
ideally it should have replaced all contents of superman.txt and replaced with sorted contents of spiderman.txt but nothing happens.
If you're trying simple output redirection you shouldn't pipe (|), just redirect (>):
sort spiderman.txt > superman.txt
If you want to show the content as well as redirect to a file - perhaps what you're looking for is tee?
sort spiderman.txt | tee superman.txt
Description:
The tee utility copies standard input to standard output, making a copy in zero or more files. The output is unbuffered.
> superman.txt (with no command) is processed as follows:
superman.txt is opened for writing and truncated
The output redirection is removed from the current command.
Since there is nothing left, the empty command is treated as having
run and exited successfully. Nothing actually reads from the pipe
or writes to superman.txt.
cat is necessary as a command which does read from standard input and writes to standard output.
It sometimes seems a little odd to me that more shells don't provide a minimal built-in that simply copies input to output with no frills, to avoid otherwise having to fork and exec cat. ( I should say "no" rather than "more", as I'm not aware of any shell that does. zsh might, if I bothered to search through the documentation to find it.)
(Some shells will optimize away an extra fork when processing a command line; bash is not one of them, though. It forks once to create a process for the write end of the pipe, then forks again to run cat. I believe ksh would simply exec cat directly instead of unnecessarily forking, in which case a built-in cat is less necessary.)

bash while loop through a file doesn't end when the file is deleted

I have a while loop in a bash script:
Example:
while read LINE
do
echo $LINE >> $log_file
done < ./sample_file
My question is why when I delete the sample_file while the script is running the loop doesn't end and I see that the log_file is updating? How the loop is continuing while there is no input?
In unix, a file isn't truly deleted until the last directory entry for it is removed (e.g. with rm) and the last open file handle for it is closed. See this question (especially MarkR's answer) for more info. In the case of your script, the file is opened as stdin for the while read loop, and until that loop exits (or closes its stdin), rming the file will not actually delete it off disk.
You can see this effect pretty easily if you want. Open three terminal windows. In the first, run the command cat >/tmp/deleteme. In the second, run tail -f /tmp/deleteme. In the third, after running the other two commands, run rm /tmp/deleteme. At this point, the file has been unlinked, but both the cat and tail processes have open file handles for it, so it hasn't actually been deleted. You can prove this by typing into the first terminal window (running cat), and every time your hit return, tail will see the new line added to the file and display it in the second window.
The file will not actually be deleted until you end those two commands (Control-D will end cat, but you need Control-C to kill tail).
See "Why file is accessible after deleting in unix?" for an excellent explanation of what you are observing here.
In short...
Underlying rm and any other command that may appear to delete a file
there is the system call unlink. And it's called unlink, not remove or
deletefile or anything similar, because it doesn't remove a file. It
removes a link (a.k.a. directory entry) which is an association
between a file and a name in a directory.
You can use the function truncate to destroy the actual contents (or shred if you need to be more secure), which would immediately halt the execution of your example loop.
The moment shell executes the while loop, the sample_file contents have been read, and it does not matter whether the file exists or not after that point.
Test script:
$ cat test.sh
#!/bin/bash
while read line
do
echo $line
sleep 1
done < data_file
Test file:
$ seq 1 10 > data_file
Now, in one terminal you run the script, in another terminal, you go and delete the file data_file, you would still see the 1 to 10 numbers printed by the script.

Is there a way to save output from bash commands to a "file/variable" in bash without creating a file in your directory

I'm writing commands that do something like ./script > output.txt so that I can use the files in later scripts like ./script2 output.txt otherFile.txt > output2.txt. I remove them all at the end of the script, but when I'm testing certain things or debugging it's tricky to search through all my sub directories and files which have been created in the script.
Is the best option just to create a hidden file?
As always, there are numerous ways to do so. If you want to avoid files altogether, you can save the output (STDOUT) of a command in a variable and pass it to the next command as a file using the <() operator:
output=$(cat /usr/include/stdio.h)
cat <(echo "$output")
Alternatively, you can do so in a single command line:
cat <(cat /usr/include/stdio.h)
This assumes that the next command strictly requires a file for input.
I tend to avoid temporary files whenever possible to eliminate the need for a cleanup step that gets executed in all cases unless large amounts of data have to be processed.

Infinite loop when redirecting file to itself from Unix cat

I'm trying to concatenate a license to the top of my built sources. I'm use GNU Make. In one of my rules, I have:
cat src/license.txt build/3d-tags.js > build/3d-tags.js
But this seems to be causing an infinite loop. When I kill the cat command, I see that build/3d-tags is just the contents of src/license.txt over and over again? What's going on? I would have suspected the two files to be concatenated together, and the resulting ouput from cat to be redirected back into build/3d-tags.js. I'm not looking to append. I'm on OSX, in case the issue is related to GNU cat vs BSD cat.
The shell launches cat as a subprocess. The output redirection (>) is inherited by that subprocess as its stdout (file descriptor 1). Because the subprocess has to inherit the file descriptor at its creation, it follows that the shell has to open the output file before launching the subprocess.
So, the shell opens build/3d-tags.js for writing. Furthermore, since you're not appending (>>), it truncates the file. Remember, this happens before cat has even been launched. At this point, it's impossible to achieve what you wanted because the original contents of build/3d-tags.js is already gone, and cat hasn't even been launched yet.
Then, when cat is launched, it opens the files named in its arguments. The timing and order in which it opens them isn't terribly important. It opens them both for reading, of course. It then reads from src/license.txt and writes to its stdout. This writing goes to build/3d-tags.js. At this point, it's the only content in that file because it was truncated before.
cat then reads from build/3d-tags.js. It finds the content that was just written there, which is what cat previously read from src/license.txt. It writes that content to the end of the file. It then goes back and tries to read some more. It will, of course, find more to read, because it just wrote more data to the end of the file. It reads this remaining data and writes it to the file. And on and on.
In order for cat to work as you hoped (even ignoring the shell redirection obliterating the contents of build/3d-tags.js), it would have to read and keep in memory the entire contents of build/3d-tags.js, no matter how big it was, so that it could write it after it wrote the contents of src/license.txt.
Probably the best way to achieve what you want is something like this:
cat src/license.txt build/3d-tags.js > build/3d-tags.js.new && mv build/3d-tags.js.new build/3d-tags.js || rm -f build/3d-tags.js.new
That is: concatenate the two files to a new file; if that succeeds, move the new file to the original file name (replacing the original); if either step fails, remove the temporary "new" file so as to not leave junk around.

Diff output from two programs without temporary files

Say I have too programs a and b that I can run with ./a and ./b.
Is it possible to diff their outputs without first writing to temporary files?
Use <(command) to pass one command's output to another program as if it were a file name. Bash pipes the program's output to a pipe and passes a file name like /dev/fd/63 to the outer command.
diff <(./a) <(./b)
Similarly you can use >(command) if you want to pipe something into a command.
This is called "Process Substitution" in Bash's man page.
Adding to both the answers, if you want to see a side by side comparison, use vimdiff:
vimdiff <(./a) <(./b)
Something like this:
One option would be to use named pipes (FIFOs):
mkfifo a_fifo b_fifo
./a > a_fifo &
./b > b_fifo &
diff a_fifo b_fifo
... but John Kugelman's solution is much cleaner.
For anyone curious, this is how you perform process substitution in using the Fish shell:
Bash:
diff <(./a) <(./b)
Fish:
diff (./a | psub) (./b | psub)
Unfortunately the implementation in fish is currently deficient; fish will either hang or use a temporary file on disk. You also cannot use psub for output from your command.
Adding a little more to the already good answers (helped me!):
The command docker outputs its help to STD_ERR (i.e. file descriptor 2)
I wanted to see if docker attach and docker attach --help gave the same output
$ docker attach
$ docker attach --help
Having just typed those two commands, I did the following:
$ diff <(!-2 2>&1) <(!! 2>&1)
!! is the same as !-1 which means run the command 1 before this one - the last command
!-2 means run the command two before this one
2>&1 means send file_descriptor 2 output (STD_ERR) to the same place as file_descriptor 1 output (STD_OUT)
Hope this has been of some use.
For zsh, using =(command) automatically creates a temporary file and replaces =(command) with the path of the file itself. With normal Process Substitution, $(command) is replaced with the output of the command.
This zsh feature is very useful and can be used like so to compare the output of two commands using a diff tool, for example Beyond Compare:
bcomp =(ulimit -Sa | sort) =(ulimit -Ha | sort)
For Beyond Compare, note that you must use bcomp for the above (instead of bcompare) since bcomp launches the comparison and waits for it to complete. If you use bcompare, that launches comparison and immediately exits due to which the temporary files created to store the output of the commands disappear.
Read more here: http://zsh.sourceforge.net/Intro/intro_7.html
Also notice this:
Note that the shell creates a temporary file, and deletes it when the command is finished.
and the following which is the difference between $(...) and =(...) :
If you read zsh's man page, you may notice that <(...) is another form of process substitution which is similar to =(...). There is an important difference between the two. In the <(...) case, the shell creates a named pipe (FIFO) instead of a file. This is better, since it does not fill up the file system; but it does not work in all cases. In fact, if we had replaced =(...) with <(...) in the examples above, all of them would have stopped working except for fgrep -f <(...). You can not edit a pipe, or open it as a mail folder; fgrep, however, has no problem with reading a list of words from a pipe. You may wonder why diff <(foo) bar doesn't work, since foo | diff - bar works; this is because diff creates a temporary file if it notices that one of its arguments is -, and then copies its standard input to the temporary file.

Resources