I've weird problem with my crontab job. My crontab job does the following:
program > file
Sometimes however file gets filled with random data that I can't explain.
I wonder if it could be the previous crontab job that's taking longer to run and it somehow mixes its results in file with current crontab job?
Overall my question is: is > operation atomic? Meaning if two programs do > file, then the last one to finish will have its data in file?
No, it's not atomic. Not even a little bit atomic.
The redirection does two things:
It opens the file by name, creating it if necessary.
It truncates the file.
After that, the utility is started, with its stdin assigned to the opened file.
If two scripts do that more or less at the same time, they will both end up writing the same file, but since they will have independent file descriptors, each process will overwrite the other process's output, resulting in a grand interleaving of bytes, some from one process and some from the other.
Another common race condition relates to the fact that the file is truncated (by the shell) before the utility starts executing. Consequently, even if the utility only writes a single line to the file, it is possible that a concurrent utility which reads the file will find that it is empty.
It's not atomic. You can verify it easily yourself:
( { echo a ; sleep 3; echo b ; } > 1) &
( { echo c ; sleep 1 ; echo d ; } > 1 )&
sleep 5 ; cat 1
bash > is done with an open with the flags (for bash 4.3.30 at least) O_TRUNC | O_WRONLY | O_CREAT (from line 707 of make_cmd.c)
So each will truncate the file, and write to it. If a previous process still had an open file handle it would continue writing, and at its seek position into the file, being unaware that another process had truncated the file.
Related
We have a large number of files in a directory which need to be processed by a program called process, which takes two aguments infile and outfile.
We want to name the outfile after infile and add a suffix. E.g. for processing a single file, we would do:
process somefile123 somefile123-processed
How can we process all files at once from the command line?
(This is in a bash command line)
As #Cyrus says in the comments, the usual way to do that would be:
for f in *; do process "$f" "${f}-processed" & done
However, that may be undesirable and lead to a "thundering herd" of processes if you have thousands of files, so you might consider GNU Parallel which is:
more controllable,
can give you progress reports,
is easier to type
and by default runs one process per CPU core to keep them all busy. So, that would become:
parallel process {} {}-processed ::: *
I'm working with Bash script and meeting such a situation:
one bash script will write things into a file, and the other bash script will read things from the same file.
In this case, is lockfile necessary? I think I don't need to use lockfile because there are only one reading process and only one writing process but I'm not sure.
Bash write.sh:
#!/bin/bash
echo 'success' > tmp.log
Bash read.sh:
#!/bin/bash
while :
do
line=$(head -n 1 ./tmp.log)
if [[ "$line" == "success" ]]; then
echo 'done'
break
else
sleep 3
fi
done
BTW, the write.sh could write several key words, such as success, fail etc.
While many programmers ignore this, you can potentially run into a problem because writing to the file is not atomic. When the writer does
echo success > tmp.log
it could be split into two (or more) parts: first it writes suc, then it writes cess\n.
If the reader executes between those steps, it might get just suc rather than the whole success line. Using a lockfile would prevent this race condition.
This is unlikely to happen with short writes from a shell echo command, which is why most programmers don't worry about it. However, if the writer is a C program using buffered output, the buffer could be flushed at arbitrary times, which would likely end with a partial line.
Also, since the reader is reading the file from the beginning each time, you don't have to worry about starting the read where the previous one left off.
Another way to do this is for the writer to write into a file with a different name, then rename the file to what the reader is looking for. Renaming is atomic, so you're guaranteed to read all of it or nothing.
At least from your example, it doesn't look like read.sh really cares about what gets written to tmp.log, only that write.sh has created the file. In that case, all read.sh needs to check is that the file exists.
write.sh can simply be
: > tmp.log
and read.sh becomes
until [ -e tmp.log ]; do
sleep 3
done
echo "done"
I have a while loop in a bash script:
Example:
while read LINE
do
echo $LINE >> $log_file
done < ./sample_file
My question is why when I delete the sample_file while the script is running the loop doesn't end and I see that the log_file is updating? How the loop is continuing while there is no input?
In unix, a file isn't truly deleted until the last directory entry for it is removed (e.g. with rm) and the last open file handle for it is closed. See this question (especially MarkR's answer) for more info. In the case of your script, the file is opened as stdin for the while read loop, and until that loop exits (or closes its stdin), rming the file will not actually delete it off disk.
You can see this effect pretty easily if you want. Open three terminal windows. In the first, run the command cat >/tmp/deleteme. In the second, run tail -f /tmp/deleteme. In the third, after running the other two commands, run rm /tmp/deleteme. At this point, the file has been unlinked, but both the cat and tail processes have open file handles for it, so it hasn't actually been deleted. You can prove this by typing into the first terminal window (running cat), and every time your hit return, tail will see the new line added to the file and display it in the second window.
The file will not actually be deleted until you end those two commands (Control-D will end cat, but you need Control-C to kill tail).
See "Why file is accessible after deleting in unix?" for an excellent explanation of what you are observing here.
In short...
Underlying rm and any other command that may appear to delete a file
there is the system call unlink. And it's called unlink, not remove or
deletefile or anything similar, because it doesn't remove a file. It
removes a link (a.k.a. directory entry) which is an association
between a file and a name in a directory.
You can use the function truncate to destroy the actual contents (or shred if you need to be more secure), which would immediately halt the execution of your example loop.
The moment shell executes the while loop, the sample_file contents have been read, and it does not matter whether the file exists or not after that point.
Test script:
$ cat test.sh
#!/bin/bash
while read line
do
echo $line
sleep 1
done < data_file
Test file:
$ seq 1 10 > data_file
Now, in one terminal you run the script, in another terminal, you go and delete the file data_file, you would still see the 1 to 10 numbers printed by the script.
i spent the better part of the day looking for a solution to this problem and i think i am nearing the brink ... What i need to do in bash is: write 1 script that will periodicly read your inputs and write them into a file and second script that will periodicly print out the complete file BUT only when something new gets written in, meaning it will never write 2 same outputs 1 after another. 2 scripts need to comunicate by the means of a lock, meaning script 1 will lock a file so that script 2 cant print anything out of it, then script 1 will write something new into that file and unlock it ( and then script 2 can print updated file ).
The only hints we got was the usage of flock and lockfile - didnt get any hints on how to use them, exept that problem MUST be solved by flock or lockfile.
edit: When i said i was looking for a solution i ment i tried every single combination of flock with those flags and i just couldnt get it to work.
I will write pseudo code of what i want to do. A thing to note here is that this pseudocode is basicly the same as it is done in C .. its so simple, i dont know why everything has to be so complicated in bash.
script 1:
place a lock on file text.txt ( no one else can read it or write to it)
read input
place that input into file ( not deleting previous text )
remove lock on file text.txt
repeat
script 2:
print out complete text.txt ( but only if it is not locked, if it is locked obviously you cant)
repeat
And since script 2 is repeating all the time, it should print the complete text.txt ONLY when something new was writen to it.
I have about 100 other commands like flock that i have to learn in a very short time and i spent 1 day only for 1 of those commands. It would be kind of you to at least give me a hint. As for man page ...
I tried to do something like flock -x text.txt -c read > text.txt, tried every other combination also, but nothing works. It takes only 1 command, wont accept arguments. I dont even know why there is an option for command. I just want it to place a lock on file, write into it and then unlock it. In c it only takes flock("text.txt", ..).
Let's look at what this does:
flock -x text.txt -c read > text.txt
First, it opens test.txt for write (and truncates all contents) -- before doing anything else, including calling flock!
Second, it tells flock to get an exclusive lock on the file and run the command read.
However, read is a shell builtin, not an external command -- so it can't be called by a non-shell process at all, mooting any effect that it might otherwise have had.
Now, let's try using flock the way the man page suggests using it:
{
flock -x 3 # grab a lock on file descriptor #3
printf "Input to add to file: " # Prompt user
read -r new_input # Read input from user
printf '%s\n' "$new_input" >&3 # Write new content to the FD
} 3>>text.txt # do all this with FD 3 open to text.txt
...and, on the read end:
{
flock -s 3 # wait for a read lock
cat <&3 # read contents of the file from FD 3
} 3<text.txt # all of this with text.txt open to FD 3
You'll notice some differences from what you were trying before:
The file descriptor used to grab the lock is in append mode (when writing to the end), or in read mode (when reading), so you aren't overwriting the file before you even grab the lock.
We're running the read command (which, again, is a shell builtin, and so can only be run directly by the shell) by the shell directly, rather than telling the flock command to invoke it via the execve syscall (which is, again, impossible).
I have a program that users can run using the command line. Once running, it receives and processes commands from the keyboard. It's possible to start the program with input from disk like so: $ ./program < startScript.sh. However, the program quits as soon as the script finishes running. Is there a way to start the program with input from disk using < that does not quit the program and allows additional keyboard input to be read?
(cat foo.txt; cat) | ./program
I.e., create a subshell (that's what the parentheses do) which first outputs the contents of foo.txt and after that just copies whatever the user types. Then, take the combined output of this subshell and pipe it into stdin of your program.
Note that this also works for other combinations. Say you want to start a program that always asks the same questions. The best approach would be to use "expect" and make sure the questions didn't change, but for a quick workaround, you can also do something like this:
(echo y; echo $file; echo n) | whatever
Use system("pause")(in bash it's just "pause") in your program so that it does not exit immediatly.
There are alternatives such as
dummy read
infinite loop
sigsuspend
many more
Why not try something like this
BODY=`./startScript.sh`
if [ -n "$BODY" ]
then cat "$BODY" |./program
fi
That depends on how the program is coded. This cannot be achieved from writing code in startScript.sh, if that is what you're trying to achieve.
What you could do is write a callingScript.sh that asks for the input first and then calls the program < startScript.sh.