Lock a file in bash using flock and lockfile - bash

i spent the better part of the day looking for a solution to this problem and i think i am nearing the brink ... What i need to do in bash is: write 1 script that will periodicly read your inputs and write them into a file and second script that will periodicly print out the complete file BUT only when something new gets written in, meaning it will never write 2 same outputs 1 after another. 2 scripts need to comunicate by the means of a lock, meaning script 1 will lock a file so that script 2 cant print anything out of it, then script 1 will write something new into that file and unlock it ( and then script 2 can print updated file ).
The only hints we got was the usage of flock and lockfile - didnt get any hints on how to use them, exept that problem MUST be solved by flock or lockfile.
edit: When i said i was looking for a solution i ment i tried every single combination of flock with those flags and i just couldnt get it to work.
I will write pseudo code of what i want to do. A thing to note here is that this pseudocode is basicly the same as it is done in C .. its so simple, i dont know why everything has to be so complicated in bash.
script 1:
place a lock on file text.txt ( no one else can read it or write to it)
read input
place that input into file ( not deleting previous text )
remove lock on file text.txt
repeat
script 2:
print out complete text.txt ( but only if it is not locked, if it is locked obviously you cant)
repeat
And since script 2 is repeating all the time, it should print the complete text.txt ONLY when something new was writen to it.
I have about 100 other commands like flock that i have to learn in a very short time and i spent 1 day only for 1 of those commands. It would be kind of you to at least give me a hint. As for man page ...
I tried to do something like flock -x text.txt -c read > text.txt, tried every other combination also, but nothing works. It takes only 1 command, wont accept arguments. I dont even know why there is an option for command. I just want it to place a lock on file, write into it and then unlock it. In c it only takes flock("text.txt", ..).

Let's look at what this does:
flock -x text.txt -c read > text.txt
First, it opens test.txt for write (and truncates all contents) -- before doing anything else, including calling flock!
Second, it tells flock to get an exclusive lock on the file and run the command read.
However, read is a shell builtin, not an external command -- so it can't be called by a non-shell process at all, mooting any effect that it might otherwise have had.
Now, let's try using flock the way the man page suggests using it:
{
flock -x 3 # grab a lock on file descriptor #3
printf "Input to add to file: " # Prompt user
read -r new_input # Read input from user
printf '%s\n' "$new_input" >&3 # Write new content to the FD
} 3>>text.txt # do all this with FD 3 open to text.txt
...and, on the read end:
{
flock -s 3 # wait for a read lock
cat <&3 # read contents of the file from FD 3
} 3<text.txt # all of this with text.txt open to FD 3
You'll notice some differences from what you were trying before:
The file descriptor used to grab the lock is in append mode (when writing to the end), or in read mode (when reading), so you aren't overwriting the file before you even grab the lock.
We're running the read command (which, again, is a shell builtin, and so can only be run directly by the shell) by the shell directly, rather than telling the flock command to invoke it via the execve syscall (which is, again, impossible).

Related

Bash differences in changes to running scripts

I'm scratching my head about two seemingly different behaviors of bash when editing a running script.
This is not a place to discuss WHY one would do this (you probably shouldn't). I would only like to try to understand what happens and why.
Example A:
$ echo "echo 'echo hi' >> script.sh" > script.sh
$ cat script.sh
echo 'echo hi' >> script.sh
$ chmod +x script.sh
$ ./script.sh
hi
$ cat script.sh
echo 'echo hi' >> script.sh
echo hi
The script edits itself, and the change (extra echo line) is directly executed. Multiple executions lead to more lines of "hi".
Example B:
Create a script infLoop.sh and run it.
$ cat infLoop.sh
while true
do
x=1
echo $x
done
$ ./infLoop.sh
1
1
1
...
Now open a second shell and edit the file changing the value of x. E.g. like this:
$ sed --in-place 's/x=1/x=2/' infLoop.sh
$ cat infLoop.sh
while true
do
x=2
echo $x
done
However, we observe that the output in the first terminal is still 1. Doing the same with only one terminal, interrupting infLoop.sh through Ctrl+Z, editing, and then continuing it via fg yields the same result.
The Question
Why does the change in example A have an immediate effect but the change in example B not?
PS: I know there are questions out there showing similar examples but none of those I saw have answers explaining the difference between the scenarios.
There are actually two different reasons that example B is different, either one of which is enough to prevent the change from taking effect. They're due to some subtleties of how sed and bash interact with files (and how unix-like OSes treat files), and might well be different with slightly different programs, etc.
Overall, I'd say this is a good example of how hard it is to understand & predict what'll happen if you modify a file while also running it (or reading etc from it), and therefore why it's a bad idea to do things like this. Basically, it's the computer equivalent of sawing off the branch you're standing on.
Reason 1: Despite the option's name, sed --in-place does not actually modify the existing file in place. What it actually does is create a new file with a temporary name, then when it's finished that it deletes the original and renames the new file into its place. It has the same name, but it's not actually the same file. You can tell this by looking at the file's inode number with ls -li:
$ ls -li infLoop.sh
88 -rwxr-xr-x 1 pi pi 39 Aug 4 22:04 infLoop.sh
$ sed --in-place 's/x=1/x=2/' infLoop.sh
$ ls -li infLoop.sh
4073 -rwxr-xr-x 1 pi pi 39 Aug 4 22:05 infLoop.sh
But bash still has the old file open (strictly speaking, it has an open file handle pointing to the old file), so it's going to continue getting the old contents no matter what changed in the new file.
Note that this doesn't apply to all programs that edit files. vim, for example, will rewrite the contents of existing files (unless file permissions forbid it, in which case it switches to the delete&replace method). Appending with >> will always append to the existing file rather than creating a new one.
(BTW, if it seems weird that bash could have a file open after it's been "deleted", that's just part of how unix-like OSes treat their files. Files are not truly deleted until their last directory entry is removed and the last open file handle referring to them is closed. Some programs actually take advantage of this for security by opening(/creating) a file and then immediately "deleting" it, so that the open file handle is the only way to reach the file.)
Reason 2: Even if you used a tool that actually modified the existing file in place, bash still wouldn't see the change. The reason for this is that bash reads from the file (parsing as it goes) until it has something it can execute, runs that, then goes back and reads more until it has another executable chunk, etc. It does not go back and re-read the same chunk to see if it's changed, so it'll only ever notice changes in parts of the file it hasn't read yet.
In example B, it has to read and parse the entire while true ... done loop before it can start executing it. Therefore, changes in that part (or possibly before it) will not be noticed once the loop has been read & started executing. Changes in the file after the loop would be noticed after the loop exited (if it ever did).
See my answer to this question (and Don Hatch's comment on it) for more info about this.

Is lockfile necessary for reading and writing the same file of two processes

I'm working with Bash script and meeting such a situation:
one bash script will write things into a file, and the other bash script will read things from the same file.
In this case, is lockfile necessary? I think I don't need to use lockfile because there are only one reading process and only one writing process but I'm not sure.
Bash write.sh:
#!/bin/bash
echo 'success' > tmp.log
Bash read.sh:
#!/bin/bash
while :
do
line=$(head -n 1 ./tmp.log)
if [[ "$line" == "success" ]]; then
echo 'done'
break
else
sleep 3
fi
done
BTW, the write.sh could write several key words, such as success, fail etc.
While many programmers ignore this, you can potentially run into a problem because writing to the file is not atomic. When the writer does
echo success > tmp.log
it could be split into two (or more) parts: first it writes suc, then it writes cess\n.
If the reader executes between those steps, it might get just suc rather than the whole success line. Using a lockfile would prevent this race condition.
This is unlikely to happen with short writes from a shell echo command, which is why most programmers don't worry about it. However, if the writer is a C program using buffered output, the buffer could be flushed at arbitrary times, which would likely end with a partial line.
Also, since the reader is reading the file from the beginning each time, you don't have to worry about starting the read where the previous one left off.
Another way to do this is for the writer to write into a file with a different name, then rename the file to what the reader is looking for. Renaming is atomic, so you're guaranteed to read all of it or nothing.
At least from your example, it doesn't look like read.sh really cares about what gets written to tmp.log, only that write.sh has created the file. In that case, all read.sh needs to check is that the file exists.
write.sh can simply be
: > tmp.log
and read.sh becomes
until [ -e tmp.log ]; do
sleep 3
done
echo "done"

bash while loop through a file doesn't end when the file is deleted

I have a while loop in a bash script:
Example:
while read LINE
do
echo $LINE >> $log_file
done < ./sample_file
My question is why when I delete the sample_file while the script is running the loop doesn't end and I see that the log_file is updating? How the loop is continuing while there is no input?
In unix, a file isn't truly deleted until the last directory entry for it is removed (e.g. with rm) and the last open file handle for it is closed. See this question (especially MarkR's answer) for more info. In the case of your script, the file is opened as stdin for the while read loop, and until that loop exits (or closes its stdin), rming the file will not actually delete it off disk.
You can see this effect pretty easily if you want. Open three terminal windows. In the first, run the command cat >/tmp/deleteme. In the second, run tail -f /tmp/deleteme. In the third, after running the other two commands, run rm /tmp/deleteme. At this point, the file has been unlinked, but both the cat and tail processes have open file handles for it, so it hasn't actually been deleted. You can prove this by typing into the first terminal window (running cat), and every time your hit return, tail will see the new line added to the file and display it in the second window.
The file will not actually be deleted until you end those two commands (Control-D will end cat, but you need Control-C to kill tail).
See "Why file is accessible after deleting in unix?" for an excellent explanation of what you are observing here.
In short...
Underlying rm and any other command that may appear to delete a file
there is the system call unlink. And it's called unlink, not remove or
deletefile or anything similar, because it doesn't remove a file. It
removes a link (a.k.a. directory entry) which is an association
between a file and a name in a directory.
You can use the function truncate to destroy the actual contents (or shred if you need to be more secure), which would immediately halt the execution of your example loop.
The moment shell executes the while loop, the sample_file contents have been read, and it does not matter whether the file exists or not after that point.
Test script:
$ cat test.sh
#!/bin/bash
while read line
do
echo $line
sleep 1
done < data_file
Test file:
$ seq 1 10 > data_file
Now, in one terminal you run the script, in another terminal, you go and delete the file data_file, you would still see the 1 to 10 numbers printed by the script.

Is bash > redirection atomic?

I've weird problem with my crontab job. My crontab job does the following:
program > file
Sometimes however file gets filled with random data that I can't explain.
I wonder if it could be the previous crontab job that's taking longer to run and it somehow mixes its results in file with current crontab job?
Overall my question is: is > operation atomic? Meaning if two programs do > file, then the last one to finish will have its data in file?
No, it's not atomic. Not even a little bit atomic.
The redirection does two things:
It opens the file by name, creating it if necessary.
It truncates the file.
After that, the utility is started, with its stdin assigned to the opened file.
If two scripts do that more or less at the same time, they will both end up writing the same file, but since they will have independent file descriptors, each process will overwrite the other process's output, resulting in a grand interleaving of bytes, some from one process and some from the other.
Another common race condition relates to the fact that the file is truncated (by the shell) before the utility starts executing. Consequently, even if the utility only writes a single line to the file, it is possible that a concurrent utility which reads the file will find that it is empty.
It's not atomic. You can verify it easily yourself:
( { echo a ; sleep 3; echo b ; } > 1) &
( { echo c ; sleep 1 ; echo d ; } > 1 )&
sleep 5 ; cat 1
bash > is done with an open with the flags (for bash 4.3.30 at least) O_TRUNC | O_WRONLY | O_CREAT (from line 707 of make_cmd.c)
So each will truncate the file, and write to it. If a previous process still had an open file handle it would continue writing, and at its seek position into the file, being unaware that another process had truncated the file.

Locking Files in Bash

I have a Problem to find a good concept on locking files in bash,
Basically I want to achieve the following:
Lock File
Read in the data in the file (multiple times)
Do stuff with the data.
Write new stuff to the file (not necessarily to the end)
Unlock that file
Doing this with flock seems not possible to me, because the file descriptor will just move once to the end of the file.
Also creating a Tempfile fails, because I might overwrite already read lines which is also not possible.
Edit:
Also note that other scripts I do not control might try to write to that file.
So my question is how can I create a lock in step 1 so it will span over steps 2,3,4 till I unlock it again in step 5?
You can do this with the flock utility. You just need to get flock to use a separate read-only file descriptor, i.e. open the file twice. E.g. to sort a file using a intermediate temporary file:
(
flock -x -w 10 100 || exit 1
tmp=$(mktemp)
sort <"$file" >"$tmp"
cat "$tmp" > "$file"
rm -f "$tmp"
) 100<"$file"
flock will issue the flock() system call for your file and block if it is already locked. If the timeout is exceeded then the script will just abort with an error code.

Resources