linux inotify returns same watch descriptor as before - linux-kernel

I have a program which uses linux inotify syscall to monitor files generated in a folder.
The program monitors the file sizes, so uses a IN_MODIFY flag for the file. Since file could be written at a fast rate, and we don't want inotify queue to get overflowed, the mask also uses IN_ONESHOT mask which makes inotify to delete the watch when it sends an event for file modification. The program then adds a watch again. and the process repeats, as described in the following loop.
eventLoop:
program adds a watch on a file, gets a watch fd (say a0)
file gets modified, program gets an event on watch fd a0
(since IN_ONESHOT was used, the watch is auto deleted by inotify now.)
program handles the modify (0x2) event and does its logic
program adds a watch again on the same file, gets a new watch fd (say a1) (this step is same as step 1.)
While testing it is observed that after some iterations of the above loop, the watch fd returned in the step 5 is same as one returned in step 1. and once this happens, there are no more events received for file changes. the loop essentially halts.
To find out more about the behavior, I executed the same program under strace, the same problem occurs, but after a really long time. If earlier this was occurring in 2 minutes, with strace this problem occurs after 3-4 hours. But it does occur.
Is this a known issue? Am i doing something wrong in my code? (The code is in golang, and running in a container on a kubernetes cluster)
Just the fact that under strace the problem takes longer to appear make me wonder if the problem lies inside inotify. (I am figuring out how to trace the kernel, but i may not succeed in that)
Update
I modified the program to add a retry if the watch fd returned by inotify is same as the old one. After adding this retry in a loop makes the program to continue the expected behavior.

Related

QProcess finished Signal

We have a QProcess that runs a bash script. The script finishes properly and produces expected output, but the finished signal takes a very long time (minutes) afterward to emit. Basically, our script is generating an encrypted tarball from a list of files fed as an argument. The final bundle is sitting there on disk, intact, but Process takes a very long time to return. This is preventing our UI from moving on to the next task, because we need to ensure the script has run to completion programatically, instead of through inspection. We're not doing anything other than
connect(myProcess, SIGNAL(finished()), mySlot, SLOT(tidyUp()));
myProcess.start();
We can monitor the size of the file with Qt, and we have an estimate of its final size based on the file list we feed the script, but the script hangs around for a very long time after the file has reached its estimated size. We've inserted sync statements, but that doesn't seem to have any effect. When the script is run on the command line, the file grows, and the script stops as soon as it reaches its final size.
Why is QProcess not sending it's finished signal immediately after the script completes?
We would very much like to attach a progress bar indicating percentage of file size produced, or give some other indication of progress, but we're stumped by this behavior. We've tried using both a worker thread moved to a QThread, and running the QProcess directly in a busy loop, calling processEvents(), to no avail.
Turns out this was a problem with the commands I had in my pipe. GPG plunks out the file fairly quickly (quite variable timing, though) but then often spends quite a lot of time idling/working after the file itself has reached its final size. I'm not sure what it's doing, or why it only does on some runs of the same content, but eventually it finishes, the script completes, and I get my finished() signal delivered. I may have to put a more elaborate progress bar in place that switches to busy wait if the file size hasn't changed for a while, but it appears that Qt is working as expected here after all.

What happens if another process tries to write to a flock(2)'d file?

Specifically, if the following events take place in the given order:
Process 1 opens a file in append mode.
Process 2 opens the same file in append mode.
Process 2 gets an exclusive lock using flock(2) on the file descriptor.
Process 1 attempts to write to the file.
What happens?
Will the write return immediately with a code indicating failure? Will it hang until the lock is released, then write and return success? Does the behavior vary by kernel? It seems odd that the documentation doesn't cover this case.
(I could write a couple processes to test it on my system, but I don't know whether my test would be representative of the general case, and if anyone does know, I can anticipate this answer saving a lot of other people a lot of time.)
The write proceeds as normal. flock provides advisory locking. Locking a file exclusively only prevents others from getting a shared or exclusive lock on the same file. Calls other than flock are not affected.

Maximum size of pipe used by CreateProcess

I'm currently using this example as a guide to redirect standard error of a child process launched by CreateProcess.
However unlike the example currently I'm waiting until the process finishes (checking GetExitCodeProcess), closing the pipe and then reading the error if a non-zero return code comes back.
However I've since read if the pipe fills up the client process will block until the pipe is cleared. The reason I'm not currently reading from the pipe during execution is that the ReadFile call blocks during execution (standard error is only output at the end) so I can't pump the message queue to avoid the GUI from "ghosting" and being marked not responding.
I can't find any reference to how big the pipe is by default (although I can set a size myself), is this something I need to worry about given I'm buffering the output into a string variable for later use anyway? (ie. it would need to fit into the available memory for the process so it has a hard limit there, it's not going to a file like most of the examples have)

Controlling an interactive command-line utility from a Cocoa app - trouble with ptys

What I'm trying to do
My Cocoa app needs to run a bunch of command-line programs. Most of these are non-interactive, so I launch them with some command-line arguments, they do their thing, output something and quit. One of the programs is interactive, so it outputs some text and a prompt to stdout and then expects input on stdin and this keeps going until you send it a quit command.
What works
The non-interactive programs, which just dump a load of data to stdout and then terminate, are comparatively trivial:
Create NSPipes for stdout/stdin/stderr
Launch NSTask with those pipes
Then, either
get the NSFileHandle for the other end of the pipe to read all data until the end of the stream and process it in one go when the task ends
or
Get the -fileDescriptors from the NSFileHandle of the other end of the output pipes.
Set the file descriptor to use non-blocking mode
Create a GCD dispatch source with each of those file descriptors using dispatch_source_create(DISPATCH_SOURCE_TYPE_READ, ...
Resume the dispatch source and handle the data it throws at you using read()
Keep going until the task ends and the pipe file descriptor reports EOF (read() reports 0 bytes read)
What doesn't work
Either approach completely breaks down for interactive tools. Obviously I can't wait until the program exits because it's sitting at a command prompt and never will exit unless I tell it to. On the other hand, NSPipe buffers the data, so you receive it in buffer-sized chunks, unless the CLI program happens to flush the pipe explicitly, which the one in my case does not. The initial command prompt is much smaller than the buffer size, so I don't receive anything, and it just sits there. So NSPipe is also a no-go.
After some research, I determined that I needed to use a pseudo-terminal (pty) in place of the NSPipe. Unfortunately, I've had nothing but trouble getting it working.
What I've tried
Instead of the stdout pipe, I create a pty like so:
struct termios termp;
bzero(&termp, sizeof(termp));
int res = openpty(&masterFD, &slaveFD, NULL, &termp, NULL);
This gives me two file descriptors; I hand the slaveFD over to an NSFileHandle, which gets passed to the NSTask for either just stdout or both stdout and stdin. Then I try to do the usual asynchronous reading from the master side.
If I run the program I'm controlling in a Terminal window, it starts off by outputting 2 lines of text, one 18 bytes long including the newline, one 22 bytes and with no newline for the command prompt. After those 40 bytes it waits for input.
If I just use the pty for stdout, I receive 18 bytes of output (exactly one line, ending in newline) from the controlled program, and no more. Everything just sits there after the initial 18 bytes, no more events - the GCD event source's handler doesn't get called.
If I also use the pty for stdin, I usually receive 19 bytes of output (the aforementioned line plus one character from the next line) and then the controlled program dies immediately. If I wait a little before attempting to read the data (or scheduling noise causes a small pause), I actually get the whole 40 bytes before the program again dies instantly.
An additional dead end
At one point I was wondering if my async reading code was flawed, so I re-did everything using NSFileHandles and its -readInBackgroundAndNotify method. This behaved the same as when using GCD. (I originally picked GCD over the NSFileHandle API as there doesn't appear to be any async writing support in NSFileHandle)
Questions
Having arrived at this point after well over a day of futile attempts, I could do with some kind of help. Is there some fundamental problem with what I'm trying to do? Why does hooking up stdin to the pty terminate the program? I'm not closing the master end of the pty, so it shouldn't be receiving EOF. Leaving aside stdin, why am I only getting one line's worth of output? Is there a problem with the way I'm performing I/O on the pty's file descriptor? Am I using the master and slave ends correctly - master in the controlling process, slave in the NSTask?
What I haven't tried
I so far have only performed non-blocking (asynchronous) I/O on pipes and ptys. The only thing I can think of is that the pty simply doesn't support that. (if so, why does fcntl(fd, F_SETFL, O_NONBLOCK); succeed though?) I can try doing blocking I/O on background threads instead and send messages to the main thread. I was hoping to avoid having to deal with multithreading, but considering how broken all these APIs seem to be, it can't be any more time consuming than trying yet another permutation of async I/O. Still, I'd love to know what exactly I'm doing wrong.
The problem is likely that the stdio library inside is buffering output. The output will only appear in the read pipe when the command-line program flushes it, either because it writes a "\n" via the stdio library, or fflush()s, or the buffer gets full, or exits (which causes the stdio library to automatically flush any output still buffered), or possibly some other conditions. If those printf strings were "\n"-terminated, then you MIGHT the output quicker. That's because there are three output buffering styles -- unbuffered, line-buffered (\n causes a flush), and block buffered (when the output buffer gets full, it's auto-flushed).
Buffering of stdout is line-buffered by default if the output file descriptor is a tty (or pty); otherwise, block buffered. stderr is by default unbuffered. The setvbuf() function is used to change the buffering mode. These are all standard BSD UNIX (and maybe general UNIX) things I've described here.
NSTask does not do any setting up of ttys/ptys for you. It wouldn't help in this case anyway since the printfs aren't printing out \n.
Now, the problem is that the setvbuf() needs to be executed inside the command-line program. Unless (1) you have the source to the command-line program and can modify it and use that modified program, or (2) the command-line program has a feature that allows you to tell it to not buffer its output [ie, call setvbuf() itself], there's no way to change this, that I know of. The parent simply cannot affect the subprocess in this way, either to force flushing at certain points or change the stdio buffering behavior, unless the command-line utility has those features built into it (which would be rare).
Source: Re: NSTask, NSPipe's and interactive UNIX command

Difference between CFRunLoopRemoveSource and CFRunLoopSourceInvalidate

I was debugging a crash in my HID driver code on the Mac and found that the crash happened in the CFRunLoop. In the driver code I open the USB handles for the devices which match the VID and the PID which match my HID device and then set up an Interrupt call back for it using setInterruptReportHandlerCallback function and then add it to the CFRunLoop using CFRunLoopAddSource call. In my call to the close handles I freed them up using CFRunLoopRemoveSource and then a CFRelease on the CFRunLoopSourceRef .
The problem occurs when I try to Open the handles wait for a while( 5ms) and then close the handles in a loop.
When I searched for the problem I came across a link where they had a similar problem to mine http://lists.apple.com/archives/usb/.../msg00099.html where they had used CFRunLoopSourceInvalidate call instead of teh Remove Source call. When I changed it to Invalidate source in my close handles call, it fixed my crash. I wanted to know what is the difference between the crash and why this call fixed my crash?
Thanks
jbsp72
First, let me thank you. I type CFRunLoopRemoveSource in google, find your message which is exactly the problem I was trying to solve, and your solution by calling CFRunLoopSourceInvalidate instead also solves my problem.
Now, the difference between CFRunLoopRemoveSource an CFRunLoopSourceInvalidate is:
CFRunLoopRemoveSource removes the
source from the specific run loop you
specify.
CFRunLoopSourceInvalidate renders the
source invalid, and will remove it
from all the run loops where was
added.
Now, the crash, which I suspect is the same as the one I got, is that the run loop the source was added to has disappeared, and trying to remove the source from it results in a crash. Actually, an infinite loop in __spin_lock in my case.
Now, how can a run loop disappear? Run loops are tied to threads. You create a new thread, you have a new run loop, automatically. If a thread ends, the run loop disappears with it. The thread I attached the run loop to has exited, and subsequently removing the source from the run loop results in the crash.
The reason why invalidating the run loop solves the problem is because it removes the source from all the run loops it was added to, ignoring run loops that now do not exist anymore.

Resources