Why doesn't ParseMultipartForm leak file handles?

Why doesn't ParseMultipartForm leak file handles? - go

I’ve got an http endpoint which calls net/http.(*Request).FormFile to read a file uploaded. I noticed the returned *multipart.File is never closed with Close(). That is fine for small files as it is a no-op, but It appears that https://golang.org/src/net/http/request.go#L1369 r.ParseMultipartForm will copy the file out of memory and into a temp file if the file is larger than 32MB. You can see the os.Open call here: https://golang.org/src/mime/multipart/formdata.go?s=3614:3656#L146
AFAICT this would leak file handles, but when I examine the process, I do not see leaking file handles. Where is this cleaned up?
UPDATE: Here is a complete program for testing: https://play.golang.org/p/79_kt46t1PQ

The multipart.Form declares a RemoveAll() method, which calls os.Remove(fh.tmpfile).
This method is called either on an error during multipart.Reader.ReadForm or after the request has been served:
if w.req.MultipartForm != nil {
w.req.MultipartForm.RemoveAll()
}
The above is located at https://golang.org/src/net/http/server.go#L1957
Edit:
Actually you might be on to something. There is an open issue mentioning this. In the issue it is noted that RemoveAll gets called at the end of the request, but the files are reportedly still there.
As you also commented, this might be related to the os.Remove call being implemented (on Unix) with unlink syscall, which:
If the name was the last link to a file but any processes still
have the file open, the file will remain in existence until the
last file descriptor referring to it is closed.
So to wrap up, I think you are supposed to call Close() yourself on the multipart.File, as suggested in the same thread:
file, _, _ := r.FormFile("file")
defer file.Close()
It could be a documentation bug, though some could argue that closing files after usage is obvious, though in this particular case, the docs might be more explicit. Anyway, at this point I guess you could ask for further clarification on the linked issue.

Related

Is Close() needed for a file opened by os.Open()? [duplicate]

This question already has answers here:
Is it necessary to close files after reading (only) in any programming language?
(3 answers)
Does the file need to be closed?
(1 answer)
Closed 2 years ago.
It seems that os.Open() open read-only files. So I think there is no need to Close() it. The doc is not clear on this. Is my understanding correct?
https://golang.org/pkg/os/#Open

In general, you should always close the files you open. In a long running program, you may exhaust all available file handles if you do not close your files. That said, the Go garbage collector closes open files, so depending on your exact situation leaving files open may not be a big deal.

There is a limit to how many filehandles a process can have open at once, the limit is determined by your environment, so it's important to close them.
In addition, Windows file locking is complicated; if you hold a file open it may not be able to be written to or deleted.
Unless you're returning the open filehandle, I'd advise to always match an open with a defer file.Close()

Close releases resources that are independent of the read/write status of the file. Close the file when you are done with it.

Your best bet is to always use defer file.Close(). This function is invoked for cleanup purposes, and also releases resources that are indirectly related to the I/O operation itself.
This also holds true to HTTP/s response bodies and any data type that implements the Reader interface.

Ruby file handle management (too many open files)

I am performing very rapid file access in ruby (2.0.0 p39474), and keep getting the exception Too many open files
Having looked at this thread, here, and various other sources, I'm well aware of the OS limits (set to 1024 on my system).
The part of my code that performs this file access is mutexed, and takes the form:
File.open( filename, 'w'){|f| Marshal.dump(value, f) }
where filename is subject to rapid change, depending on the thread calling the section. It's my understanding that this form relinquishes its file handle after the block.
I can verify the number of File objects that are open using ObjectSpace.each_object(File). This reports that there are up to 100 resident in memory, but only one is ever open, as expected.
Further, the exception itself is thrown at a time when there are only 10-40 File objects reported by ObjectSpace. Further, manually garbage collecting fails to improve any of these counts, as does slowing down my script by inserting sleep calls.
My question is, therefore:
Am I fundamentally misunderstanding the nature of the OS limit---does it cover the whole lifetime of a process?
If so, how do web servers avoid crashing out after accessing over ulimit -n files?
Is ruby retaining its file handles outside of its object system, or is the kernel simply very slow at counting 'concurrent' access?
Edit 20130417:
strace indicates that ruby doesn't write all of its data to the file, returning and releasing the mutex before doing so. As such, the file handles stack up until the OS limit.
In an attempt to fix this, I have used syswrite/sysread, synchronous mode, and called flush before close. None of these methods worked.
My question is thus revised to:
Why is ruby failing to close its file handles, and how can I force it to do so?

Use dtrace or strace or whatever equivalent is on your system, and find out exactly what files are being opened.
Note that these could be sockets.
I agree that the code you have pasted does not seem to be capable of causing this problem, at least, not without a rather strange concurrency bug as well.

How do Ruby Directories work?

Dir-s seem awkward as compared to File-s. Many of the methods are similar to IO methods, but a Dir doesn't inherit from IO. For example, tell in the IO docs reads:
Returns the current offset (in bytes) of ios.
When read-ing and tell-ing through a normal Dir, I get large numbers like 346723732 and 422823816. I was originally expecting these integers to be more "array-like" and just be a simple range.
Are these the bytes of the files contained in the Dir?
If not, is there any meaning to the numbers returned like IO#tell?
Also why do Dir-s have an open and close function if they are not Streams?
Is it still just as important to close a Dir as a normal IO?
Any general explanation of how a Ruby Dir works would be appreciated.
update Another confusing part: if Dirs are not IOs, why does close raise an IOerror?
Closes the directory stream. Any further attempts to access dir will raise an IOError.
Also notice that in the documentation it considers it a "directory stream". So this brings up the question again of are they streams or not and if not, why the naming convention?

The docs for Dir#tell say:
Returns the current position in dir.
without specifying what the position means. What the returned value signifying is likely to vary based on the OS that you're using and possibly the type of the file system that contains the directory. That value should be treated as opaque, don't try to interpret it in any way. The only purpose it serves is for being able to send that value back to the OS such as by calling Dir#seek.
Directories are not just a giant file. More typically they just map from a file name to information about where the data for the file is contained.
You should not (and as far as I'm aware cannot) write to directories yourself.

So after some IRC chat here's the conclusion I've come to:
The Dir object is NOT an IO
Dir Does not inherit from the IO class and is only readable. Still not sure why an IOError is raised on #close.
An opened Dir IS a stream however
Objects of class Dir are directory streams representing directories in the underlying file system.
Also if you check the source for Dir#close You will see that it calls the C function dirclose. man dirclose prints:
The closedir() function closes the directory stream associated with
dirp. A successful call to closedir() also closes the underlying file
descriptor associated with dirp. The directory stream descriptor dirp
is not available after this call.
...with dirp being a param.
So yes, instantiated Dirs will open a stream and yes, Dirs will use a file descriptor and need to be closed if you do not want to rely on garbage collection.
Big thanks to injekt and others on #ruby-lang irc!

How to identify file being closed is modified or created in action KAUTH_FILEOP_CLOSE from Mac KEXT

Observed that FWRITE or KAUTH_FILEOP_CLOSE_MODIFIED is not consistenly set in action KAUTH_FILEOP_CLOSE during file modification or file copy.
My usecase is - I am trying to figure out whether the file being closed is modified file or newly created file. I want to ignore files that are not modified.
As per documentation, I am checking for KAUTH_FILEOP_CLOSE_MODIFIED flag when the file action is KAUTH_FILEOP_CLOSE. Most of the time, I have observed KAUTH_FILEOP_CLOSE_MODIFIED is not set when file is copied from one location to other or file is modified.
I also observed that FWRITE flag is set, but not consistently for modified or copied files. I am just wondering why the behavior is so inconsistent.
Another way I was thinking was to rely on vnode actions KAUTH_VNODE_WRITE_DATA, But I have observed that there KAUTH_VNODE_WRITE_DATA multiple calls comes after KAUTH_FILEOP_CLOSE and even when file is not modified.
Any idea why such behavior exist?
Thanks in advance.
Regards,
Rupesh

KAuth and especially KAUTH_FILEOP_CLOSE_MODIFIED is buggy, and I already reported some problems related to it to Apple (a long time ago):
Events happening on file descriptor inherited from parent process seem to not trigger call to the KAuth callback at all. (See http://www.openradar.me/8898118)
The flag KAUTH_FILEOP_CLOSE_MODIFIED is not specified when the given file has transparent zlib compression enabled. (See http://www.openradar.me/23029109)
That said, I am quite confident that (as of 10.5.x, 10.6.x, 10.7.x) the callbacks are always called directly from the kernel thread doing the syscall. For example when open(2) is called, it calls the kauth callbacks for the vnode context and then (if allowed by return value) calls the VFS driver to realize the operation. The fileop (KAUTH_FILEOP_CLOSE) works also from the same thread but is just called after the closing itself.
Hence I don't think KAUTH_VNODE_WRITE_DATA can come after KAUTH_FILEOP_CLOSE for the same event.
Either you have a bug in your code, or it is another event (e.g. next open of the same file after it was closed in the same or other process.)
Still there are some traps you must be aware of:
Any I/O which is performed by kernel itself (including other kexts) does not trigger the kauth callbacks at all.
If there are multiple callbacks for the vnode context (e.g. from multiple Kexts), kernel then calls them one by one for every event. However as soon as some of them returns KAUTH_RESULT_ALLOW or KAUTH_RESULT_DENY, it finally decides and the rest of the callbacks is not called. I.e. all callbacks are called only if all of them but the last return KAUTH_RESULT_DEFER. (AFAIK, for fileop this is not true, because in this case the return value is completely ignored.)

File Unlocking and Deleting as single operation

Please note this is not duplicate of File r/w locking and unlink. (The difference - platform. Operations of files like locking and deletion have totally different semantics, thus the sultion would be different).
I have following problem. I want to create a file system based session storage where each session data is stored in simple file named with session ids.
I want following API: write(sid,data,timeout), read(sid,data,timeout), remove(sid)
where sid==file name, Also I want to have some kind of GC that may remove all timed-out sessions.
Quite simple task if you work with single process but absolutly not trivial when working with multiple processes or even over shared folders.
The simplest solution I thought about was:
write/read:
hanlde=CreateFile
LockFile(handle)
read/write data
UnlockFile(handle)
CloseHanlde(handle)
GC (for each file in directory)
hanlde=CreateFile
LockFile(handle)
check if timeout occured
DeleteFile
UnlockFile(handle)
CloseHanlde(handle)
But AFIAK I can't call DeleteFile on opended locked file (unlike in Unix where file locking is
not mandatory and unlink is allowed for opened files.
But if I put DeleteFile outside of Locking loop bad scenario may happen
GC - CreateFile/LockFile/Unlock/CloseHandle,
write - oCreateFile/LockFile/WriteUpdatedData/Unlock/CloseHandle
GC - DeleteFile
Does anybody have an idea how such issue may be solved? Are there any tricks that allow
combine file locking and file removal or make operation on file atomic (Win32)?
Notes:
I don't want to use Database,
I look for a solution for Win32 API for NT 5.01 and above
Thanks.

I don't really understand how this is supposed to work. However, deleting a file that's opened by another process is possible. The process that creates the file has to use the FILE_SHARE_DELETE flag for the dwShareMode argument of CreateFile(). A subsequent DeleteFile() call will succeed. The file doesn't actually get removed from the file system until the last handle on it is closed.

You currently have data in the record that allows the GC to determine if the record is timed out. How about extending that housekeeping info with a "TooLateWeAlreadyTimedItOut" flag.
GC sets TooLateWeAlreadyTimedItOut = true
Release lock
<== writer comes in here, sees the "TooLate" flag and so does not write
GC deletes
In other words we're using a kind of optimistic locking approach. This does require some additional complexity in the Writer, but now you're not dependent upon any OS-specifc wrinkles.
I'm not clear what happens in the case:
GC checks timeout
GC deletes
Writer attempts write, and finds no file ...
Whatever you have planned for this case can also be used in the "TooLate" case
Edited to add:
You have said that it's valid for this sequence to occur:
GC Deletes
(Very slightly later) Writer attempts a write, sees no file, creates a new one
The writer can treat "tooLate" flag as a identical to this case. It just creates a new file, with a different name, use a version number as a trailing part of it's name. Opening a session file the first time requires a directory search, but then you can stash the latest name in the session.
This does assume that there can only be one Writer thread for a given session, or that we can mediate between two Writer threads creating the file, but that must be true for your simple GC/Writer case to work.

For Windows, you can use the FILE_FLAG_DELETE_ON_CLOSE option to CreateFile - that will cause the file to be deleted when you close the handle. But I'm not sure that this satisfies your semantics (because I don't believe you can clear the delete-on-close attribute.
Here's another thought. What about renaming the file before you delete it? You simply can't close the window where the write comes in after you decided to delete the file but what if you rename the file before deleting it? Then when the write comes in it'll see that the session file doesn't exist and recreate it.
The key thing to keep in mind is that you simply can't close the window in question. IMHO there are two solutions:
Adding a flag like djna mentioned or
Require that a per-session named mutex be acquired which has the unfortunate side effect of serializing writes on the session.
What is the downside of having a TooLate flag? In other words, what goes wrong if you delete the file prematurely? After all your system has to deal with the file not being present...

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio