Can "\Device\NamedPipe\\Win32Pipes" handles cause "Too many open files" error? - windows

Continuing from this question:
When I am trying to do fopen on Windows, I get a "Too many open files" error. I tried to analyze, how many open files I have, and seems like not too much.
But when I executed Process Explorer, I noticed that I have many open handles with similar names: "\Device\NamedPipe\Win32Pipes.00000590.000000e2", "\Device\NamedPipe\Win32Pipes.00000590.000000e3", etc. I see that the number of these handles is exactly equal to the number of the iterations that my program executed, before it returned "Too many open files" and stopped.
I am looking for an answer, what are these handles, and could they actually cause the "Too many open files" error?
In my program I am loading files from remote drive, and I am creating TCP/IP connections. Could one of these operations create these handles?

Are you remembering to fclose() your pipe each time through the iteration? (see -> below).
If not you are leaking open pipes.
for(i = 0; i < lotsOfIterations; i++)
{
FILE *fp;
fp = fopen(filename[i], "r");
if (fp != NULL)
{
... do work, etc
fclose(fp); // finished with this file handle (add this line!)
}
}
However, if your intent is have a lot of file handles open at once, then the other thing to be aware of is that the C runtime typically specifies a number of file handles you can have open at any one time. This number will typically be a lot less than the operating system is capable of providing. To use the OS provided file handles you will need to use Win32/Win64 API functions:
CreateFile
ReadFile
WriteFile
GetFileSize
CloseHandle
OS provided file handles are of type HANDLE not FILE *

Related

Windows (ReFS,NTFS) file preallocation hint

Assume I have multiple processes writing large files (20gb+). Each process is writing its own file and assume that the process writes x mb at a time, then does some processing and writes x mb again, etc..
What happens is that this write pattern causes the files to be heavily fragmented, since the files blocks get allocated consecutively on the disk.
Of course it is easy to workaround this issue by using SetEndOfFile to "preallocate" the file when it is opened and then set the correct size before it is closed. But now an application accessing these files remotely, which is able to parse these in-progress files, obviously sees zeroes at the end of the file and takes much longer to parse the file.
I do not have control over the this reading application so I can't optimize it to take zeros at the end into account.
Another dirty fix would be to run defragmentation more often, run Systernal's contig utility or even implement a custom "defragmenter" which would process my files and consolidate their blocks together.
Another more drastic solution would be to implement a minifilter driver which would report a "fake" filesize.
But obviously both solutions listed above are far from optimal. So I would like to know if there is a way to provide a file size hint to the filesystem so it "reserves" the consecutive space on the drive, but still report the right filesize to applications?
Otherwise obviously also writing larger chunks at a time obviously helps with fragmentation, but still does not solve the issue.
EDIT:
Since the usefulness of SetEndOfFile in my case seems to be disputed I made a small test:
LARGE_INTEGER size;
LARGE_INTEGER a;
char buf='A';
DWORD written=0;
DWORD tstart;
std::cout << "creating file\n";
tstart = GetTickCount();
HANDLE f = CreateFileA("e:\\test.dat", GENERIC_ALL, FILE_SHARE_READ, NULL, CREATE_ALWAYS, 0, NULL);
size.QuadPart = 100000000LL;
SetFilePointerEx(f, size, &a, FILE_BEGIN);
SetEndOfFile(f);
printf("file extended, elapsed: %d\n",GetTickCount()-tstart);
getchar();
printf("writing 'A' at the end\n");
tstart = GetTickCount();
SetFilePointer(f, -1, NULL, FILE_END);
WriteFile(f, &buf,1,&written,NULL);
printf("written: %d bytes, elapsed: %d\n",written,GetTickCount()-tstart);
When the application is executed and it waits for a keypress after SetEndOfFile I examined the on disc NTFS structures:
The image shows that NTFS has indeed allocated clusters for my file. However the unnamed DATA attribute has StreamDataSize specified as 0.
Systernals DiskView also confirms that clusters were allocated
When pressing enter to allow the test to continue (and waiting for quite some time since the file was created on slow USB stick), the StreamDataSize field was updated
Since I wrote 1 byte at the end, NTFS now really had to zero everything, so SetEndOfFile does indeed help with the issue that I am "fretting" about.
I would appreciate it very much that answers/comments also provide an official reference to back up the claims being made.
Oh and the test application outputs this in my case:
creating file
file extended, elapsed: 0
writing 'A' at the end
written: 1 bytes, elapsed: 21735
Also for sake of completeness here is an example how the DATA attribute looks like when setting the FileAllocationInfo (note that the I created a new file for this picture)
Windows file systems maintain two public sizes for file data, which are reported in the FileStandardInformation:
AllocationSize - a file's allocation size in bytes, which is typically a multiple of the sector or cluster size.
EndOfFile - a file's absolute end of file position as a byte offset from the start of the file, which must be less than or equal to the allocation size.
Setting an end of file that exceeds the current allocation size implicitly extends the allocation. Setting an allocation size that's less than the current end of file implicitly truncates the end of file.
Starting with Windows Vista, we can manually extend the allocation size without modifying the end of file via SetFileInformationByHandle: FileAllocationInfo. You can use Sysinternals DiskView to verify that this allocates clusters for the file. When the file is closed, the allocation gets truncated to the current end of file.
If you don't mind using the NT API directly, you can also call NtSetInformationFile: FileAllocationInformation. Or even set the allocation size at creation via NtCreateFile.
FYI, there's also an internal ValidDataLength size, which must be less than or equal to the end of file. As a file grows, the clusters on disk are lazily initialized. Reading beyond the valid region returns zeros. Writing beyond the valid region extends it by initializing all clusters up to the write offset with zeros. This is typically where we might observe a performance cost when extending a file with random writes. We can set the FileValidDataLengthInformation to get around this (e.g. SetFileValidData), but it exposes uninitialized disk data and thus requires SeManageVolumePrivilege. An application that utilizes this feature should take care to open the file exclusively and ensure the file is secure in case the application or system crashes.

Is writing to a unix file through shell script is synchronized?

i have a requirement where many threads will call same shell script to perform a work, and then will write output(data as single text line) to a common text file.
as here many threads will try to write data to same file, my question is whether unix provides a default locking mechanism so that all can not write at the same time.
Performing a short single write to a file opened for append is mostly atomic; you can get away with it most of the time (depending on your filesystem), but if you want to be guaranteed that your writes won't interrupt each other, or to write arbitrarily long strings, or to be able to perform multiple writes, or to perform a block of writes and be assured that their contents will be next to each other in the resulting file, then you'll want to lock.
While not part of POSIX (unlike the C library call for which it's named), the flock tool provides the ability to perform advisory locking ("advisory" -- as opposed to "mandatory" -- meaning that other potential writers need to voluntarily participate):
(
flock -x 99 || exit # lock the file descriptor
echo "content" >&99 # write content to that locked FD
) 99>>/path/to/shared-file
The use of file descriptor #99 is completely arbitrary -- any unused FD number can be chosen. Similarly, one can safely put the lock on a different file than the one to which content is written while the lock is held.
The advantage of this approach over several conventional mechanisms (such as using exclusive creation of a file or directory) is automatic unlock: If the subshell holding the file descriptor on which the lock is held exits for any reason, including a power failure or unexpected reboot, the lock will be automatically released.
my question is whether unix provides a default locking mechanism so
that all can not write at the same time.
In general, no. At least not something that's guaranteed to work. But there are other ways to solve your problem, such as lockfile, if you have it available:
Examples
Suppose you want to make sure that access to the file "important" is
serialised, i.e., no more than one program or shell script should be
allowed to access it. For simplicity's sake, let's suppose that it is
a shell script. In this case you could solve it like this:
...
lockfile important.lock
...
access_"important"_to_your_hearts_content
...
rm -f important.lock
...
Now if all the scripts that access "important" follow this guideline,
you will be assured that at most one script will be executing between
the 'lockfile' and the 'rm' commands.
But, there's actually a better way, if you can use C or C++: Use the low-level open call to open the file in append mode, and call write() to write your data. With no locking necessary. Per the write() man page:
If the O_APPEND flag of the file status flags is set, the file offset
shall be set to the end of the file prior to each write and no
intervening file modification operation shall occur between changing
the file offset and the write operation.
Like this:
// process-wide global file descriptor
int outputFD = open( fileName, O_WRONLY | O_APPEND, 0600 );
.
.
.
// write a string to the file
ssize_t writeToFile( const char *data )
{
return( write( outputFD, data, strlen( data ) );
}
In practice, you can write anything to the file - it doesn't have to be a NUL-terminated character string.
That's supposed to be atomic on writes up to PIPE_BUF bytes, which is usually something like 512, 4096, or 5120. Some Linux filesystems apparently don't implement that properly, so you may in practice be limited to about 1K on those file systems.

LockFileEx returns success, but seems to have no effect

I'm trying to lock a file, because it is sitting on a network drive, and multiple instances of a program from multiple computers need to edit it. To prevent damage, I intend to set it up so that only one of the instances has rights to it at a time.
I implemented a lock, which would theoretically lock the first 100 bytes of the file from any access. I'm using Qt with its own file handling, but it has a method of returning a generic file handle.
QFile file(path);
HANDLE handle = (HANDLE)_get_osfhandle(file.handle());
OVERLAPPED ov1;
memset(&ov1, 0, sizeof(ov1));
ov1.Offset = 0;
ov1.OffsetHigh = 0;
if (handle == INVALID_HANDLE_VALUE)
{
// error
return;
}
LockFileEx(handle, LOCKFILE_FAIL_IMMEDIATELY | LOCKFILE_EXCLUSIVE_LOCK, 0, 100, 0, &ov1);
qDebug() << file.readLine();
LockFileEx() returns 1, so it seems to have been successful. However, if I run the program in multiple instances, all of them can read and print the first line of the file. More than this, I can freely edit the file with any text editor.
Being a network file is not an issue, as it behaves similarly with a local file.
The problem was that, while the program does not terminate, the QFile variable was local, so after finishing the function, the destructor of the QFile was called, so it released the file. The OS then seemed to have released the lock.
If my QFile survives the scope, everything works just fine. A minor issue is, that while I expected the file to be locked against reading, external programs do have a read-only access to it. It's not a problem, as my program can check whether it can create a lock, and detect failure to do so. This means that the intended mutex functionality works.

file_operations Question, how do i know if a process that opened a file for writing has decided to close it?

I'm currently writing a simple "multicaster" module.
Only one process can open a proc filesystem file for writing, and the rest can open it for reading.
To do so i use the inode_operation .permission callback, I check the operation and when i detect someone open a file for writing I set a flag ON.
i need a way to detect if a process that opened a file for writing has decided to close the file so i can set the flag OFF, so someone else can open for writing.
Currently in case someone is open for writing i save the current->pid of that process and when the .close callback is called I check if that process is the one I saved earlier.
Is there a better way to do that? Without saving the pid, perhaps checking the files that the current process has opened and it's permission...
Thanks!
No, it's not safe. Consider a few scenarios:
Process A opens the file for writing, and then fork()s, creating process B. Now both A and B have the file open for writing. When Process A closes it, you set the flag to 0 but process B still has it open for writing.
Process A has multiple threads. Thread X opens the file for writing, but Thread Y closes it. Now the flag is stuck at 1. (Remember that ->pid in kernel space is actually the userspace thread ID).
Rather than doing things at the inode level, you should be doing things in the .open and .release methods of your file_operations struct.
Your inode's private data should contain a struct file *current_writer;, initialised to NULL. In the file_operations.open method, if it's being opened for write then check the current_writer; if it's NULL, set it to the struct file * being opened, otherwise fail the open with EPERM. In the file_operations.release method, check if the struct file * being released is equal to the inode's current_writer - if so, set current_writer back to NULL.
PS: Bandan is also correct that you need locking, but the using the inode's existing i_mutex should suffice to protect the current_writer.
I hope I understood your question correctly: When someone wants to write to your proc file, you set a variable called flag to 1 and also save the current->pid in a global variable. Then, when any close() entry point is called, you check current->pid of the close() instance and compare that with your saved value. If that matches, you turn flag to off. Right ?
Consider this situation : Process A wants to write to your proc resource, and so you check the permission callback. You see that flag is 0, so you can set it to 1 for process A. But at that moment, the scheduler finds out process A has used up its time share and chooses a different process to run(flag is still o!). After sometime, process B comes up wanting to write to your proc resource also, checks that the flag is 0, sets it to 1, and then goes about writing to the file. Unfortunately at this moment, process A gets scheduled to run again and since, it thinks that flag is 0 (remember, before the scheduler pre-empted it, flag was 0) and so sets it to 1 and goes about writing to the file. End result : data in your proc resource goes corrupt.
You should use a good locking mechanism provided by the kernel for this type of operation and based on your requirement, I think RCU is the best : Have a look at RCU locking mechanism

Change default Console I/O functions handle

Is it possible to somehow change standart I/O functions handle on Windows? Language preffered is C++. If I understand it right, by selecting console project, compiler just pre-allocate console for you, and operates all standart I/O functions to work with its handle. So, what I want to do is to let one Console app actually write into another app Console buffer. I though that I could get first´s Console handle, than pass it to second app by a file (I don´t know much about interprocess comunication, and this seems easy) and than somehow use for example prinf with the first app handle. Can this be done? I know how to get console handle, but I have no idea how to redirect printf to that handle. Its just study-purpose project to more understand of OS work behind this. I am interested in how printf knows what Console it is assiciated with.
If I understand you correctly, it sounds like you want the Windows API function AttachConsole(pid), which attaches the current process to the console owned by the process whose PID is pid.
If I understand you correct you can find the source code of application which you want to write in http://msdn.microsoft.com/en-us/library/ms682499%28VS.85%29.aspx. This example show how to write in stdin of another application and read it's stdout.
For general understanding. Compiler don't "pre-allocate console for you". Compiler use standard C/C++ libraries which write in the output. So if you use for example printf() the following code will be executed at the end will look like:
void Output (PCWSTR pszwText, UINT uTextLenght) // uTextLenght is Lenght in charakters
{
DWORD n;
UINT uCodePage = GetOEMCP(); // CP_OEMCP, CP_THREAD_ACP, CP_ACP
PSTR pszText = _alloca (uTextLenght);
// in the console are typically not used UNICODE, so
if (WideCharToMultiByte (uCodePage, 0, pszwText, uTextLenght,
pszText, uTextLenght, NULL, NULL) != (int)uTextLenght)
return;
WriteFile (GetStdHandle (STD_OUTPUT_HANDLE), pszText, uTextLenght, &n, NULL);
//_tprintf (TEXT("%.*ls"), uTextLenght, pszText);
//_puttchar();
//fwrite (pszText, sizeof(TCHAR), uTextLenght, stdout);
//_write (
}
So if one changes the value of STD_OUTPUT_HANDLE all output will be go to a file/pipe and so on. If instead of WriteFile the program use WriteConsole function such redirection will not works, but standard C/C++ library don't do this.
If you want redirect of stdout not from the child process but from the current process you can call SetStdHandle() directly (see http://msdn.microsoft.com/en-us/library/ms686244%28VS.85%29.aspx).
The "allocating of console" do a loader of operation system. It looks the word of binary EXE file (in the Subsystem part of IMAGE_OPTIONAL_HEADER see http://msdn.microsoft.com/en-us/library/ms680339%28VS.85%29.aspx) and if the EXE has 3 on this place (IMAGE_SUBSYSTEM_WINDOWS_CUI), than it use console of the parent process or create a new one. One can change a little this behavior in parameters of CreateProcess call (but only if you start child process in your code). This Subsystem flag of the EXE you define with respect of linker switch /subsystem (see http://msdn.microsoft.com/en-us/library/fcc1zstk%28VS.80%29.aspx).
If you want to redirect printf to a handle (FILE*), just do
fprintf(handle, "...");
For example replicating printf with fprintf
fprintf(stdout, "...");
Or error reporting
fprintf(stderr, "FATAL: %s fails", "smurf");
This is also how you write to files. fprintf(file, "Blah.");

Resources