Writing last N bytes to file opened with FILE_FLAG_NO_BUFFERING

Writing last N bytes to file opened with FILE_FLAG_NO_BUFFERING - winapi

When writing lots of sequential data to disk I found that having an internal 4MB buffer and when opening the file for writing I specify [FILE_FLAG_NO_BUFFERING][1], so that my internal buffer is used.
But that also creates a requirement to write in full sector blocks (512 bytes on my machine).
How do I write the last N<512 bytes to disk?
Is there some flag to WriteFile to allow this?
Do I pad them with extra NUL characters and then truncate the file size down to the correct value?
(With SetFileValidData or similar?)
For those wondering the reason for trying this approach. Our application logs a lot. To handle this a dedicated log-thread exists, which formats and writes logs to disk. Also if we log with fullest detail we might log more per second than the disk-system can handle. (Usually noticed for customers with SAN systems that are not well tweaked.)
So, the goal is log write as much as possible, but also notice when we start to overload the system, and then hold back a bit, like reducing the details of the logs.
Hence the idea to have a fill a big memory-block and give that to the OS, hoping to reduce the overheads.

As the comments suggest, doing file writing this way is probably not the best solution for real world situations. But if writing with FILE_FLAG_NO_BUFFERING is used,
SetFileInformationByHandle is the way to mark the file shorter than whole blocks.
int data_len = len(str);
int len_last_block = BLOCKSIZE%datalen;
int padding_to_fill_block = (data_last_block == BLOCKSIZE ? 0 : (BLOCKSIZE-len_last_block);
str.append('\0', padding_to_fill_block);
ULONG bytes_written = 0;
::WriteFile(hFile, data, data_len+padding_to_fill_block, &bytes_written, NULL));
m_filesize += bytes_written;;
LARGE_INTEGER end_of_file_pos;
end_of_file_pos.QuadPart = m_filesize - padding_to_fill_block;
if (!::SetFileInformationByHandle(hFile, FileEndOfFileInfo, &end_of_file_pos, sizeof(end_of_file_pos)))
{
HRESULT hr = ::GetLastErrorMessage();
}

Related

Using SetFilePointer to change the location to write in the sector doesn't work?

I'm using SetFilePointer to rewrite the second half of the MBR with something, its a user-mode application and i opened a handle to PhysicalDrive
At first i tried to set the size parameter in WriteFile to 256 but the writefile gave the INVALID_PARAMETER error, as it turns out based on some search on other questions here it seems like this is because we are forced to write in multiplicand of the sector size when the handle is PhysicalDrive for some reason
then i tried to set the filePointer to 256, and Write 512 bytes, both of them return no error, but for some unknown reason it writes from the beginning of the sector! as if the SetFilePointer didn't work even tho the return value of SetFilePointer is OK and it returns 256
So my questions is :
Why the write size have to be multiplicand of sector size when the handle is PhysicalDrive? which other device handles are like this?
Why is this happening and when I set the file pointer to 256, WriteFile still writes from the start?
isn't this really redundant, considering that even if I want to change 1 byte then I have to read the entire sector, change the one byte and then write it back, instead of just writing 1 byte, it seems like 10 times more overhead! isn't there a faster way to write a few bytes in a sector?

I think you are mixing the file system and the storage (block device). File system stays above storage device stack. If your code obtains a handle to a file system device, you can write byte by byte. But if you are accessing storage device stack, you can only write sector by sector (or block size).
Directly writing to block device is definitely slow as you discovered. However, in most cases, people just talk to file systems. Most file system drivers maintain cache and use algorithms for both read and write to improve performance.
Can't comment on file pointer based offset before seeing the actual code. But I guess it might be not sector aligned or it's not used at all.

Windows (ReFS,NTFS) file preallocation hint

Assume I have multiple processes writing large files (20gb+). Each process is writing its own file and assume that the process writes x mb at a time, then does some processing and writes x mb again, etc..
What happens is that this write pattern causes the files to be heavily fragmented, since the files blocks get allocated consecutively on the disk.
Of course it is easy to workaround this issue by using SetEndOfFile to "preallocate" the file when it is opened and then set the correct size before it is closed. But now an application accessing these files remotely, which is able to parse these in-progress files, obviously sees zeroes at the end of the file and takes much longer to parse the file.
I do not have control over the this reading application so I can't optimize it to take zeros at the end into account.
Another dirty fix would be to run defragmentation more often, run Systernal's contig utility or even implement a custom "defragmenter" which would process my files and consolidate their blocks together.
Another more drastic solution would be to implement a minifilter driver which would report a "fake" filesize.
But obviously both solutions listed above are far from optimal. So I would like to know if there is a way to provide a file size hint to the filesystem so it "reserves" the consecutive space on the drive, but still report the right filesize to applications?
Otherwise obviously also writing larger chunks at a time obviously helps with fragmentation, but still does not solve the issue.
EDIT:
Since the usefulness of SetEndOfFile in my case seems to be disputed I made a small test:
LARGE_INTEGER size;
LARGE_INTEGER a;
char buf='A';
DWORD written=0;
DWORD tstart;
std::cout << "creating file\n";
tstart = GetTickCount();
HANDLE f = CreateFileA("e:\\test.dat", GENERIC_ALL, FILE_SHARE_READ, NULL, CREATE_ALWAYS, 0, NULL);
size.QuadPart = 100000000LL;
SetFilePointerEx(f, size, &a, FILE_BEGIN);
SetEndOfFile(f);
printf("file extended, elapsed: %d\n",GetTickCount()-tstart);
getchar();
printf("writing 'A' at the end\n");
tstart = GetTickCount();
SetFilePointer(f, -1, NULL, FILE_END);
WriteFile(f, &buf,1,&written,NULL);
printf("written: %d bytes, elapsed: %d\n",written,GetTickCount()-tstart);
When the application is executed and it waits for a keypress after SetEndOfFile I examined the on disc NTFS structures:
The image shows that NTFS has indeed allocated clusters for my file. However the unnamed DATA attribute has StreamDataSize specified as 0.
Systernals DiskView also confirms that clusters were allocated
When pressing enter to allow the test to continue (and waiting for quite some time since the file was created on slow USB stick), the StreamDataSize field was updated
Since I wrote 1 byte at the end, NTFS now really had to zero everything, so SetEndOfFile does indeed help with the issue that I am "fretting" about.
I would appreciate it very much that answers/comments also provide an official reference to back up the claims being made.
Oh and the test application outputs this in my case:
creating file
file extended, elapsed: 0
writing 'A' at the end
written: 1 bytes, elapsed: 21735
Also for sake of completeness here is an example how the DATA attribute looks like when setting the FileAllocationInfo (note that the I created a new file for this picture)

Windows file systems maintain two public sizes for file data, which are reported in the FileStandardInformation:
AllocationSize - a file's allocation size in bytes, which is typically a multiple of the sector or cluster size.
EndOfFile - a file's absolute end of file position as a byte offset from the start of the file, which must be less than or equal to the allocation size.
Setting an end of file that exceeds the current allocation size implicitly extends the allocation. Setting an allocation size that's less than the current end of file implicitly truncates the end of file.
Starting with Windows Vista, we can manually extend the allocation size without modifying the end of file via SetFileInformationByHandle: FileAllocationInfo. You can use Sysinternals DiskView to verify that this allocates clusters for the file. When the file is closed, the allocation gets truncated to the current end of file.
If you don't mind using the NT API directly, you can also call NtSetInformationFile: FileAllocationInformation. Or even set the allocation size at creation via NtCreateFile.
FYI, there's also an internal ValidDataLength size, which must be less than or equal to the end of file. As a file grows, the clusters on disk are lazily initialized. Reading beyond the valid region returns zeros. Writing beyond the valid region extends it by initializing all clusters up to the write offset with zeros. This is typically where we might observe a performance cost when extending a file with random writes. We can set the FileValidDataLengthInformation to get around this (e.g. SetFileValidData), but it exposes uninitialized disk data and thus requires SeManageVolumePrivilege. An application that utilizes this feature should take care to open the file exclusively and ensure the file is secure in case the application or system crashes.

MiniFilter Driver - modify a file bytes on IRP_MJ_CLOSE and IRP_MJ_CREATE

I'd like to change a file when it is closed and reverse the change when it is opened.
It's kind of like encryption driver except I don't want to encrypt the file.
I've created a new "Filter Driver: Filesystem Mini-Filter" project with WDK8 in Visual Studio 2012 and registered PreCreate, PostCreate, PreClose and PostClose as callback functions.
For example, on IRP_MJ_CLOSE of file which it's byte are {72,101,108,108,111} ("Hello"), I want that after the PostClose function the file would look like this on the hard disk:
{10,11,12,72,101,108,108,111}.
I suspect it is not as easy as just:
FLT_PREOP_CALLBACK_STATUS
PreClose (
_Inout_ PFLT_CALLBACK_DATA Data,
_In_ PCFLT_RELATED_OBJECTS FltObjects,
_Flt_CompletionContext_Outptr_ PVOID *CompletionContext
)
{
//...
//some if statment...
{
Data->Iopb->Parameters.Write.WriteBuffer = newBfr;
Data->Iopb->Parameters.Write.Length = newLen;
}
//...
return FLT_PREOP_SUCCESS_WITH_CALLBACK;
}
I'd like some guidance on the subject.
Also what is the best way to debug this? I Haven't found a way to print to the windows 7 debug.
Thanks!
gfgqtmakia.
EDIT: I've read http://code.msdn.microsoft.com/windowshardware/swapBuffer-File-System-6b7e6e2d but I don't think it'll help me because it is for read/write, which I don't want to deal with.
EDIT2: Or maybe I should do my changes in the PreCreate and PostClose, when the file is on the hard drive and not in the middle of an IRP, and then I won't need to deal with buffers "on the fly" but on the disk?

You will have to write something like swap buffers. Modifying file data in PostCreate/PreClose would not be good idea.
Few reasons:
Firstly in PostCreate/PreClose you shouldn't be accessing Data->Iopb->Parameters.Write.WriteBuffer. That is valid only in IRP_MJ_WRITE. You can do FltWriteFile to write data to file.
Windows kernel may not write file data immediately to the disk in/after IRP_MJ_CLOSE. Think about page cache.
There are may complexities like paging i/o, direct i/o etc. that need to be taken care properly.
Another major thing I notice it that you will also change the file size (as said in your question actual data length is 5 bytes while you will update data to 8 bytes). Now this is very difficult to manage. It never recommended to change the file size in minifilter/file system driver.

Coding for Input/Output: Speed or Memory priority?

I am currently writing a simple piece of IO parsing, and is in a dilemma as to how should I code it.
This is the case of a web application, where this particular parsing function may be called multiple times within a second by several users.
Assume that the file size is more than 2 MB and Hardware IO delays are 5ms for each call.
First Case: Memory
The first case would be to code for memory, but at the expense of speed. The function will take in small parts of the file and parse by the parts thus using more iterations, but less memory.
Pseudo-code:
function parser() {
Open file and put into handle variable fHandle
while (file position not passed EOF) {
read 1024 bytes from file using fHandle into variable data
process(data)
}
Close file using handle fHandle
}
Second Case: Speed
The second case would be to code for speed, at the expense of memory usage. The function will load the entire file content into memory and parse it directly.
Pseudo-code:
function parser() {
read entire file and store into variable data
declare parsing position variable and set to 0
while (parsing position not past data length) {
get position of next token and store into variable pos
process( substring from current position to pos of data )
}
}
Note: when reading entire file we are using library direct-available functions to read the entire file. No loops are used in reading the file on the developer's end.
Third Case: End-user choice
Would it then be advisable to write for both, and whenever the function runs, the function will detect whether memory is abundant or not. If there is a lot of free memory space, the function will use the memory-intensive version.
Pseudo-code:
function parser() {
if (memory is too little) {
Open file and put into handle variable fHandle
while (file position not passed EOF) {
read 1024 bytes from file using fHandle into variable data
process(data)
}
Close file using handle fHandle
} else {
read entire file and store into variable data
declare parsing position variable and set to 0
while (parsing position not past data length) {
get position of next token and store into variable pos
process( substring from current position to pos of data )
}
}
}

Use asynchronous I/O (or a second thread), and process one chunk of data while the drive's busy fetching the next chunk. Best of both worlds.

If you need to read the full file either way and it fits into memory without issue, then read it from memory. Will it be the same file every time, or some small set of files? Cache them in memory.

If the input for your parsing comes from I/O, as it usually does, any good parsing technology, like recursive-descent, will be I/O bound.
In other words, the average time to get a character from the I/O should exceed the average time spent processing it, by a healthy factor.
So it really doesn't matter very much.
The only difference will be in how much working storage you glom onto, which is not usually a big deal.

How to know what amount of memory I'm using in a process? win32 C++

I'm using
Win32 C++ in
CodeGear Builder 2009
Target is Windows XP Embedded.
I found the PROCESS_MEMORY_COUNTERS_EX struct
and I have created a siple function to return the
Memory consumption of my process
SIZE_T TForm1::ProcessPrivatBytes( DWORD processID )
{
SIZE_T lRetval = 0;
HANDLE hProcess;
PROCESS_MEMORY_COUNTERS_EX pmc;
hProcess = OpenProcess( PROCESS_QUERY_INFORMATION |
PROCESS_VM_READ,
FALSE, processID );
if (NULL == hProcess)
{
lRetval = 1;
}
else
{
if ( GetProcessMemoryInfo( hProcess, (PROCESS_MEMORY_COUNTERS*)&pmc, sizeof(pmc)) )
{
lRetval = pmc.WorkingSetSize;
lRetval = pmc.PrivateUsage;
}
CloseHandle( hProcess );
}
return lRetval;
}
//---------------------------------------------------------------------------
Do i have to use lRetval = pmc.WorkingSetSize; or lRetval = pmc.PrivateUsage;
the privateUsage are what I see in perfmon.
but what is that WorkingSetSize exactly.
I what to see every byte I allocate in the counter when I allocate it. Is this Posible?
regards
jvdn

This is a much tougher question than you probably realized. The reason is that Windows shares most executable code between processes (especially the ones that make up most of Windows itself) between processes. For example, there's normally ONE copy of kernel32.dll loaded into memory, but it'll normally be mapped into every process. Do you consider that part of the memory your process is "using" or not?
Private memory is what's unique to that particular process. This can be somewhat misleading too. Since the executable for your process could potentially be shared with another process (i.e. two instances of your program could be run), that's not counted as part of the private memory, even if (as is often the case) there's only one instance of it running.
The working set size is about 99.999% meaningless. What it returns is whatever has been set as the preferred working set size for the process. You can adjust that with SetProcessWorkingSetSize(). Windows has a working set trimmer that attempts to trim down working sets. If memory serves, it uses the working set size to guess at whether it's worth trying to trim the working set of this process -- i.e. if its current working set is larger than the working set size was set to, it tries to trim it down. Otherwise, it (mostly) leaves it alone.
Chances are that nothing you do will show you ever byte you allocate as you allocate it though. Calling Windows to allocate memory is fairly slow, so what's normally done is that the run-time library allocates a fairly big chunk of memory from Windows. When you allocate memory, the run-time library gives you a piece of that big chunk. Only when that chunk is gone does it go back to Windows and ask for more.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio