Efficient way to send files across processes - winapi

How to effectively send a file from my own process to a program such as Photoshop, Word, Paint.
I do not want to save the whole file to disk and then open the program from the startup parameters using CreateProcess, ShellExecute, etc.
Maybe the only way out is Memory Maped Files?
Maybe I should look to COM, IPC, Pipes?

You cannot tell these programs that your file data is actually a memory mapped file. That really doesn't matter, files are already memory mapped by default. Much more efficiently than a MMF, file data is stored in RAM and doesn't take any space in the paging file.
The file system cache takes care of that. Think of it as a large RAM disk without actually having to pay for the RAM. This works so well that there never was a need for these programs to do something else than accept their input from a file.

Related

read/write to a disk without a file system

I would like to know if anybody has any experience writing data directly to disk without a file system - in a similar way that data would be written to a magnetic tape. In particular I would like to know if/how data is written in blocks, and whether a certain blocksize needs to be specified (like it does when writing to tape), and if there is a disk equivalent of a tape file mark, which separates the archives written to a tape.
We are creating a digital archive for over 1 PB of data, and we want redundancy built in to the system in as many levels as possible (by storing multiple copies using different storage media, and storage formats). Our current system works with tapes, and we have a mechanism for storing the block offset of each archive on each tape so we can restore it.
We'd like to extend the system to work with disk volumes without having to change much of the logic. Another advantage of not having a file system is that the solution would be portable across Operating Systems.
Note that the ability to browse the files on disk is not important in this application, since we are considering this for an archival copy of data which is not accessed independently. Also note that we would already have an index of the files stored in the application database, which we also write to the end of the tape/disk when it is almost full.
EDIT 27/05/2020: It seems that accessing the disk device as a raw/character device is what I'm looking for.

Can I have a memory mapped file, mapped to two or more processes at the same time (windows)?

I need to have two processes share information through a memory mapped file. One of them is going to only read to the file and the other is only going to write to it.
Is it OK for me just to leave the file always mapped to those two processes? I am currently:
mapping the file to the reader process
Writing
Unmapping the file
Mapping the file to the writer process
reading
Unmapping
And repeating over and over every time I need the processes to share information. My concern is that all these calls to map and unmap may be expensive. Should I keep the file mapped to both process al the time? I could regulate the access to the shared memory through mutexes.
What is the best way to do this kind of task?
You don't need to unmap the file after reading or writing at all. Windows guarantees that the data "visible" in the mapping in two processes will be the same when the local file is mapped on one computer.
If you need to do this repeatedly, then maintain the mapping. Don't prematurely optimize. (If you do find there are problems, you can go back and fix them at that time.)

Opening a custom file on-demand

I have a custom file type that is implemented in sections with a header at the shows the offset and length of each section within the file.
Currently, whenever I want to interact with the file, I must either load and parse the entire thing up front, or else pick only the sections that I need and load just them.
What I would like to do is to achieve a hybrid approach where each of the sections is loaded on-demand.
It seems however that doing this has a lot of potential downsides in terms of leaving filesystem handles open for longer than I would like and the additional code complexity that I would incur.
Are there any standard patterns for this sort of thing? It seems that my options are to:
Just load the entire file and stop grousing about the cycles/memory wasted
Load the entire file into memory as raw bytes and then satisfy any requests for unloaded sections from the memory buffer rather than disk. This saves me the cost of parsing the unneeded sections and requires less memory (since the disk representation is much more compact than the object model around it), but still means that I waste memory for sections that I never end up loading.
Load whatever sections I need right away and close the file but hold onto the source location of the file. Then if another section is requested, re-open the file and load the data. In this case I could get strange results if the underlying file is changed.
Same as the above but leave a file handle open (perhaps allowing read sharing).
Load the file using Memory-Mapped IO and leave a view on the file open.
Any thoughts
If possible, MMAP-ing the whole file is usually the easiest thing to do if you have a random-access pattern. This way you just delegate the loading/unloading issue to the OS and you have 1 & 2 for free.
If you have very special access patterns, you can even use something like fadvise() (I don't the exact Win32 equivalent) to tell the OS your access intend.
If your file is more than 2GB and you can either go the 64bits way or to mmap() the file on demand.
If the file is relatively small, mmap-ing the entire file is good enough. If the file is large, you could leave a mmap view open, and just move it around the file and resize it to view each section when needed.

Custom Prefetch

Any programmatic techniques, portable or specific to NT and Linux that get the result of number of large files loading faster? I am after a 'ahead of time', a prior, whatever you prefer to call it mechanisms that I can control in code for two OS in a question.
Each file has to be processed in full, i.e. completely in size and sequentially for its contents. The aim is to speed up some batch file processing.
I don't know about NT, but one option on Linux would be to use madvise with the MADV_WILLNEED flag shortly before you actually need the next file to start reading it in early.
Alternately, a more portable option would be to simply manually do readahead in a separate thread from your buffer-processing thread - that is, read data in to fill an X MB buffer in thread A, process it as fast as you can in thread B.
I am not aware of a Win32 (NT) API similar to madvise().
However, I would suggest an approach.
First, pass the Win32 flag FILE_FLAG_SEQUENTIAL_SCAN to CreateFile(). This will allow the Windows operating system to perform better buffering of the file once you have opened it.
With FILE_FLAG_SEQUENTIAL_SCAN, your file parser may operate more quickly once the file is in memory. Unlike madvise() on Linux, the file will not begin loading into memory any earlier due to the use of the Win32 flag.
Next, we need to trigger the file to begin loading. Asynchronously read the first page of the file by calling ReadFileEx() with an OVERLAPPED structure and a FileIOCompletionRoutine function.
Your FileIOCompletionRoutine can simply return, or you can set the event in the overlapped structure -- read the MSDN details of ReadFileEx for details.
Since it would not be a critical failure if the pre-fetch hasn't completed when you actually read from the file, the easiest implementation would be to "fire and forget" -- execute the overlapped file read and then never check the result of it. Be sure that you read the data into valid buffers, though!
If you perform this operation for a file while reading the previous file, the result should be that the next file will commence paging in.
Be aware that this may slow your performance. As the next file begins to page in, the disk I/O to access that file will compete with disk I/O for the file you are currently parsing. If the two files are physically distant from each other on the same disk, the result of pre-fetching might be additional delay as the drive head seeks. Although modern drives have huge buffers which mitigate this, queuing the first page of a new file is likely to cause a head seek.
bdonlan's suggestion of a 'pre-fetch' thread which loads the files asynchronously from the processing would be a workable solution for Win32, also.

Get sector location of a file

Based on a file name, or a file handle, is there a Win-API method of determining what physical sector the file starts on?
You can get file cluster allocation by sending FSCTL_GET_RETRIEVAL_POINTERS using DeviceIoControl.
You'd have to read the allocation table directly.
I suspect that there is no such function.
Even if you know where the file starts, what good would it do? The rest of the file could be anywhere as soon as the file is larger than a single sector due to fragmentation.
You would probably need to develop deeper understanding of the file system involved and read the necessary information from the file allocation table or such mechanism.
No. Why? Because a file system is an abstraction of physical hardware. You don't need to know if you're on a RAM disk, hard drive, CD, or network drive, or if your data is compressed or encrypted -- Windows takes care of these little details for you.
You can always open the physical disk, but you'd need knowledge of the file system used.
What are you trying to accomplish with this?

Resources