Creating an bomb-proof worker process (on windows) - windows

I write a pdf viewer that uses various libraries written in C. This C code is potentially easy to exploit. And there are just too many lines to check. I will have to assume that this code may contain exploitable bugs.
The thing is that the C code is quite straightforward. A stream of bytes go in at one end, and a bitmap (also a stream of bytes) comes out at the other.
Inspired by google chrome, I am thinking to create a separate process that does the decoding and page rendering. Ideally this should be executed in a process that has absolutely no rights to do anything except reading the one input stream it has, and outputting to a stream of bytes (some uncompresed bitmap) at the other end.
What I think the process should not be able to do is:
any disk access
open sockets
limited amount of memory use
access shared memory with other processes
load other dll's
... anything else?
Is that possible? Is this described somewhere?

If you have the source code - you may check it doesn't do the described things.
Well, limiting available memory is a bit more difficult. You may however use SetProcessWorkingSetSize.
Also after you've built the executable you may check its DLL import table (by dependencies walker) to ensure it doesn't access any file/socket function.

This isn't really possible. Ultimately any potential exploit code will be running with whatever privileges this process runs with. If you run it as a standard user then you will limit the damage that could be done, but your best bet is to just fix the code as much as possible.

Related

Windows: redirect ReadFile to run process and pipe it's stdout

I was wondering how hard it would be to create a set-up under Windows where a regular ReadFile on certain files is being redirected by the file system to actually run (e.g. ShellExecute) those files, and then the new process' stdout is being used as the file content streamed out to the ReadFile call to the callee...
What I envision the set-up to look like, is that you can configure it to denote a certain folder as 'special', and that this extra functionality is then only available on that folder's content (so it doesn't need to be disk-wide). It might be accessible under a new drive letter, or a path parallel to the source folder; the location it is hooked up to is irrelevant to me.
To those of you that wonder if this is a classic xy problem: it might very well be ;) It's just that this idea has intrigued me, and I want to know what possibilities there are. In my particular case I want to employ it to #include content in my C++ code base, where the actual content included is being made up on the spot, different on each compile round. I could of course also create a script to create such content to include, call it as a pre-build step and leave it at that, but why choose the easy route.
Maybe there are already ready-made solutions for this? I did an extensive Google search for it, but came out empty handed. But then I'm not sure I already know all the keywords involved to do a good search...
When coding up something myself, I think a minifilter driver might be needed intercepting ReadFile calls, but then it must at that spot run usermode apps from kernel space - not a happy marriage I assume. Or use an existing file system driver framework that allows for usermode parts, but I found the price of existing solutions to be too steep for my taste (several thousand dollars).
And I also assume that a standard file system (minifilter) driver might be required to return a consistent file size for such files, although the actual data size returned through ReadFile would of course differ on each call. Not to mention negating any buffering that takes place.
All in all I think that a create-it-yourself solution will take quite some effort, especially when you have never done Windows driver development in your life :) Although I see myself quite capable of learning up on it, the time invested will be prohibitive I think.
Another approach might be to hook ReadFile calls from the process doing the ReadFile - via IAT hooking, or via code injection. But I want this solution to more work 'out-of-the-box', i.e. all ReadFile requests for these special files trigger the correct behavior, regardless of origin. In my case I'd need to intercept my C++ compiler (G++) behavior, but that one is called on the fly by the IDE, so I see no easy way to detect it's startup and hook it up quickly before it does it's ReadFiles. And besides, I only want certain files to be special in this regard; intercepting all ReadFiles for a certain process is overkill.
You want something like FUSE (which I used with profit many times), but for Windows. Apparently there's Dokan, I've never used it but seems to be well known enough (and, at very least, can be used as an inspiration to see "how it's done").

Can a read() by one process see a partial write() by another?

If one process does a write() of size (and alignment) S (e.g. 8KB), then is it possible for another process to do a read (also of size and alignment S and the same file) that sees a mix of old and new data?
The writing process adds a checksum to each data block, and I'd like to know whether I can use a reading process to verify the checksums in the background. If the reader can see a partial write, then it will falsely indicate corruption.
What standards or documents apply here? Is there a portable way to avoid problems here, preferably without introducing lots of locking?
When a function is guaranteed to complete without there being any chance of any other process/thread/anything seeing things in a half finished state, it's said to be atomic. It either has or hasn't happened, there is no part way. While I can't speak to Windows, there are very few file operations in POSIX (which is what Linux/BSD/etc attempt to stick to) that are guaranteed to be atomic. Reading and writing are not guaranteed to be atomic.
While it would be pretty unlikely for you to write 2 bytes to a file and another process only see one of those bytes written, if by dumb luck your write straddled two different pages in memory and the VM system had to do something to prepare the second page, it's possible you'd see one byte without the other in a second process. Usually if things are page aligned in your file, they will be in memory, but again you can't rely on that.
Here's a list someone made of what is atomic in POSIX, which is pretty short, and I can't vouch for it's authenticity. (I can't think of why unlink isn't listed, for example).
I'd also caution you against testing what appears to work and running with it, the moment you start accessing files over a network file system (NFS on Unix, or SMB mounts in Windows) a lot of things that seemed to be atomic before no longer are.
If you want to have a second process calculating checksums while a first process is writing the file, you may want to open a pipe between the two and have the first process write a copy of everything down the pipe to the checksumming process. That may be faster than dealing with locking.

LoadLibrary from offset in a file

I am writing a scriptable game engine, for which I have a large number of classes that perform various tasks. The size of the engine is growing rapidly, and so I thought of splitting the large executable up into dll modules so that only the components that the game writer actually uses can be included. When the user compiles their game (which is to say their script), I want the correct dll's to be part of the final executable. I already have quite a bit of overlay data, so I figured I might be able to store the dll's as part of this block. My question boils down to this:
Is it possible to trick LoadLibrary to start reading the file at a certain offset? That would save me from having to either extract the dll into a temporary file which is not clean, or alternatively scrapping the automatic inclusion of dll's altogether and simply instructing my users to package the dll's along with their games.
Initially I thought of going for the "load dll from memory" approach but rejected it on grounds of portability and simply because it seems like such a horrible hack.
Any thoughts?
Kind regards,
Philip Bennefall
You are trying to solve a problem that doesn't exist. Loading a DLL doesn't actually require any physical memory. Windows creates a memory mapped file for the DLL content. Code from the DLL only ever gets loaded when your program calls that code. Unused code doesn't require any system resources beyond reserved memory pages. You have 2 billion bytes worth of that on a 32-bit operating system. You have to write a lot of code to consume them all, 50 megabytes of machine code is already a very large program.
The memory mapping is also the reason you cannot make LoadLibrary() do what you want to do. There is no realistic scenario where you need to.
Look into the linker's /DELAYLOAD option to improve startup performance.
I think every solution for that task is "horrible hack" and nothing more.
Simplest way that I see is create your own virtual drive that present custom filesystem and hacks system access path from one real file (compilation of your libraries) to multiple separate DLL-s. For example like TrueCrypt does (it's open-source). And than you may use LoadLibrary function without changes.
But only right way I see is change your task and don't use this approach. I think you need to create your own script interpreter and compiler, using structures, pointers and so on.
The main thing is that I don't understand your benefit from use of libraries. I think any compiled code in current time does not weigh so much and may be packed very good. Any other resources may be loaded dynamically at first call. All you need to do is to organize the working cycles of all components of the script engine in right way.

Most efficient way to send images across processes

Goal
Pass images generated by one process efficiently and at very high speed to another process. The two processes run on the same machine and on the same desktop. The operating system may be WinXP, Vista and Win7.
Detailed description
The first process is solely for controlling the communication with a device which produces the images. These images are about 500x300px in size and may be updated up to several hundred times per second. The second process needs these images to process them. The first process uses a third party API to paint the images from the device to a HDC. This HDC has to be provided by me.
Note: There is already a connection open between the two processes. They are communicating via anonymous pipes and share memory mapped file views.
Thoughts
How would I achieve this goal with as little work as possible? And I mean both work for the computer and me (of course ;)). I am using Delphi, so maybe there is some component available for doing this? I think I could always paint to any image component's HDC, save the content to memory stream, copy the contents via the memory mapped file, unpack it on the other side and paint it there to the destination HDC. I also read about a IPicture interface which can be used to marshal images. I need it as quick as possible, so the less overhead the better. I don't want the machine to be stressed just by copying some images.
What are your ideas? I appreciate every thought on this!
Use a Memory Mapped File.
For a Delphi reference see Memory-mapped Files in Delphi and Shared Memory in Delphi.
For a more versatile approach you can look at using pipes or sending bitmap data via TCP. This would allow you to distribute the image data between nodes more easily, if necessary.
Use shared memory to pass the image data, and something else (named pipes, sockets, ...) to coordinate the handover.
In some cases, you can pass HBITMAP handles across processes. I've seen it done before (yes, on XP/Vista), and was surprised as everyone else on the team when one of my co-workers showed me.
If memory serves me correctly, I believe it will work if the HBITMAP was allocated with one of the GDI function (CreateBitmap, CreateCompatibleBitmap,CreateDIBitmap,etc...) HBIMAP handles created by LoadBitmap will not work as it's just a pointer to an in-proc resource.
That, and I think when you share the HBITMAP across to the other process, don't try to do anything special with it other than normal BitBlt operations.
At least that's what I remember. We got lucky because our graphic libraries were already written to manage all images as HBITMAPs.
YMMV
Ok it seems as if memory mapped files and pipes are the right way to go. That is not too bad because the two processes already share a MMF and two pipes (for bidirectional communication). The only thing left to solve was how to pass the data with as little copy operations as possible.
The design which works quite well looks as follows (sequential flow):
Process 1 (wants image)
give signal to process 2 (via pipe 1) to store image in shared memory
go to sleep and wait for response (blocking read from pipe 2)
Process 2 (provides images)
on signal (via pipe 1) wake up and tell hardware device to paint to HDC 1 (this is backed by shared memory, see below)
give signal to process 1 (via pipe 2)
go to sleep and wait for new job (via pipe 1)
Process 1 (wants image)
on signal (via pipe 2) wake up and paint from shared memory to destination HDC 2
Now for the image transfer via shared memory (my goal was to use not more than one additional copy operation):
Process 2 creates a HBITMAP via CreateDIBSection and provides the handle of the file mapping and the offset of the mapped view. Thus the image data lives in the shared memory. This creates an HBITMAP which is selected into HDC 1 (which is also created by process 2) and which will be used from now on by process 2.
Process 1 uses StretchDIBits with a pointer to the mapped view's memory (as described here). This seems to be the only function for getting bits from memory directly into another HDC (in this case HDC 2). Other functions would copy them first into some intermediary buffer before you could transfer them from there to the final HDC.
So in the end it seems the bits needed to be transferred are about twice as much as in the beginning. But I think this is as good as it gets unless sharing GDI handles between processes would be possible.
Note: I used pipes instead of signals because I need to transfer some additional data, too.
As I can see this, you have two options:
Pass only the image handle / pointer to other process, so both processes work only on one collection of images.
Copy the image content to other process and work on a copy from then on.
Which approach is best depends on your design. Best tool for both approaches would be "memory mapped files", or "named pipes". This are the fastest you can get. Memory mapped files are probaly the fastest form of inter process communication but have the donwside that there is no "client-server" paradigm build into them. So you have to synchronize the acces to MMF yourself. Named pipes on the other hand are almost as fast but have the client-server paradigm build right into them. The difference in speed comes mainly from that.
Now because of the share speed of the updates, the first approach could be better, but then you have to watch out for synchronization between processes, so they do not read / write to single image at the same time. Also some sort of caching or other smart tehniques could be used, so you reduce your traffic to minimum. When facing such high level of communications there is always advisable to look for means of reducing that level if possible.
For a very fast implementation of IPC based on named pipes you can use my IPC implementation. It is message oriented so you do not have to worry about pipe technical details. It also uses thread pool behind the scenes and has mininal additional overhead.
You can stress test it and see for yourself (a typical message takes 0.1 ms for full client-server request-response cycle).

Fastest way to pass a file's contents from Kernel to User mode?

I'll try to be brief, but fully descriptive:
This is Windows-specific. Using the Windows Driver Development Kit (DDK).
I am writing a Kernel Mode Driver (KMD) for the first time, having no prior experience in Kernel Mode. I am playing around currently with the "scanner" mini-filter sample which comes with the DDK, and expanding upon it for practice. The "scanner" mini-filter is a basic outline for a generic "anti-virus" type scanning driver which hooks file creates/closes and operates on the associated file to scan for a "bad word" before approving/denying the requested operation.
The end goal is to scan the file with the user-mode application when it is opened, deciding whether or not the mini-filter should allow the operation to complete, without noticeable slow-down to the process or user which is attempting to open the file. I will also want to scan the entire file again when a save is attempted to decide whether or not to allow the save to complete successfully or deny the save. The mini-filter sample lays out the groundwork for how to hook these calls, but is a bit weak in the actually "scanning" portion.
I am looking at expanding the sample to scan the entire file that has been opened, such as to generate a hash, rather than just the first 1k (the sample's limit). I have modified the sample to read the entirety of the file and send it using the same mechanisms within the original sample. This method uses FltReadFile to read the file within the KMD and FltSendMessage to send the buffer to the user-mode component. The user-mode application is using GetQueuedCompletionStatus to grab the notifications from the KMD and process the buffers.
However, I'm noticing that this process seems to be pretty slow compared to a normal open/read in C++ using the standard library (fstream). This method is taking between approximately 4-8 times longer than simplying opening and reading the file in a simple C++ user app. I have adjusted buffer sizes to see if it makes for a noticeable improvement, and while it can help slightly, the benefits have not appeared to be very significant.
Since I am looking to scan files in 'real-time', this rate of transfer is highly disappointing and prohibitive. Is there a faster way to transfer a file's contents from a Kernel-Mode Driver to a User-Mode Application?
I can suggest several solutions:
Use DeviceIoControl with METHOD_OUT_DIRECT transfer type to pass large amounts of data.
Create memory section and map it to your process (remember about limited address space on 32-bit platforms).
Pass file path to your application and open it there.

Resources