Hooking disk write and getting the data being written - winapi

Is there a way to hook disk writes by a specific application and get the data being written, aside from reading the data after the write or reading process memory and searching for data? It's important for me to get the data before it can be tampered with on the disk. Thanks in advance!

Too little reputation to comment, sorry.
I would have said (to echo Raymond) mini filters would fit your requirements nicely.
Microsoft docs
FltGetRequestorProcessId should allow you to filter by process.
You will still see every request come through, just match the pid you are interested in. If it is not your process return FLT_PREOP_SUCCESS_NO_CALLBACK and you will not worry about that request again.

Related

Save file from POST request to disk without storing in memory with Python's BaseHTTPServer

I'm writing an HTTP server in Python 2 with BaseHTTPServer, and it's assumed that it accepts multiple connections at the same time, on each connection the user can send a large file through a POST request. However my understanding is that the whole request will be stored in the server's memory before being processed, and multiple uploaded file at the same time can exceed the amount of memory on the server. Is there any way to, instead of storing the file/request in memory, stream it to a file on disk directly?
BaseHTTPServer doesn't come with a POST handler out of the box, so you'll have to implement it yourself or find an implementation that works for you. (These are easy to search for; here's one I found that looked straightforward.
Your question is similar to this question about limiting the max-size of POST; the answer points out you'll need to read through all that data in order to ensure proper browser functionality. The comments to that answer seem to indicate the use of other techniques ("e.g. AJAX and realtime notifications via WebSocket." #dmitry-nedbaylo)

Redis memory management - clear based on key, database or instance

I am very new to Redis. I've implemented caching in our application and it works nicely. I want to store two main data types: a directory listing and file content. It's not really relevant, but this will cache files served up via WebDAV.
I want the file structure to remain almost forever. The file content needs to be cached for a short time only. I have set up my expiry/TTL to reflect this.
When the server reaches memory capacity is it possible to priorities certain cached items over others? i.e. flush a key, flush a whole database or flush a whole instance of Redis.
I want to keep my directory listing and flush the file content when memory begins to be an issue.
EDIT: Reading this article seems to be what I need. I think I will need to use volatile-ttl. My file content will have a much shorter TTL set, so this should in theory clear that first. If anyone has any other helpful advice I would love to hear it, but for now I am going to implement this.
Reading this article describes what I needed. I have implemented volatile-ttl as my memory management type.

Is basic_managed_mapped_file.flush() immediately written to disk?

I use basic_managed_mapped_file and I want to backup the file while the program is running.
How can I make sure the data is written to disk for the backup?
The answer is logically "yes".
The operating system will make sure that the data is written, I believe even if your process would crash next.
However, if you
must be sure that the data hit the disk before doing anything else
need to ensure that data hits the disk in any particular order (e.g. journaling/intent logging)
need to be sure that data is safely written in the face of e.g. power failure
then you will need to add a disk sync call on most OS-es. If you require this level of detail (and worse, in portable fashion), the topic quickly becomes hard, and I defer to
eat my data: how everybody gets file IO wrong from Stewart Smith
I've also mirrored the video/slides just in case (see here)

Using boost's shared memory

I would like to use Boost's shared memory services to do the following. I have been begun studying the documentation but as a aid to that was hoping someone could provide an example that is close to what I want to do.
Here it is:
Process A will write messages to a buffer area. It will also maintain a map, mapping message ID to information regarding the start location and size of the message in the buffer. Some locking mechanism, possibly a scoped lock, will control access to the map and buffer area.
Process B will read these messages based upon message ID.
Thanks in advance for any constructive suggestions.
Have you looked at the Interprocess - message queue documentation?
It doesn't do exactly what you're asking for, as far as making each message have an ID and such, but you don't go into detail as to why that is necessary. Since there's only two processes, will it work to simply copy the data over to process B?

Trying to write a program / library like LogParser - How does it work internally?

LogParser isn't open source and I need this functionality for an open source project I'm working on.
I'd like to write a library that allows me to query huge (mostly IIS) log files, preferably with Linq.
Do you have any links that could help me? How does a program like LogParser work so fast? How does it handle memory limitations?
It probably process the information in the log as it reads it. This means it (the library) doesn't have to allocate a huge amount of memory to store the information. It can read a chunk, process it and throw it away. It is a usual and very effective way to process data.
You could for example work line by line and parse each line. For the actual parsing you can write a state machine or if the requirements allows it, use regex.
Another approach would be a state machine that both reads and parses the data. If for some reason a log entry spans more than one line this might be needed.
Some state machine related links:
A very simple state machine written in C: http://snippets.dzone.com/posts/show/3793
Alot of python related code, but some sections are universally applicable: http://www.ibm.com/developerworks/library/l-python-state.html
If your aim is to query IIS log data with LINQ. Then i suggest you to move the Raw IIS Log data to database and query the database using LINQ. This blog post might help.
http://getsrirams.blogspot.in/2012/07/migrate-iislog-data-to-sqlce-4-database.html

Resources