Software for visualizing arbitrary part of process memory content in real-time

Software for visualizing arbitrary part of process memory content in real-time - debugging

In my practice I faced with the task - visualize some process memory content in real time. The main idea is read arbitrary part of remote process memory, represent it as image, and show in a separate window, then repeat these action with some interval, and in result get dynamic visualization of memory content. For example, it will be useful for view framebuffers/textures that located in process memory. Do exists any tools/software for this purpose? Thanks.

So, I did not find any utilities, and so I created my own tool.
This is mem2pix, program which allows to real-time visualizing some part of remote process memory, supports many pixel format type. Currently works on both Windows and Linux.

Related

Is there any way to intercept I/O operations from a Windows application so I can grab the data in real time in a different application

We have a Windows application that interfaces with a sensor array. It reads out the array of 37 elements 10 times per second and appends the set of 35 32-bit integers and 2 16-bit integers to a csv file in the Documents folder on the C: drive
We have neither the application source nor access to the developer, who left the company a couple of years ago, nor specifications for the array to system protocols. We now want to perform real time analysis of the data, but all of the communications between the code and the array are effectively a black box.
I'm not a Windows system programmer, but a million years (in IT time) ago I was a designer for IBM OS/360 so I have a basic understanding of file system structures and it seems to me that it should be possible to somehow intercept file "open" and "write" calls to the OS and perform "near" real time analysis. Any good ideas how to do it? Preferably explained in terms that an 80 year old whio only dabbles in Python and C/C++ would comprehend? I've thought of a disassembler. or executing in a debugging environment that might be able to trap the I/O calls and pass control to an analysis routine, but I have no idea of what tools might be available these days in the Windows environment.
By the way, one other thing occurred to me - the app also outputs a plot of the data from each sensor - not sure if that's something we could get at .

There's no standard/supported way to hook into file I/O operations.
For this specific problem, the ideal solution will likely to be to use ReadDirectoryChangesW to watch the file for changes and read them out between updates; 100 ms should be more than enough time to pull out the data, unless it's a network drive or similar. This obviously won't work if the application is preventing you from reading the file between writes, though.
In an absolute worst case scenario, you can hook into the application's writes by injecting the process with a DLL that overwrites the first instructions of WriteFile (or whatever it's using to write) in kernel32.dll with a hook on load. You can read more about this process here.

Is it possible to associate data with a running process?

As the title says, I want to associate a random bit of data (ULONG) with a running process on the local machine. I want that data persisted with the process it's associated with, not the process thats reading & writing the data. Is this possible in Win32?

Yes but it can be tricky. You can't access an arbitrary memory address of another process and you can't count on shared memory because you want to do it with an arbitrary process.
The tricky way
What you can do is to create a window (with a special and known name) inside the process you want to decorate. See the end of the post for an alternative solution without windows.
First of all you have to get a handle to the process with OpenProcess.
Allocate memory with VirtualAllocEx in the other process to hold a short method that will create a (hidden) window with a special known name.
Copy that function from your own code with WriteProcessMemory.
Execute it with CreateRemoteThread.
Now you need a way to identify and read back this memory from another process other than the one that created that. For this you simply can find the window with that known name and you have your holder for a small chunk of data.
Please note that this technique may be used to inject code in another process so some Antivirus may warn about it.
Final notes
If Address Space Randomization is disabled you may not need to inject code in the process memory, you can call CreateRemoteThread with the address of a Windows kernel function with the same parameters (for example LoadLibrary). You can't do this with native applications (not linked to kernel32.dll).
You can't inject into system processes unless you have debug privileges for your process (with AdjustTokenPrivileges).
As alternative to the fake window you may create a suspended thread with a local variable, a TLS or stack entry used as data chunk. To find this thread you have to give it a name using, for example, this (but it's seldom applicable).
The naive way
A poor man solution (but probably much more easy to implement and somehow even more robust) can be to use ADS to hide a small data file for each process you want to monitor (of course an ADS associated with its image then it's not applicable for services and rundll'ed processes unless you make it much more complicated).
Iterate all processes and for each one create an ADS with a known name (and the process ID).
Inside it you have to store the system startup time and all the data you need.
To read back that informations:
Iterate all processes and check for that ADS, read it and compare the system startup time (if they mismatch then it means you found a widow ADS and it should be deleted.
Of course you have to take care of these widows so periodically you may need to check for them. Of course you can avoid this storing ALL these small chunk of data into a well-known location, your "reader" may check them all each time, deleting files no longer associated to a running process.

5GB file to read

I have a design question. I have a 3-4 GB data file, ordered by time stamp. I am trying to figure out what the best way is to deal with this file.
I was thinking of reading this whole file into memory, then transmitting this data to different machines and then running my analysis on those machines.
Would it be wise to upload this into a database before running my analysis?
I plan to run my analysis on different machines, so doing it through database would be easier but if I increase the number machines to run my analysis on the database might get too slow.
Any ideas?
#update :
I want to process the records one by one. Basically trying to run a model on a timestamp data but I have various models so want to distribute it so that this whole process run over night every day. I want to make sure that I can easily increase the number of models and not decrease the system performance. Which is why I am planning to distributing data to all the machines running the model ( each machine will run a single model).

You can even access the file in the hard disk itself and reading a small chunk at a time. Java has something called Random Access file for the same but the same concept is available in other languages also.
Whether you want to load into the the database and do analysis should be purely governed by the requirement. If you can read the file and keep processing it as you go no need to store in database. But for analysis if you require the data from all the different area of file than database would be a good idea.

You do not need the whole file into memory, just the data you need for analysis. You can read every line and store only the needed parts of the line and additionally the index where the line starts in file, so you can find it later if you need more data from this line.

Would it be wise to upload this into a database before running my analysis ?
yes
I plan to run my analysis on different machines, so doing it through database would be easier but if I increase the number machines to run my analysis on the database might get too slow.
don't worry about it, it will be fine. Just introduce a marker so the rows processed by each computer are identified.
I'm not sure I fully understand all of your requirements, but if you need to persist the data (refer to it more than once,) then a db is the way to go. If you just need to process portions of these output files and trust the results, you can do it on the fly without storing any contents.
Only store the data you need, not everything in the files.

Depending on the analysis needed, this sounds like a textbook case for using MapReduce with Hadoop. It will support your requirement of adding more machines in the future. Have a look at the Hadoop wiki: http://wiki.apache.org/hadoop/
Start with the overview, get the standalone setup working on a single machine, and try doing a simple analysis on your file (e.g. start with a "grep" or something). There is some assembly required but once you have things configured I think it could be the right path for you.

I had a similar problem recently, and just as #lalit mentioned, I used the RandomAccess file reader against my file located in the hard disk.
In my case I only needed read access to the file, so I launched a bunch of threads, each thread starting in a different point of the file, and that got me the job done and that really improved my throughput since each thread could spend a good amount of time blocked while doing some processing and meanwhile other threads could be reading the file.
A program like the one I mentioned should be very easy to write, just try it and see if the performance is what you need.

#update :
I want to process the records one by one. Basically trying to run a model on a timestamp data but I have various models so want to distribute it so that this whole process run over night every day. I want to make sure that I can easily increase the number of models and not decrease the system performance. Which is why I am planning to distributing data to all the machines running the model ( each machine will run a single model).

Most efficient way to send images across processes

Goal
Pass images generated by one process efficiently and at very high speed to another process. The two processes run on the same machine and on the same desktop. The operating system may be WinXP, Vista and Win7.
Detailed description
The first process is solely for controlling the communication with a device which produces the images. These images are about 500x300px in size and may be updated up to several hundred times per second. The second process needs these images to process them. The first process uses a third party API to paint the images from the device to a HDC. This HDC has to be provided by me.
Note: There is already a connection open between the two processes. They are communicating via anonymous pipes and share memory mapped file views.
Thoughts
How would I achieve this goal with as little work as possible? And I mean both work for the computer and me (of course ;)). I am using Delphi, so maybe there is some component available for doing this? I think I could always paint to any image component's HDC, save the content to memory stream, copy the contents via the memory mapped file, unpack it on the other side and paint it there to the destination HDC. I also read about a IPicture interface which can be used to marshal images. I need it as quick as possible, so the less overhead the better. I don't want the machine to be stressed just by copying some images.
What are your ideas? I appreciate every thought on this!

Use a Memory Mapped File.
For a Delphi reference see Memory-mapped Files in Delphi and Shared Memory in Delphi.
For a more versatile approach you can look at using pipes or sending bitmap data via TCP. This would allow you to distribute the image data between nodes more easily, if necessary.

Use shared memory to pass the image data, and something else (named pipes, sockets, ...) to coordinate the handover.

In some cases, you can pass HBITMAP handles across processes. I've seen it done before (yes, on XP/Vista), and was surprised as everyone else on the team when one of my co-workers showed me.
If memory serves me correctly, I believe it will work if the HBITMAP was allocated with one of the GDI function (CreateBitmap, CreateCompatibleBitmap,CreateDIBitmap,etc...) HBIMAP handles created by LoadBitmap will not work as it's just a pointer to an in-proc resource.
That, and I think when you share the HBITMAP across to the other process, don't try to do anything special with it other than normal BitBlt operations.
At least that's what I remember. We got lucky because our graphic libraries were already written to manage all images as HBITMAPs.
YMMV

Ok it seems as if memory mapped files and pipes are the right way to go. That is not too bad because the two processes already share a MMF and two pipes (for bidirectional communication). The only thing left to solve was how to pass the data with as little copy operations as possible.
The design which works quite well looks as follows (sequential flow):
Process 1 (wants image)
give signal to process 2 (via pipe 1) to store image in shared memory
go to sleep and wait for response (blocking read from pipe 2)
Process 2 (provides images)
on signal (via pipe 1) wake up and tell hardware device to paint to HDC 1 (this is backed by shared memory, see below)
give signal to process 1 (via pipe 2)
go to sleep and wait for new job (via pipe 1)
Process 1 (wants image)
on signal (via pipe 2) wake up and paint from shared memory to destination HDC 2
Now for the image transfer via shared memory (my goal was to use not more than one additional copy operation):
Process 2 creates a HBITMAP via CreateDIBSection and provides the handle of the file mapping and the offset of the mapped view. Thus the image data lives in the shared memory. This creates an HBITMAP which is selected into HDC 1 (which is also created by process 2) and which will be used from now on by process 2.
Process 1 uses StretchDIBits with a pointer to the mapped view's memory (as described here). This seems to be the only function for getting bits from memory directly into another HDC (in this case HDC 2). Other functions would copy them first into some intermediary buffer before you could transfer them from there to the final HDC.
So in the end it seems the bits needed to be transferred are about twice as much as in the beginning. But I think this is as good as it gets unless sharing GDI handles between processes would be possible.
Note: I used pipes instead of signals because I need to transfer some additional data, too.

As I can see this, you have two options:
Pass only the image handle / pointer to other process, so both processes work only on one collection of images.
Copy the image content to other process and work on a copy from then on.
Which approach is best depends on your design. Best tool for both approaches would be "memory mapped files", or "named pipes". This are the fastest you can get. Memory mapped files are probaly the fastest form of inter process communication but have the donwside that there is no "client-server" paradigm build into them. So you have to synchronize the acces to MMF yourself. Named pipes on the other hand are almost as fast but have the client-server paradigm build right into them. The difference in speed comes mainly from that.
Now because of the share speed of the updates, the first approach could be better, but then you have to watch out for synchronization between processes, so they do not read / write to single image at the same time. Also some sort of caching or other smart tehniques could be used, so you reduce your traffic to minimum. When facing such high level of communications there is always advisable to look for means of reducing that level if possible.
For a very fast implementation of IPC based on named pipes you can use my IPC implementation. It is message oriented so you do not have to worry about pipe technical details. It also uses thread pool behind the scenes and has mininal additional overhead.
You can stress test it and see for yourself (a typical message takes 0.1 ms for full client-server request-response cycle).

Fastest way to pass a file's contents from Kernel to User mode?

I'll try to be brief, but fully descriptive:
This is Windows-specific. Using the Windows Driver Development Kit (DDK).
I am writing a Kernel Mode Driver (KMD) for the first time, having no prior experience in Kernel Mode. I am playing around currently with the "scanner" mini-filter sample which comes with the DDK, and expanding upon it for practice. The "scanner" mini-filter is a basic outline for a generic "anti-virus" type scanning driver which hooks file creates/closes and operates on the associated file to scan for a "bad word" before approving/denying the requested operation.
The end goal is to scan the file with the user-mode application when it is opened, deciding whether or not the mini-filter should allow the operation to complete, without noticeable slow-down to the process or user which is attempting to open the file. I will also want to scan the entire file again when a save is attempted to decide whether or not to allow the save to complete successfully or deny the save. The mini-filter sample lays out the groundwork for how to hook these calls, but is a bit weak in the actually "scanning" portion.
I am looking at expanding the sample to scan the entire file that has been opened, such as to generate a hash, rather than just the first 1k (the sample's limit). I have modified the sample to read the entirety of the file and send it using the same mechanisms within the original sample. This method uses FltReadFile to read the file within the KMD and FltSendMessage to send the buffer to the user-mode component. The user-mode application is using GetQueuedCompletionStatus to grab the notifications from the KMD and process the buffers.
However, I'm noticing that this process seems to be pretty slow compared to a normal open/read in C++ using the standard library (fstream). This method is taking between approximately 4-8 times longer than simplying opening and reading the file in a simple C++ user app. I have adjusted buffer sizes to see if it makes for a noticeable improvement, and while it can help slightly, the benefits have not appeared to be very significant.
Since I am looking to scan files in 'real-time', this rate of transfer is highly disappointing and prohibitive. Is there a faster way to transfer a file's contents from a Kernel-Mode Driver to a User-Mode Application?

I can suggest several solutions:
Use DeviceIoControl with METHOD_OUT_DIRECT transfer type to pass large amounts of data.
Create memory section and map it to your process (remember about limited address space on 32-bit platforms).
Pass file path to your application and open it there.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio