Iterating over WDM device stack

Iterating over WDM device stack - windows

As I understand, one can iterate the device stack of WDM devices only from the bottoms up, because DEVICE_OBJECT has an AttachedDevice member (but not a LowerDevice member). Luckily, the AddDevice callback receives the PhysicalDeviceObject so you can iterate over the entire stack.
From within my filter driver I'm trying to determine whether I'm already filtering a certain device object. (Let's say I have a legit reason for this. Bear with me.) My idea was to go over every DEVICE_OBJECT in the stack and compare its DriverObject member to mine.
Judging from the existence of IoGetAttachedDeviceReference, I assume just accessing AttachedDevice isn't a safe thing to do, for the risk of the device suddenly going away. However, IoGetAttachedDeviceReference brings me straight to the top of the stack, which is no good for me.
So, is there a safe way to iterate over a device stack?

Correct, you can't safely walk the AttachedDevice chain unless you can somehow guarantee that the stack will not be torn down (e.g. if you have an active file object referencing the stack). On Win2K this is pretty much your only option.
On XP and later, the preferred method is actually to walk from the top of the stack down. You can do this by calling IoGetAttachedDeviceReference and then calling IoGetLowerDeviceObject.
-scott

Related

In my App memory graph, there are instances of dispatch_group leaked but I don't use that technology explicitly. Any possible suggestion?

In my macOS App with Mixed Objective-C/Swift, in the Xcode memory graph, there are instances of dispatch_group leaked:
I am a bit familiar with GCD and I use it in my project, but I don't use dispatch_groups explicitly in my code. I have thought that it could be some indirect usage of it when I call other GCD APIs like dispatch_async. I was wondering if there is somebody that can help me track this issue. Thanks for your attention.

In order to diagnose this, you want to know (a) what is keeping a strong reference to them; and (b) where these objects were instantiated. Unfortunately, unlike many objects, dispatch groups might not tell you much about the former (though your memory addresses suggest that there might be some object keeping a reference to them), but we can use the “Malloc Stack” feature to answer the latter.
So, edit your scheme (“Product” » “Scheme” » “Edit Scheme” or press command-<) and temporarily turn on this feature:
You can then click on the object in question and the panel on the right might show you illuminating stack information about where the object was allocated:
Now, in this case, in viewDidLoad I manually instantiated six dispatch groups, performed an enter, but not a leave, which is why these objects are still in memory. This is a common source of dispatch groups lingering in memory.
As you look at the stack trace, focus first on entries in your codebase (in white, rather than gray, in the stack trace). And if you click on your code in the stack trace, it will even jump you to the relevant line of code. But even if it is not in your code, often the stack trace will give you insights where these dispatch groups were created, and you can start your research there.
And remember, when you are done with your diagnostics, make sure to turn off the “Malloc Stack” feature.

What's the correct way to CheckDeviceState in DirectX11?

I have built the DX11VideoRenderer sample (a replacement for EVR that uses DirectX11 instead of EVR's DirectX9), and it's working. Problem is, it's not working very well. It's using twice the CPU time that the EVR does for the same videos (more on this in the next question).
Since I've got the source, I decided to profile it to see what's going on. (Among other things) this led me to:
HRESULT DX11VideoRenderer::CPresenter::CheckDeviceState(BOOL* pbDeviceChanged)
I'm not much of a DirectX expert (actually, I'm not one at all), but it seems likely that window handles can invalidate as monitors get unplugged, windows get FullScreened, closed, etc so a function like this makes perfect sense to me.
However.
When I look at the code for CheckDeviceState, the first thing it does is call SetVideoMonitor, which seems odd.
SetVideoMonitor looks like the routine you call when you first initialize the presenter (or change the target window), not something you'd call repeatedly to "Check" the device state.
Indeed, SetVideoMonitor calls TerminateDisplaySystem, followed by InitializeDisplaySystem. I could see doing this once at startup, but those functions are being called once per frame. That can't be right.
I can comment out the call to SetVideoMonitor in CheckDeviceState (or actually all of CheckDeviceState), and the code continues to function correctly (it's predictably a bit faster). But then I'm not checking the device state anymore.
Trying to figure out the proper way to check for state changes in DX11 brought me here which talks about just checking the return codes for IDXGISwapChain::Present and ResizeBuffers. Is that how this should be done? Because that makes it seem like this whole routine is some leftover from DX9 (where it still would have been poorly implemented).
What's the correct way to check the device state in DX11? Is this even a thing anymore?

How to track/find out which userdata are GC-ed at certain time?

I've written an app in LuaJIT, using a third-party GUI framework (FFI-based) + some additional custom FFI calls. The app suddenly loses part of its functionality at some point soon after being run, and I'm quite confident it's because of some unpinned objects being GC-ed. I assume they're only referenced from the C world1, so Lua GC thinks they're unreferenced and can free them. The problem is, I don't know which of the numerous userdata are unreferenced (unpinned) on Lua side?
To confirm my theory, I've run the app with GC disabled, via:
collectgarbage 'stop'
and lo, with this line, the app works perfectly well long past the point where it got broken before. Obviously, it's an ugly workaround, and I'd much prefer to have the GC enabled, and the app still working correctly...
I want to find out which unpinned object (userdata, I assume) gets GCed, so I can pin it properly on Lua side, to prevent it being GCed prematurely. Thus, my question is:
(How) can I track which userdata objects got collected when my app loses functionality?
One problem is, that AFAIK, the LuaJIT FFI already assigns custom __gc handlers, so I cannot add my own, as there can be only one per object. And anyway, the framework is too big for me to try adding __gc in each and every imaginable place in it. Also, I've already eliminated the "most obviously suspected" places in the code, by removing local from some variables — thus making them part of _G, so I assume not GC-able. (Or is that not enough?)
1 Specifically, WinAPI.

For now, I've added some ffi.gc() handlers to some of my objects (printing some easily visible ALL-CAPS messages), then added some eager collectgarbage() calls to try triggering the issue as soon as possible:
ffi.gc(foo, function()
print '\n\nGC FOO !!!\n\n'
end)
[...]
collectgarbage()
And indeed, this exposed some GCing I didn't expect. Specifically, it led me to discover a note in luajit's FFI docs, which is most certainly relevant in my case:
Please note that [C] pointers [...] are not followed by the garbage collector. So e.g. if you assign a cdata array to a pointer, you must keep the cdata object holding the array alive [in Lua] as long as the pointer is still in use.

Extending functionality of existing program I don't have source for

I'm working on a third-party program that aggregates data from a bunch of different, existing Windows programs. Each program has a mechanism for exporting the data via the GUI. The most brain-dead approach would have me generate extracts by using AutoIt or some other GUI manipulation program to generate the extractions via the GUI. The problem with this is that people might be interacting with the computer when, suddenly, some automated program takes over. That's no good. What I really want to do is somehow have a program run once a day and silently (i.e. without popping up any GUIs) export the data from each program.
My research is telling me that I need to hook each application (assume these applications are always running) and inject a custom DLL to trigger each export. Am I remotely close to being on the right track? I'm a fairly experienced software dev, but I don't know a whole lot about reverse engineering or hooking. Any advice or direction would be greatly appreciated.
Edit: I'm trying to manage the availability of a certain type of professional. Their schedules are stored in proprietary systems. With their permission, I want to install an app on their system that extracts their schedule from whichever system they are using and uploads the information to a central server so that I can present that information to potential clients.

I am aware of four ways of extracting the information you want, both with their advantages and disadvantages. Before you do anything, you need to be aware that any solution you create is not guaranteed and in fact very unlikely to continue working should the target application ever update. The reason is that in each case, you are relying on an implementation detail instead of a pre-defined interface through which to export your data.
Hooking the GUI
The first way is to hook the GUI as you have suggested. What you are doing in this case is simply reading off from what an actual user would see. This is in general easier, since you are hooking the WinAPI which is clearly defined. One danger is that what the program displays is inconsistent or incomplete in comparison to the internal data it is supposed to be representing.
Typically, there are two common ways to perform WinAPI hooking:
DLL Injection. You create a DLL which you load into the other program's virtual address space. This means that you have read/write access (writable access can be gained with VirtualProtect) to the target's entire memory. From here you can trampoline the functions which are called to set UI information. For example, to check if a window has changed its text, you might trampoline the SetWindowText function. Note every control has different interfaces used to set what they are displaying. In this case, you are hooking the functions called by the code to set the display.
SetWindowsHookEx. Under the covers, this works similarly to DLL injection and in this case is really just another method for you to extend/subvert the control flow of messages received by controls. What you want to do in this case is hook the window procedures of each child control. For example, when an item is added to a ComboBox, it would receive a CB_ADDSTRING message. In this case, you are hooking the messages that are received when the display changes.
One caveat with this approach is that it will only work if the target is using or extending WinAPI controls.
Reading from the GUI
Instead of hooking the GUI, you can alternatively use WinAPI to read directly from the target windows. However, in some cases this may not be allowed. There is not much to do in this case but to try and see if it works. This may in fact be the easiest approach. Typically, you will send messages such as WM_GETTEXT to query the target window for what it is currently displaying. To do this, you will need to obtain the exact window hierarchy containing the control you are interested in. For example, say you want to read an edit control, you will need to see what parent window/s are above it in the window hierarchy in order to obtain its window handle.
Reading from memory (Advanced)
This approach is by far the most complicated but if you are able to fully reverse engineer the target program, it is the most likely to get you consistent data. This approach works by you reading the memory from the target process. This technique is very commonly used in game hacking to add 'functionality' and to observe the internal state of the game.
Consider that as well as storing information in the GUI, programs often hold their own internal model of all the data. This is especially true when the controls used are virtual and simply query subsets of the data to be displayed. This is an example of a situation where the first two approaches would not be of much use. This data is often held in some sort of abstract data type such as a list or perhaps even an array. The trick is to find this list in memory and read the values off directly. This can be done externally with ReadProcessMemory or internally through DLL injection again. The difficulty lies mainly in two prerequisites:
Firstly, you must be able to reliably locate these data structures. The problem with this is that code is not guaranteed to be in the same place, especially with features such as ASLR. Colloquially, this is sometimes referred to as code-shifting. ASLR can be defeated by using the offset from a module base and dynamically getting the module base address with functions such as GetModuleHandle. As well as ASLR, a reason that this occurs is due to dynamic memory allocation (e.g. through malloc). In such cases, you will need to find a heap address storing the pointer (which would for example be the return of malloc), dereference that and find your list. That pointer would be prone to ASLR and instead of a pointer, it might be a double-pointer, triple-pointer, etc.
The second problem you face is that it would be rare for each list item to be a primitive type. For example, instead of a list of character arrays (strings), it is likely that you will be faced with a list of objects. You would need to further reverse engineer each object type and understand internal layouts (at least be able to determine offsets of primitive values you are interested in in terms of its offset from the object base). More advanced methods revolve around actually reverse engineering the vtable of objects and calling their 'API'.
You might notice that I am not able to give information here which is specific. The reason is that by its nature, using this method requires an intimate understanding of the target's internals and as such, the specifics are defined only by how the target has been programmed. Unless you have knowledge and experience of reverse engineering, it is unlikely you would want to go down this route.
Hooking the target's internal API (Advanced)
As with the above solution, instead of digging for data structures, you dig for the internal API. I briefly covered this with when discussing vtables earlier. Instead of doing this, you would be attempting to find internal APIs that are called when the GUI is modified. Typically, when a view/UI is modified, instead of directly calling the WinAPI to update it, a program will have its own wrapper function which it calls which in turn calls the WinAPI. You simply need to find this function and hook it. Again this is possible, but requires reverse engineering skills. You may find that you discover functions which you want to call yourself. In this case, as well as being able to locate the location of the function, you have to reverse engineer the parameters it takes, its calling convention and you will need to ensure calling the function has no side effects.
I would consider this approach to be advanced. It can certainly be done and is another common technique used in game hacking to observe internal states and to manipulate a target's behaviour, but is difficult!
The first two methods are well suited for reading data from WinAPI programs and are by far easier. The two latter methods allow greater flexibility. With enough work, you are able to read anything and everything encapsulated by the target but requires a lot of skill.
Another point of concern which may or may not relate to your case is how easy it will be to update your solution to work should the target every be updated. With the first two methods, it is more likely no changes or small changes have to be made. With the second two methods, even a small change in source code can cause a relocation of the offsets you are relying upon. One method of dealing with this is to use byte signatures to dynamically generate the offsets. I wrote another answer some time ago which addresses how this is done.
What I have written is only a brief summary of the various techniques that can be used for what you want to achieve. I may have missed approaches, but these are the most common ones I know of and have experience with. Since these are large topics in themselves, I would advise you ask a new question if you want to obtain more detail about any particular one. Note that in all of the approaches I have discussed, none of them suffer from any interaction which is visible to the outside world so you would have no problem with anything popping up. It would be, as you describe, 'silent'.
This is relevant information about detouring/trampolining which I have lifted from a previous answer I wrote:
If you are looking for ways that programs detour execution of other
processes, it is usually through one of two means:
Dynamic (Runtime) Detouring - This is the more common method and is what is used by libraries such as Microsoft Detours. Here is a
relevant paper where the first few bytes of a function are overwritten
to unconditionally branch to the instrumentation.
(Static) Binary Rewriting - This is a much less common method for rootkits, but is used by research projects. It allows detouring to be
performed by statically analysing and overwriting a binary. An old
(not publicly available) package for Windows that performs this is
Etch. This paper gives a high-level view of how it works
conceptually.
Although Detours demonstrates one method of dynamic detouring, there
are countless methods used in the industry, especially in the reverse
engineering and hacking arenas. These include the IAT and breakpoint
methods I mentioned above. To 'point you in the right direction' for
these, you should look at 'research' performed in the fields of
research projects and reverse engineering.

make_request and queue limits

I'm writing a linux kernel module that emulates a block device.
There are various calls that can be used to tell the block size to the kernel, so it aligns and sizes every request toward the driver accordingly. This is well documented in the "Linux Device Drives 3" book.
The book describes two methods of implementing a block device: using a "request" function, or using a "make_request" function.
It is not clear, whether the queue limit calls apply when using the minimalistic "make_request" approach (which is also the more efficient one if the underlying device is has really no benefit from sequential over random IO, which is the case with me).
I would really like to get the kernel to talk to me using 4K block sizes, but I see smaller bio-s hitting my make_request function.
My question is that should the blk_queue_limit_* affect the bio size when using make_request?
Thank you in advance.

I think I've found enough evidence in the kernel code that if you use make_request, you'll get correctly sized and aligned bios.
The answer is:
You must call blk_queue_make_request first, because it sets queue limits to defaults. After this, set queue limits as you'd like.
It seems that every part of the kernel submitting bios are do check for validity, and it's up to the submitter to do these checks. I've found incomplete validation in submit_bio and generic_make_request. But as long as no one does tricks, it's fine.
Since it's a policy to submit correct bio's, but it's up to the submitter to take care, and no one in the middle does, I think I have to implement explicit checks and fail the wrong bio-s. Since it's a policy, it's fine to fail on violation, and since it's not enforced by the kernel, it's a good thing to do explicit checks.
If you want to read a bit more on the story, see http://tlfabian.blogspot.com/2012/01/linux-block-device-drivers-queue-and.html.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio