How to avoid memory leaks using ShellExecuteEx? - winapi

Minimal, Complete, and Verifiable example:
Visual Studio 2017 Pro 15.9.3
Windows 10 "1803" (17134.441) x64
Environment variable OANOCACHE set to 1.
Data/Screenshots shown for a 32 bits Unicode build.
UPDATE: Exact same behavior on another machine with Windows 10 "1803" (17134.407)
UPDATE: ZERO leaks on an old Laptop with Windows Seven
UPDATE: Exact same behavior (leaks) on another machine with W10 "1803" (17134.335)
#include <windows.h>
#include <cstdio>
int main() {
getchar();
CoInitializeEx( NULL, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE );
printf( "Launching and terminating processes...\n" );
for ( size_t i = 0; i < 64; ++i ) {
SHELLEXECUTEINFO sei;
memset( &sei, 0, sizeof( sei ) );
sei.cbSize = sizeof( sei );
sei.lpFile = L"iexplore.exe";
sei.lpParameters = L"about:blank";
sei.fMask = SEE_MASK_FLAG_NO_UI | SEE_MASK_NOCLOSEPROCESS | SEE_MASK_NOASYNC;
BOOL bSuccess = ShellExecuteEx( &sei );
if ( bSuccess == FALSE ) {
printf( "\nShellExecuteEx failed with Win32 code %d and hInstApp %d. Exiting...\n",
GetLastError(), (int)sei.hInstApp );
CoUninitialize();
return 0;
} // endif
printf( "%d", (int)GetProcessId( sei.hProcess ) );
Sleep( 1000 );
bSuccess = TerminateProcess( sei.hProcess, 0 );
if ( bSuccess == FALSE ) {
printf( "\nTerminateProcess failed with Win32 code %d. Exiting...\n",
GetLastError() );
CloseHandle( sei.hProcess );
CoUninitialize();
return 0;
} // endif
DWORD dwRetCode = WaitForSingleObject( sei.hProcess, 5000 );
if ( dwRetCode != WAIT_OBJECT_0 ) {
printf( "\nWaitForSingleObject failed with code %x. Exiting...\n",
dwRetCode );
CloseHandle( sei.hProcess );
CoUninitialize();
return 0;
} // endif
CloseHandle( sei.hProcess );
printf( "K " );
Sleep( 1000 );
} // end for
printf( "\nDone!" );
CoUninitialize();
getchar();
} // main
The code use ShellExecuteEx to launch, in a loop, 64 instances of Internet Explorer with the about:blank URL. The SEE_MASK_NOCLOSEPROCESS is used to be able to then use the TerminateProcess API.
I notice two kinds of leaks:
Handles leaks: launching Process Explorer when the loop is finished but the program still running, I see several blocks of 64 handles (process handles, and registries handles for various keys)
Memory leaks: attaching the visual C++ 2017 debugger to the program, before the loop, I took a first Heap Snapshot, and a second one after the loop.I see 64 blocs of 8192 bytes, coming from windows.storage.dll!CInvokeCreateProcessVerb::_BuildEnvironmentForNewProcess()
You can read some information about the handles leaks here: ShellExecute leaks handles
Here are some screenshots:
First, the PID launched and terminated:
Second: the same pids, as seen in Process Explorer:
Process Explorer also shows 64*3 open registry handles, for HKCR\.exe, HKCR\exefile and HKCR\exefile\shell\open.
One of the 64 leaked "Environment" (8192 bytes and the callstack):
Last: a screen shot of Process Explorer, showing the "Private Bytes" during the execution of the MCVE modified with a 1024 loop counter. The running time is approximately 36 minutes, the PV start at 1.1 Mo (before CoInitializeEx) and end at 19 Mo (after CoUninitialize). The value then stabilizes at 18.9
What am I doing wrong?
Do I see leaks where there are none?

this is windows bug, in version 1803. minimal code for reproduce:
if (0 <= CoInitialize(0))
{
SHELLEXECUTEINFO sei = {
sizeof(sei), 0, 0, 0, L"notepad.exe", 0, 0, SW_SHOW
};
ShellExecuteEx( &sei );
CoUninitialize();
}
after execute this code, can view handles for notepad.exe process and first thread - this handles of course must not exist (be closed), not closed keys
\REGISTRY\MACHINE\SOFTWARE\Classes\.exe
\REGISTRY\MACHINE\SOFTWARE\Classes\exefile
also private memory leaks exist in process after this call.
of course this bug cause permanent resource leaks in explorer.exe and any process, which use ShellExecute[Ex]
exactly research of this bug - here
The underlying issue here appears to be in windows.storage.dll. In
particular, the CInvokeCreateProcessVerb object is never
destroyed, because the associated reference count never reaches 0.
This leaks all of the objects associated with
CInvokeCreateProcessVerb, including 4 handles and some memory.
The reason the reference count never reaches 0 appears to be related
to the argument change for ShellDDEExec::InitializeByShellInternal
from Windows 10 1709 to 1803, executed by
CInvokeCreateProcessVerb::Launch().
more concrete here we have cyclic reference of an object (CInvokeCreateProcessVerb) to itself.
more concrete error inside method CInvokeCreateProcessVerb::Launch() which call from self
HRESULT ShellDDEExec::InitializeByShellInternal(
IAssociationElement*,
CreateProcessMethod,
PCWSTR,
STARTUPINFOEXW*,
IShellItem2*,
IUnknown*, // !!!
PCWSTR,
PCWSTR,
PCWSTR);
with wrong 6 argument. the CInvokeCreateProcessVerb class containing internal ShellDDEExec sub-object. in windows 1709 CInvokeCreateProcessVerb::Launch() pass pointer to static_cast<IServiceProvider*>(pObj) in place 6 argument to ShellDDEExec::InitializeByShellInternal where pObj is point to instance of CBindAndInvokeStaticVerb class. but in 1803 version here passed pointer to static_cast<IServiceProvider*>(this) - so pointer to self. the InitializeByShellInternal store this pointer inside self and add reference to it. note that ShellDDEExec is sub-object of CInvokeCreateProcessVerb. so destructor of ShellDDEExec will not be called until destructor of CInvokeCreateProcessVerb not be called. but destructor of CInvokeCreateProcessVerb will be not called until it reference count reach 0. but this not happens until ShellDDEExec not release self pointer to CInvokeCreateProcessVerb which will be only inside it destructor ..
may be this more visible in pseudo code
class ShellDDEExec
{
CComPtr<IUnknown*> _pUnk;
HRESULT InitializeByShellInternal(..IUnknown* pUnk..)
{
_pUnk = pUnk;
}
};
class CInvokeCreateProcessVerb : CExecuteCommandBase, IServiceProvider /**/
{
IServiceProvider* _pVerb;//point to static_cast<IServiceProvider*>(CBindAndInvokeStaticVerb*)
ShellDDEExec _exec;
TRYRESULT CInvokeCreateProcessVerb::Launch()
{
// in 1709
// _exec.InitializeByShellInternal(_pVerb);
// in 1803
_exec.InitializeByShellInternal(..static_cast<IServiceProvider*>(this)..); // !! error !!
}
};
ShellDDEExec::_pUnk hold pointer to containing object CInvokeCreateProcessVerb this pointer will be released only inside CComPtr destructor, called from ShellDDEExec destructor. called from CInvokeCreateProcessVerb destructor, called when reference count became 0, but this never happens because extra reference hold ShellDDEExec::_pUnk
so object store referenced pointer to self. after this reference count to CInvokeCreateProcessVerb never reaches 0

Related

NtCreateProcess(Ex) - Can I have a child process inherit the parents address space while running under a different process name?

I am calling NtCreateProcessEx with the section handle argument set to NULL in order to create a user mode process that is initialized with a copy of the parents address space.
I want the child process to run under a different image name other than the one of the parent process.
Is this even possible?
Here's my call to NtCreateProcessEx:
HANDLE fileHandle;
OBJECT_ATTRIBUTES ObjectAttributes = { 0 };
UNICODE_STRING InputString;
RtlInitUnicodeString( &InputString, L"C:\\Users\\user\\Documents\\codeblocks_projects\\test\\bin\\Release\\test.exe" );
ObjectAttributes.Length = sizeof( OBJECT_ATTRIBUTES );
ObjectAttributes.ObjectName = &InputString;
NTSTATUS status = NtCreateProcessEx( &fileHandle, PROCESS_QUERY_INFORMATION, &ObjectAttributes, GetCurrentProcess(), PS_INHERIT_HANDLES, NULL, NULL, NULL, FALSE );
printf_s( "%x\n", status );
Status is 0xC0000033 - STATUS_OBJECT_NAME_INVALID, if I don't pass any object attributes, the call works fine.
What am I missing here?
My guess is that this not only poorly documented; it is also impossible. At least, it seems effectively impossible nowadays, perhaps due to security concerns.
The documentation for NtOpenProcess indicates that even identifying a process by name hasn't been possible since Vista:
https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/nf-ntddk-ntopenprocess
I just tried an alternative approach using NtSetInformationProcess:
#define PHNT_NO_INLINE_INIT_STRING
#include <phnt_windows.h>
#include <phnt.h>
#include <stdio.h>
int main() {
NTSTATUS status;
HANDLE handle;
status
= NtCreateProcess(&handle,
PROCESS_ALL_ACCESS,
NULL,
NtCurrentProcess(),
/*InheritObjectTable=*/TRUE,
NULL,
NULL,
NULL
);
UNICODE_STRING newName;
RtlInitUnicodeString(&newName, L"dummy.exe");
status
= NtSetInformationProcess(
handle,
ProcessImageFileName,
&newName,
sizeof newName
);
// fails with 0xC0000003, STATUS_INVALID_INFO_CLASS
printf("status: %x\n", status);
// pause to observe the new "zombie child" in Process Explorer
printf("sleeping...\n");
Sleep(5000);
return 0;
}
As noted in the code, NtSetInformationProcess fails with STATUS_INVALID_INFO_CLASS, even though this is allowed by the corresponding NtQueryInformationProcess:
https://learn.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntqueryinformationprocess
Looking at the ReactOS source, this seems to be a deliberate omission. I also found this:
https://reactos.org/pipermail/ros-diffs/2004-November/002066.html
As of this writing, the above reads "don't allow the ProcessImageFileName information class for NtSetInformationProcess() anymore".
All this suggests it may have been possible in the past. After all, how was exec() in the old POSIX subsystem implemented? After fork() created an initial thread in the new process and set the thread's context to be a copy of the parent's, the child might then call exec(), whereupon the POSIX implementation probably destroyed and re-created the child process from within. At some point, the subsystem would have to set the ProcessImageFileName somehow.

Force a DLL to be loaded above 2GB (0x80000000) in a 32-bit process on Windows

To test a corner case in our debugger, I need to come up with a program which has a DLL loaded above 2GB (0x80000000). Current test case is a multi-GB game which loads >700 DLLs, and I'd like to have something simpler and smaller. Is there a way to achieve it reliably without too much fiddling? I assume I need to use /LARGEADDRESSAWARE and somehow consume enough of the VA space to bump the new DLLs above 2GB but I'm fuzzy on the details...
Okay, it took me a few tries but I managed to come up with something working.
// cl /MT /Ox test.cpp /link /LARGEADDRESSAWARE
// occupy the 2 gigabytes!
#define ALLOCSIZE (64*1024)
#define TWOGB (2*1024ull*1024*1024)
#include <windows.h>
#include <stdio.h>
int main()
{
int nallocs = TWOGB/ALLOCSIZE;
for ( int i = 0; i < nallocs+200; i++ )
{
void * p = VirtualAlloc(NULL, ALLOCSIZE, MEM_RESERVE, PAGE_NOACCESS);
if ( i%100 == 0)
{
if ( p != NULL )
printf("%d: %p\n", i, p);
else
{
printf("%d: failed!\n", i);
break;
}
}
}
printf("finished VirtualAlloc. Loading a DLL.\n");
//getchar();
HMODULE hDll = LoadLibrary("winhttp");
printf("DLL base: %p.\n", hDll);
//getchar();
FreeLibrary(hDll);
}
On Win10 x64 produces:
0: 00D80000
100: 03950000
200: 03F90000
[...]
31800: 7FBC0000
31900: 00220000
32000: 00860000
32100: 80140000
32200: 80780000
32300: 80DC0000
32400: 81400000
32500: 81A40000
32600: 82080000
32700: 826C0000
32800: 82D00000
32900: 83340000
finished VirtualAlloc. Loading a DLL.
DLL base: 83780000.
for your own DLL you need set 3 linker option:
/LARGEADDRESSAWARE
/DYNAMICBASE:NO
/BASE:"0x********"
note that link.exe allow only image full located bellow 3GB (0xC0000000) for 32-bit image. in other word, he want that ImageBase + ImageSize <= 0xC0000000
so say /BASE:0xB0000000 will be ok, /BASE:0xBFFF0000 only if your image size <= 0x10000 and for /BASE:0xC0000000 and higher we always got error LNK1249 - image exceeds maximum extent with base address address and size size
also EXE mandatory must have /LARGEADDRESSAWARE too, because are all 4GB space available for wow64 process based only on EXE options.
if we want do this for external DLL - here question more hard. first of all - are this DLL can correct handle this situation (load base > 0x80000000) ? ok. let test this. any api (including most low level LdrLoadDll) not let specify base address, for DLL load. here only hook solution exist.
when library loaded, internal always called ZwMapViewOfSection and it 3-rd parameter BaseAddress - Pointer to a variable that receives the base address of the view. if we set this variable to 0 - system yourself select loaded address. if we set this to specific address - system map view (DLL image in our case) only at this address, or return error STATUS_CONFLICTING_ADDRESSES.
working solution - hook call to ZwMapViewOfSection and replace value of variable, to which point BaseAddress. for find address > 0x80000000 we can use VirtualAlloc with MEM_TOP_DOWN option. note - despite ZwMapViewOfSection also allow use MEM_TOP_DOWN in AllocationType parameter, here it will be have not needed effect - section anyway will be loaded by preferred address or top-down from 0x7FFFFFFF not from 0xFFFFFFFF. but with VirtualAlloc the MEM_TOP_DOWN begin search from 0xFFFFFFFF if process used 4Gb user space. for know - how many memory need for section - we can call ZwQuerySection with SectionBasicInformation - despite this is undocumented - for debug and test - this is ok.
for hook of course can be used some detour lib, but possible do hook with DRx breakpoint - set some Drx register to NtMapViewOfSection address. and set AddVectoredExceptionHandler - which handle exception. this will be perfect work, if process not under debugger. but under debugger it break - most debuggers alwas stop under single step exception and usually no option not handle it but pass to application. of course we can start program not under debugger, and attach it later - after dll load. or possible do this task in separate thread and hide this thread from debugger. disadvantage here - that debugger not got notify about dll load in this case and not load symbols for this. however for external (system dll) for which you have not src code - this in most case can be not a big problem. so solution exit, if we can implement it ). possible code:
PVOID pvNtMapViewOfSection;
LONG NTAPI OnVex(::PEXCEPTION_POINTERS ExceptionInfo)
{
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP &&
ExceptionInfo->ExceptionRecord->ExceptionAddress == pvNtMapViewOfSection)
{
struct MapViewOfSection_stack
{
PVOID ReturnAddress;
HANDLE SectionHandle;
HANDLE ProcessHandle;
PVOID *BaseAddress;
ULONG_PTR ZeroBits;
SIZE_T CommitSize;
PLARGE_INTEGER SectionOffset;
PSIZE_T ViewSize;
SECTION_INHERIT InheritDisposition;
ULONG AllocationType;
ULONG Win32Protect;
} * stack = (MapViewOfSection_stack*)(ULONG_PTR)ExceptionInfo->ContextRecord->Esp;
if (stack->ProcessHandle == NtCurrentProcess())
{
SECTION_BASIC_INFORMATION sbi;
if (0 <= ZwQuerySection(stack->SectionHandle, SectionBasicInformation, &sbi, sizeof(sbi), 0))
{
if (PVOID pv = VirtualAlloc(0, (SIZE_T)sbi.Size.QuadPart, MEM_RESERVE|MEM_TOP_DOWN, PAGE_NOACCESS))
{
if (VirtualFree(pv, 0, MEM_RELEASE))
{
*stack->BaseAddress = pv;
}
}
}
}
// RESUME_FLAG ( 0x10000) not supported by xp, but anyway not exist 64bit xp
ExceptionInfo->ContextRecord->EFlags |= RESUME_FLAG;
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
struct LOAD_DATA {
PCWSTR lpLibFileName;
HMODULE hmod;
ULONG dwError;
};
ULONG WINAPI HideFromDebuggerThread(LOAD_DATA* pld)
{
NtSetInformationThread(NtCurrentThread(), ThreadHideFromDebugger, 0, 0);
ULONG dwError = 0;
HMODULE hmod = 0;
if (PVOID pv = AddVectoredExceptionHandler(TRUE, OnVex))
{
::CONTEXT ctx = {};
ctx.ContextFlags = CONTEXT_DEBUG_REGISTERS;
ctx.Dr7 = 0x404;
ctx.Dr1 = (ULONG_PTR)pvNtMapViewOfSection;
if (SetThreadContext(GetCurrentThread(), &ctx))
{
if (hmod = LoadLibraryW(pld->lpLibFileName))
{
pld->hmod = hmod;
}
else
{
dwError = GetLastError();
}
ctx.Dr7 = 0x400;
ctx.Dr1 = 0;
SetThreadContext(GetCurrentThread(), &ctx);
}
else
{
dwError = GetLastError();
}
RemoveVectoredExceptionHandler(pv);
}
else
{
dwError = GetLastError();
}
pld->dwError = dwError;
return dwError;
}
HMODULE LoadLibHigh(PCWSTR lpLibFileName)
{
BOOL bWow;
HMODULE hmod = 0;
if (IsWow64Process(GetCurrentProcess(), &bWow) && bWow)
{
if (pvNtMapViewOfSection = GetProcAddress(GetModuleHandle(L"ntdll"), "NtMapViewOfSection"))
{
LOAD_DATA ld = { lpLibFileName };
if (IsDebuggerPresent())
{
if (HANDLE hThread = CreateThread(0, 0, (PTHREAD_START_ROUTINE)HideFromDebuggerThread, &ld, 0, 0))
{
WaitForSingleObject(hThread, INFINITE);
CloseHandle(hThread);
}
}
else
{
HideFromDebuggerThread(&ld);
}
if (!(hmod = ld.hmod))
{
SetLastError(ld.dwError);
}
}
}
else
{
hmod = LoadLibrary(lpLibFileName);
}
return hmod;
}

Walking the NTFS Change Journal on Windows 10

I'm trying to read the NTFS Change Journal but I've noticed that all the example code I can find fails on Windows 10, even though it works on Windows 7.
For example, Microsofts own example Walking a Buffer of Change Journal Records works on Windows 7 but when I run the same code on Windows 10 I get an error 87 (The parameter is incorrect) when I call DeviceIoControl with FSCTL_READ_USN_JOURNAL (Note that the earlier call to DeviceIoControl with FSCTL_QUERY_USN_JOURNAL completes successfully and return valid data.).
I've even taken the EXE compiled and working on Windows 7 and copied it to the Windows 10 machine and it still fails, so I believe that Windows 10 may be more strict on parameter validation or something like that?
I am running the code as Administrator, so that's not the issue.
I can't find any other references to this problem, but I get the same issue if I take other peoples example code and attempt to run it on Windows 10.
Edit:
The code itself:
#include <Windows.h>
#include <WinIoCtl.h>
#include <stdio.h>
#define BUF_LEN 4096
void main()
{
HANDLE hVol;
CHAR Buffer[BUF_LEN];
USN_JOURNAL_DATA JournalData;
READ_USN_JOURNAL_DATA ReadData = {0, 0xFFFFFFFF, FALSE, 0, 0};
PUSN_RECORD UsnRecord;
DWORD dwBytes;
DWORD dwRetBytes;
int I;
hVol = CreateFile( TEXT("\\\\.\\c:"),
GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
0,
NULL);
if( hVol == INVALID_HANDLE_VALUE )
{
printf("CreateFile failed (%d)\n", GetLastError());
return;
}
if( !DeviceIoControl( hVol,
FSCTL_QUERY_USN_JOURNAL,
NULL,
0,
&JournalData,
sizeof(JournalData),
&dwBytes,
NULL) )
{
printf( "Query journal failed (%d)\n", GetLastError());
return;
}
ReadData.UsnJournalID = JournalData.UsnJournalID;
printf( "Journal ID: %I64x\n", JournalData.UsnJournalID );
printf( "FirstUsn: %I64x\n\n", JournalData.FirstUsn );
for(I=0; I<=10; I++)
{
memset( Buffer, 0, BUF_LEN );
if( !DeviceIoControl( hVol,
FSCTL_READ_USN_JOURNAL,
&ReadData,
sizeof(ReadData),
&Buffer,
BUF_LEN,
&dwBytes,
NULL) )
{
printf( "Read journal failed (%d)\n", GetLastError());
return;
}
dwRetBytes = dwBytes - sizeof(USN);
// Find the first record
UsnRecord = (PUSN_RECORD)(((PUCHAR)Buffer) + sizeof(USN));
printf( "****************************************\n");
// This loop could go on for a long time, given the current buffer size.
while( dwRetBytes > 0 )
{
printf( "USN: %I64x\n", UsnRecord->Usn );
printf("File name: %.*S\n",
UsnRecord->FileNameLength/2,
UsnRecord->FileName );
printf( "Reason: %x\n", UsnRecord->Reason );
printf( "\n" );
dwRetBytes -= UsnRecord->RecordLength;
// Find the next record
UsnRecord = (PUSN_RECORD)(((PCHAR)UsnRecord) +
UsnRecord->RecordLength);
}
// Update starting USN for next call
ReadData.StartUsn = *(USN *)&Buffer;
}
CloseHandle(hVol);
}
I've managed to work out what the problem is.
The example Microsoft code creates a local variable defined as READ_USN_JOURNAL_DATA, which is defined as:
#if (NTDDI_VERSION >= NTDDI_WIN8)
typedef READ_USN_JOURNAL_DATA_V1 READ_USN_JOURNAL_DATA, *PREAD_USN_JOURNAL_DATA;
#else
typedef READ_USN_JOURNAL_DATA_V0 READ_USN_JOURNAL_DATA, *PREAD_USN_JOURNAL_DATA;
#endif
On my systems (Both the Win10 and Win7 systems) this evaluates to READ_USN_JOURNAL_DATA_V1.
Looking at the difference betweem READ_USN_JOURNAL_DATA_V0 and READ_USN_JOURNAL_DATA_V1 we can see that V0 is defined as:
typedef struct {
USN StartUsn;
DWORD ReasonMask;
DWORD ReturnOnlyOnClose;
DWORDLONG Timeout;
DWORDLONG BytesToWaitFor;
DWORDLONG UsnJournalID;
} READ_USN_JOURNAL_DATA_V0, *PREAD_USN_JOURNAL_DATA_V0;
and the V1 version is defined as:
typedef struct {
USN StartUsn;
DWORD ReasonMask;
DWORD ReturnOnlyOnClose;
DWORDLONG Timeout;
DWORDLONG BytesToWaitFor;
DWORDLONG UsnJournalID;
WORD MinMajorVersion;
WORD MaxMajorVersion;
} READ_USN_JOURNAL_DATA_V1, *PREAD_USN_JOURNAL_DATA_V1;
Note the new Min and Max Major version members.
So, the Microsoft code is defining a local variable called ReadData that is actually a V1 structure, yet it appears to fill in the data assuming it's a V0 structure. i.e. it doesn't set the Min and Max elements.
It appears that Win7 is fine with this but Win10 rejects it and returns error 87 (The parameter is incorrect).
Sure enough if I explicitly define the ReadData variable to be a READ_USN_JOURNAL_DATA_V0 then the code works on Win7 and Win10, whereas if I explicitly define it as a READ_USN_JOURNAL_DATA_V1 then it continues to work on Win7 but not on Win10.
The strange thing is that the API documentation for READ_USN_JOURNAL_DATA_V1 structure states that it's only supported from Windows 8 on, so it's odd that it works on Windows 7 at all. I guess it's just interpreting it as a READ_USN_JOURNAL_DATA_V0 structure given that V1 version is an extension of the V0 structure. If so then it must be ignoring the size parameter that is passed into DeviceIOControl.
Anyway, all working now. I hope someone finds this a useful reference in the future.
I came across this exact same issue with the sample code as the OP. At the start of the sample code, you will see a partial initialization of the structure at the point of declaration. A little later in the code, right before the offending call, there is a line that assigns the UsnJournalID into the read data structure.
For Windows 10, though, the other two members of the V1 structure are not initialized. I initialized them right after the UsnJournalID's initialization with:
#if (NTDDI_VERSION >= NTDDI_WIN8)
ReadData.MinMajorVersion = JournalData.MinSupportedMajorVersion;
ReadData.MaxMajorVersion = JournalData.MaxSupportedMajorVersion;
#endif
When I ran my code after doing this, it worked correctly without the error code. I remember reading in the Volume Management API discussion that the Min and Max versions needed to be set. I forgot exactly where, because I've been reading and testing this stuff for a couple days.
Anyway, I hope that clarifies the issue for anyone coming after me.
Replace the READ_USN_JOURNAL_DATA structure with READ_USN_JOURNAL_DATA_V0 data structure and initialize it.
This worked for me
READ_USN_JOURNAL_DATA_V0 ReadData;
ZeroMemory(&ReadData, sizeof(ReadData));
ReadData.ReasonMask = 0xFFFFFFFF;

Using IRPs for I/O on device object returned by IoGetDeviceObjectPointer()

Can one use IoCallDriver() with an IRP created by IoBuildAsynchronousFsdRequest() on a device object returned by IoGetDeviceObjectPointer()? What I have currently fails with blue screen (BSOD) 0x7E (unhandled exception), which when caught shows an Access Violation (0xc0000005). Same code worked when the device was stacked (using the device object returned by IoAttachDeviceToDeviceStack()).
So what I have is about the following:
status = IoGetDeviceObjectPointer(&device_name, FILE_ALL_ACCESS, &FileObject, &windows_device);
if (!NT_SUCCESS(status)) {
return -1;
}
offset.QuadPart = 0;
newIrp = IoBuildAsynchronousFsdRequest(io, windows_device, buffer, 4096, &offset, &io_stat);
if (newIrp == NULL) {
return -1;
}
IoSetCompletionRoutine(newIrp, DrbdIoCompletion, bio, TRUE, TRUE, TRUE);
status = ObReferenceObjectByPointer(newIrp->Tail.Overlay.Thread, THREAD_ALL_ACCESS, NULL, KernelMode);
if (!NT_SUCCESS(status)) {
return -1;
}
status = IoCallDriver(bio->bi_bdev->windows_device, newIrp);
if (!NT_SUCCESS(status)) {
return -1;
}
return 0;
device_name is \Device\HarddiskVolume7 which exists according to WinObj.exe .
buffer has enough space and is read/writable. offset and io_stat are on stack (also tried with heap, didn't help). When catching the exception (SEH exception) it doesn't blue screen but shows an access violation as reason for the exception. io is IRP_MJ_READ.
Do I miss something obvious? Is it in general better to use IRPs than the ZwCreateFile / ZwReadFile / ZwWriteFile API (which would be an option, but isn't that slower?)? I also tried a ZwCreateFile to have an extra reference, but this also didn't help.
Thanks for any insights.
you make in this code how minimum 2 critical errors.
can I ask - from which file you try read (or write) data ? from
FileObject you say ? but how file system driver, which will handle
this request know this ? you not pass any file object to newIrp.
look for IoBuildAsynchronousFsdRequest - it have no file object
parameter (and impossible get file object from device object - only
visa versa - because on device can be multiple files open). so it
and can not be filled by this api in newIrp. you must setup it
yourself:
PIO_STACK_LOCATION irpSp = IoGetNextIrpStackLocation( newIrp );
irpSp->FileObject = FileObject;
I guess bug was exactly when file system try access FileObject
from irp which is 0 in your case. also read docs for
IRP_MJ_READ - IrpSp->FileObject -
Pointer to the file object that is associated with DeviceObject
you pass I guess local variables io_stat (and offset) to
IoBuildAsynchronousFsdRequest. as result io_stat must be valid
until newIrp is completed - I/O subsystem write final result to it
when operation completed. but you not wait in function until request
will be completed (in case STATUS_PENDING returned) but just exit
from function. as result later I/O subsystem, if operation completed
asynchronous, write data to arbitrary address &io_stat (it became
arbitrary just after you exit from function). so you need or check
for STATUS_PENDING returned and wait in this case (have actually
synchronous io request). but more logical use
IoBuildSynchronousFsdRequest in this case. or allocate io_stat
not from stack, but say in your object which correspond to file. in
this case you can not have more than single io request with this
object at time. or if you want exactly asynchronous I/O - you can do
next trick - newIrp->UserIosb = &newIrp->IoStatus. as result you
iosb always will be valid for newIrp. and actual operation status
you check/use in DrbdIoCompletion
also can you explain (not for me - for self) next code line ?:
status = ObReferenceObjectByPointer(newIrp->Tail.Overlay.Thread, THREAD_ALL_ACCESS, NULL, KernelMode);
who and where dereference thread and what sense in this ?
Can one use ...
we can use all, but with condition - we understand what we doing and deep understand system internally.
Is it in general better to use IRPs than the ZwCreateFile / ZwReadFile
/ ZwWriteFile API
for performance - yes, better. but this require more code and more complex code compare api calls. and require more knowledge. also if you know that previous mode is kernel mode - you can use NtCreateFile, NtWriteFile, NtReadFile - this of course will be bit slow (need every time reference file object by handle) but more faster compare Zw version
Just wanted to add that the ObReferenceObjectByPointer is needed
because the IRP references the current thread which may exit before
the request is completed. It is dereferenced in the Completion
Routine. Also as a hint the completion routine must return
STATUS_MORE_PROCESSING_REQUIRED if it frees the IRP (took me several
days to figure that out).
here you make again several mistakes. how i understand you in completion routine do next:
IoFreeIrp(Irp);
return StopCompletion;
but call simply call IoFreeIrp here is error - resource leak. i advice you check (DbgPrint) Irp->MdlAddress at this point. if you read data from file system object and request completed asynchronous - file system always allocate Mdl for access user buffer in arbitrary context. now question - who free this Mdl ? IoFreeIrp - simply free Irp memory - nothing more. you do this yourself ? doubt. but Irp is complex object, which internally hold many resources. as result need not only free it memory but call "destructor" for it. this "destructor" is IofCompleteRequest. when you return StopCompletion (=STATUS_MORE_PROCESSING_REQUIRED) you break this destructor at very begin. but you must latter again call IofCompleteRequest for continue Irp (and it resources) correct destroy.
about referencing Tail.Overlay.Thread - what you doing - have no sense:
It is dereferenced in the Completion Routine.
but IofCompleteRequest access Tail.Overlay.Thread after it
call your completion routine (and if you not return
StopCompletion). as result your reference/dereference thread lost
sense - because you deference it too early, before system
actually access it.
also if you return StopCompletion and not more call
IofCompleteRequest for this Irp - system not access
Tail.Overlay.Thread at all. and you not need reference it in this
case.
and exist else one reason, why reference thread is senseless. system
access Tail.Overlay.Thread only for insert Apc to him - for call
final part (IopCompleteRequest) of Irp destruction in original
thread context. really this need only for user mode Irp's requests,
where buffers and iosb located in user mode and valid only in
context of process (original thread ). but if thread is terminated -
call of KeInsertQueueApc fail - system not let insert apc to
died thread. as result IopCompleteRequest will be not called and
resources not freed.
so you or dereference Tail.Overlay.Thread too early or you not need do this at all. and reference for died thread anyway not help. in all case what you doing is error.
you can try do next here:
PETHREAD Thread = Irp->Tail.Overlay.Thread;
IofCompleteRequest(Irp, IO_NO_INCREMENT);// here Thread will be referenced
ObfDereferenceObject(Thread);
return StopCompletion;
A second call to IofCompleteRequest causes the I/O manager to resume calling the IRP's completion. here io manager and access Tail.Overlay.Thread insert Apc to him. and finally you call ObfDereferenceObject(Thread); already after system access it and return StopCompletion for break first call to IofCompleteRequest. look like correct but.. if thread already terminated, how i explain in 3 this will be error, because KeInsertQueueApc fail. for extended test - call IofCallDriver from separate thread and just exit from it. and in completion run next code:
PETHREAD Thread = Irp->Tail.Overlay.Thread;
if (PsIsThreadTerminating(Thread))
{
DbgPrint("ThreadTerminating\n");
if (PKAPC Apc = (PKAPC)ExAllocatePool(NonPagedPool, sizeof(KAPC)))
{
KeInitializeApc(Apc, Thread, 0, KernelRoutine, 0, 0, KernelMode, 0);
if (!KeInsertQueueApc(Apc, 0, 0, IO_NO_INCREMENT))
{
DbgPrint("!KeInsertQueueApc\n");
ExFreePool(Apc);
}
}
}
PMDL MdlAddress = Irp->MdlAddress;
IofCompleteRequest(Irp, IO_NO_INCREMENT);
ObfDereferenceObject(Thread);
if (MdlAddress == Irp->MdlAddress)
{
// IopCompleteRequest not called due KeInsertQueueApc fail
DbgPrint("!!!!!!!!!!!\n");
IoFreeMdl(MdlAddress);
IoFreeIrp(Irp);
}
return StopCompletion;
//---------------
VOID KernelRoutine (PKAPC Apc,PKNORMAL_ROUTINE *,PVOID *,PVOID *,PVOID *)
{
DbgPrint("KernelRoutine(%p)\n", Apc);
ExFreePool(Apc);
}
and you must got next debug output:
ThreadTerminating
!KeInsertQueueApc
!!!!!!!!!!!
and KernelRoutine will be not called (like and IopCompleteRequest) - no print from it.
so what is correct solution ? this of course not documented anywhere, but based on deep internal understand. you not need reference original thread. you need do next:
Irp->Tail.Overlay.Thread = KeGetCurrentThread();
return ContinueCompletion;
you can safe change Tail.Overlay.Thread - if you have no any pointers valid only in original process context. this is true for kernel mode requests - all your buffers in kernel mode and valid in any context. and of course you not need break Irp destruction but continue it. for correct free mdl and all irp resources. and finally system call IoFreeIrp for you.
and again for iosb pointer. how i say pass local variable address, if you exit from function before irp completed (and this iosb accessed) is error. if you break Irp destruction, iosb will be not accessed of course, but in this case much better pass 0 pointer as iosb. (if you latter something change and iosb pointer will be accessed - will be the worst error - arbitrary memory corrupted - with unpredictable effect. and research crash of this will be very-very hard). but if you completion routine - you not need separate iosb at all - you have irp in completion and can direct access it internal iosb - for what you need else one ? so the best solution will be do next:
Irp->UserIosb = &Irp->IoStatus;
full correct example how read file asynchronous:
NTSTATUS DemoCompletion (PDEVICE_OBJECT /*DeviceObject*/, PIRP Irp, BIO* bio)
{
DbgPrint("DemoCompletion(p=%x mdl=%p)\n", Irp->PendingReturned, Irp->MdlAddress);
bio->CheckResult(Irp->IoStatus.Status, Irp->IoStatus.Information);
bio->Release();
Irp->Tail.Overlay.Thread = KeGetCurrentThread();
return ContinueCompletion;
}
VOID DoTest (PVOID buf)
{
PFILE_OBJECT FileObject;
NTSTATUS status;
UNICODE_STRING ObjectName = RTL_CONSTANT_STRING(L"\\Device\\HarddiskVolume2");
OBJECT_ATTRIBUTES oa = { sizeof(oa), 0, &ObjectName, OBJ_CASE_INSENSITIVE };
if (0 <= (status = GetDeviceObjectPointer(&oa, &FileObject)))
{
status = STATUS_INSUFFICIENT_RESOURCES;
if (BIO* bio = new BIO(FileObject))
{
if (buf = bio->AllocBuffer(PAGE_SIZE))
{
LARGE_INTEGER ByteOffset = {};
PDEVICE_OBJECT DeviceObject = IoGetRelatedDeviceObject(FileObject);
if (PIRP Irp = IoBuildAsynchronousFsdRequest(IRP_MJ_READ, DeviceObject, buf, PAGE_SIZE, &ByteOffset, 0))
{
Irp->UserIosb = &Irp->IoStatus;
Irp->Tail.Overlay.Thread = 0;
PIO_STACK_LOCATION IrpSp = IoGetNextIrpStackLocation(Irp);
IrpSp->FileObject = FileObject;
bio->AddRef();
IrpSp->CompletionRoutine = (PIO_COMPLETION_ROUTINE)DemoCompletion;
IrpSp->Context = bio;
IrpSp->Control = SL_INVOKE_ON_CANCEL|SL_INVOKE_ON_ERROR|SL_INVOKE_ON_SUCCESS;
status = IofCallDriver(DeviceObject, Irp);
}
}
bio->Release();
}
ObfDereferenceObject(FileObject);
}
DbgPrint("DoTest=%x\n", status);
}
struct BIO
{
PVOID Buffer;
PFILE_OBJECT FileObject;
LONG dwRef;
void AddRef()
{
InterlockedIncrement(&dwRef);
}
void Release()
{
if (!InterlockedDecrement(&dwRef))
{
delete this;
}
}
void* operator new(size_t cb)
{
return ExAllocatePool(PagedPool, cb);
}
void operator delete(void* p)
{
ExFreePool(p);
}
BIO(PFILE_OBJECT FileObject) : FileObject(FileObject), Buffer(0), dwRef(1)
{
DbgPrint("%s<%p>(%p)\n", __FUNCTION__, this, FileObject);
ObfReferenceObject(FileObject);
}
~BIO()
{
if (Buffer)
{
ExFreePool(Buffer);
}
ObfDereferenceObject(FileObject);
DbgPrint("%s<%p>(%p)\n", __FUNCTION__, this, FileObject);
}
PVOID AllocBuffer(ULONG NumberOfBytes)
{
return Buffer = ExAllocatePool(PagedPool, NumberOfBytes);
}
void CheckResult(NTSTATUS status, ULONG_PTR Information)
{
DbgPrint("CheckResult:status = %x, info = %p\n", status, Information);
if (0 <= status)
{
if (ULONG_PTR cb = min(16, Information))
{
char buf[64], *sz = buf;
PBYTE pb = (PBYTE)Buffer;
do sz += sprintf(sz, "%02x ", *pb++); while (--cb); sz[-1]= '\n';
DbgPrint(buf);
}
}
}
};
NTSTATUS GetDeviceObjectPointer(POBJECT_ATTRIBUTES poa, PFILE_OBJECT *FileObject )
{
HANDLE hFile;
IO_STATUS_BLOCK iosb;
NTSTATUS status = IoCreateFile(&hFile, FILE_READ_DATA, poa, &iosb, 0, 0,
FILE_SHARE_VALID_FLAGS, FILE_OPEN, FILE_NO_INTERMEDIATE_BUFFERING, 0, 0, CreateFileTypeNone, 0, 0);
if (0 <= (status))
{
status = ObReferenceObjectByHandle(hFile, 0, *IoFileObjectType, KernelMode, (void**)FileObject, 0);
NtClose(hFile);
}
return status;
}
and output:
BIO::BIO<FFFFC000024D4870>(FFFFE00001BAAB70)
DoTest=103
DemoCompletion(p=1 mdl=FFFFE0000200EE70)
CheckResult:status = 0, info = 0000000000001000
eb 52 90 4e 54 46 53 20 20 20 20 00 02 08 00 00
BIO::~BIO<FFFFC000024D4870>(FFFFE00001BAAB70)
the eb 52 90 4e 54 46 53 read ok

QueueUserAPC fails with invalid handle?

I'm trying to figure out why QueueUserAPC fails, the error code is 6, which is ERROR_INVALID_HANDLE
The DLL exists, the OpenThread works too,
Attached source code,
int _tmain(int argc, _TCHAR* argv[])
{
DWORD pid;
vector<DWORD> tids;
if (FindProcess(L"calc.exe", pid, tids)) {
HANDLE hProcess = ::OpenProcess(PROCESS_VM_WRITE | PROCESS_VM_OPERATION, FALSE, pid);
auto p = ::VirtualAllocEx(hProcess, nullptr, 1 << 12, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
wchar_t buffer[] = L"c:\\msgbox.dll";
::WriteProcessMemory(hProcess, p, buffer, sizeof(buffer), nullptr);
for (const auto& tid : tids) {
HANDLE hThread = ::OpenThread(THREAD_SET_CONTEXT, FALSE, tid);
if (hThread != INVALID_HANDLE_VALUE) {
::QueueUserAPC((PAPCFUNC)::GetProcAddress(GetModuleHandle(L"kernel32"), "LoadLibraryW"), hThread, (ULONG_PTR)p);
printf ("QueueUserAPC: %d\n", GetLastError());
} else {
printf ("OpenThread failed (%d): %d\n", tid, GetLastError());
}
}
::VirtualFreeEx(hProcess, p, 0, MEM_RELEASE | MEM_DECOMMIT);
} else {
puts ("process not found");
}
return 0;
}
OpenThread -
If the function fails, the return value is NULL.
so condition
if (hThread != INVALID_HANDLE_VALUE)
is error. need
if (hThread != NULL)// or if (hThread)
I guess that in your case hThread == 0
also need understand that:
When a user-mode APC is queued, the thread is not directed to call the
APC function unless it is in an alertable state.
really even if you use correct thread id/handle - your code will be no effect - APC not executed. insert APC have sense only if you know what thread is doing, that he wait for APC.
one is point, when QueueUserAPC worked - if you yourself call CreateProcess with CREATE_SUSPENDED and then call QueueUserAPC (in xp before call QueueUserAPC you must call GetThreadContext because xp at thread start insert also internal insert APC to it (to LdrInitializeThunk) - and if you just (without arbitrary wait or GetThreadContext(this is exactly wait) call QueueUserAPC - your APC can inserted before system APC - as result process begin executed not from LdrInitializeThunk but from your func and crashed. begin from vista this problem is gone and we can not use this hack) .
this will be worked because just before call exe entry point ntdll always call ZwTestAlert - this call check - are exist APC in thread queue and execute it. as result your APC will execute after all DLL in process got DLL_PROCESS_ATTACH, TLS initialized, etc.. but just before exe entry point begin executed. in context of first thread in process. for this case use QueueUserAPC the best injection way

Resources