Does an access violation exception happen before or after the offending memory is written? - interop

I am seeing some access violations in a C# app which call a c++ dll (cdecl calling convention)
On the stack trace dump I am seeing some bad memory locations:
2aabe80c 00020000 someCdll!somefunction(
short * X_data = 0x00000001,
int * X_sizes = 0x00030002,
short * Y_data = 0x0000002b,
int * Y_sizes = 0x0e7115e8
short * T_data = 0x00000000,
struct someStruct * some_data)+0x268
and getting an access violation exception.
Short * X_data = 0x00000001 looks invalid.
Is it possible that this function changed this and then caused the access violation, or did something else make the change and this function tries to write but gets the access violation before it actually changes the memory?
Or is windbg just giving me bogus data.
Edit*
This was caused by a classic buffer overflow in unmanaged code. An array pointer was loaded into a register and then a loop took care of the rest overwriting all of my stack variables, which made it look like the code was in a different state than it was when it crashed.
Thanks,
Jason

This was caused by a classic buffer overflow in unmanaged code. An array pointer was loaded into a register and then a loop took care of the rest overwriting all of my stack variables, which made it look like the code was in a different state than it was when it crashed.

Related

CUDA dynamic parallelism: Access child kernel results in global memory

I am currently trying my first dynamic parallelism code in CUDA. It is pretty simple. In the parent kernel I am doing something like this:
int aPayloads[32];
// Compute aPayloads start values here
int* aGlobalPayloads = nullptr;
cudaMalloc(&aGlobalPayloads, (sizeof(int) *32));
cudaMemcpyAsync(aGlobalPayloads, aPayloads, (sizeof(int)*32), cudaMemcpyDeviceToDevice));
mykernel<<<1, 1>>>(aGlobalPayloads); // Modifies data in aGlobalPayloads
cudaDeviceSynchronize();
// Access results in payload array here
Assuming that I do things right so far, what is the fastest way to access the results in aGlobalPayloads after kernel execution? (I tried cudaMemcpy() to copy aGlobalPayloads back to aPayloads but cudaMemcpy() is not allowed in device code).
You can directly access the data in aGlobalPayloads from your parent kernel code, without any copying:
mykernel<<<1, 1>>>(aGlobalPayloads); // Modifies data in aGlobalPayloads
cudaDeviceSynchronize();
int myval = aGlobalPayloads[0];
I'd encourage careful error checking (Read the whole accepted answer here). You do it in device code the same way as in host code. The programming guide states: "May not pass in local or shared memory pointers". Your usage of aPayloads is a local memory pointer.
If for some reason you want that data to be explicitly put back in your local array, you can use in-kernel memcpy for that:
memcpy(aPayloads, aGlobalPayloads, sizeof(int)*32);
int myval = aPayloads[0]; // retrieves the same value
(that is also how I would fix the issue I mention in item 2 - use in-kernel memcpy)

How to set memory region's protection in kernel mode under Windows 7

Essentially I am looking for a function that could do for kernel mode what VirtualProtect does for user mode.
I am allocating memory using a logic exemplified by the following simplified code.
PMDL mdl = MmAllocatePagesForMdl
(
LowAddress,
HighAddress,
SkipAddress,
size
);
ULONG flags = NormalPagePriority | MdlMappingNoExecute | MdlMappingNoWrite;
PVOID ptr = MmGetSystemAddressForMdlSafe
(
mdl,
flags
);
The MdlMappingNoExecute and MdlMappingNoWrite flags will have effect only on Win8+.
Moreover, using only MmGetSystemAddressForMdlSafe I cannot assign for example NoAccess protection for the memory region.
Are there any additional or alternative API-s I could use so that I can modify the page protection of the allocated memory?
A hack would do too since currently this functionality would not be in use in production code.
C:\Windows\System32>dumpbin /exports ntdll.dll | find "Protect"
391 17E 0004C030 NtProtectVirtualMemory
1077 42C 000CE8F0 RtlProtectHeap
1638 65D 0004C030 ZwProtectVirtualMemory
I think you can call Zw functions from kernel mode, and the args are generally the same as for the corresponding Nt functions. And while ZwProtectVirtualMemory is undocumented, there is a documented ZwAllocateVirtualMemory that accepts protection flags.
Another approach might be to allocate and protect virtual memory in user-mode, pass the buffer down to your driver, then create the corresponding MDL there.
The code I currently ended up using is below.
All used APIs are official.
Here I create another mdl for subrange of the allocated memory and change protection of that subrange.
If You trip over memory protected with this method below then:
at IRQL < DISPATCH_LEVEL You will get PAGE_FAULT_IN_NONPAGED_AREA fault (Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.)
at IRQL == DISPATCH_LEVEL You will get
DRIVER_IRQL_NOT_LESS_OR_EQUAL fault (An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.)
Note that changing the protection might fail if the subrange is part of large page allocation. Then the status will be likely STATUS_NOT_SUPPORTED.
Large page allocations can happen if the originally allocated memory region's size and alignment (which depends on SkipAddress variable in the question) are suitable and some additional preconditions are fulfilled with which I am not familiar with (perhaps starting from certain OS version).
PMDL guard_mdl = IoAllocateMdl
(
NULL,
PAGE_SIZE * guardPageCount,
FALSE,
FALSE,
NULL
);
if (guard_mdl)
{
IoBuildPartialMdl
(
mdl,
guard_mdl,
(PVOID)(0), // **offset** from the beginning of allocated memory ptr
PAGE_SIZE * guardPageCount
);
status = MmProtectMdlSystemAddress
(
guard_mdl,
PAGE_NOACCESS
);
}

Writing to .text section of a user process from kernel space

I'm writing a kernel space component for a research project which requires me to intercept and checkpoint a user space process at different points in its execution (specific instructions.) For various reasons I cannot modify the user-space program or ptrace that process.
To accomplish this goal I'm attempting to insert an breakpoint (INT 3 instruction) in the user-space process at the point I need to checkpoint it, and then intercept the SIGTRAP in kernel space. Unfortunately, I can't seem to figure out how to properly modify the read-only text section of the user-space code from the kernel space of that process. I'm currently attempting to use the get_user_pages API to force the pages writable, and modify them, but the text data doesn't seem to change. The relevant portions of the code I'm attempting to use are below. user_addr is the user-space address to insert a breakpoint at (unsigned long); page is a struct page *.
char *addr;
unsigned long aligned_user_addr = user_addr & ~((unsigned long)PAGE_SIZE - 1);
down_read(&current->mm->mmap_sem);
rc = get_user_pages(current, current->mm, aligned_user_addr,
1, 1, 1, &page, &vma);
up_read(&current->mm->mmap_sem);
BUG_ON(rc != 1);
addr = kmap(page);
BUG_ON(!addr);
offs = user_addr % PAGE_SIZE;
/* NOTE: INT3_INSTR is defined to be 0xCC */
addr[offs] = INT3_INSTR;
BUG_ON(addr[offs] != INT3_INSTR); // Assertion fails
set_page_dirty(page);
kunmap(page);
page_cache_release(page);
I'm hoping someone with more kernel knowledge and experience will be able to tell me what I'm doing wrong, or the proper way to go about accomplishing my task.
Thank you for your help.
It turns out that my issue was actually with C sign extension. INT3_INSTR was defined as:
#define INT3_INSTR 0xCC
Which makes it an integer, and the line:
BUG_ON(addr[offs] != INT3_INSTR);
evaluated addr[offs] to be a signed char. In c when a signed char is compared to an int its type is elevated to that of int, and since its signed it will be signed extended if its MSB is 1. As 0xCC's MSB is always 1 the comparison always evaluated to:
BUG_ON(0xFFFFFFCC != 0xCC);
Which evaluated as false. Changing addr to a unsigned char * resolves the issue. and then the above code works.

Stack Overflow in C function call - MS Visual C++ 2010 Express

I have written a function in C, which, when called, immediately results in a stack overflow.
Prototype:
void dumpOutput( Settings *, char **, FILE * );
Calling line:
dumpOutput( stSettings, sInput, fpOut );
At the time of calling it, stSettings is already a pointer to Settings structure, sInput is a dynamically allocated 2D array and fpOut is a FILE *. It reaches all the way to the calling line without any errors, no memory leaks etc.
The actual function is rather lengthy and i think its not worth sharing it here as the overflow occurs just as the code enters the function (called the prologue part, i think)
I have tried calling the same function directly from main() with dummy variables for checking if there are any problems with passed arguments but it still throws the stack overflow condition.
The error arises from the chkstk.asm when the function is called. This asm file (according to the comments present in it) tries to probe the stack to check / allocate the memory for the called function. It just keeps jumping to Find next lower page and probe part till the stack overflow occurs.
The local variables in dumpOutput are not memory beasts either, just 6 integers and 2 pointers.
The memory used by code at the point of entering this function is 60,936K, which increases to 61,940K at the point when the stack overflow occurs. Most of this memory goes into the sInput. Is this the cause of error? I don't think so, because only its pointer is being passed. Secondly, i fail to understand why dumpOutput is trying to allocate 1004K of memory on stack?
I am totally at a loss here. Any help will be highly appreciated.
Thanks in advance.
By design, it is _chkstk()'s job to generate a stack overflow exception. You can diagnose it by looking at the generated machine code. After you step into the function, right-click the edit window and click Go To Disassembly. You ought to see something similar to this:
003013B0 push ebp
003013B1 mov ebp,esp
003013B3 mov eax,1000D4h ; <== here
003013B8 call #ILT+70(__chkstk) (30104Bh)
The value passed through the EAX register is the important one, that's the amount of stack space your function needs. Chkstk then verifies it is actually available by probing the pages of stack. If you see it repeatedly looping then the value for EAX in your code is high. Like mine, it is guaranteed to consume all bytes of the stack. And more. Which is what it protects against, you normally get an access violation exception. But there's no guarantee, your code may accidentally write to a mapped page that belongs to, say, the heap. Which would produce an incredibly difficult to diagnose bug. Chkstk() helps you find these bugs before you blow your brains out in frustration.
I simply did it with this little test function:
void test()
{
char kaboom[1024*1024];
}
We can't see yours, but the exception says that you either have a large array as a local variable or you are passing a large value to _alloca(). Fix by allocating that array from the heap instead.
Most likely a stack corruption or recursion error but it's hard to answer without seeing any code

Transfer a pointer through boost::interprocess::message_queue

What I am trying to do is have application A send application B a pointer to an object which A has allocated on shared memory ( using boost::interprocess ). For that pointer transfer I intend to use boost::interprocess::message_queue. Obviously a direct raw pointer from A is not valid in B so I try to transfer an offset_ptr allocated on the shared memory. However that also does not seem to work.
Process A does this:
typedef offset_ptr<MyVector> MyVectorPtr;
MyVectorPtr * myvector;
myvector = segment->construct<MyVectorPtr>( boost::interprocess::anonymous_instance )();
*myvector = segment->construct<MyVector>( boost::interprocess::anonymous_instance )
(*alloc_inst_vec); ;
// myvector gets filled with data here
//Send on the message queue
mq->send(myvector, sizeof(MyVectorPtr), 0);
Process B does this:
// Create a "buffer" on this side of the queue
MyVectorPtr * myvector;
myvector = segment->construct<MyVectorPtr>( boost::interprocess::anonymous_instance )();
mq->receive( myvector, sizeof(MyVectorPtr), recvd_size, priority);
As I see it, in this way a do a bit copy of the offset pointer which invalidates him in process B. How do I do this right?
It seems you can address it as described in this post on the boost mailing list.
I agree there is some awkwardness here and offset_ptr doesn't really work for what you are trying to do. offset_ptr is useful if the pointer itself is stored inside of another class/struct which also is allocated in your shared memory segment, but generally you have some top-level item which is not a member of some object allocated in shared memory.
You'll notice the offset_ptr example kindof glosses over this - it just has a comment "Communicate list to other processes" with no details. In some cases you may have a single named top-level object and that name can be how you communicate it, but if you have an arbitrary number of top-level objects to communicate, it seems like just sending the offset from the shared memory's base address is the best you can do.
You calculate the offset on the sending in, send it, and then add to the base adddress on the receiving end. If you want to be able to send nullptr as well, you could do like offset_ptr does and agree that 1 is an offset that is sufficiently unlikely to be used, or pick another unlikely sentinel value.

Resources