How can I locate a process' global and stack areas in Win32? - winapi

How can I locate which areas of memory of a Win32 process contain the global data and the stack data for each thread?

There is no API (that I know of) to do this. But If you have a DLL in the process, then you will get DLL_PROCESS_ATTACH/DLL_THREAD_ATTACH notifications in DllMain when each thread is created. You can record the thread ID and the address of a stack object for that thread when you get these notifications, because you will get called on the new thread. So store the thread id and stack address in some table that you create at that time. Don't try to do a lot of work in DllMain, just record the stack location and return.
You can then use VirtualQuery to get turn the address of a variable on each thread stack into a virtual allocation range, that should give you the base address of the stack (remember that stacks grow from high addresses to low addresses). The default allocation size for a stack is 1Mb, but that can be overridden by a linker switch or by the thread creator, but a stack must be contiguous. So what you get back from VirtualQuery will be the full stack at that point in time
As for the heap location - there can be multiple locations for the heap, but in general if you want to assume a single contigous heap location then use HeapAlloc to get the address of a heap object and then VirtualQuery to determine the range of pages for that section of the heap.
Alternatively You can use VirtualQuery on the hModule for the EXE and for each DLL. and then you can assume that anything that is read-write and isn't a stack or a module is part of the heap. Note that this will be true in most processes, but may not be true in some because an application can call VirtualAlloc or CreateFileMapping directly, resulting in valid data pointers that are not from either stack or heap.
Use EnumProcessModules to get the the list of modules loaded into a process.
VirtualQuery basically takes a random address, and returns the base address of the collection of pages that that address belongs to, as well as the page protections. So it's good for going from a specific pointer which 'type' of allocation.

Take the address of variables allocated in the memory regions you are interested. What you do with the addresses when you have them is another question entirely.
You can also objdump -h (I think it's -h, might be -x) to list the section addresses, including data sections.

Global Data
By "Global" I'm going to assume you mean all the data that is not dynamically allocated using new, malloc, HeapAlloc, VirtualAlloc etc - the data that you may declare in your source code that is outside of functions and outside of class definitions.
You can locate these by loading each DLL as a PE File in a PE file reader and determining the locations of the .data and .bss sections (these may have different names for different compilers). You need to do this for each DLL. That gives you the general locations for this data for each DLL. Then, if you have debugging information, or failing that, a MAP file, you can map the DLL addresses against the debug info/mapfile info to get names and exact locations for each variable.
You may find the PE Format DLL helps you perform this task much easier than writing the code to query the PE file yourself.
Thread Stacks
Enumerate the threads in the application using ToolHelp32 (or PSAPI library if on Windows NT 4). For each thread, get the thread context and read the ESP register (RSP for x64). Now do a VirtualQuery on the address in the ESP/RSP register read from each context. The 1MB (default value) region around that address (start at mbi.AllocationBase and work upwards 1MB) is the stack location. Note that the stack size may not be 1MB, you can query this from the PE header of the DLL/EXE that started the thread if you wish.
EDIT, Fix typo where I swapped some register names, thanks #interjay

Related

What happens in the kernel when the process accesses an address just allocated with brk/sbrk?

This is actually a theoretical question about memory management. Since different operating systems implement things differently, I'll have to relieve my thirst for knowledge asking how things work in only one of them :( Preferably the open source and widely used one: Linux.
Here is the list of things I know in the whole puzzle:
malloc() is user space. libc is responsible for the syscall job (calling brk/sbrk/mmap...). It manages to get big chunks of memory, described by ranges of virtual addresses. The library slices these chunks and manages to respond the user application requests.
I know what brk/sbrk syscalls do. I know what 'program break' means. These calls basically push the program break offset. And this is how libc gets its virtual memory chunks.
Now that user application has a new virtual address to manipulate, it simply writes some value to it. Like: *allocated_integer = 5;. Ok. Now, what? If brk/sbrk only updates offsets in the process' entry in the process table, or whatever, how the physical memory is actually allocated?
I know about virtual memory, page tables, page faults, etc. But I wanna know exactly how these things are related to this situation that I depicted. For example: is the process' page table modified? How? When? A page fault occurs? When? Why? With what purpose? When is this 'buddy algorithm' called, and this free_area data structure accessed? (http://www.tldp.org/LDP/tlk/mm/memory.html, section 3.4.1 Page Allocation)
Well, after finally finding an excellent guide (http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory/) and some hours digging the Linux kernel, I found the answers...
Indeed, brk only pushes the virtual memory area.
When the user application hits *allocated_integer = 5;, a page fault occurs.
The page fault routine will search for the virtual memory area responsible for the address and then call the page table handler.
The page table handler goes through each level (2 levels in x86 and 4 levels in x86_64), allocating entries if they're not present (2nd, 3rd and 4th), and then finally calls the real handler.
The real handler actually calls the function responsible for allocating page frames.

How does a PE file get mapped into memory?

So I have been reasearching the PE format for the last couple days, and I still have a couple of questions
Does the data section get mapped into the process' memory, or does the program read it from the disk?
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Does the data section get mapped into the process' memory
Yes. That's unlikely to survive for very long, the program is apt to write to that section. Which triggers a copy-on-write page copy that gets the page backed by the paging file instead of the PE file.
how can the process aqquire the offset of the section?
The linker already calculated the offsets of variables in the section. It might be relocated, common for DLLs that have an awkward base address that's already in use when the DLL gets loaded. In which case the relocation table in the PE file is used by the loader to patch the addresses in the code. The pages that contain such patched code get the same treatment as the data section, they are no longer backed by the PE file and cannot be shared between processes.
Is there any way the get the entry point of a process
The entire PE file gets mapped to memory, including its headers. So you can certainly read IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint from memory without reading the file. Do keep in mind that it is painful if you do this for another process since you don't have direct access to its virtual address space. You'd have to use ReadProcessMemory(), that's fairly little joy and unlikely to be faster than reading the file. The file is pretty likely to be present in the file system cache. The Address Space Layout Randomization feature is apt to give you a headache, designed to make it hard to do these kind of things.
Does the data section get mapped into the process' memory, or does the program read it from the disk?
It's mapped into process' memory.
If it does get mapped into its memory, how can the process aqquire the offset of the section? ( And other sections )
By means of a relocation table: every reference to a global object (data or function) from the executable code, that uses direct addressing, has an entry in this table so that the loader patches the code, fixing the original offset. Note that you can make a PE file without relocation section, in which case all data and code sections have a fixed offset, and the executable has a fixed entry point.
Is there any way the get the entry point of a process that has already been mapped into the memory, without touching the file on disk?
Not sure, but if by "not touching" you mean not even reading the file, then you may figure it out by walking up the stack.
Yes, all sections that are described in the PE header get mapped into memory. The IMAGE_SECTION_HEADER struct tells the loader how to map it (the section can for example be much bigger in memory than on disk).
I'm not quite sure if I understand what you are asking. Do you mean how does code from the code section know where to access data in the data section? If the module loads at the preferred load address then the addresses that are generated statically by the linker are correct, otherwise the loader fixes the addresses with relocation information.
Yes, the windows loader also loads the PE Header into memory at the base address of the module. There you can file all the info that was in the file PE header - also the Entry Point.
I can recommend this article for everything about the PE format, especially on relocations.
Does the data section get mapped into the process' memory, or does the
program read it from the disk?
Yes, everything before execution by the dynamic loader of operating systems either Windows or Linux must be mapped into memory.
If it does get mapped into its memory, how can the process acquire the
offset of the section? ( And other sections )
PE file has a well-defined structure which loader use that information and also parse that information to acquire the relative virtual address of sections around ImageBase. Also, if ASLR - Address randomization feature - was activated on the system, the loader has to use relocation information to resolve those offsets.
Is there any way the get the entry point of a process that has already
been mapped into the memory, without touching the file on disk?
NOPE, the loader of the operating system for calculation of OEP uses ImageBase + EntryPoint member values of the optional header structure and in some particular places when Address randomization is enabled, It uses relocation table to resolve all addresses. So we can't do anything without parsing of PE file on the disk.

windows memory segmentation & Ollydbg

a few questions about windows memory segmentation.
every process in windows got his own virtual memory. does it mean that each each process has it own task
(I mean own Task descriptor or Task gate) ?
I opened a simple exe with ollydbg and I saw that for each CALL intruction to a dll function is taking me to the jumping table. the jumping table had jumping instructions to the DLLs like this one :
JMP DWORD PTR DS:[402058]
my question is why its uses the data segment and not the CS selector for the base address?
if I open the memory map and find what stored at 402058 I find that it containes resorces.
if I understand correctly the addresses of the DLL function stored in the DS ?
I noticed that the memory map is organized by owner. shouldn't it be organized with segments like all the code be in CS data in DS etc ?
thank you
1.
A Process has it's own virtual address space.
I do not understand what you're referring to as "Task descriptor or Task gate", but the Windows operating system holds a descriptor for each process, called the Process Control Block, that contains information about the process (such as identification, access tokens, execution state, virtual memory mapping, etc).
A Task is a logical unit that can be used to manage a single process, or multiple processes.
Job -> Tasks
Task -> Processes
Process -> Threads
2.
In the case you mentioned, which is common for compilers, the program uses the .DATA section to store the jump table after loading the function addresses.
The reason why this happens in the first place is because the compiler cannot know the DLL base address at compile-time, therefore the address has to be fixed at load-time to point to the function. This is known as Relocation.
In order to maintain the jump table seperately from the code, compilers store it in the .DATA section. This way, we can also give it write permissions (usually the .DATA segment has write permissions) and modify it as necessary without sacrificing stability and security.
3.
Each module loaded in the process' virtual address space contains it's own sections - that's why you see a different set of .text, .data, .reloc etc for each module. The "Owner" column is the module name.
P.S. Please ask one question per post - that way it will be easily accesible by other users after you get answered, and each question will likely get more accurate answers.

Sharing GlobalAlloc() memory from DLL to multiple Win32 applications

I want to move my caching library to a DLL and allow multiple applications to share a single pointer allocated within the DLL using GlobalAlloc(). How could I accomplish this, and would it result in a significant performance decrease?
You could certainly do this and there won't be any performance implication for a single pointer.
Rather than use GlobalAlloc, a legacy API, you should opt for a different shared heap. For example the simplest to use is the COM allocator, CoTaskMemAlloc. Or you can use HeapAlloc passing the process heap obtained by GetProcessHeap.
For example, and neglecting to show error checking:
void *mem = HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, size);
Note that you only need to worry about heap sharing if you expect the memory to be deallocated in a different module from where it was created. If your DLL both creates and destroys the memory then you can use plain old malloc. Because all modules live in the same process address space, memory allocated by any module in that process, can be used by any other module.
Update
I failed on first reading of the question to pick up on the possibility that you may be wanting multiple process to have access to the same memory. If that's what you need then it is only possible with memory mapped files, or perhaps with some form of IPC.

Is there any way to determine what type of memory the segments returned by VirtualQuery() are?

Greetings,
I'm able to walk a processes memory map using logic like this:
MEMORY_BASIC_INFORMATION mbi;
void *lpAddress=(void*)0;
while (VirtualQuery(lpAddress,&mbi,sizeof(mbi))) {
fprintf(fptr,"Mem base:%-10x start:%-10x Size:%-10x Type:%-10x State:%-10x\n",
mbi.AllocationBase,
mbi.BaseAddress,
mbi.RegionSize,
mbi.Type,mbi.State);
lpAddress=(void *)((unsigned int)mbi.BaseAddress + (unsigned int)mbi.RegionSize);
}
I'd like to know if a given segment is used for static allocation, stack, and/or heap and/or other?
Is there any way to determine that?
I'm curious, what do you plan on doing with this information?
There is a windbg extension, !address, which can get you this information, if you don't need code to do it. Scripting the debugger will probably be much more reliable to get this info.
VirtualQuery can't return this information to you on its own, since it has no idea why user mode code requested the memory. You need to use it with other information sources to get this info, and there may still be some error cases.
First, you should filter only by MEM_PRIVATE memory . . . heap, stack, and static allocations (provided they've been modified) should fall within that range.
Static allocations (globals, etc.) should be at an address with a loaded module. You can use PSAPI to determine if the address is within a loaded module, for example, calling EnumProcessModules and then GetModuleInformation.
Stack values, you can use the toolhelp API to determine if the memory location is in a stack. CreateToolhelp32Snapshot with TH32CS_SNAPSHOT to get threads in the target process, then GetThreadContext and check if the resulting stack pointer is within the segment.
I'm don't know of a good way to walk heaps from outside the process. Toolhelp snaps a heap list but doesn't give you a good set of bounds for the heap memory. From within the process, you can use GetProcessHeaps to walk the list of heaps, and then call HeapValidate to dtermie if the memory location is within the heap.

Resources