Problems in reading memory while analyzing Windows kernel crash dumps

Problems in reading memory while analyzing Windows kernel crash dumps - windows

While analyzing Windows Kernel crash dumps using WinDBG, I have often faced a problem of WinDBG not able to read some memory location. Recently while analyzing a Kernel crash dump (minidump file), I observed that there were six stack variables (including two parameters), out of those WinDBG successfully dumped the values of four stack variables but it was returning for other two variable. I could not understand this because all the six variables were part of same stack frame.
In addition to that, I noticed that when I tried to dump a global data structure, WinDBG returned me an error indicating "Unable to read memory at Address 0xfffff801139c50d0". I could not understand why WinDBG could not read a variable which had been defined globally in my driver.
I had loaded the symbols properly, including the PDB file of my driver. WinDBG did not give me any symbols related error.
I want to understand the reason for this behavior. Why does WinDBG fail to read the value of local and global variables? Can someone give me an explanation for this behavior?

Assuming you already have access to private symbols, this error is commonly caused by code optimization in the driver, and the PDBs do not have enough information to determine correction location of variables at all times.
Use !lmi <module name>and check if characteristics field has "perf" to determine of code is optimized.
As advised in Debugging Performance Optimized Code The resulting optimization reduces paging (and page faults), and increases spatial locality between code and data. It addresses a key performance bottleneck that would be introduced by poor positioning of the original code. A component that has gone through this optimization may have its code or data block within a function moved to different locations of the binary.
In modules that have been optimized by these techniques, the locations of code and data blocks will often be found at memory addresses different than the locations where they would reside after normal compilation and linking. Furthermore, functions may have been split into many non-contiguous blocks, in order that the most commonly-used code paths can be located close to each other on the same pages.
Therefore, a function (or any symbol) plus an offset will not necessarily have the same meaning it would have in non-optimized code. The rule of thumb when working with performance-optimized codes is simply that you cannot perform reliable address arithmetic on optimized code.
You need to check the output of dv /V to determine where the debugger is actually looking for locals, and confirm this is correct.

Related

How to reserve a range of memory in data section (RAM) and the prevent heap/stack of same application using that memory?

I want to reserve/allocate a range of memory in RAM and the same application should not overwrite or use that range of memory for heap/stack storage. How to allocate a range of memory in ram protected from stack/heap overwrite?
I thought about adding(or allocating) an array to the application itself and reserve memory, But its optimized out by compiler as its not referenced anywhere in the application.
I am using ARM GNU toolchain for compiling.

There are several solutions to this problem. Listing in best to worse order,
Use the linker
Annotate the variable
Global scope
Volatile (maybe)
Linker script
You can obviously use a linker file to do this. It is the proper tool for the job. Pass the linker the --verbose parameter to see what the default script is. You may then modify it to precisely reserve the memory.
Variable Attributes
With more recent versions of gcc, the attribute used will also do what you want. Most modern gcc versions will support this. It is also significantly easier than the linker script; but only the linker script gives precise control over the position of the hole in a reliable manner.
Global scope
You may also give your array global scope and the compiler should not eliminate it. This may not be true if you use link time optimization.
Volatile
Theoretically, a compiler may eliminate a static volatile array. The volatile comes into play when you have code involving the array. It modifies the access behavior so the compiler will never caches access to that range. Dr. Dobbs on volatile At least the behavior is unclear to me and I would not recommend this method. It may work with some versions (and optimization levels) of the compiler and not others.
Limitations
Also, the linker option -gc-sections, can eliminate space reserved with either the global scope and the volatile methods as the symbol may not be annotated in any way in object formats; see the linker script (KEEP).
Only the Linker script can definitely restrict over-writes by the stack. You need to position the top of the stack before your reserved area. Typically, the heap grows up and the stack grows down. So these two collide with each other. This is particular to your environment/C library (for instance newlib is the typical ARM bare metal library). Looking at the linker file will give the best clue to this.
My guess is you want a fallow area to reserve for some sort of debugging information in the event of a system crash? A more explicit explaination of you problem would be helpful. You don't seem to be concerned with the position of the memory, so I guess this is not hardware related.

Reading huge files using Memory Mapped Files

I see many articles suggesting not to map huge files as mmap files so the virtual address space won't be taken solely by the mmap.
How does that change with 64 bit process where the address space dramatically increases?
If I need to randomly access a file, is there a reason not to map the whole file at once? (dozens of GBs file)

On 64bit, go ahead and map the file.
One thing to consider, based on Linux experience: if the access is truly random and the file is much bigger than you can expect to cache in RAM (so the chances of hitting a page again are slim) then it can be worth specifying MADV_RANDOM to madvise to stop the accumulation of hit file pages steadily and pointlessly swapping other actually useful stuff out. No idea what the windows equivalent API is though.

There's a reason to think carefully of using memory-mapped files, even on 64-bit platform (where virtual address space size is not an issue). It's related to the (potential) error handling.
When reading the file "conventionally" - any I/O error is reported by the appropriate function return value. The rest of error handling is up to you.
OTOH if the error arises during the implicit I/O (resulting from the page fault and attempt to load the needed file portion into the appropriate memory page) - the error handling mechanism depends on the OS.
In Windows the error handling is performed via SEH - so-called "structured exception handling". The exception propagates to the user mode (application's code) where you have a chance to handle it properly. The proper handling requires you to compile with the appropriate exception handling settings in the compiler (to guarantee the invocation of the destructors, if applicable).
I don't know how the error handling is performed in unix/linux though.
P.S. I don't say don't use memory-mapped files. I say do this carefully

One thing to be aware of is that memory mapping requires big contiguous chunks of (virtual) memory when the mapping is created; on a 32-bit system this particularly sucks because on a loaded system, getting long runs of contiguous ram is unlikely and the mapping will fail. On a 64-bit system this is much easier as the upper bound of 64-bit is... huge.
If you are running code in controlled environments (e.g. 64-bit server environments you are building yourself and know to run this code just fine) go ahead and map the entire file and just deal with it.
If you are trying to write general purpose code that will be in software that could run on any number of types of configurations, you'll want to stick to a smaller chunked mapping strategy. For example, mapping large files to collections of 1GB chunks and having an abstraction layer that takes operations like read(offset) and converts them to the offset in the right chunk before performing the op.
Hope that helps.

Forcing Windows to load DLL's at places so that memory is minimally fragmented

My application needs lots of memory and big data structure in order to perform its work.
Often the application needs more than 1 GB of memory, and in some cases my customers really need to use the 64-bit version of the application because they have several gigabytes of memory.
In the past, I could easily explain to the user that if memory reached 1.6 to 1.7 GB of memory usage, it was 'out of memory' or really close to an 'out of memory' situation, and that they needed to reduce their memory or move to a 64-bit version.
The last year I noticed that often the application only uses about 1 GB before it already runs out of memory. After some investigations it seemed that the cause of this problem is memory fragmentation. I used VMMAP (a SysInternals utility) to look at the memory usage of my application and saw something like this:
The orange areas are memory allocated by my application. The purple areas are executable code.
As you can see in the bottom halve of the image, the purple areas (which are the DLL's) are loaded at many different addresses, causing my memory to be fragmented. This isn't really a problem if my customer does not have a lot of data, but if my customer has data sets that take more than 1 GB, and a part of the application needs a big block of memory (e.g. 50 MB), it can result in a memory allocation failure, causing my application to crash.
Most of my data structures are STL-based and often don't require big chunks of contiguous memory, but in some cases (e.g. very big strings), it is really needed to have a contiguous block of memory. Unfortunately, it is not always possible to change the code so that it doesn't need such a contiguous block of memory.
The questions are:
How can I influence the location where DLL's are loaded in memory, without explicitly using REBASE on all the DLL's on the customer's computer, or without loading all DLL's explicitly.
Is there a way to specify load addresses of DLL's in your own application manifest file?
Or is there a way to tell Windows (via the manifest file?) to not scatter the DLL's around (I think this scattering is called ASLR).
Of course the best solution is one that I can influence from within my application's manifest file, since I rely on the automatic/dynamic loading of DLL's by Windows.
My application is a mixed mode (managed+unmanaged) application, although the major part of the application is unmanaged.
Anyone suggestions?

First, your virtual address space fragmentation should not necessarily cause the out-of-memory condition. This would be the case if your application had to allocate contiguous memory blocks of the appropriate size. Otherwise the impact of the fragmentation should be minor.
You say most of your data is "STL-based", but if for example you allocate a huge std::vector you'll need a contiguous memory block.
AFAIK there is no way to specify the preferred mapping address of the DLL upon its load. So that there are only two options: either rebase it (the DLL file), or implement DLL loading yourself (which is not trivial of course).
Usually you don't need to rebase the standard Windows API DLLs, they are loaded at your address space very tightly. Fragmentation may arrive from some 3rd-party DLLs (such as windows hooks, antivirus injections and etc.)

You cannot do this with a manifest, it must be done by the linker's /BASE option. Linker + Advanced + Base address in the IDE. The most flexible way is to use the /BASE:#filename,key syntax so that the linker reads the base address from a text file.
The best way to get the text file filled is from the Debug + Windows + Modules window. Get the Release build of your program loaded in the debugger and load up the whole shebang. Debug + Break All, bring up the window and copy-paste it into the text file. Edit it to match the required format, calculating load addresses from the Address column. Leave enough space between the DLLs so that you don't constantly have to tweak the text file.

If you are able to execute some of your own code before the libraries in question are loaded, you could reserve yourself a nice big chunk of address space ahead of time to allocate from.
Otherwise, you need to identify the DLLs responsible, to determine why they are being loaded. For example, are they part of .NET, the language's runtime library, your own code, or third party libraries you are using?
For your own code the most sensible solution is probably to use static instead of dynamic linking. This should also be possible for the language runtime and may be possible for third party libraries.
For third party libraries, you can change from using implicit loading to explicit loading, so that loading only takes place after you've reserved your chunk of address space.
I don't know if there's anything you can do about .NET libraries; but since most of your code is unmanaged, it might be possible to eliminate the managed components in order to get rid of .NET. Or perhaps you could split the .NET parts into a separate process.

Estimating the memory size of a software

I'm working on the development of Boot that will be embedded in a PROM chip for a project.
I was tasked with making an estimation of the final memory size that the software will probably take but I've never done this before.
I searched a bit around and I'm thinking about doing the following:
Counting all the variables, this size goes directly to the size total
Estimating a number of line of codes each function will take (the code hasn't been written yet)
Finding out an approximate number of asm instruction per c instruction
Total size = Total nb line of codes * avg asm instruction per c instruction * 32bit
My solution could very well be bogus, I hope someone will be able to help.

On principle - You are on the right track:
You need to distinguish between several types of memory footprint:
Stack
Dynamic memory (malloc, new, etc.)
Initialised variables
Un-initialised variables
Code
Stack is mostly impacted by recursion, local variables and function parameters.
Dynamic memory (heap) is obvious and also probably not relevant to you - so I'll ignore it for now.
Initialised variables are interesting since you need to count them twice - once for the program footprint on the PROM (similar to code and constants) and once for the RAM footprint.
Un-initialised variables obviously go toward the RAM and counting the size is almost good enough (you also need to consider alignment and padding.
The hardest to estimate is code or what goes into PROM, you need to count constants and local variables as well as the code, the code itself is more or less what you suspect (after adding padding, alignment, function call overhead, interrupt vector initialisation etc.) but many things can make it larger than expected, such as inline functions, library functions (many seemingly trivial operations involve such functions), casting etc.

On way of answering the question would be from experience or assessment of existing code with similar functionality. However there will be a number of factors that affect code size:
Target architecture and instruction set.
Compiler and compiler options used.
Library code usage.
Capability of development staff.
Required functionality.
The "development of Boot" tells us nothing about the requirements or functionality of your boot process. This will have the greatest affect on code size. As an example of how target can make a difference, 8-bit targets typically have greater code density, but generate more code for arithmetic on larger data types, while on say an ARM target where you can select between Thumb and ARM instruction sets, the code density will change significantly.
If you have no prior experience or representative code base to work from, then I suggest you perform a few experiments to get some metrics you can work with:
Build an empty application - just an empty main() function if C or C++; that will give you the basic fixed overhead of the runtime start-up.
If you are using library code, that will probably take a significant amount of space; add dummy calls to all library interfaces you will make use of in the final application, that will tell you how much code will be taken up by library code (assuming the library code is not in-lined).
Thereafter it will depend on functionality; you might implement a subset of the required functionality, and then estimate what proportion of the final build that might constitute.
Regarding your suggestions, remember that variables do not occupy space in ROM, though any constant initialisers will do so. Typically a boot-loader can use all available RAM because the application start-up will re-establish a new runtime environment for itself, discarding the boot-loader environment and variables.
If you were to provide details of functionality and target, you may be able to leverage the experience of the community in estimating the required resources. For example I might be able to tell you (from experience) that a boot-loader with support for Flash programming that loads via a UART using XMODEM protocol on an ARM7 using ARM instruction set will fit in 4k Bytes, or that adding support for loading via SD card may add a further 6Kb, and say USB Virtual Comm Port a further 4Kb. However your requirements are possibly unique and you will have to determine the resource load for yourself somehow.

Why code segment is common for different instances of same program

I wanted to know why code segment is common for different instances of same program.
For example: consider program P1.exe running, if another copy of P1.exe is running, code segment will be common for both running instances. Why is it so?

If the code segment in question is loaded from a DLL, it might be the operating system being clever and re-using the already loaded library. This is one of the core points of using dynamically loaded library code, it allows the code to be shared across multiple processes.
Not sure if Windows is clever enough to do this with the code sections of regular EXE files, but it would make sense if possible.
It could also be virtual memory fooling you; two processes can look like they have the same thing on the same address, but that address is virtual, so they really are just showing mappings of physical memory.

Code is typically read-only, so it would be wasteful to make multiple copies of it.
Also, Windows (at least, I can't speak for other OS's at this level) uses the paging infrastructure to page code in and out direct from the executable file, as if it were a paging file. Since you are dealing with the same executable, it is paging from the same location to the same location.
Self-modifying code is effectively no longer supported by modern operating systems. Generating new code is possible (by setting the correct flags when allocating memory) but this is separate from the original code segment.

The code segment is (supposed to be) static (does not change) so there is no reason not to use it for several instances.

Just to start at a basic level, Segmentation is just a way to implement memory isolation and partitioning. Paging is another way to achieve this. For the most part, anything you can achieve via segmentation, you can be achieve via paging. As such, most modern operating systems on the x86 forego using segmentation at all, instead relying completely on paging facilities.
Because of this, all processes will usually be running under the trivial segment of (Base = 0, Limit = 4GB, Privilege level = 3), which means the code/data segment registers play no real part in determining the physical address, and are just used to set the privilege level of the process. All processes will usually be run at the same privilege, so they should all have the same value in the segment register.
Edit
Maybe I misinterpreted the question. I thought the question author was asking why both processes have the same value in the code segment register.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio