Please help if You can.
I need to debug access to some variable in kernel module throw mechanism, that allow to print stack each time, when an access is required. I know that the easiest way is to use HW breakpoint (also knows as watchpoint), https://github.com/torvalds/linux/blob/master/kernel/events/hw_breakpoint.c that use processor debug register, wrote to it monitor address, and can generate interupt, that can be handled by call back function that do everything, that You need.
But unfortunately in my project uses linux kernel with older version, then this functionality was added. So i can not upgrade project kernel due to project limitations.
I found, that in kernel available watch.c:
https://github.com/torvalds/linux/blob/master/arch/mips/kernel/watch.c
Is this functions provide the similar functionality, like hw breakpoints, or no? I can not find documentation for this functions.
The root cause that i need this, is that somebody corrupt pointer to memory indirectly in kernel module, and as a result is "unaligned memory access kernel crash".
So maybe presents some another debug techniques, that can help find a part of code, that do this, as alternative to breakpoints (watchpoints)?
Thanks a lot in advance of any helpfully information.
Related
I wrote a kernel-mode driver using C. When I examined it using dependency walker I saw that it depends on some NT*.dll and HAL.dll.
I have several questions:
When does the OS load these DLLs? I thought kernel is responsible for loading DLLs in that case how can driver load a DLL if it is already in kernel-mode
Why don't the standard C dependencies show up like ucrtbase, concrt, vcruntime, msvcp etc? Would it be possible for a driver to have these dependencies and still function?
(A continuation of the last question). If Windows will still load DLLs even in kernel mode, I don't see why drivers cannot be written in (MS) C++
Thanks,
Most of the API in the driver is exported from ntoskrnl.exe.
Your driver is actually a "kernel module", which is part of a process, just like the modules in Ring3.
The driver's "process" is "System", with a Pid of 4, which you can see in the task manager.
ntoskrnl.exe and HAL.dll are modules in the "System", they will Loaded at system startup, while other modules are loaded at time of use (such as your drivers).
You can write and load "driver DLLs", but I haven't done so yet, so I can't answer that.
Ring3 modules are not loaded into the kernel, so you can't call many common Ring3 APIs, but Microsoft has mostly provided alternative APIs for them.
You can't load the Ring3 module directly into the kernel and call its export function. There may be some very complicated methods or tricks to do this, but it's really not necessary.
You can write drivers in C++, but this is not officially recommended by Microsoft at this time as it will encounter many problems, such as:
Constructors and destructors of global variables cannot be called automatically.
You can't use C++ standard libraries directly.
You can't use new and delete directly, they need to be overridden.
C++ exceptions cannot be used directly, and will consume a lot of stack space if you support them manually. Ring0 driver stack space is usually much smaller than Ring3 application stack space, indicating that BSOD may be caused.
Fortunately:
Some great people have solved most of the problems, such as the automatic calling of constructors and destructors and the use of standard libraries.
GitHub Project Link (But I still don't recommend using standard libraries in the kernel unless it's necessary, because they are too complex and large and can lead to some unanticipated issues)
My friend told me that Microsoft seems to have a small team currently trying to make drivers support C++. But I don't have time to confirm the veracity of this claim.
I have an embedded project which runs on a 68332 processor target (68k family). There is no OS on the target. We have a custom simulator that will allow our code to execute within Windows. The simulator is completely without our control to modify. Basically the simulator is executing the machine code which isn't very good when you need to debug. What I would really like to do is interface a debugger to allow us to debug at the source level rather than at the machine/assembly level. Has anyone ever done such a thing? Is there a spec that debuggers support? Perhaps would something like gdb work for this? Any advice is appreciated.
This is not necessarily an answer to your question - I'm not familiar with hooking up an existing 3rd-party debugger to a program executing inside a VM so I can't advise about that.
However, you control the source of your simulator so you can try implementing an interface (maybe a local socket, etc.) where your simulator keeps reporting status information about the code that's executing and links it up with source files by reading debug information from some generated debugging database. You'd likely have to support reading the debugging format of the compiler that compiles your 68k code and then use that information to link back assembly instructions to source code lines.
This way you're effectively implementing a debugger, but since you already have the simulator (a VM really), that's probably not too much of extra work - the simulator already has all state information about the executing 68k code, you just need a way to temporarily pause execution and extract state information during pause. Stepping through code after that is probably a trivial repeat of these steps.
There is a device driver for a camera device provided to us as a .so library file by the vendor.
Only the header file with API's is available which provides the list of functions that we can work with the device. Our application is linked with the .so library file provided by the vendor and uses the interface functions provided for our objective.
When we wanted to measure the time taken by our application in handling different tasks, we have added GCC -pg flag and compiled+built our application.
But we found that using this executable built with -pg, we are observing random failure in the camera image acquire functions. Since we are using the .so library file, we do not know what is going wrong inside that function.
So in general I wanted to understand what could be the possible reasons of such a failure mode. Any pointers or documents that can help what goes inside profiling and its side effects is appreciated.
This answer is a helpful overview of how the gcc -pg flag profiler actually works. The take-home point is mostly to do with possible changes to timing. If your library has any kind of time-sensitivity in it, introducing profiler overheads might be changing the time it takes to execute parts of the code, and perhaps violating some kind of constraint.
If you look at the gprof documentation, it would explain the implementation details:
Profiling works by changing how every function in your program is
compiled so that when it is called, it will stash away some
information about where it was called from. From this, the profiler
can figure out what function called it, and can count how many times
it was called. This change is made by the compiler when your program
is compiled with the `-pg' option, which causes every function to call
mcount (or _mcount, or __mcount, depending on the OS and compiler) as
one of its first operations.
So the timing of your application would change quite a bit when you turn on -pg.
If you would like to instrument your code without significantly affecting the timings, you could possibly look at oprofile. It does not pose as significant an overhead as gprof does.
Another fairly recent tool that serves as a good lightweight profiling tool is perf.
The profiling tools are useful primarily in understanding the CPU bound pieces of your library/application and can help you optimize those critical pieces. Most of the time they serve to identify some culprit function/method which wastes CPU cycles. So do not use it as the sole piece for debugging any and all issues.
Most vendor libraries would also provide means to turn on extra debugging or dumping extra information during runtime. They include means such as environment variables, log files, /proc or /sys interfaces for drivers, etc. and sometimes even tools to increase debugging levels at runtime. See if you can leverage these.
If you have defined APIs in a library/driver, you should run unit-tests on them instead of trying to debug the whole application you've built.
If you find a certain unit-test fails, send the source code of the unit-test to your vendor, and ask them to fix the bug. If it is not a bug, your vendor would at least point you towards the right set of APIs or the semantics to use.
Breakpoints are one of the coolest feature supported by most popular Debuggers like GDB. But how a breakpoint works ? What code modifications does the compiler do to achieve the breakpoint? Are there any special hardware features used to support breakpoints?
Compiler does not need to "modify" the binary in any way to support the breakpoints. However it is important, that:
Compiler includes enough information in the executable (that is not in the code itself but in special sections in same file), so that debugger can relate source that user wants to debug with machine code. One typical thing debugger needs to know to be able to set breakpoints (unless you specify addresses directly), is where (at which address) program functions and lines of source code start (within machine code).
Code is not optimized by compiler in any way, that makes it impossible to relate source and machine code. Typically you will want debug code that was not optimized or code where only carefully selected optimizations were performed.
The rest of work is then performed by debugger itself.
Software breakpoints don't necessarily need special hardware features. Debugger here relies on modifying original binary (it's copy that is loaded to memory). When you set a breakpoint, debugger will place special instruction at the location of breakpoint. This special instruction needs to somehow let debugger detect when it (this special instruction) is executing. This can be some instruction that causes some kind of interrupt/exception, that debugger can hook onto, or some instruction that handles the control to debug unit. If this runs under some OS, that OS needs to support modifying running program (with something like ptrace poke/peek). Downside of SW breakpoints is that debugger needs to be able to modify running program, which is not possible if program is running from some kind of read-only memory (quite common in embedded world).
Hardware breakpoints (which need to be supported by CPU) implement similar behavior without modifying program binary. This is CPU specific, but usually it lets you to at least define a program address at which execution should hit a breakpoint. CPU continuously compares current PC with these breakpoint addresses and once the condition is matched, it breaks the execution. Number of these breakpoints is always limited.
To put a break point first we have to add some special information in to the binary .We use the flag -g while compiling the c source files to include this info.The Software debugger actually use this info to put break points.The best example for hardware break point support is in VxWorks as I have experienced.
Basically at the break point the processor halts.So internally any step which will give an exception to processor can be used to put a software break point.While a Hardware break point works by matching the address stored in Hardware registers to cause an exception.So Hardware break point is very powerful but it is heavily architecture dependent.
A very good explanation is here
What is the difference between hardware and software breakpoints?
A good intro with Processor related information is given here
http://processors.wiki.ti.com/index.php/How_Do_Breakpoints_Work
I am just trying to understand the differences to patching into the kernel and writing a driver.
It is my understanding that a kernel mode driver can do anything the kernel can do, and is similar in some ways to a linux module.
Why then, were AV makers so upset when Microsoft stopped them from patching into the Windows kernel?
What kind of stuff can you do through kernel patching that you can't do through a driver?
In this context patching the kernel means modifying its (undocumented?) internal structures in order to achieve some functionality, typically hooking various functions (e.g. opening a file). You are not supposed to go messing around with internal kernel structures that do not belong to you. In the past Microsoft did not provide official hooks for some things, so security companies reverse engineered the internals and hooked the kernel directly. Recently Microsoft has provided official hooks for some things, so the need to hook the kernel directly is not as strong.
It's true that a kernel-mode driver can do anything the kernel can do - after all, they both run in ring 0. The key question here is: how difficult is it? Patching things relies on internal details that may change between different kernel releases. For example, the system call number of NtTerminateProcess will change between versions, so a driver which hooks the SSDT will break between versions (although the system call number can be obtained through other means). Reading or modifying fields of internal structures such as EPROCESS or ETHREAD is risky as well, because again, these structures change between versions. None of this is impossible for a driver to do, but it's hard.
If an official interface is provided for hooking, Microsoft can guarantee compatibility between versions as well as being able to control who can do what (e.g. only signed drivers can use the object manager callbacks). However, Microsoft can't do this for everything, because some things are just implementation details that drivers shouldn't know about.