How to find stuff in the kernel - linux-kernel

I'm doing various tasks on the linux kernel, and I end up reading source code from time to time. I haven't really needed to change the kernel yet (I'm good with so called "Loadable Kernel Modules") so I didn't download the source of the kernel, just using http://lxr.free-electrons.com/ . And quite a lot I find myself finding a function that has many implementations, and start guessing which one is the one I need.
For example, I looked at the file Linux/virt/kvm/kvm_main.c at line 496 is a call to list_add, a click on it gives me two options: drivers/gpu/drm/radeon/mkregtable.c, line 84 and include/linux/list.h, line 60 - It's quite clear that kvm will not send my to something under "gpu" but this is not always the case. I have looked at the includes of the file - was not much help.
So my questions: Given a file from the kernel, and a function call at line ###, what is the nicest way to find where one function call actually continues?
(I'll be happy to hear also about ways that don't include the website and\or require me to download the source code)

There are many things in kernel that are #define'd or typedef'd or functions mapped inside structs (the fop struct in the drivers). So, there's no easy way to browse the kernel source. lxr site helps you but it can't go any further when you encounter any of the above data structs. The same is with using cscope/ctags. The best way though, despite you explicitly mentioning against it, is to download the source and browse through it.
Another method would be to use kgdb and inspect the code function by function, but that requires you to have some knowledge of the functions where you want to step in or not, to save a lot of time. And last but not the least, increase the kernel log level, and print the logs that are accessible through dmesg. But these all require you to have a kernel source.

Related

Suspend program execution if syscall with specific parameters called (GDB / strace)

Is there a straigtforward way with ready-at-hand tooling to suspend a traced process' execution when a certain syscalls are called with specific parameters? Specifically I want to suspend program execution whenever
stat("/${SOME_PATH}")
or
readlink("/${SOME_PATH}")
are called. I aim to then attach a debugger, so that I can identify which of the hundreds of shared objects that are linked into the process is trying to access that specific path.
strace shows me the syscalls alright, and gdb does the rest. The question is, how to bring them together. This surely can be solved with custom glue-scripting, but I'd rather use a clean solution.
The problem at hand is a 3rd party toolsuite which is available only in binary form and which distribution package completely violates the LSB/FHS and good manners and places shared objects all over the filesystem, some of which are loaded from unconfigurable paths. I'd like to identify which modules of the toolsuite try to do this and either patch the binaries or to file an issue with the vendor.
This is the approach that I use for similar condition in windows debugging. Even though I think it should be possible for you too, I have not tried it with gdb in linux.
When you attached your process, set breakpoint on your system call which is for example stat in your case.
Add a condition based on esp to your breakpoint. For example you want to check stat("/$te"). value at [esp+4] should point to address of string which in this case is "/$te". Then add a condition like: *(uint32_t*)[esp+4] == "/$te". It seems that you can use strcmp() in your condition too as described here.
I think something similar to this should work for you too.

Windows: redirect ReadFile to run process and pipe it's stdout

I was wondering how hard it would be to create a set-up under Windows where a regular ReadFile on certain files is being redirected by the file system to actually run (e.g. ShellExecute) those files, and then the new process' stdout is being used as the file content streamed out to the ReadFile call to the callee...
What I envision the set-up to look like, is that you can configure it to denote a certain folder as 'special', and that this extra functionality is then only available on that folder's content (so it doesn't need to be disk-wide). It might be accessible under a new drive letter, or a path parallel to the source folder; the location it is hooked up to is irrelevant to me.
To those of you that wonder if this is a classic xy problem: it might very well be ;) It's just that this idea has intrigued me, and I want to know what possibilities there are. In my particular case I want to employ it to #include content in my C++ code base, where the actual content included is being made up on the spot, different on each compile round. I could of course also create a script to create such content to include, call it as a pre-build step and leave it at that, but why choose the easy route.
Maybe there are already ready-made solutions for this? I did an extensive Google search for it, but came out empty handed. But then I'm not sure I already know all the keywords involved to do a good search...
When coding up something myself, I think a minifilter driver might be needed intercepting ReadFile calls, but then it must at that spot run usermode apps from kernel space - not a happy marriage I assume. Or use an existing file system driver framework that allows for usermode parts, but I found the price of existing solutions to be too steep for my taste (several thousand dollars).
And I also assume that a standard file system (minifilter) driver might be required to return a consistent file size for such files, although the actual data size returned through ReadFile would of course differ on each call. Not to mention negating any buffering that takes place.
All in all I think that a create-it-yourself solution will take quite some effort, especially when you have never done Windows driver development in your life :) Although I see myself quite capable of learning up on it, the time invested will be prohibitive I think.
Another approach might be to hook ReadFile calls from the process doing the ReadFile - via IAT hooking, or via code injection. But I want this solution to more work 'out-of-the-box', i.e. all ReadFile requests for these special files trigger the correct behavior, regardless of origin. In my case I'd need to intercept my C++ compiler (G++) behavior, but that one is called on the fly by the IDE, so I see no easy way to detect it's startup and hook it up quickly before it does it's ReadFiles. And besides, I only want certain files to be special in this regard; intercepting all ReadFiles for a certain process is overkill.
You want something like FUSE (which I used with profit many times), but for Windows. Apparently there's Dokan, I've never used it but seems to be well known enough (and, at very least, can be used as an inspiration to see "how it's done").

What is the ".scatterload" file in macOS?

I've downloaded Apple's TextEdit example app (here) and I'm a bit puzzled by one thing I see there: the TextEdit.scatterload file. It contains a list of functions and methods. My guess is that it provides information to the linker as to which functions/methods will be needed, and in what order, when the app launches, and that this is used to order the binary generated by the linker for maximum efficiency. Oddly, I seem to be unable to find any information whatsoever about this file through Google. So. First of all, is my guess as to the function of this file correct? And second, if so, can I generate a .scatterload file for my own macOS app, to make it launch faster? How would I do that? Seems like a good idea! (I am using Objective-C, but perhaps this question is not specific to that, so I'm not going to tag for it here.)
Scatter loading refers to a way to organize the mapping of your code in memory by specifying which part of code must be near which one, etc. This is to optimize page faults, etc.
You can read about it here Improving locality of reference (HTML)
or here Improving locality of reference (PDF).
.scatterload file is used by the linker to position code in memory layout of the executable.
Except if your app really need tight performance tuning, I would not encourage you to have a look at this.

How to detect who's issuing a wrong kfree

I am suspecting a double kfree in my kernel code. Basically, I have a data structure that is kzalloced and kfreed in a module. I notice that the same address is allocated and then allocated again without being freed in the module.
I would like to know what technique should I employ in finding where the wrong kfree is issued.
1.
Yes, kmemleak is an excellent tool, especially suitable for system-wide analysis.
Note that if you are going to use it to analyze a kernel module, you may need to save the addresses of the ELF sections containing the code of the module (.text, .init.text, ...) when the module is loaded. This may help you decipher the call stacks in the kmemleak's report. It usually makes sense to ask kmemleak to produce a report after the module has been unloaded but kmemleak cannot resolve the addresses at that time.
While a module is loaded, the addresses fo its sections can be found in the files in /sys/module/<module_name>/sections/.
After you have found the section each code address in the report belongs to and the corresponding offset into that section, you can use objdump, gdb, addr2line or a similar tool to obtain more detailed information about where the event of interest occurred.
2.
Besides that, if you are working on an x86 system and you would like to analyze a single kernel module, you can also use KEDR LeakCheck tool.
Unlike kmemleak, most of the time, it is not required to rebuild the kernel to be able to use KEDR.
The instructions on how to build and use KEDR are here. A simple example of how LeakCheck can be used is described in "Detecting Memory Leaks" section.
Have you tried enabling the kmemleak detection code?
See Documentation/kmemleak.txt for details.

LoadLibrary from offset in a file

I am writing a scriptable game engine, for which I have a large number of classes that perform various tasks. The size of the engine is growing rapidly, and so I thought of splitting the large executable up into dll modules so that only the components that the game writer actually uses can be included. When the user compiles their game (which is to say their script), I want the correct dll's to be part of the final executable. I already have quite a bit of overlay data, so I figured I might be able to store the dll's as part of this block. My question boils down to this:
Is it possible to trick LoadLibrary to start reading the file at a certain offset? That would save me from having to either extract the dll into a temporary file which is not clean, or alternatively scrapping the automatic inclusion of dll's altogether and simply instructing my users to package the dll's along with their games.
Initially I thought of going for the "load dll from memory" approach but rejected it on grounds of portability and simply because it seems like such a horrible hack.
Any thoughts?
Kind regards,
Philip Bennefall
You are trying to solve a problem that doesn't exist. Loading a DLL doesn't actually require any physical memory. Windows creates a memory mapped file for the DLL content. Code from the DLL only ever gets loaded when your program calls that code. Unused code doesn't require any system resources beyond reserved memory pages. You have 2 billion bytes worth of that on a 32-bit operating system. You have to write a lot of code to consume them all, 50 megabytes of machine code is already a very large program.
The memory mapping is also the reason you cannot make LoadLibrary() do what you want to do. There is no realistic scenario where you need to.
Look into the linker's /DELAYLOAD option to improve startup performance.
I think every solution for that task is "horrible hack" and nothing more.
Simplest way that I see is create your own virtual drive that present custom filesystem and hacks system access path from one real file (compilation of your libraries) to multiple separate DLL-s. For example like TrueCrypt does (it's open-source). And than you may use LoadLibrary function without changes.
But only right way I see is change your task and don't use this approach. I think you need to create your own script interpreter and compiler, using structures, pointers and so on.
The main thing is that I don't understand your benefit from use of libraries. I think any compiled code in current time does not weigh so much and may be packed very good. Any other resources may be loaded dynamically at first call. All you need to do is to organize the working cycles of all components of the script engine in right way.

Resources