Better understanding of the vmcore collection process - linux-kernel

I need to modify the kernel vmcore collection process, but I can not seem to understant few things:
What is that "core_collector" thing referenced to here? is it a script or a binary or what?
Where are the sources for the crash-dump-capture kernel?
Much like /sbin/makedumpfile is added into crash-dump-capture kernel environment, how can I add another script and make sure it is ran before makedumpfile.
I use Rocky Linux.

Related

How can I remove the need of wpcap.dll in my go program?

I use gopacket in my program. on linux, it runs perfectly.
But on windows the whole program crashes if i did not install WinPcap before.
My plan was to check if WinPcap is installed, and if not to inform the user that he needs this to use 100% of all features.
But i dont come to this point. i cant use gopacket if WinPcap is not available. I mean... not a single line of code of it (=> crash)
Has anyone an idea how i can solve this? im do not need gopacket actually. My plan was, if it is installed, fine, super! If not, dont care... do other things.
But now i have 2 choices... remove gopacket totally or find a way to start my program without the need of wpcap.dll. at least to tell the user that he needs it.
Please help me :(
You're wrong in that you are «not [using] a single line of code of it»: it's not hard to see that
its Windows-specific code calls into winpcap.dll.
What is more fun, is that
its Unix-specific code calls into libpcap.so, and this means you have it working on your local system simply due to the fact you have libpcap package installed (or whatever it's named in your code).
All this means that currently your program is not really portable
anyway (I mean, in the sense you supposedly think it is portable).
You can run something like
$ ldd ./yourbinary
and see it printing a reference to libpcap.so of some version.
There are several ways to solve this.
The easiest is to just try shipping winpcap.dll with your binary. Windows by default looks for DLLs in the current directory of the application trying to load them. Since gopacket uses cgo, it means the winpcap.dll is attempted to be linked it at the application startup, so the application has no chance of changing its working directory before that library is attempted to be found and linked in.
A more complicated approach is to make (or obtain) a static version of the winpcap library (remember that DLL is a library, just a special form of it) and then jump around building gopacket so that it picks that static library.
Install Npcap in "Wpcap API compatibility mode".

Boot linux kernel to terminal

I have a project in mind and for that I require the kernel to boot up and bring me to a console window so that I can start working. [later I'll automate the process].
How do I accomplish it?
Well, I have downloaded the latest stable kernel source from kernel.org and I have tried editing the init/main.c file. But I have no idea what in the world was going on in that file [noob ^n].
Hence, I post this question for an answer.
I require the kernel to boot up and bring me to a console window so that I can start working.
The kernel doesn't do much by itself. In fact, it's unlikely you want to alter "main" in the kernel.
If you want to "run" the kernel, you'll also need a root filesystem and some user-space programs. If you want a minimal userland, you can use "busybox". Even better, buildroot will help you create a minimal userland + kernel.
You can even combine your root filesystem plus the kernel into a single binary. At runtime, it will uncompress userland into a ramdisk and run entirely from RAM. See initramfs. This is super-helpful for embedded systems. A minimal kernel+root filesystem can be around 1MB.
Go through below link
http://balau82.wordpress.com/2010/03/27/busybox-for-arm-on-qemu/
Just black screen after running Qemu

Changing linux kernel system call number

I wanted to build my own custom kernel with a different syscall table. (same syscalls but in different position/numbers)
I was working on kernel 3.2.29.
Changing the kernel was quite easy:
1) changing the syscall position in ‫‪arch/x86/kernel/syscall_table_32.S‬‬
2) changing the syscall macro number in arch/x86/include/asm/unistd_32.h
3) compiling and installing the new kernel
I switched the syscalls around: sys_open took the place and number of sys_read, and vice versa.
I figured that if I compile glibc with the modified kernel headers, I could have a running system, but unfortunately, it wasn't enough and my system won't boot.
Am I missing something? What else do I need to do in order to have a running system?
The steps I have taken are:
1) building and installing the kernel as described in my question
2) extracting the new kernel headers using make headers_install INSTALL_HDR_PATH=[path]
3) building glibc with the parameter --with-headers=[path/include]
4) I used a live cd to access the file system externally in order to install the new glibc, using the make install install_root=[the original file system] (so the system won't break during the install)
I hope that the new glibc was built properly, but I am not sure.
After that, when booting the system, the boot stops in the (initrafms) shell screen:
I guess I need to rebuild the initrd, but how do I compile it according to the new syscall table?
You will have to rebuild everything. Even if all your binaries are dynamically linked, it is possible that the old syscalls were inlined into the binary because many of the C functions are just return syscall(__NR_somecall,...).
You could do this manually, but it could be difficult to keep the toolchains straight unless you use a cross compilable toolchain like buildroot, aboriginal or similar. Pick whichever best suits you (I prefer Rob Landley's aboriginal - http://landley.net/aboriginal/ )
Then to make your initrd just expand the old one using {z,bz,xz}cat oldinit.rd |cpio -id; rm oldinit.rd. Replace the old kernel modules, libs and binaries with the new and cpio and compress it back (cpio needs the -H newc option) ... or now you can rebuild your kernel and point the initramfs to that directory, but wouldn't recommend that if your initrd may need changed frequently such as if for instance you were testing out a whole new syscall structure and having to debug a lot.
Scrambling the system call numbers is really going to hurt. You'll at least need to rebuild all of the statically linked binaries on your system and your initrd (if you use one).
You haven't said at what point the boot fails, but even if the kernel comes up it is likely that the critical programs contained in the initrd compressed ramdisk would fail because they have the original syscall numbers hard-coded. You will need to rebuild and repackage those as well.
You might consider first replacing init with a static hello-world type of program to verify that your kernel can support a userspace at all; then look into the details of making all the complexity of a modern linux userspace match.
you had to learn to reading the messsage of pansic dump and show us what kenrel panic. Without this information, people can hardly help you or provide you useful suggestion.

Use named pipes (or something else) as in-memory files

I've been going through the WinAPI documentation for a while, but I don't seem to be able to find an answer. What I'm trying to achieve is to give a program a file name that it can open and work with it like that would be a normal file on the disk. But I want this object to be in the memory.
I tried using named pipes and they work in some of the situations, but not always. I create a named pipe and pass it to the child process as a regular file. When process exists I collect the data from the pipe.
program.exe \\.\pipe\input_pipe
Faced some limitations though. One of them is that they are not seekable. The second limitation is that they should be opened with exactly the right permissions. And the third one I found is that you cannot pre-put any data into a duplex pipe before it's been open on the other end. Is there any way to overcome those limitations of the named pipes?
Or maybe there is some other kind of object that could be opened with CreateFile and then accessed with ReadFile and WriteFile. So far the only solution I see is to create a file system driver and implement all the functionality myself.
Just to make it clear I wanted to point out that I cannot change the child program I'm running. The main idea is to give that program something that it would think is a normal file.
UPDATE: I'm not looking for a solution that involves installation of any external software.
Memory-mapped files would allow you to do what you want.
EDIT:
On rereading the question - since the receiving program already uses CreateFile/ReadFile/WriteFile and cannot be modified, this will not work. I cannot think of a way to do what OP wants outside of third-party or self-written RAMDisk solution.
The simplest solution might be, as you seem to suggest, using a Ramdisk to make a virtual drive mapped to memory. Then obviously, any files you write to or read from that virtual drive will be completely contained in RAM (assuming it doesn't get paged to disk).
I've done that a few times myself to speed up a process that was entirely disk-bound.
Call CreateFile but with FILE_ATTRIBUTE_TEMPORARY and probably FILE_FLAG_DELETE_ON_CLOSE as well.
The file will then never hit the disk unless the system is low on physical memory.

I need to find the point in my userland code that crash my kernel

I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html

Resources