I am debugging an problem wherein sometimes, reboot command just does not boot.
Very similar to https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1086480/am3352-linux-reboot-command-hangs-for-10-minutes-while-booting-down-then-succeeds but not the same.
Can "tracefs" be used for it ? I ask this because in all examples I have seen the trace-buffer is stored in /sys/kernel/tracing/trace, and then later this file is dumped.
Since the issue is in the reboot sequence, I will not have the option to dump this.
Is there a way that I can have the trace-buffer to be directly printed to console ?
I looked at tp_printk kernel cmdline option. However when I enable the "function_graph" I still see the output in the trace file but not console.
Could you let me know the parameters that need to be set to have the tracing subsystem print the trace to console.
Thanks
Related
I am trying to analyze a core dump using dotnet-dump tool via cmd:
tmp>dotnet-dump analyze core.2293
Loading core dump: core.2293 ...
Ready to process analysis commands. Type 'help' to list available commands or 'help [command]' to get detailed help on a command.
Type 'quit' or 'exit' to exit the session.
As documentation tells it brings up an interactive session that accepts a variety of instructions to get debug info.
In my case, every command fails with the message like this:
> pe -lines
Failed to load data access module, 0x80004002
Can not load or initialize mscordaccore.dll. The target runtime may not be initialized.
For more information see https://go.microsoft.com/fwlink/?linkid=2135652
>
p/s Link above doesn't help much.
Do you have any suggestions on how to fix it?
Solved. The limitation is that process dumps are not portable. It is not possible to diagnose dumps collected on Linux with Windows and vice-versa.
The dump was collected on a Linux machine. And I were trying to analyze it on a Windows machine.
To analyze it properly you should set up a Linux environment. In my case, it was done by creating a docker container with an sdk:alpine image.
I'm getting a lovely BSOD on bootup (STOP: 0x0000007E) from a driver I'm writing, and would like to load up the memory dump for analysis. However, it's not getting dumped anywhere. Everything is setup correctly in the Startup and Recovery settings, but I get no dump file, and nothing in the event log stating a dump has taken place. It looks like a dump is not even occurring...
I know the exact line of code causing it (a call to IoAttachDevice()), but am not sure why, and would like to view the DbgPrint output to see where exactly it's failing. Could Windows possibly be crashing before the dumping functionality is set up? If so, how do I get access to the state of the machine when the failure occurs?
UPDATE: Other possibly useful information: I'm running Windows XP through VirtualBox on a Linux host.
I don't know why you're not getting a dump file, but if you have ready access to the machine, attach a kernel debugger to it an repro the error - you'll be left with the machine sitting in the debugger, ready to go (you can have the debugger produce the dumpfile for you if you want to debug offline as well).
Right-click on "my computer" select "Advanced", under "startup and recovery" click "settings". select "kernel memory dump" or "complete memory dump".
What's the start setting of your driver? If it starts too early in the boot order, the filesystem might not be remounted read-write yet, and therefore there's no place for a dump to go.
Drivers under development shouldn't generally be set to auto-start until you've gotten the driver stable when loaded later. Of course you eventually need to set it to auto-start so you can verify it works correctly, but that comes later.
I'm trying to debug an application on an embedded device running an old version of Linux/Qtopia. I asked for help on QT forums but the people there don't know about old software and embedded systems. I'd really like some help with debug strategies.
My program will crash after the main window has been constructed, i.e. some time into the event loop. But depending on the order of functions in the constructor, sometimes it will run only from the console and sometimes it will only run from the icon. Despite my best efforts I can't narrow down what is causing the problem.
There is no seg fault or signal but my program does not continue and the destructor does not get called. It seems to me that one of the first things that would happen in the event loop is a resize event and when this is called could vary if you ran from the console or icon. Also, the various widgets in my GUI would be initialised and drawn so that is also a potential source of error, if I haven't set up something properly.
My debugging options are limited as the area where the crash actually occurs is not under my control. I tried logging to a file and printing to stderr but this was no help. When I got to the state where it runs from the icon but not console, I tried running in gdb and strace but it ran OK - the classic problem of debug software initialising differently.
My next thought is to try to force a core dump and then analyse that. How do I force a core dump ? Is there a better strategy ?
Logging to a file or to a communication port (serial port, etc.) is probably the simplest way to see what is happening and maintaining the normal runtime (i.e. not in a debugger).
You say that logging to a file and printing to stderr was no help. Why not? Are you printing relevant debugging information to the file? Are you using the Linux/Qtopia sources and adding debug logging?
Assuming you have sources for all of the code you are running, it should be just a matter of adding debug logging in the right places to pinpoint where the problem is occurring.
I'm doing some kernel modification and am trying to get printk to output information back to the console. I pass any kernel log level with it and do not properly get any response back on the console for even the highest log levels.
I checked and the current log configuration for printk is 4 4 1 7.
It prints properly each time to logs. I can use dmesg | less and see it appended to the log. But I can't return it to console properly using printk.
I'm not sure that it matters but I use SSH to connect to a remote machine where the modified kernel exists.
I've tried SSH from gnome-terminal and from putty in Windows. Neither change a thing. Still shows printk in the server's logs, but not on my console.
Any way to get it to the console? What could be going wrong given that I've tried every log level and none work? THANKS!
I believe that prink only logs to the physical consoles, if you want to monitor the kernel output via arbitrary ttys, then you will need to use tail to monitor a file being written to by syslog, or an application such as xconsole which specifically monitors /dev/console for messages.
Just to make sure, you are in init level 3 (text mode) aren't you? If you have run startx and are working in graphical mode, you will not see stuff on the terminal.
I believe some variants of syslog support this without doing kernel modifications, perhaps by logging to /dev/console. Is there any particular reason you're trying to modify the kernel to do this? I'd guess there's an easier way.
Some distros patch out printk so it doesn't show up (Red Hat was first, Ubuntu does it too afaik) - you're probably hitting this.
If it's for debugging - just tail the /var/log/messages. If you need stable output from your kernel module - create a char device or a file under /proc and have userland process read from there.
Try using
dmesg -wH &
to force all your kernel messages, that are printed to dmesg (and also the virtual terminals like Ctrl+Alt+F1 , depending on your /proc/sys/kernel/printk log level and a level of your message), to also appear at your SSH or GUI console: Konsole, Terminal or whatever you are using! And, if you need to monitor only for the specific messages:
dmesg -wH | grep ERR &
I'm using it to monitor for the "ERROR" messages like
printk(KERN_EMERG "ERROR!\n");
that I printk from my driver
Well, fortunately I haven't written many applications that cause a BSOD but I just wonder about the usefullness of the information on this screen. Does it contain any useful information that could help me to find the error in my code? If so, what do I need, exactly?
And then, the system restarts and probably has written some error log or other information to the system somewhere. Where is it, what does it contain and how do I use it to improve my code?
I did get a BSOD regularly in the past when I was interacting with a PBX system where the amount of documentation of it's drivers were just absent, so I had to do some trial-and-error coding. Fortunately, I now work for a different company and don't see any BSOD's as a result of my code.
If you want a fairly easy way to find out what caused an OS crash that will work ~90% of the time - assuming you have a crash dump available - then try the following:
Download WinDbg as part of the Debugging tools for Windows package. Note, you only need to install the component called Debugging Tools for Windows.
Run WinDbg
Select "Open Crash Dump" from the file menu
When the dump file has loaded type analyze -v and press enter
WinDbg will do an automated analysis of the crash and will provide a huge amount of information on the system state at the time of the crash. It will usually be able to tell you which module was at fault and what type of error caused the crash. You should also get a stack trace that may or may not be helpful to you.
Another useful command is kbwhich prints out a stack trace. In that list, look for a line contains .sys. This is normally the driver which caused the crash.
Note that you will have to configure symbols in WinDbg if you want the stack trace to give you function names. To do this:
Create a folder such as C:\symbols
In WinDbg, open File -> Symbol File Path
Add: SRV*C:\symbols*http://msdl.microsoft.com/download/symbols
This will cache symbol files from Microsoft's servers.
If the automated analysis is not sufficient then there are a variety of commands that WinDbg provides to enable you to work out exactly what was happening at the time of the crash. The help file is a good place to start in this scenario.
Generally speaking, you cannot cause a OS crash or bug check from within your application code. That said, if you are looking for general tips and stuff, I recommend the NTDebugging blog. Most of the stuff is way over my head.
What happens when the OS crashes is it will write a kernel dump file, depending on the current flags and so on, you get more or less info in it. You can load up the dump file in windbg or some other debugger. Windbg has the useful !analyze command, which will examine the dump file and give you hints on the bucket the crash fell into, and the possible culprits. Also check the windbg documentation on the general cause of the bug check, and what you can do to resolve it.