Character device driver hangs the system - how to avoid? - linux-kernel

I'm writing a simple writable character device driver (2.6.32-358.el6.x86_64, under VirtualBox), and since it's not mature yet, it tends to crash/freeze (segfaults, infinite loops).
I'm testing it like this: $> echo "some data" > /dev/my_dev, and if crash/freeze occurs, the whole system (VirtualBox) freezes. I tried to move all the work to another kernel thread to avoid the system-wide freeze, but it doesn't help.
Is it possible to "isolate" such a crash/freeze, so that I'd be able to kill the process, in whose context the kernel module runs?

The module runs in kernel context. That's why debugging it is difficult and bugs can easily crash the system. Infinite loop is not really an issue as it just slows the system down, but doesn't cause a crash. Writing to the wrong memory region however is fatal.
If you are lucky, you would get a kernel oops before the freeze. If you test your code in one of the TTYs, rather than the GUI, then you might immediately see the oops (kernel BUG log) on the screen which you can study and might be helpful to you.
In my experience however, it's best to write and test the kernel-independent code in user-space, probably with mock functions and test it heavily, run valgrind on it, and make sure it doesn't have bugs. Then use it in kernel space. You'd be surprised at how much of a kernel module's code may in fact not need kernel context at all. Of course this very much depends on the functionality of the kernel module.
To actually debug the code in kernel space, there are tools which I have never used, such as kgdb. What I do myself usually is a mixture of printks and binary search. That is, if the crash is so severe that the kernel oops is not shown at all. First, I put printk (possibly with a delay after) in different places to see which parts of the code are reached before the oops. tail -f /var/log/messages comes in handy. Then, I do binary search; disable half of the code to see if the crash occurs. If not, possibly the problem is in the second half. If it occurs, surely the problem is in the first half. Repeat!
The ultimate way of writing a bug-free kernel module is to write code that doesn't have bugs in the first place. Of course, this is rarely possible, but if you write clean and undefined-behavior-free C code and write very concise functions whose correctness is obvious and you pay attention to the boundaries of arrays, it's not that hard.

Related

Leak_DefinitelyLost can valgrind points out where is the last time the address was on the stack?

I'm having a leak which is very hard to detect;
Can valgrind tell me which is the last call where address was accessible? and what were the values of the variables? I use Clion, can it just break when it happens?
There is no "instantaneous" detection of leaks functionality in valgrind/memcheck
that reports a leak exactly at the time the last pointer to a block is lost.
There was an experimental tool that tried to do that, but it was never considered for integration in valgrind, due to various difficulties to make this work properly.
If your leak is easy to reproduce, you can run your application under valgrind +
gdb/vgdb. You can then add breaks at various points in your program, and then
use monitor commands such as "leak_check" or "who_points_at" to check if the leak already happened. By refining the locations where to put a break, this might help to find when the last pointer to a block is lost.
See e.g. https://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.monitor-commands for more info.

Creating deliberately dirty heap in gdb?

I found that trying to debug accidentally uninitialized data in gdb can be annoying. The program will crash when directly executed from the command line, but not while under inspection in gdb. It seems like gdb's heap is often clean (all zeroes), whereas from the command line, clearly not.
Is there a reason for this? If so, can I deliberately tell gdb or gcc to dirty the heap? IE, is there way to specify a "debug" allocator that will always give random data to malloc() and new? I imagine this might involve a special libc? Obviously if there was a way to do this without changing the linker options would be great so that the release version is as similar as possible to the debug version.
I'm currently using MinGW-w64 (gcc 4.7 based), but I'd be interested in a general answer.
The Linux way of doing this would be to use valgrind. On Mac OS X there are environment variables that control allocation debugging, see the Mac OS X man page for malloc. Valgrind support for Mac OS X is starting to appear but 10.8 support is not complete as of me writing this.
As you're using MinGW-w64 I am assuming you're using Windows. It seems like this SO question talks about alternatives to valgrind on Windows. One solution would be to run your app in Wine on a Linux box under valgrind.
If your program is running under valgrind, it is not directly running on a CPU. Valgrind is simulating every instruction, hence you can't simply attach a debugger to it. To get this to work you need to use the valgrind GDB server, see this page for more details.
Another approach would be to use calloc instead of malloc, which would zero your heap allocations. This doesn't give you a deliberately dirty heap but at least gives you consistent behaviour with or without a debugger.
Yes, GDB zeroes out everything, this is both useful and very annoying. Useful, insofar as everything is guaranteed to be in a well-defined state (no random values in memory, just zero). Which means, in theory, no nasty surprises while debugging.
In practice, and this is where it gets annoying, the theory sometimes fails spectacularly. The infamous "works fine, but crashes in debugger!?!" or "works fine in debugger, but crashes otherwise?!" issues are an example of this. Usually, this is a combination of an uninitialized pointer with a well-intended if(ptr != NULL) somewhere, which totally blows up for "no good reason" because the debugger initializes memory to zero, so the test fails to do what you intended.
About your question on deliberately garbling data allocated by malloc, GCC supports malloc hooks (see docs here and question here on SO).
This lets you, in a very easy and unintrusive manner, redirect all calls to malloc to a function of your own. From there you can call the real malloc and fill the allocated block with garbage (or some invalid-pointer magic value like DEADBEEF), if you wish to do so.
As for operator new, this happens to be a wrapper around malloc (that's an implementation detail, but malloc hooks are non-portable already, so relying on that won't make things worse), therefore malloc hooking should already deal with this, too.

How to debug a program without a debugger? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Interview question-
Often its pretty easier to debug a program once you have trouble with your code.You can put watches,breakpoints and etc.Life is much easier because of debugger.
But how to debug a program without a debugger?
One possible approach which I know is simply putting print statements in your code wherever you want to check for the problems.
Are there any other approaches other than this?
As its a general question, its not restricted to any specific language.So please share your thoughts on how you would have done it?
EDIT- While submitting your answer, please mention a useful resource (if you have any) about any concept. e.g. Logging
This will be lot helpful for those who don't know about it at all.(This includes me, in some cases :)
UPDATE: Michal Sznajderhas put a real "best" answer and also made it a community wiki.Really deserves lots of up votes.
Actually you have quite a lot of possibilities. Either with recompilation of source code or without.
With recompilation.
Additional logging. Either into program's logs or using system logging (eg. OutputDebugString or Events Log on Windows). Also use following steps:
Always include timestamp at least up to seconds resolution.
Consider adding thread-id in case of multithreaded apps.
Add some nice output of your structures
Do not print out enums with just %d. Use some ToString() or create some EnumToString() function (whatever suits your language)
... and beware: logging changes timings so in case of heavily multithreading you problems might disappear.
More details on this here.
Introduce more asserts
Unit tests
"Audio-visual" monitoring: if something happens do one of
use buzzer
play system sound
flash some LED by enabling hardware GPIO line (only in embedded scenarios)
Without recompilation
If your application uses network of any kind: Packet Sniffer or I will just choose for you: Wireshark
If you use database: monitor queries send to database and database itself.
Use virtual machines to test exactly the same OS/hardware setup as your system is running on.
Use some kind of system calls monitor. This includes
On Unix box strace or dtrace
On Windows tools from former Sysinternals tools like http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx, ProcessExplorer and alike
In case of Windows GUI stuff: check out Spy++ or for WPF Snoop (although second I didn't use)
Consider using some profiling tools for your platform. It will give you overview on thing happening in your app.
[Real hardcore] Hardware monitoring: use oscilloscope (aka O-Scope) to monitor signals on hardware lines
Source code debugging: you sit down with your source code and just pretend with piece of paper and pencil that you are computer. Its so called code analysis or "on-my-eyes" debugging
Source control debugging. Compare diffs of your code from time when "it" works and now. Bug might be somewhere there.
And some general tips in the end:
Do not forget about Text to Columns and Pivot Table in Excel. Together with some text tools (awk, grep or perl) give you incredible analysis pack. If you have more than 32K records consider using Access as data source.
Basics of Data Warehousing might help. With simple cube you may analyse tons of temporal data in just few minutes.
Dumping your application is worth mentioning. Either as a result of crash or just on regular basis
Always generate you debug symbols (even for release builds).
Almost last but not least: most mayor platforms has some sort of command line debugger always built in (even Windows!). With some tricks like conditional debugging and break-print-continue you can get pretty good result with obscure bugs
And really last but not least: use your brain and question everything.
In general debugging is like science: you do not create it you discover it. Quite often its like looking for a murderer in a criminal case. So buy yourself a hat and never give up.
First of all, what does debugging actually do? Advanced debuggers give you machine hooks to suspend execution, examine variables and potentially modify state of a running program. Most programs don't need all that to debug them. There are many approaches:
Tracing: implement some kind of logging mechanism, or use an existing one such as dtrace(). It usually worth it to implement some kind of printf-like function that can output generally formatted output into a system log. Then just throw state from key points in your program to this log. Believe it or not, in complex programs, this can be more useful than raw debugging with a real debugger. Logs help you know how you got into trouble, while a debugger that traps on a crash assumes you can reverse engineer how you got there from whatever state you are already in. For applications that you use other complex libraries that you don't own that crash in the middle of them, logs are often far more useful. But it requires a certain amount of discipline in writing your log messages.
Program/Library self-awareness: To solve very specific crash events, I often have implemented wrappers on system libraries such as malloc/free/realloc which extensions that can do things like walk memory, detect double frees, attempts to free non-allocated pointers, check for obvious buffer over-runs etc. Often you can do this sort of thing for your important internal data types as well -- typically you can make self-integrity checks for things like linked lists (they can't loop, and they can't point into la-la land.) Even for things like OS synchronization objects, often you only need to know which thread, or what file and line number (capturable by __FILE__, __LINE__) the last user of the synch object was to help you work out a race condition.
If you are insane like me, you could, in fact, implement your own mini-debugger inside of your own program. This is really only an option in a self-reflective programming language, or in languages like C with certain OS-hooks. When compiling C/C++ in Windows/DOS you can implement a "crash-hook" callback which is executed when any program fault is triggered. When you compile your program you can build a .map file to figure out what the relative addresses of all your public functions (so you can work out the loader initial offset by subtracting the address of main() from the address given in your .map file). So when a crash happens (even pressing ^C during a run, for example, so you can find your infinite loops) you can take the stack pointer and scan it for offsets within return addresses. You can usually look at your registers, and implement a simple console to let you examine all this. And voila, you have half of a real debugger implemented. Keep this going and you can reproduce the VxWorks' console debugging mechanism.
Another approach, is logical deduction. This is related to #1. Basically any crash or anomalous behavior in a program occurs when it stops behaving as expected. You need to have some feed back method of knowing when the program is behaving normally then abnormally. Your goal then is to find the exact conditions upon which your program goes from behaving correctly to incorrectly. With printf()/logs, or other feedback (such as enabling a device in an embedded system -- the PC has a speaker, but some motherboards also have a digital display for BIOS stage reporting; embedded systems will often have a COM port that you can use) you can deduce at least binary states of good and bad behavior with respect to the run state of your program through the instrumentation of your program.
A related method is logical deduction with respect to code versions. Often a program was working perfectly at one state, but some later version is not longer working. If you use good source control, and you enforce a "top of tree must always be working" philosophy amongst your programming team, then you can use a binary search to find the exact version of the code at which the failure occurs. You can use diffs then to deduce what code change exposes the error. If the diff is too large, then you have the task of trying to redo that code change in smaller steps where you can apply binary searching more effectively.
Just a couple suggestions:
1) Asserts. This should help you work out general expectations at different states of the program. As well familiarize yourself with the code
2) Unit tests. I have used these at times to dig into new code and test out APIs
One word: Logging.
Your program should write descriptive debug lines which include a timestamp to a log file based on a configurable debug level. Reading the resultant log files gives you information on what happened during the execution of the program. There are logging packages in every common programming language that make this a snap:
Java: log4j
.Net: NLog or log4net
Python: Python Logging
PHP: Pear Logging Framework
Ruby: Ruby Logger
C: log4c
I guess you just have to write fine-grain unit tests.
I also like to write a pretty-printer for my data structures.
I think the rest of the interview might go something like this...
Candidate: So you don't buy debuggers for your developers?
Interviewer: No, they have debuggers.
Candidate: So you are looking for programmers who, out of masochism or chest thumping hamartia, make things complicated on themselves even if they would be less productive?
Interviewer: No, I'm just trying to see if you know what you would do in a situation that will never happen.
Candidate: I suppose I'd add logging or print statements. Can I ask you a similar question?
Interviewer: Sure.
Candidate: How would you recruit a team of developers if you didn't have any appreciable interviewing skill to distinguish good prospects based on relevant information?
Peer review. You have been looking at the code for 8 hours and your brain is just showing you what you want to see in the code. A fresh pair of eyes can make all the difference.
Version control. Especially for large teams. If somebody changed something you rely on but did not tell you it is easy to find a specific change set that caused your trouble by rolling the changes back one by one.
On *nix systems, strace and/or dtrace can tell you an awful lot about the execution of your program and the libraries it uses.
Binary search in time is also a method: If you have your source code stored in a version-control repository, and you know that version 100 worked, but version 200 doesn't, try to see if version 150 works. If it does, the error must be between version 150 and 200, so find version 175 and see if it works... etc.
use println/log in code
use DB explorer to look at data in DB/files
write tests and put asserts in suspicious places
More generally, you can monitor side effects and output of the program, and trigger certain events in the program externally.
A Print statement isn't always appropriate. You might use other forms of output such as writing to the Event Log or a log file, writing to a TCP socket (I have a nice utility that can listen for that type of trace from my program), etc.
For programs that don't have a UI, you can trigger behavior you want to debug by using an external flag such as the existence of a file. You might have the program wait for the file to be created, then run through a behavior you're interested in while logging relevant events.
Another file's existence might trigger the program's internal state to be written to your logging mechanism.
like everyone else said:
Logging
Asserts
Extra Output
&
your favorite task manager or process
explorer
links here and here
Another thing I have not seen mentioned here that I have had to use quite a bit on embedded systems is serial terminals.
You can cannot a serial terminal to just about any type of device on the planet (I have even done it to embedded CPUs for hydraulics, generators, etc). Then you can write out to the serial port and see everything on the terminal.
You can get real fancy and even setup a thread that listens to the serial terminal and responds to commands. I have done this as well and implemented simple commands to dump a list, see internal variables, etc all from a simple 9600 baud RS-232 serial port!
Spy++ (and more recently Snoop for WPF) are tremendous for getting an insight into Windows UI bugs.
A nice read would be Delta Debugging from Andreas Zeller. It's like binary search for debugging

How Does AQTime Do It?

I've been testing out the performance and memory profiler AQTime to see if it's worthwhile spending those big $$$ for it for my Delphi application.
What amazes me is how it can give you source line level performance tracing (which includes the number of times each line was executed and the amount of time that line took) without modifying the application's source code and without adding an inordinate amount of time to the debug run.
The way that they do this so efficiently makes me think there might be some techniques/technologies used here that I don't know about that would be useful to know about.
Do you know what kind of methods they use to capture the execution line-by-line without code changes?
Are there other profiling tools that also do non-invasive line-by-line checking and if so, do they use the same techniques?
I've made an open source profiler for Delphi which does the same:
http://code.google.com/p/asmprofiler/
It's not perfect, but it's free :-). Is also uses the Detour technique.
It stores every call (you must manual set which functions you want to profile),
so it can make an exact call history tree, including a time chart (!).
This is just speculation, but perhaps AQtime is based on a technology that is similar to Microsoft Detours?
Detours is a library for instrumenting
arbitrary Win32 functions on x86, x64,
and IA64 machines. Detours intercepts
Win32 functions by re-writing the
in-memory code for target functions.
I don't know about Delphi in particular, but a C application debugger can do line-by-line profiling relatively easily - it can load the code and associate every code path with a block of code. Then it can break on all the conditional jump instructions and just watch and see what code path is taken. Debuggers like gdb can operate relatively efficiently because they work through the kernel and don't modify the code, they just get informed when each line is executed. If something causes the block to be exited early (longjmp), the debugger can hook that and figure out how far it got into the blocks when it happened and increment only those lines.
Of course, it would still be tough to code, but when I say easily I mean that you could do it without wasting time breaking on each and every instruction to update a counter.
The long-since-defunct TurboPower also had a great profiling/analysis tool for Delphi called Sleuth QA Suite. I found it a lot simpler than AQTime, but also far easier to get meaningful result. Might be worth trying to track down - eBay, maybe?

Question about g++ generated code

Dear g++ hackers, I have the following question.
When some data of an object is overwritten by a faulty program, why does the program eventually fail on destruction of that object with a double free error? How does it know if the data is corrupted or not? And why does it cause double free?
It's usually not that the object's memory is overwritten, but some part of the memory outside of the object. If this hits malloc's control structures, free will freak out once it accesses them and tries to do weird things based on the corrupted structure.
If you'd really only overwrite object memory with silly stuff, there's no way malloc/free would know. Your program might crash, but for other reasons.
Take a look at valgrind. It's a tool that emulates the CPU and watches every memory access for anomalies (like trying to overwrite malloc's control structures). It's really easy to use, most of the time you just start your program inside valgrind by prepending valgrind on the shell, and it saves you a lot of pain.
Regarding C++: always make sure that you use new in conjunction with delete and, respectively, new[] in conjunction with delete[]. Never mix them up. Bad things will happen, often similar to what you are describing (but valgrind would warn you).

Resources