Linux Page poisoning - memory-management

I am working on ARM Linux. When we enable CONFIG_PAGE_POISONING, the pages are filled with poison byte pattern after free_pages() and verifying the poison patterns before alloc_pages().
This helps me to identify the bit flips or page memory corruption, by verifying the poison byte pattern before allocating a new page. But how to identify the culprit? I searched in google, but could not find.

I know it's an old question, but I've just had a similar issue, and it took me a while to debug it. So I will recommend here on two tools that were very useful to me (I used both of them simultaneously):
First tool: KMEMLEAK
In order to enable it you must enable it in your kernel configuration:
CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
And if you get this kernel log:
kmemleak: Kernel memory leak detector disabled
kmemleak: Early log buffer exceeded (xxxx), please increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE
Then I also suggest to add this to your kernel configuration:
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=4096
In addition I think you need to add kmemleak=on to the boot arguments (because I'm not sure it's enabled by default).
After all that, the KMEMLEAK tool is ready to run.
Now I suggest to give a look at the examples that are given in the links below. They helped me to understand how to use the (API and how to read it's logs):
KMEMLEAK first example
KMEMLEAK second example
Second tool: SLUB_DEBUG
The SLUB_DEBUG is very useful with finding memory corruptions caused by use-after-free, double-free and buffer-overrun errors.
In order to enable it you should update your kernel configuration:
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_PAGE_POISONING=y
And I also suggest adding:
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_PAGE_OWNER=y
In addition you will probably need to add to the boot arguments the next flags: page_poison=on, slub_debug=FZP and if you've also set CONFIG_PAGE_OWNER, you also need page_owner=on.
After all the debug configuration is set, you should have a look at the next example. It's clear and useful.
SLUB_DEBUG example
Good Luck finding your bug !!!

Related

How can I track an event accross multiple resources in gem5?

I would like to know if there is a proper method to track memory accesses
across multiple resources at once. For example I set up a simple dual core CPU
by advancing the simple.py from learning gem5 (I just added another
TimingSimpleCPU and made the port connections).
I took a look at the different debug options and found for example the
MemoryAccess flag (and others), but this seemed to only show the accesses at
the DRAM or one other resource component.
Nevertheless I imagine a way to track events across CPU, bus and finally memory.
Does this feature already exist?
What can I try next? Is it and idea to add my own --debug-flag or can I work
with the TraceCPU for my specified use?
I haven't worked much with gem5 yet so I'm not sure how to achieve this. Since until now I only ran in SE mode is the FS mode a solution?
Finally I also found the TraceCPUData flag in the --debug-flags, but running
this with my config script created no output (like many other flags btw. ...).
It seems that this is a --debug-flag for the TraceCPU, what kind of output does this flag create and can it help me?

How to implement new instruction in linux KVM at unused x86 opcode

As a part of understanding virtualization, I am trying to extend the support of KVM and defin a new instruction. The instruction will use previously unused opcodes.
ref- ref.x86asm.net/coder32.html.
Now, lets say an instruction like 'CPUID' (which causes a vm-exit) and i want to add a new instruction, say - 'NEWCPUID', which is similar to 'CPUID' in priviledge and is trapped by hypervisor, but will differ in the implementation.
After going through some online resources, I was able to understand how to define new system calls, but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID? Is there a better way than only relying on 'find' command?
I am facing below challenges:
1. Which all places in linux source code do I need to add code?
2. Not sure how this new instruction can be mapped to a previously unused opcode?
As I am completely new to this field and willing to learn this, can someone explain me in short how to go about this task? I will need the right direction to achieve this. If there is a reference/tutorial/blog describing the process, it will be of great help!
Here are answers to some of your questions:
... but I am not sure about which all files in linux source code do I need to add the code for NEWCPUID?
A - The right place to add emulation for KVM is arch/x86/kvm/emulate.c. Take a look at how opcode_table[] is defined and the hooks to the functions that they execute. The basic idea is the guest executes and undefined instruction such as "db 0xunused"; this is results in an exit since the instruction is undefined. In KVM, you look at the rip from the VMCS/VMCB and determine if it's an instruction KVM knows about (such as NEWCPUID) and then KVM calls x86_emulate_instruction().
...Is there a better way than only relying on 'find' command?
A - Yes, pick an example system call and then use a symbol cross reference such as cscope.
...n me in short how to go about this task?
A - As I mentioned in 1, first of all find a way for the guest to attempt to execute this unused opcode (such as the db trick). I think the assembler will trying to reject unknown opcodes. So, that the first step. Second, check whether your instruction causes an vmexit(). For this, you can use tracing. Tracing emits a lot of output, so, you have to use some filter options. If tracing is overwhelming, simply printk something in vmx_handle_exit (vmx.c). Finally, find a way to hook to your custom function from here. KVM already has handle_exception() to handle guest exceptions; that would be a good place to insert your custom function. See how this function calls emulate_instruction to emulate an exception to be injected to the guest.
I have deliberately skipped some of the questions since I consider them essential to figure out yourself in the process of learning. BTW, I don't think this may not be the best way to understand virtualization. A better way might be to write your own userspace hypervisor that utlizes kvm services via /dev/kvm or maybe just a standalone hypervisor.

More specific OpenGL error information

Is there a way to retrieve more detailed error information when OpenGL has flagged an error? I know there isn't in core OpenGL, but is there perhaps some common extension or platform- or driver-dependent way or anything at all?
My basic problem is that I have a game (written in Java with JOGL), and when people have trouble with it, which they do on certain hardware/software configurations, it can be quite hard to trace down where the root of the problem lies. For performance reasons, I can't keep calling glGetError for each command but only do so at a few points in the program, so it's kind of hard to even find what command even flagged the error to begin with. Even if I could, however, the extremely general error codes that OpenGL have don't really tell me all that much about what happened (seeing as how the manpages on the commands even describe how the various error codes are reused for sometimes quite many different actual error conditions).
It would be tremendously helpful if there were a way to find out what OpenGL command actually flagged the error, and also more details about the error that was flagged (like, if I get GL_INVALID_VALUE, what value to what argument was invalid and why?).
It seems a bit strange that drivers wouldn't provide this information, even if in a completely custom way, but looked as I have, I sure haven't found any way to find it. If it really is that they don't, is there any good reason for why that is so?
Actually, there is a feature in core OpenGL that will give you detailed debug information. But you are going to have to set your minimum version requirement pretty high to have this as a core feature.
Nevertheless, see this article -- even though it only went core in OpenGL 4.3, it existed in extension form for quite some time and it does not require any special hardware feature. So for the most part all you really need is a recent driver from NV or AMD.
I have an example of how to use this extension in an answer I wrote a while back, complete with a few utility functions to make the output easier to read. It is written in C, so I do not know how helpful it will be, but you might find something useful.
Here is the sort of output you can expect from this extension (AMD Catalyst):
OpenGL Error:
=============
Object ID: 102
Severity: Medium
Type: Performance
Source: API
Message: glDrawElements uses element index type 'GL_UNSIGNED_BYTE' that is not
optimal for the current hardware configuration; consider using
'GL_UNSIGNED_SHORT' instead.
Not only will it give you error information, but it will even give you things like performance warnings for doing something silly like using 8-bit vertex indices (which desktop GPUs do not like).
To answer another one of your questions, if you set the debug output to synchronous and install a breakpoint in your debug callback you can easily make any debugger break on an OpenGL error. If you examine the callstack you should be able to quickly identify exactly what API call generated most errors.
Here are some suggestions.
According to the man pages, glGetError returns the value of the error flag and then resets it to GL_NO_ERROR. I would use this property to track down your bug - if nothing else you can switch up where you call it and do a binary search to find where the error occurs.
I doubt calling glGetError will give you a performance hit. All it does is read back an error flag.
If you don't have the ability to test this on the specific hardware/software configurations those people have, it may be tricky. OpenGL drivers are implemented for specific devices, after all.
glGetError is good for basically saying that the previous line screwed up. That should give you a good starting point - you can look up in the man pages why that function will throw the error, rather than trying to figure it out based on its enum name.
There are other specific error functions to call, such as glGetProgramiv, and glGetFramebufferStatus, that you may want to check, as glGetError doesn't check for every type of error. IE Just because it reads clean doesn't mean another error didn't happen.

How to detect who's issuing a wrong kfree

I am suspecting a double kfree in my kernel code. Basically, I have a data structure that is kzalloced and kfreed in a module. I notice that the same address is allocated and then allocated again without being freed in the module.
I would like to know what technique should I employ in finding where the wrong kfree is issued.
1.
Yes, kmemleak is an excellent tool, especially suitable for system-wide analysis.
Note that if you are going to use it to analyze a kernel module, you may need to save the addresses of the ELF sections containing the code of the module (.text, .init.text, ...) when the module is loaded. This may help you decipher the call stacks in the kmemleak's report. It usually makes sense to ask kmemleak to produce a report after the module has been unloaded but kmemleak cannot resolve the addresses at that time.
While a module is loaded, the addresses fo its sections can be found in the files in /sys/module/<module_name>/sections/.
After you have found the section each code address in the report belongs to and the corresponding offset into that section, you can use objdump, gdb, addr2line or a similar tool to obtain more detailed information about where the event of interest occurred.
2.
Besides that, if you are working on an x86 system and you would like to analyze a single kernel module, you can also use KEDR LeakCheck tool.
Unlike kmemleak, most of the time, it is not required to rebuild the kernel to be able to use KEDR.
The instructions on how to build and use KEDR are here. A simple example of how LeakCheck can be used is described in "Detecting Memory Leaks" section.
Have you tried enabling the kmemleak detection code?
See Documentation/kmemleak.txt for details.

How to debug a program without a debugger? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Interview question-
Often its pretty easier to debug a program once you have trouble with your code.You can put watches,breakpoints and etc.Life is much easier because of debugger.
But how to debug a program without a debugger?
One possible approach which I know is simply putting print statements in your code wherever you want to check for the problems.
Are there any other approaches other than this?
As its a general question, its not restricted to any specific language.So please share your thoughts on how you would have done it?
EDIT- While submitting your answer, please mention a useful resource (if you have any) about any concept. e.g. Logging
This will be lot helpful for those who don't know about it at all.(This includes me, in some cases :)
UPDATE: Michal Sznajderhas put a real "best" answer and also made it a community wiki.Really deserves lots of up votes.
Actually you have quite a lot of possibilities. Either with recompilation of source code or without.
With recompilation.
Additional logging. Either into program's logs or using system logging (eg. OutputDebugString or Events Log on Windows). Also use following steps:
Always include timestamp at least up to seconds resolution.
Consider adding thread-id in case of multithreaded apps.
Add some nice output of your structures
Do not print out enums with just %d. Use some ToString() or create some EnumToString() function (whatever suits your language)
... and beware: logging changes timings so in case of heavily multithreading you problems might disappear.
More details on this here.
Introduce more asserts
Unit tests
"Audio-visual" monitoring: if something happens do one of
use buzzer
play system sound
flash some LED by enabling hardware GPIO line (only in embedded scenarios)
Without recompilation
If your application uses network of any kind: Packet Sniffer or I will just choose for you: Wireshark
If you use database: monitor queries send to database and database itself.
Use virtual machines to test exactly the same OS/hardware setup as your system is running on.
Use some kind of system calls monitor. This includes
On Unix box strace or dtrace
On Windows tools from former Sysinternals tools like http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx, ProcessExplorer and alike
In case of Windows GUI stuff: check out Spy++ or for WPF Snoop (although second I didn't use)
Consider using some profiling tools for your platform. It will give you overview on thing happening in your app.
[Real hardcore] Hardware monitoring: use oscilloscope (aka O-Scope) to monitor signals on hardware lines
Source code debugging: you sit down with your source code and just pretend with piece of paper and pencil that you are computer. Its so called code analysis or "on-my-eyes" debugging
Source control debugging. Compare diffs of your code from time when "it" works and now. Bug might be somewhere there.
And some general tips in the end:
Do not forget about Text to Columns and Pivot Table in Excel. Together with some text tools (awk, grep or perl) give you incredible analysis pack. If you have more than 32K records consider using Access as data source.
Basics of Data Warehousing might help. With simple cube you may analyse tons of temporal data in just few minutes.
Dumping your application is worth mentioning. Either as a result of crash or just on regular basis
Always generate you debug symbols (even for release builds).
Almost last but not least: most mayor platforms has some sort of command line debugger always built in (even Windows!). With some tricks like conditional debugging and break-print-continue you can get pretty good result with obscure bugs
And really last but not least: use your brain and question everything.
In general debugging is like science: you do not create it you discover it. Quite often its like looking for a murderer in a criminal case. So buy yourself a hat and never give up.
First of all, what does debugging actually do? Advanced debuggers give you machine hooks to suspend execution, examine variables and potentially modify state of a running program. Most programs don't need all that to debug them. There are many approaches:
Tracing: implement some kind of logging mechanism, or use an existing one such as dtrace(). It usually worth it to implement some kind of printf-like function that can output generally formatted output into a system log. Then just throw state from key points in your program to this log. Believe it or not, in complex programs, this can be more useful than raw debugging with a real debugger. Logs help you know how you got into trouble, while a debugger that traps on a crash assumes you can reverse engineer how you got there from whatever state you are already in. For applications that you use other complex libraries that you don't own that crash in the middle of them, logs are often far more useful. But it requires a certain amount of discipline in writing your log messages.
Program/Library self-awareness: To solve very specific crash events, I often have implemented wrappers on system libraries such as malloc/free/realloc which extensions that can do things like walk memory, detect double frees, attempts to free non-allocated pointers, check for obvious buffer over-runs etc. Often you can do this sort of thing for your important internal data types as well -- typically you can make self-integrity checks for things like linked lists (they can't loop, and they can't point into la-la land.) Even for things like OS synchronization objects, often you only need to know which thread, or what file and line number (capturable by __FILE__, __LINE__) the last user of the synch object was to help you work out a race condition.
If you are insane like me, you could, in fact, implement your own mini-debugger inside of your own program. This is really only an option in a self-reflective programming language, or in languages like C with certain OS-hooks. When compiling C/C++ in Windows/DOS you can implement a "crash-hook" callback which is executed when any program fault is triggered. When you compile your program you can build a .map file to figure out what the relative addresses of all your public functions (so you can work out the loader initial offset by subtracting the address of main() from the address given in your .map file). So when a crash happens (even pressing ^C during a run, for example, so you can find your infinite loops) you can take the stack pointer and scan it for offsets within return addresses. You can usually look at your registers, and implement a simple console to let you examine all this. And voila, you have half of a real debugger implemented. Keep this going and you can reproduce the VxWorks' console debugging mechanism.
Another approach, is logical deduction. This is related to #1. Basically any crash or anomalous behavior in a program occurs when it stops behaving as expected. You need to have some feed back method of knowing when the program is behaving normally then abnormally. Your goal then is to find the exact conditions upon which your program goes from behaving correctly to incorrectly. With printf()/logs, or other feedback (such as enabling a device in an embedded system -- the PC has a speaker, but some motherboards also have a digital display for BIOS stage reporting; embedded systems will often have a COM port that you can use) you can deduce at least binary states of good and bad behavior with respect to the run state of your program through the instrumentation of your program.
A related method is logical deduction with respect to code versions. Often a program was working perfectly at one state, but some later version is not longer working. If you use good source control, and you enforce a "top of tree must always be working" philosophy amongst your programming team, then you can use a binary search to find the exact version of the code at which the failure occurs. You can use diffs then to deduce what code change exposes the error. If the diff is too large, then you have the task of trying to redo that code change in smaller steps where you can apply binary searching more effectively.
Just a couple suggestions:
1) Asserts. This should help you work out general expectations at different states of the program. As well familiarize yourself with the code
2) Unit tests. I have used these at times to dig into new code and test out APIs
One word: Logging.
Your program should write descriptive debug lines which include a timestamp to a log file based on a configurable debug level. Reading the resultant log files gives you information on what happened during the execution of the program. There are logging packages in every common programming language that make this a snap:
Java: log4j
.Net: NLog or log4net
Python: Python Logging
PHP: Pear Logging Framework
Ruby: Ruby Logger
C: log4c
I guess you just have to write fine-grain unit tests.
I also like to write a pretty-printer for my data structures.
I think the rest of the interview might go something like this...
Candidate: So you don't buy debuggers for your developers?
Interviewer: No, they have debuggers.
Candidate: So you are looking for programmers who, out of masochism or chest thumping hamartia, make things complicated on themselves even if they would be less productive?
Interviewer: No, I'm just trying to see if you know what you would do in a situation that will never happen.
Candidate: I suppose I'd add logging or print statements. Can I ask you a similar question?
Interviewer: Sure.
Candidate: How would you recruit a team of developers if you didn't have any appreciable interviewing skill to distinguish good prospects based on relevant information?
Peer review. You have been looking at the code for 8 hours and your brain is just showing you what you want to see in the code. A fresh pair of eyes can make all the difference.
Version control. Especially for large teams. If somebody changed something you rely on but did not tell you it is easy to find a specific change set that caused your trouble by rolling the changes back one by one.
On *nix systems, strace and/or dtrace can tell you an awful lot about the execution of your program and the libraries it uses.
Binary search in time is also a method: If you have your source code stored in a version-control repository, and you know that version 100 worked, but version 200 doesn't, try to see if version 150 works. If it does, the error must be between version 150 and 200, so find version 175 and see if it works... etc.
use println/log in code
use DB explorer to look at data in DB/files
write tests and put asserts in suspicious places
More generally, you can monitor side effects and output of the program, and trigger certain events in the program externally.
A Print statement isn't always appropriate. You might use other forms of output such as writing to the Event Log or a log file, writing to a TCP socket (I have a nice utility that can listen for that type of trace from my program), etc.
For programs that don't have a UI, you can trigger behavior you want to debug by using an external flag such as the existence of a file. You might have the program wait for the file to be created, then run through a behavior you're interested in while logging relevant events.
Another file's existence might trigger the program's internal state to be written to your logging mechanism.
like everyone else said:
Logging
Asserts
Extra Output
&
your favorite task manager or process
explorer
links here and here
Another thing I have not seen mentioned here that I have had to use quite a bit on embedded systems is serial terminals.
You can cannot a serial terminal to just about any type of device on the planet (I have even done it to embedded CPUs for hydraulics, generators, etc). Then you can write out to the serial port and see everything on the terminal.
You can get real fancy and even setup a thread that listens to the serial terminal and responds to commands. I have done this as well and implemented simple commands to dump a list, see internal variables, etc all from a simple 9600 baud RS-232 serial port!
Spy++ (and more recently Snoop for WPF) are tremendous for getting an insight into Windows UI bugs.
A nice read would be Delta Debugging from Andreas Zeller. It's like binary search for debugging

Resources