How to monitor memory usage of all processes in Linux? - embedded-linux

I'm developing a program running on embedded Linux (Debian Buster), and I found the program sometimes has performance issues. After some debugging process, I doubt the issue might not be in my program. Instead, somehow the OS start doing memory swap and my program was swapped to the file system.
Therefore, I use the code here to verify. And it turns out my program occupied much less physical memory after about 500 seconds, and it matches the hypothesis.
Now I want to find which process suddenly takes lots of memory at that point, but I don't know how.
Is there anyway to keep monitoring memory usage of all processes (or the top 10) of the system and dump to a log file? Any tools or commands would be good.
Thanks.

I'm developing a program running on embedded Linux
It will be helpful, if you could specify which embedded Linux you are working on.
Based on that, there are tools that someone could suggest.
For Linux, I would say, you could use:
top -p [PID]
you can get PID by:
ps [options]
I am not sure if there is a problem while using the command line?
dump to a log file
I think you could use grep to dump the terminal output to a log file you can create using touch command.

Related

Monitor what processes are writing to a file on Windows from Go

I am writing a Go app targeted at macOS and Windows and that needs to monitor what processes write to a file at a given path. More specifically, I need to verify that only one specific process writes to the file for the duration that my program is running. On macOS, I can monitor the file via the built-in fs_usage command. Does anyone have an idea for how to achieve equivalent monitoring on at least Windows 10 and later without requiring the user to install any additional software.
Note that I don't expect for there to exist a pure Go solution and I don't mind interoperating to achieve the desired result.

Using VMMap in a batch script

I am doing some analysis work on some software we are running where I work. The software seems to have memory issues some where along the line which are proving difficult to track down. We have decided to use Sysinternals VMMap to track the memory being used by the software.
We have VMMap exporting the usage every 20 seconds using Windows scheduler to launch a batch script which pulls back the target process PID and launches VMMap with it. The process runs for a while, output appearing the out directory but after a while it stops. Windows scheduler reports the job ran fine and will start another instance when the trigger is meant, once again with no output.
After a bit of investigation it looks like VMMap is failing to open the process and is trying to report an error through its GUI. Since we are running in batch, we cannot see this error to dismiss it. This is causing numerous process' to be spawned but not actually doing anything.
Has anyone come across this issue when using VMMap, or know of anything that may help? I am thinking there may be some flag I can pass which suppresses messages or maybe some way I can handle it in the batch but Google hasn't helped nor has the Sysinternals forum. Any help would be really appreciated.
VMMap is a GUI tool, so trying to capture its output in an automated way will be difficult. Instead, try using another SysInternals tool, Handle, that captures a lot of the same information, but exports/reports on it in command line, where it can be captured much easier. Alternatively, don't run the output in an auto-repeating way when using VMMap, but instead have your script somehow detect the error or missing expected results/data and stop so the GUI output can be examined.
All Sysinternals tools do pop up a consent dialog for the first time they are started on a new machine to accept their license. I think you did deploy the tool to a production machine and it was trying to show the consent dialog but nobody did press ok.
They do basically create a registry key on the machine which you can fake if you need a fully automated deployement or you can start in once on the target machine for the user in question.

How can I find whether a process is in deadlock or is waiting for I/O

Asked by an Interviewer:
How can we find if an application has become non responsive due to a deadlock or due to wait on some IO?
Can anybody comment any general way of doing this, or if various provides some specific ways of doing this?
This is an OS related thing I believe so I am not tagging any language here.
EDIT: I would like to know about the techniques and the APIs as well to do this. So that i can run a monitoring program if i wish.
On linux I would use sar -u 1. If the %iowait column is high, then the application is probably waiting for IO
On Windows you can attach WinDbg and then execute !analyze -v -hang which will work out which thread is waiting on I/O. (The only time I used this I got lucky and it was an open call which was waiting, so I got to find out the file name very quickly.)
The answer is there are many possible design as solutions.
If in your application, u use open() with lockf() or flock() to lock the resource. So the next time another process (or the same process) attempt to flock() the same file again it will be blocked.
If u open a file with LOCK_NB (see "man -s 2 flock in Ubuntu) non-blocking locks, and then returned with EWOULDBLOCK error, then u can deduce that the file is locked.
To identify all the locked files in the OS, one way is to do a "lsof" to see all the opened files, and from the filename and using fcntl() u can identify the types of locks held.
Many possible alternative designs: eg, for Oracle database there is a concept called waiter list to list all the waiters waiting on the existing locked records. Because of this sophisticated design, automatic deadlock detection is also possible.
http://www.dba-oracle.com/t_deadlock.htm
Other techniques are described in general OS courses:
http://lovingod.host.sk/tanenbaum/Recovery-from-Deadlock.html
On Linux you can attach gdb to a running process. It'll stop the process at the point where is is running, with bt you'll get the back-trace. You can also get the thread info of all running threads, switch between them and look at the back-trace of each using info threads; thread N; bt.
Another very useful tool under Linux is strace which traces system calls, you can also attach this to running processes. The -c option shows you profiling information of the system calls done by the program.

Sample a process on Mac OS X from a C/C++ program

The Sample Process feature in Activity Monitor is quite a useful thing. However, I need to do the same thing (take samples) of a certain process from another running process (C/C++) or a command line.
Is there any way to do this? I have been googling for this since a few days without any luck.
There is a command-line utility sample.
Example:
sample Safari -file /dev/stdout
It will get exactly the same output with Activity Monitor.
There are some few commandsline application that come in handy: sample and top.
If you want to write your own program, you can use the sysctl system call to get such information. However, it's quite tedious.
I would recommend installing procfs file system (built with MacFUSE). This would create a new "directory" at /proc that contains a lot of useful information for each application (e.g. memory usage, cpu usage, locks, opened files, sockets, threads, etc). The site gives a sample of how it can be accessed. Then you can simply script your access to those files.

I need to find the point in my userland code that crash my kernel

I have big system that make my system crash hard. When I boot up, I don't even have
a coredump. If I log every line that
get executed until my system goes down. I will find that evil code.
Can I log every source code line in GDB to a file?
UPDATE:
ok, I found the bug. It was nasty. The application I started did not
take the system down. After learning about coredump inspection with mdb, and some gdb stepping I found out that the systemcall causing the dump, was not implemented. Updating the system to latest kernel will fix my problem. Thanks to all of you.
MY LESSON:
make sure you know what process causes the coredump. It's not always the one you started.
Sounds like a tricky little problem.
I often try to eliminate as many possible suspects as I can by commenting out large chunks of code, configuring the system to not run certain pieces (if it allows you to do that) etc. This amounts to doing an ad-hoc binary search on the problem, and is a surprisingly effective way of zooming in on offending code relatively quickly.
A potential problem with logging is that the log might not hit the disk before the system locks up - if you don't get a core dump, you might not get the log.
Speaking of core dumps, make sure you don't have a limit on your core dump size (man ulimit.)
You could try to obtain a list of all the functions in your code using objdump, process it a little bit and create a bunch of GDB trace statements on those functions - basically creating a GDB script automatically. If that turns out to be overkill, then a binary search on the code using tracepoints can also help you zoom in on the problem.
And don't panic. You're smarter than the bug - you'll find it.
You can not reasonably track every line of your source using GDB (too slow). Besides, a system crash is most likely a result of a system call, and libc is probably doing the system call on your behalf. Even if you find the line of the application that caused OS crash, you still don't really know anything.
You should start by clarifying which OS is crashing. For Linux, you can try the following approaches:
strace -fo trace.out /path/to/app
After reboot, trace.out will contain syscalls the application was doing just before the crash. If you are lucky, you'll see the last syscall-of-death, but I wouldn't count on it.
Alternatively, try to reproduce the crash on the user-mode Linux, or on kernel with KGDB compiled in.
These will tell you where the problem in the kernel is. Finding the matching system call in your application will likely be trivial.
Please clarify your problem: What part of the system is crashing?
Is it an application?
If so, which application? Is this an application which you have written yourself? Is this an application you have obtained from elsewhere? Can you obtain a clean interrupt if you use a debugger? Can you obtain a backtrace showing which functions are calling the section of code which crashes?
Is it a new hardware driver?
Is it based on an older driver? If so, what has changed? Is it based on a manufacturer's data sheet? Is that data sheet the latest and most correct?
Is it somewhere in the kernel? Which kernel?
What is the OS? I assume it is linux, seeing that you are using the GNU debugger. But of course, that is not necessarily so.
You say you have no coredump. Have you enabled coredumps on your machine? Most systems these days do not have coredumps enabled by default.
Regarding logging GDB output, you may have some success, but it depends where the problem is whether or not you will have the right output logged before the system crashes. There is plenty of delay in writing to disk. You may not catch it in time.
I'm not familiar with the gdb way of doing this, but with windbg the way to go is to have a debugger attached to the kernel and control the debugger remotely over a serial cable (or firewire) from a second debugger. I'm pretty sure gdb has similar capabilities, I could quickly find some hints here: http://www.digipedia.pl/man/gdb.4.html

Resources