system profiling - usage information of shared libraries - linux-kernel

Is there any way to know which library files are being used by which process (or by how many number of process) in some amount of time.
Can V-Tune or perf or OProfile be used for this?

At any moment, one can list all shared-libraries within the process map of a particular process-pid
cat /proc/<pid>/maps | grep <name of library>
Also one can check the list of running processes that have opened a particular shared library
lsof <path-to-shared-library-file>

Is there any way to know which library files are being used by which process (or by how many number of process)
You can take a snapshot by cat /proc/*/maps > /tmp/snapshot and then use grep and wc to answer your question.
If you want to monitor the system for some period of time, you could take the snapshot every second or so.
Can V-Tune or perf or OProfile be used for this?
You can do perf record -a, then perf script -D and look for PERF_RECORD_MMAP events.

Related

Query GPU memory usage and/or user by PID

I have a list of PIDs of processes running on different GPUs. I want to get the used GPU memory of each process based on its PID. nvidia-smi yields the information I want; however, I don't know how to grep it, as the output is sophisticated. I have already looked for how to do it, but I have not found any straightforward answers.
While the default output of nvidia-smi is "sophisticated" or rather formatted for interfacing with humans rather than scripts, the command provides lots of options for use in scripts. The ones most fitting for use case seem to be --query-compute-apps=pid,used_memory specifying the information that you need and --format=csv,noheader,nounits specifying the minimal, machine readable output formatting.
So the resulting command is
nvidia-smi --query-compute-apps=pid,used_memory --format=csv,noheader,nounits
I recommend taking a look at man nvidia-smi for further information and options.
nvidia-smi --query-compute-apps=pid,used_memory,gpu_bus_id --format=csv
gpu_bus_id will help you if you have multiple gpus

Getting each of the CPU cores usage via terminal in MacOS

I need to get the list of each of CPU cores usage with one command in macOS terminal.
I have been searching the web for a few hours, but all that I was able to find were two variants, both of which are not what I am looking for.
The first one is the usage of htop command. As I understood, it prints the separate cores load on screen. I was not able to extract this information with one grep command.
I tried looking in the htop source code, but was not able to understand how it gets the cores usage information.
Another solution that I found involves the usage of
ps -A -o %cpu | awk '{s+=$1} END {print s "%"}'
The result is one number that represents the overall CPU usage. If I am correct, the output of macOS ps command, that is used here, does not provide the information about each process's core, so t is not possible to use that approach for my task.
I hope that it is possible to get such results in macOS.
Nope, this is how you do it on Mac:
Put the Activity Monitor in the Dock
Right click on the icon > Monitors

can "perf record" or "perf-record" sample child processes?

Assume I have a harness binary which could spawn different benchmarks according to command line option. I am really interested in sampling these benchmarks.
I have 3 options:
change the harness binary to spawn a perf record child process which run the benchmarks and do the sampling
just do perf record $harness-binary hoping it will sample the child process too.
perf record -a $harness-binary which would do a "System-wide collection from all CPUs.". This requires root access, therefore not feasible in my case.
Approach #2 is clean if perf-record really samples the child process. Can somebody help to confirm if this is the case? Pointers to documents or perf code would be highly appreciated.
If approach #2 is feasible and the benchmarks is much more CPU-intensive than the harness, I think the quality of the benchmark sampling should be reasonably good, right?
Thanks
perf record without -a option record all processes, forked (and threads cloned) from target process after starting of record. With perf record ./program it will profile all child processes too, and with perf record -p $PID with attaching to already running $PID it will profile target process and all child processes started after attaching. Profiling inheritance is enabled by default (code as required: attr->inherit = !opts->no_inherit; & no_inherit) and can be disabled with -i option and also disabled by -t and --per-thread.
This inheritance is like in perf stat: https://perf.wiki.kernel.org/index.php/Tutorial
Counting and inheritance
By default, perf stat counts for all threads of the process and subsequent child processes and threads. This can be altered using the -i option. It is not possible to obtain a count breakdown per-thread or per-process.
And -i option is there for perf record too: http://man7.org/linux/man-pages/man1/perf-record.1.html
-i, --no-inherit
Child tasks do not inherit counters.
perf report can filter events from some PID from collected combined perf.data file.

How to fetch list of running processes on a system and sort them by various parameters

I use htop to view information about the processes currently running in my osx machine, also to sort them by CPU, memory usage, etc.
Is there any way to fetch the output of htop programatically in Ruby?. Also I would like to be able to use the API to sort the processes using various parameters like CPU, memory usage, etc.
I can do IO.popen('ps -a') and parse the output, but want to know if there is a better way than directly parsing the output of a system command run programmatically.
Check out sys-proctable:
require 'sys/proctable'
Sys::ProcTable.ps
To sort by starttime:
Sys::ProcTable.ps.sort_by(&:starttime)

gcc or javac slow at first startup

Can anyone explain why in linux when I start gcc or javac after some time of inactivity it takes a while for them to start. Subsequent invocations are way faster. Is there a way to ensure quick startup always? (This requirement may seem strange, but is necessary in my case). Ubuntu by the way.
Most likely, it's the time it takes for code pages to fault in. There are a few ways to avoid this delay if you really have to. The simplest would be to run gcc periodically. Another would be to install gcc to a RAM disk.
Another approach would be to make a list of which files are involved and then write a simple program to lock all those files into memory. You can use something like:strace -f gcc *rest of gcc command* 2>&1 | grep open | grep -v -- -1
Use a GCC command line that's typical of how you are using GCC.
You'll find libraries and binaries being opened in there. Make a full list in a file. Then write a program that calls mlockall(MCL_FUTURE) then reads in filenames from the file. For each file, mmap it into memory and read each byte. Then have the program just sleep forever (or until killed).
This will have the effect of forcing every page of every file in memory. You should check the total size of all these files and make sure it's not a significant fraction of the amount of memory you actually have!
By the way, there used to be something called a sticky bit that did something like this. If by some chance your platform supports it, just set it on all the files used. (Although it traditionally caused the files to be saved to swap, which on a modern system won't make things any faster.)

Resources