Debugging file descriptor leak ( in kernel ?) - linux-kernel

I am working in this relatively large code base where I am seeing a file descriptor leak and processes start complaining that they are not able to open files after I run certain programs.
Though this happens after 6 days , I am able to reproduce the problem in 3-4 hours by reducing the value in /proc/sys/fs/file-max to 9000.
There are many processes running at any moment. I have been able to pin point couple of processes that could be causing the leak. However, I don't see any file descriptor leak either through lsof or through /proc//fd.
If I kill the processes(they communicate with each other) that I am suspecting of leaking, the leak goes away. FDs are released.
cat /proc/sys/fs/file-nr in a while(1) loop shows the leak. However, I don't see any leak in any process.
Here is a script I wrote to detect that leak is happening :
#!/bin/bash
if [ "$#" != "2" ];then
name=`basename $0`
echo "Usage : $name <threshold for number of pids> <check_interval>"
exit 1
fi
fd_threshold=$1
check_interval=$2
total_num_desc=0
touch pid_monitor.txt
nowdate=`date`
echo "=================================================================================================================================" >> pid_monitor.txt
echo "****************************************MONITORING STARTS AT $nowdate***************************************************" >> pid_monitor.txt
while [ 1 ]
do
for x in `ps -ef | awk '{ print $2 }'`
do
if [ "$x" != "PID" ];then
num_fd=`ls -l /proc/$x/fd 2>/dev/null | wc -l`
pname=`cat /proc/$x/cmdline 2> /dev/null`
total_num_desc=`expr $total_num_desc + $num_fd`
if [ $num_fd -gt $fd_threshold ]; then
echo "Proces name $pname($x) and number of open descriptor = $num_fd" >> pid_monitor.txt
fi
fi
done
total_nr_desc=`cat /proc/sys/fs/file-nr`
lsof_desc=`lsof | wc -l`
nowdate=`date`
echo "$nowdate : Total number of open file descriptor = $total_num_desc lsof desc: = $lsof_desc file-nr descriptor = $total_nr_desc" >> pid_monitor.txt
total_num_desc=0
sleep $2
done
./monitor.fd.sh 500 2 &
tail -f pid_monitor.txt
As I mentioned earlier, I don't see any leak in /proc//fd for any , but leak is happening for sure and system is running out of file descriptors.
I suspect something in the kernel is leaking. Linux kernel version 2.6.23.
My questions are follows :
Will 'ls /proc//fd' show list descriptors for any library linked to the process with pid . If not how do i determine when there is a leak in the library i am linking to.
How do I confirm that leak is in the userspace vs. in kernel.
If the leak is in the kernel what tools can I use to debug ?
Any other tips you can give me.
Thanks for going through the question patiently.
Would really appreciate any help.

Found the solution to the problem.
There was a shared memory attach happening in some function and that function was getting called every 30 seconds. The shared memory attach was never getting detached , hence the descriptor leak. I guess /proc//fd doesn't show shared memory attach as a descriptor. Hence my script was not able to catch file descriptor leak.

Which processes start complaining? And what is the error you see? What is the output of your monitoring script?
To open a file you need two things, a file descriptor, and a struct file - or file description. The file descriptor is what userspace uses, inside the kernel it is used to lookup the struct file. It's not clear to me which you are leaking.

Related

Watching folder for changes - bash efficiency

I've got this:
#!/bin/bash
while :
do
SUM=$(tree | md5sum)
if [ "$SUMOLD" != "$SUM" ]; then
# do something here
SUMOLD=$SUM
sleep 1
fi
done
Which works just fine. But, the problem is that it consumes 50% of the CPU, a Core2 Duo T8300. Why is that? How to improve the efficiency?
This is a job for inotifywait. inotify is Linux's event-based system for monitoring files and directories for changes. Goodbye polling loops!
NAME
inotifywait - wait for changes to files using inotify
SYNOPSIS
inotifywait [-hcmrq] [-e <event> ] [-t <seconds> ] [--format <fmt> ]
[--timefmt <fmt> ] <file> [ ... ]
DESCRIPTION
inotifywait efficiently waits for changes to files using Linux's inotify(7) interface. It is suitable for waiting for changes to
files from shell scripts. It can either exit once an event occurs, or continually execute and output events as they occur.
Here's how you could write a simple loop which detects whenever files are added, modified, or deleted from a directory:
inotifywait -mq /dir | while read event; do
echo "something happened in /dir: $event"
done
Take a look at the man page for more options. If you only care about modifications and want to ignore files simply being read, you could use -e to limit the types of events.
While a OS specific solution like inotify would be better, you can dramatically improve your script by moving the sleep out of the if statement:
#!/bin/bash
while :
do
SUM=$(tree | md5sum)
if [ "$SUMOLD" != "$SUM" ]; then
# do something here
SUMOLD=$SUM
# move sleep from here
fi
sleep 1 # to here
done
CPU usage should drop dramatically now that you're always checking once per second, instead of as often as possible when there are no changes. You can also replace tree with find, and sleep for longer between each check.
The reason is that the script is continuously running even when the command sleep is called. My recommendation is to launch your script using the "inotifywait" or the "watch" (Alternative solution) command and avoid to use the while loop.
See: http://linux.die.net/man/1/watch
An example taken from the MAN pages:
To watch the contents of a directory change, you could use
watch -d ls -l
Watch is going to launch your script periodically but without continuously execute your script.

out of memory kernel crash

I am facing an issue on my system related to the out of memory (OOM) error. Under this condition, the oom kill utility of the linux kills a process (called "the bad process") using a specific algorithm to free up the space.
I want to print the memory, process stats just before this condition happens.
mm/oom_kill.c contains the function out_of_memory(). I wanted to print my stats just before this function moves ahead with the killing of "the bad process". For this i wrote the following bash script
#!/bin/bash
# Script to print process related info
echo "Vmstat " > OOM_memresults
vmstat >> OOM_memresults
echo >> OOM_memresults
echo "SLABINFO" >> OOM_memresults
cat /proc/slabinfo >> OOM_memresults
echo >> OOM_memresults
echo "Status of process getting killed" >> OOM_memresults
cat /proc/$1/status >> OOM_memresults
Now the problem i am facing is to find a way to call this script from the kernel code.
I cannot use system("scriptname") as system function is not available in linux kernel, neither we can use exec and its variants.
Any ideas how i can call this script from the kernel code or any other way i can print the process, memory related info at any instant from the kernel code.
The "current" function gives the information about the currently running process and its task_struct but its very difficult to pull any useful info out of it.

How can I have output from one named pipe fed back into another named pipe?

I'm adding some custom logging functionality to a bash script, and can't figure out why it won't take the output from one named pipe and feed it back into another named pipe.
Here is a basic version of the script (http://pastebin.com/RMt1FYPc):
#!/bin/bash
PROGNAME=$(basename $(readlink -f $0))
LOG="$PROGNAME.log"
PIPE_LOG="$PROGNAME-$$-log"
PIPE_ECHO="$PROGNAME-$$-echo"
# program output to log file and optionally echo to screen (if $1 is "-e")
log () {
if [ "$1" = '-e' ]; then
shift
$# > $PIPE_ECHO 2>&1
else
$# > $PIPE_LOG 2>&1
fi
}
# create named pipes if not exist
if [[ ! -p $PIPE_LOG ]]; then
mkfifo -m 600 $PIPE_LOG
fi
if [[ ! -p $PIPE_ECHO ]]; then
mkfifo -m 600 $PIPE_ECHO
fi
# cat pipe data to log file
while read data; do
echo -e "$PROGNAME: $data" >> $LOG
done < $PIPE_LOG &
# cat pipe data to log file & echo output to screen
while read data; do
echo -e "$PROGNAME: $data"
log echo $data # this doesn't work
echo -e $data > $PIPE_LOG 2>&1 # and neither does this
echo -e "$PROGNAME: $data" >> $LOG # so I have to do this
done < $PIPE_ECHO &
# clean up temp files & pipes
clean_up () {
# remove named pipes
rm -f $PIPE_LOG
rm -f $PIPE_ECHO
}
#execute "clean_up" on exit
trap "clean_up" EXIT
log echo "Log File Only"
log -e echo "Echo & Log File"
I thought the commands on line 34 & 35 would take the $data from $PIPE_ECHO and output it to the $PIPE_LOG. But, it doesn't work. Instead I have to send that output directly to the log file, without going through the $PIPE_LOG.
Why is this not working as I expect?
EDIT: I changed the shebang to "bash". The problem is the same, though.
SOLUTION: A.H.'s answer helped me understand that I wasn't using named pipes correctly. I have since solved my problem by not even using named pipes. That solution is here: http://pastebin.com/VFLjZpC3
it seems to me, you do not understand what a named pipe really is. A named pipe is not one stream like normal pipes. It is a series of normal pipes, because a named pipe can be closed and a close on the producer side is might be shown as a close on the consumer side.
The might be part is this: The consumer will read data until there is no more data. No more data means, that at the time of the read call no producer has the named pipe open. This means that multiple producer can feed one consumer only when there is no point in time without at least one producer. Think of it of door which closes automatically: If there is a steady stream of people keeping the door always open either by handing the doorknob to the next one or by squeezing multiple people through it at the same time, the door is open. But once the door is closed it stays closed.
A little demonstration should make the difference a little clearer:
Open three shells. First shell:
1> mkfifo xxx
1> cat xxx
no output is shown because cat has opened the named pipe and is waiting for data.
Second shell:
2> cat > xxx
no output, because this cat is a producer which keeps the named pipe open until we tell him to close it explicitly.
Third shell:
3> echo Hello > xxx
3>
This producer immediately returns.
First shell:
Hello
The consumer received data, wrote it and - since one more consumer keeps the door open, continues to wait.
Third shell
3> echo World > xxx
3>
First shell:
World
The consumer received data, wrote it and - since one more consumer keeps the door open, continues to wait.
Second Shell: write into the cat > xxx window:
And good bye!
(control-d key)
2>
First shell
And good bye!
1>
The ^D key closed the last producer, the cat > xxx, and hence the consumer exits also.
In your case which means:
Your log function will try to open and close the pipes multiple times. Not a good idea.
Both your while loops exit earlier than you think. (check this with (while ... done < $PIPE_X; echo FINISHED; ) &
Depending on the scheduling of your various producers and consumers the door might by slam shut sometimes and sometimes not - you have a race condition built in. (For testing you can add a sleep 1 at the end of the log function.)
You "testcases" only tries each possibility once - try to use them multiple times (you will block, especially with the sleeps ), because your producer might not find any consumer.
So I can explain the problems in your code but I cannot tell you a solution because it is unclear what the edges of your requirements are.
It seems the problem is in the "cat pipe data to log file" part.
Let's see: you use a "&" to put the loop in the background, I guess you mean it must run in parallel with the second loop.
But the problem is you don't even need the "&", because as soon as no more data is available in the fifo, the while..read stops. (still you've got to have some at first for the first read to work). The next read doesn't hang if no more data is available (which would pose another problem: how does your program stops ?).
I guess the while read checks if more data is available in the file before doing the read and stops if it's not the case.
You can check with this sample:
mkfifo foo
while read data; do echo $data; done < foo
This script will hang, until you write anything from another shell (or bg the first one). But it ends as soon as a read works.
Edit:
I've tested on RHEL 6.2 and it works as you say (eg : bad!).
The problem is that, after running the script (let's say script "a"), you've got an "a" process remaining. So, yes, in some way the script hangs as I wrote before (not that stupid answer as I thought then :) ). Except if you write only one log (be it log file only or echo,in this case it works).
(It's the read loop from PIPE_ECHO that hangs when writing to PIPE_LOG and leaves a process running each time).
I've added a few debug messages, and here is what I see:
only one line is read from PIPE_LOG and after that, the loop ends
then a second message is sent to the PIPE_LOG (after been received from the PIPE_ECHO), but the process no longer reads from PIPE_LOG => the write hangs.
When you ls -l /proc/[pid]/fd, you can see that the fifo is still open (but deleted).
If fact, the script exits and removes the fifos, but there is still one process using it.
If you don't remove the log fifo at the cleanup and cat it, it will free the hanging process.
Hope it will help...

How to get a full copy of a dump file while it is still being created?

Every hour the main machine takes a minute to produce a dump file of
about 100MB.
A backup machine copies that, using scp, also hourly.
Both actions are triggered by cron to start at the same minutes past the hour.
The copy often contains only part of the dump file.
Although I could change the cron on the backup machine to happen 5 minutes later, that rather smells of cheating.
What is the correct way around this problem?
Leverage fuser to determine if the file is in-use; something like:
#!/bin/sh
MYFILE=myfile #...change as necessary
while true
do
if [ -z "$(/sbin/fuser ${MYFILE} 2> /dev/null)" ]; then
break
fi
echo "...waiting..."
sleep 10 #...adjust as necessary
done
echo "Process the file..."
Assuming you can modify the source code of the programs, you can have the "dumper" output a second file indicating that it is done dumping.
1. Dumper deletes signal file
2. Dumper dumps file
3. Dumper creates signal file
4. Backup program polls periodically until the signal file is created
5. Backup program deletes signal file
6. Backup program has complete file
Not the best form of IPC but as its on another machine... I suppose you could also open a network connection to do the polling.
Very quick and very dirty:
my_file=/my/filename
while [[ -n "$(find $my_file)" -a -n "$(find $my_file -mmin -1)" ]] ; do
echo "File '$my_file' exists, but has been recently modified. Waiting..."
sleep 10
done
scp ... # if the file does not exist at all, we don't wait for it, but let scp fail
Try to copy the file from the main node to the backup node in one script:
1. dumpfile
2. scp
Then it's guaranteed that the file is finished before the scp call.

Shell script that continuously checks a text file for log data and then runs a program

I have a java program that stops often due to errors which is logged in a .log file. What can be a simple shell script to detect a particular text in the last/latest line say
[INFO] Stream closed
and then run the following command
java -jar xyz.jar
This should keep on happening forever(possibly after every two minutes or so) because xyz.jar writes the log file.
The text stream closed can arrive a lot of times in the log file. I just want it to take an action when it comes in the last line.
How about
while [[ true ]];
do
sleep 120
tail -1 logfile | grep -q "[INFO] Stream Closed"
if [[ $? -eq 1 ]]
then
java -jar xyz.jar &
fi
done
There may be condition where the tailed last log "Stream Closed" is not the real last log and the process is still logging the messages. We can avoid this condition by checking if the process is alive or not. If the process exited and the last log is "Stream Closed" then we need to restart the application.
#!/bin/bash
java -jar xyz.jar &
PID=$1
while [ true ]
do
tail -1 logfile | grep -q "Stream Closed" && kill -0 $PID && sleep 20 && continue
java -jar xyz.jar &
PID=$1
done
I would prefer checking whether the corresponding process is still running and restart the program on that event. There might be other errors that cause the process to stop. You can use a cronjob to periodically (like every minute) perform such a check.
Also, you might want to improve your java code so that it does not crash that often (if you have access to the code).
i solved this using a watchdog script that checks directly (grep) if program(s) is(are) running. by calling watchdog every minute (from cron under ubuntu), i basically guarantee (programs and environment are VERY stable) that no program will stay offline for more than 59 seconds.
this script will check a list of programs using the name in an array and see if each one is running, and, in case not, start it.
#!/bin/bash
#
# watchdog
#
# Run as a cron job to keep an eye on what_to_monitor which should always
# be running. Restart what_to_monitor and send notification as needed.
#
# This needs to be run as root or a user that can start system services.
#
# Revisions: 0.1 (20100506), 0.2 (20100507)
# first prog to check
NAME[0]=soc_gt2
# 2nd
NAME[1]=soc_gt0
# 3rd, etc etc
NAME[2]=soc_gp00
# START=/usr/sbin/$NAME
NOTIFY=you#gmail.com
NOTIFYCC=you2#mail.com
GREP=/bin/grep
PS=/bin/ps
NOP=/bin/true
DATE=/bin/date
MAIL=/bin/mail
RM=/bin/rm
for nameTemp in "${NAME[#]}"; do
$PS -ef|$GREP -v grep|$GREP $nameTemp >/dev/null 2>&1
case "$?" in
0)
# It is running in this case so we do nothing.
echo "$nameTemp is RUNNING OK. Relax."
$NOP
;;
1)
echo "$nameTemp is NOT RUNNING. Starting $nameTemp and sending notices."
START=/usr/sbin/$nameTemp
$START 2>&1 >/dev/null &
NOTICE=/tmp/watchdog.txt
echo "$NAME was not running and was started on `$DATE`" > $NOTICE
# $MAIL -n -s "watchdog notice" -c $NOTIFYCC $NOTIFY < $NOTICE
$RM -f $NOTICE
;;
esac
done
exit
i do not use the log verification, though you could easily incorporate that into your own version (just change grep for log check, for example).
if you run it from command line (or putty, if you are remotely connected), you will see what was working and what wasnt. have been using it for months now without a hiccup. just call it whenever you want to see what's working (regardless of it running under cron).
you could also place all your critical programs in one folder, do a directory list and check if every file in that folder has a program running under the same name. or read a txt file line by line, with every line correspoding to a program that is supposed to be running. etcetcetc
A good way is to use the awk command:
tail -f somelog.log | awk '/.*[INFO] Stream Closed.*/ { system("java -jar xyz.jar") }'
This continually monitors the log stream and when the regular expression matches its fires off whatever system command you have set, which is anything you would type into a shell.
If you really wanna be good you can put that line into a .sh file and run that .sh file from a process monitoring daemon like upstart to ensure that it never dies.
Nice and clean =D

Resources