Bash file descriptor leak - bash

I get a file descriptor leak when running the following code:
function get_fd_count() {
local fds
cd /proc/$$/fd; fds=( * ) # avoid a StackOverflow source colorizer bug
echo "${#fds[#]}"
}
function fd_leak_func() {
while : ; do
echo ">> Current FDs: $(get_fd_count)"
read retval new_state < <(set +e; new_state=$(echo foo); retval=$?; printf "%d %s\n" $retval $new_state)
done
}
fd_leak_func
Tested on both 3.2.25 and 4.0.28.
This only happens when the loop is happening within a function; every time we return to top-level context, the extra file descriptors are closed.
Is this intended behavior? More to the point, are workarounds available?
Followup: After reporting to the bash-bug mailing list, this was confirmed as a bug. Chet indicated that a fix will be included in the next release (as of 4/17/2010).

Here's a simplified example:
$ fd_leaker() { while :; do read a < <(pwd); c=(/proc/$$/fd/*); c=${#c[#]}; echo $c; done; }
$ fd_leaker
This one is not fixed by using /bin/true but it's mostly fixed by using (exit 0) But I get "bash: echo: write error: Interrupted system call" errors using that "fix" or if I use /bin/pwd instead of the builtin pwd.
It also seems to be specific to read. I tried grep . < <(pwd) > /dev/null and it worked properly. When I tried while read a; do :; done < <( pwd)
The extra file descriptors in the form of:
lr-x------ 1 user user 64 2010-04-15 19:26 39 -> pipe:[8357879]
I really don't think the runaway creation of them is intended, after all there's nothing recursive going on. I really don't see how adding something in the loop should fix things.

Putting a /bin/true at the end of the loop fixes it, but I don't know why or how, or why it happens in the first place.

It seems to be fixed in bash-4.2. I tested with bash-4.2.28 particularly

Related

Bash completion scripting - getting a "transparent proxy"-like behaviour

I am trying to write a simple Bash completion script for a program that runs its arguments as a command. A good example of this is kind of program is the prime-run script provided by the nvidia-prime package:
#!/bin/bash
__NV_PRIME_RENDER_OFFLOAD=1 __VK_LAYER_NV_optimus=NVIDIA_only __GLX_VENDOR_LIBRARY_NAME=nvidia "$#"
This script sets a few environment variables, which instructs the prime driver to use the Nvidia dGPU on a hybrid system. The first argument is treated as the command, and all trailing arguments are passed through. So for example you can run prime-run code . and VSCode will start in the current directory using the dGPU.
Therefore from a completion-script POV, what we want is to basically try to complete as if the prime-run token isn't there (hence "transparent proxy"-like behaviour). To give a rather contrived example:
> prime-run journalc<TAB>
(completes journalctl)
> prime-run journalctl --us<TAB>
(completes --user)
However I am finding this surprisingly difficult in Bash (not that I know how in other shells). So the question is simple: is it possible and if so how?
Ideas I've (hopelessly) had
The simple complete -A command prime-run: the first argument gets completed as a command as expected (let's call it foo), but the following arguments are also completed as commands rather than as arguments to foo
Use some combination of compgen and complete -p to invoke the completion function of foo, but AFAIK the completion function for all foo is locally defined and thus uncallable
TL;DR
bash-completion provides a function named _command_offset (permalink), which is exactly what I need.
# A meta-command completion function for commands like sudo(8), which need to
# first complete on a command, then complete according to that command's own
# completion definition.
Keep reading if you are interested in how I got here.
So I was daydreaming the other day, when it hit me - doesn't sudo basically have the exact same behaviour I want? So the task became simple - reverse engineer the completion script for sudo. Source available here: permalink.
Turns out, most of the code has to do with completing the various options, so it's safe to simply throw most of it out:
L 8-11, 50-52: Related to sudo's edit mode. Safe to ditch.
L 19-24, 27-39, 43-49: These complete sudo's options. Safe to ditch.
So we're left with this:
_sudo()
{
local cur prev words cword split
_init_completion -s || return
for ((i = 1; i <= cword; i++)); do
if [[ ${words[i]} != -* ]]; then
local PATH=$PATH:/sbin:/usr/sbin:/usr/local/sbin
local root_command=${words[i]}
_command_offset $i
return
fi
done
$split && return
} &&
complete -F _sudo sudo sudoedit
The for and if block are there to deal with sudo's options that precede the "guest command". Safe to ditch (after replacing all $i with 1).
The variable $split is only referenced in _init_completion (permalink), and it seems to be used for handling different argument styles (--foo=bar v.s. --foo bar). Same with the -s flag. Irrelevant.
Appending to $PATH and setting $root_command have to do with privilege escalation. Only relevant to sudo.
So after the dust has cleared, by process of elimination, I ended up with this simple chunk of code:
_my-script()
{
local cur prev words cword
_init_completion || return
_command_offset 1
} && complete -F _my-script my-script
Declaring these four local variables and calling _init_completion is standard for all completion scipts, so really it's as simple as one command. Of course someone had to write the massively-complex _command_offset function so lucky me I guess?
Anyways, thank you for reading the story of me messing around and hopefully this will be helpful to some other person in the future.

Bash: graceful function death on error

I'm trying to find a way to emulate the behavior of set -e in a function, but only within the scope of that function.
Basically, I want a function where if any simple command would trigger set -e it returns 1 up one level. The goal is to isolate sets of risky jobs into functions so that I can gracefully handle them.
If you want any failing command to return 1, you can achieve that by following each command with || return 1.
For instance:
false || return 1 # This will always return 1
I am a big fan of never letting any command fail without explicit handling. For my scripts, I am using an exception handling technique where I return errors in a way that is not return codes, and trap all errors (with bash traps). Any command with a non-zero return code automatically means an improperly handled situation or bug, and I prefer my scripts to fail as soon as such a situation occurs.
Caution: I highly advise against using this technique. If you run the function in a subshell environment, you almost get the behavior you desire. Consider:
#!/bin/bash
foo() ( # Use parens to get a sub-shell
set -e # Does not impact the main script
echo This is executed
false
echo This should *not* be executed
)
foo # Function call fails, returns 1
echo return: $?
# BUT: this is a good reason to avoid this technique
if foo; then # Set -e is invalid in the function
echo Foo returned 0!!
else
echo fail
fi
false # Demonstrates that set -e is not set for the script
echo ok
Seems like you are looking for "nested exceptions" somewhat like what Java gives. For your requirement of scoping it, how about doing a set -e at the beginning of the function and making sure to run set +e before returning from it?
Another idea, which is not efficient or convenient, is to call your function in a subshell:
# some code
(set -e; my_function)
if [[ $? -ne 0 ]]; then
# the function didn't succeed...
fi
# more code
In any case, please be aware that set -e is not the greatest way to handle errors in a shell script. There are way too many issues making it quite unreliable. See these related posts:
What does set -e mean in a bash script?
Error handling in Bash
The approach I take for large scripts that need to exist for a long time in a production environment is:
create a library of functions to do all the standard stuff
the library will have a wrapper around each standard action (say, mv, cp, mkdir, ln, rm, etc.) that would validate the arguments carefully and also handle exceptions
upon exception, the wrapper exits with a clear error message
the exit itself could be a library function, somewhat like this:
--
# library of common functions
trap '_error_handler' ERR
trap '_exit_handler' EXIT
trap '_int_handler' SIGINT
_error_handler() {
# appropriate code
}
# other handlers go here...
#
exit_if_error() {
error_code=${1:-0}
error_message=${2:-"Uknown error"}
[[ $error_code == 0 ]] && return 0 # it is all good
# this can be enhanced to print out the "stack trace"
>&2 printf "%s\n" $error_message
# out of here
my_exit $error_code
}
my_exit() {
exit_code=${1:-0}
_global_graceful_exit=1 # this can be checked by the "EXIT" trap handler
exit $exit_code
}
# simple wrapper for cp
my_cp() {
# add code to check arguments more effectively
cp $1 $2
exit_if_error $? "cp of '$1' to '$2' failed"
}
# main code
source /path/to/library.sh
...
my_cp file1 file2
# clutter-free code
This, along with effective use of trap to take action on ERR and EXIT events, would be a good way to write reliable shell scripts.
Doing more research, I found a solution I rather like in Google's Shell Style Guide. There are some seriously interesting suggestions here, but I think I'm going to go with this for readability:
if ! mv "${file_list}" "${dest_dir}/" ; then
echo "Unable to move ${file_list} to ${dest_dir}" >&2
exit "${E_BAD_MOVE}"
fi

bash: call function with variables vs function with arguments

Lets take for example next function:
version 1 - with variables:
backup () {
for arname in `arname_f`
do
slapcat -b "$setnames" -l "$bkdir"/"$ardate"_"$arname".ldif || exit 1
done
}
and run it just with code:
backup;
version 2 - with positional arguments:
backup () {
for arname in `arname_f`
do
slapcat -b "$1" -l "$2"/"$3"_"$4".ldif || exit 1
done
}
and let's run with such code:
backup $setnames $bkdir $ardate $arname;
Is there any difference in this slants?
The question has nothing to do with Bash as such.
The #1 is the example of "Spaghetti" coding style (global variables) hated by most professionals and simply sane people. It will eventually cause a major problem when someone somewhere changes the parameter and the function starts misbehaving, and you won't have a clue of who/what has changed what where.
The #2 is close to how I would do it. Though, of cause, there may well be a valid reason to prefer #1, it depends.

Bash: Slow redirection and filter

I have a bash script that calls a program which generates a humongous amount of output. A lot of this data is coming from a Python package that I have not created and whose output I can't really control, nor interests me.
I tried to filter the output generated by that external Python package and redirect the "cleaned" output to a log file. If I used regular pipes and grep expressions, I lost many chunks of information. I read that is something that can actually happen with the redirections (1 and 2).
In order to fix that, I made the redirections like this:
#!/bin/bash
regexTxnFilterer="\[txn\.-[[:digit:]]+\]"
regexThreadPoolFilterer="\[paste\.httpserver\.ThreadPool\]"
bin/paster serve --reload --pid-file="/var/run/myServer//server.pid" parts/etc/debug.ini 2>&1 < "/dev/null" | while IFS='' read -r thingy ; do
if [[ ! "$thingy" =~ $regexTxnFilterer ]] && [[ ! "$thingy" =~ $regexThreadPoolFilterer ]]; then
echo "$thingy" >> "/var/log/myOutput.log"
fi
done
Which doesn't lose any information (at least not that I could tell) and filters the strings I don't need (using the two regular expressions above).
The issue is that it has rendered the application (the bin/paster thing I'm executing) unbearably slow. Is there any way to achieve the same effect but with a better performance?
Thank you in advance!
Update #2012-04-13: As shellter pointed out in one of the comments to this question, it may be useful to provide examples of the outputs I want to filter. Here's a bunch of them:
2012-04-13 19:30:37,996 DEBUG [txn.-1220917568] new transaction
2012-04-13 19:30:37,997 DEBUG [txn.-1220917568] commit <zope.sqlalchemy.datamanager.SessionDataManager object at 0xbf4062c>
2012-04-13 19:30:37,997 DEBUG [txn.-1220917568] commit
Starting server in PID 18262.
2012-04-13 19:30:38,292 DEBUG [paste.httpserver.ThreadPool] Started new worker -1269716112: Initial worker pool
2012-04-13 19:33:08,158 DEBUG [txn.-1244144784] new transaction
2012-04-13 19:33:08,158 DEBUG [txn.-1244144784] commit
2012-04-13 19:32:06,980 DEBUG [paste.httpserver.ThreadPool] Added task (0 tasks queued)
2012-04-13 19:32:06,980 INFO [paste.httpserver.ThreadPool] kill_hung_threads status: 10 threads (0 working, 10 idle, 0 starting) ave time N/A, max time 0.00sec, killed 0 workers
There's a few more different messages involving the ThreadPool though, but I couldn't catch any.
For one thing -- you're reopening the log file every time you want to append a line. That's silly.
Instead of this:
while ...; do
echo "foo" >>filename
done
Do this (which opens the output file on a new, non-stdout file handle, such that you still have a clear line to stdout should you wish to write to it):
exec 4>>filename
while ...; do
echo "foo" >&4
done
It's also possible to redirect stdout for the whole loop:
while ...; do
echo "foo"
done >filename
...notably, this will impact more than just the "echo" line, and thus have slightly different semantics from the original.
Or, better yet -- Configure the Python logging module to filter output to only what you care about, and don't bother with shell-script postprocessing at all.
If the version of Paste you're using is sufficiently similar to modern Pyramid, you can put this in your ini file (currently parts/etc/debug.ini):
[logger_paste.httpserver.ThreadPool]
level = INFO
[logger_txn]
level = INFO
...and anything below INFO level (including the DEBUG messages) will be excluded.
It may be faster to use a grep-based solution to this
#!/bin/bash
regexTxnFilterer="\[txn\.-[[:digit:]]+\]"
regexThreadPoolFilterer="\[paste\.httpserver\.ThreadPool\]"
bin/paster serve --reload --pid-file="/var/run/myServer//server.pid" parts/etc/debug.ini 2>&1 < "/dev/null" | grep -vf <(echo "$regexTxnFilterer"; echo "$regexThreadPoolFilterer") >> "/var/log/myOutput.log"
Your loop may be slow because the echo "$thingy" >> "/var/log/myOutput.log" line is opening and closing the log file every time it executes. I wouldn't expect there to be a big performance difference between grep's regex matching and bash's, but if there was it wouldn't surprise me.
Late Edit
There's a far simpler way to fix the performance issue caused by opening/closing the output once per line. Why this didn't occur to me before, I have no idea. Just move the >> to outside your loop
#!/bin/bash
regexTxnFilterer="\[txn\.-[[:digit:]]+\]"
regexThreadPoolFilterer="\[paste\.httpserver\.ThreadPool\]"
bin/paster serve --reload --pid-file="/var/run/myServer//server.pid" parts/etc/debug.ini 2>&1 < "/dev/null" | while IFS='' read -r thingy ; do
if [[ ! "$thingy" =~ $regexTxnFilterer ]] && [[ ! "$thingy" =~ $regexThreadPoolFilterer ]]; then
echo "$thingy"
fi
done >> "/var/log/myOutput.log"
I can't see any compelling reason why this would be either faster or slower than the grep solution, but it's a lot closer to the original code and a little less cryptic.

How to deal with NFS latency in shell scripts

I'm writing shell scripts where quite regularly some stuff is written
to a file, after which an application is executed that reads that file. I find that through our company the network latency differs vastly, so a simple sleep 2 for example will not be robust enough.
I tried to write a (configurable) timeout loop like this:
waitLoop()
{
local timeout=$1
local test="$2"
if ! $test
then
local counter=0
while ! $test && [ $counter -lt $timeout ]
do
sleep 1
((counter++))
done
if ! $test
then
exit 1
fi
fi
}
This works for test="[ -e $somefilename ]". However, testing existence is not enough, I sometimes need to test whether a certain string was written to the file. I tried
test="grep -sq \"^sometext$\" $somefilename", but this did not work. Can someone tell me why?
Are there other, less verbose options to perform such a test?
You can set your test variable this way:
test=$(grep -sq "^sometext$" $somefilename)
The reason your grep isn't working is that quotes are really hard to pass in arguments. You'll need to use eval:
if ! eval $test
I'd say the way to check for a string in a text file is grep.
What's your exact problem with it?
Also you might adjust your NFS mount parameters, to get rid of the root problem. A sync might also help. See NFS docs.
If you're wanting to use waitLoop in an "if", you might want to change the "exit" to a "return", so the rest of the script can handle the error situation (there's not even a message to the user about what failed before the script dies otherwise).
The other issue is using "$test" to hold a command means you don't get shell expansion when actually executing, just evaluating. So if you say test="grep \"foo\" \"bar baz\"", rather than looking for the three letter string foo in the file with the seven character name bar baz, it'll look for the five char string "foo" in the nine char file "bar baz".
So you can either decide you don't need the shell magic, and set test='grep -sq ^sometext$ somefilename', or you can get the shell to handle the quoting explicitly with something like:
if /bin/sh -c "$test"
then
...
Try using the file modification time to detect when it is written without opening it. Something like
old_mtime=`stat --format="%Z" file`
# Write to file.
new_mtime=$old_mtime
while [[ "$old_mtime" -eq "$new_mtime" ]]; do
sleep 2;
new_mtime=`stat --format="%Z" file`
done
This won't work, however, if multiple processes try to access the file at the same time.
I just had the exact same problem. I used a similar approach to the timeout wait that you include in your OP; however, I also included a file-size check. I reset my timeout timer if the file had increased in size since last it was checked. The files I'm writing can be a few gig, so they take a while to write across NFS.
This may be overkill for your particular case, but I also had my writing process calculate a hash of the file after it was done writing. I used md5, but something like crc32 would work, too. This hash was broadcast from the writer to the (multiple) readers, and the reader waits until a) the file size stops increasing and b) the (freshly computed) hash of the file matches the hash sent by the writer.
We have a similar issue, but for different reasons. We are reading s file, which is sent to an SFTP server. The machine running the script is not the SFTP server.
What I have done is set it up in cron (although a loop with a sleep would work too) to do a cksum of the file. When the old cksum matches the current cksum (the file has not changed for the determined amount of time) we know that the writes are complete, and transfer the file.
Just to be extra safe, we never overwrite a local file before making a backup, and only transfer at all when the remote file has two cksums in a row that match, and that cksum does not match the local file.
If you need code examples, I am sure I can dig them up.
The shell was splitting your predicate into words. Grab it all with $# as in the code below:
#! /bin/bash
waitFor()
{
local tries=$1
shift
local predicate="$#"
while [ $tries -ge 1 ]; do
(( tries-- ))
if $predicate >/dev/null 2>&1; then
return
else
[ $tries -gt 0 ] && sleep 1
fi
done
exit 1
}
pred='[ -e /etc/passwd ]'
waitFor 5 $pred
echo "$pred satisfied"
rm -f /tmp/baz
(sleep 2; echo blahblah >>/tmp/baz) &
(sleep 4; echo hasfoo >>/tmp/baz) &
pred='grep ^hasfoo /tmp/baz'
waitFor 5 $pred
echo "$pred satisfied"
Output:
$ ./waitngo
[ -e /etc/passwd ] satisfied
grep ^hasfoo /tmp/baz satisfied
Too bad the typescript isn't as interesting as watching it in real time.
Ok...this is a bit whacky...
If you have control over the file: you might be able to create a 'named pipe' here.
So (depending on how the writing program works) you can monitor the file in an synchronized fashion.
At its simplest:
Create the named pipe:
mkfifo file.txt
Set up the sync'd receiver:
while :
do
process.sh < file.txt
end
Create a test sender:
echo "Hello There" > file.txt
The 'process.sh' is where your logic goes : this will block until the sender has written its output. In theory the writer program won't need modifiying....
WARNING: if the receiver is not running for some reason, you may end up blocking the sender!
Not sure it fits your requirement here, but might be worth looking into.
Or to avoid synchronized, try 'lsof' ?
http://en.wikipedia.org/wiki/Lsof
Assuming that you only want to read from the file when nothing else is writing to it (ie, the writing process has finished) - you could check whether nothing else has file handle to it ?

Resources