Unexpected behaviour in bash redirection - bash

I found a strange and totally unexpected behaviour while working with redirection in bash and even If I manage to work around it, I'd like to know why it happens.
If I run this command:{ echo wtf > /dev/stdout ; } >> wtf.txt N times, I expect to see the filled with N "wtf" lines. What I found in the file is a single line.
I think that since the first command is opening /dev/stdout in truncate mode, then the mode is inherited by the second file descriptor (wtf.txt), which is then completely erased, but I'd like to know if some of you may explain it better and if this is the correct behaviour or a bug.
Just to be clear, the command I used was a different one, but with the echo example is simpler to understand it. The original command was a command who need an output file as argument and since I want the output on stdout I passed /dev/stdout as argument. The same behaviour may be verified with the command openssl rand -hex 4 -out /dev/stdout >> wtf.txt.
Finally, the solution I managed to fix the problem delegating the append operation to tee in the following way: { echo wtf > /dev/stdout } | tee -a wtf.txt > /dev/null

You can check what happens using strace:
strace -o wtf-trace.txt -ff bash -c '{ (echo wtf) > /dev/stdout; } >> wtf.txt'
This will generate two files like wtf-trace.txt.12889 and wtf-trace.txt.12890 in my case. What happens is, process 1 >> wtf.txt:
open("wtf.txt", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
dup2(3, 1) = 1
close(3) = 0
clone(child_stack=0, .................) = 12890
wait4(-1, [{WIFEXITED(s) .............) = 12890
exit_group(0) = ?
The first process opens or creates "wtf.txt" for appending and get FD 3. After that it duplicates FD 1 with FD 3 and closes FD 3. At this point it forks (clone), waits for it to exit and exits itself.
The second process { echo wtf > /dev/stdout } inherits the file by FD 1 (stdout) and it does:
open("/dev/stdout", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1) = 1
close(3) = 0
fstat(1, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
write(1, "wtf\n", 4) = 4
exit_group(0) = ?
As you can see it opens /dev/stdout (note O_TRUNC) and gets FD 3, dup2 to get FD 3 to FD 1, closes FD 3, checks FD 1 and gets a file with size of 0 st_size=0, writes to it and exits.
If you do | cat >> then the second process gets it FD 1 connected to a pipe, which is not seek-able or truncate-able...
NB: I show only the relevant lines of the files strace generated.

Related

shell closes file descriptor num 19

Debugging my application I have found strange behaviour of shell interpreter(/bin/sh on Solaris, /bin/dash in Debian). While fork()ing in shell file descriptor by number 19(dec) is closed by the shell. In my case it leads to closing of communication socket pair between processes.
Looking at shell sources I have found this one and this:
for brevity:
/* used for input and output of shell */
#define INIO 19
and
if (input > 0) {
Ldup(input, INIO);
input = INIO;
}
...
static void
Ldup(int fa, int fb)
{
if (fa >= 0) {
if (fa != fb) {
close(fb);
fcntl(fa, 0, fb); /* normal dup */
close(fa);
}
fcntl(fb, 2, 1); /* autoclose for fb */
}
}
So the netto is simply closing FD number INIO(19);
Simple test for reproducing:
$ exec 19>&1
$ echo aaa >&19
aaa
$ bash -c 'echo aaa >&19'
aaa
$ dash -c 'echo aaa >&19'
dash: 1: Syntax error: Bad fd number
$ ksh -c 'echo aaa >&19'
aaa
The questions are:
What are the reasons for this strange behavior?
What is wrong with file descriptor 19 ?
19 is special because (long ago), the maximum number of open files was 20, e.g.,
#define _NFILE 20
in stdio.h
In POSIX, you may see other symbols such as OPEN_MAX via the sysconf interface.
File descriptors count from 0, and
are normally assigned in increasing order
so the "last possible" file descriptor would have been 19.
If there was an unused file descriptor, making it last would "work".
Both Solaris sh (in particular up through Solaris 10) and dash date back a while, and the detail you noticed probably was not breaking any legacy shell scripts that mattered (much).

Process#spawn results in "No such file or directory - open (Errno::ENOENT)"

Is it just me or is there a bug in Ruby's Process#spawn when redirecting output to a file? The documentation says:
redirection:
key:
FD : single file descriptor in child process
[FD, FD, ...] : multiple file descriptor in child process
value:
FD : redirect to the file descriptor in parent process
string : redirect to file with open(string, "r" or "w")
[string] : redirect to file with open(string, File::RDONLY)
[string, open_mode] : redirect to file with open(string, open_mode, 0644)
[string, open_mode, perm] : redirect to file with open(string, open_mode, perm)
[:child, FD] : redirect to the redirected file descriptor
:close : close the file descriptor in child process
FD is one of follows
:in : the file descriptor 0 which is the standard input
:out : the file descriptor 1 which is the standard output
:err : the file descriptor 2 which is the standard error
So I should be able to have the key as [:out, :err] and the value as a string representing a filename, but:
% ruby -e 'spawn("ps -ef", [:out, :err] => "foo")'
-e:1:in `spawn': No such file or directory - open (Errno::ENOENT)
from -e:1:in `<main>'
(BTW I've tested with 2.1.2 and 2.0.0, and there doesn't seem to be any difference.)
If I first create the file, the behaviour changes:
% touch foo
% ruby -e 'spawn("ps -ef", [:out, :err] => "foo")'
However nothing is written to the file, and removing :err from the redirection key proves that something went wrong when setting up the redirection:
% cat foo
% ruby -e 'spawn("ps -ef", [:out] => "foo")' 2>err
% cat err
ps: write error: Bad file descriptor
It's fairly easy to work around this weirdness and get the behaviour I want, e.g.
% rm -f foo
% ruby -e 'Process.waitpid(spawn("ps -ef", [:out, :err] => ["foo", "w"]))'
% wc foo
503 4422 36470 foo
or
% rm -f foo
% ruby -e 'Process.waitpid(spawn("ps -ef", :out => "foo", :err => :out))'
% wc foo
503 4422 36470 foo
but since I've wasted 30 minutes due to the documentation apparently not matching the actual behaviour, I'd love to know if it's a genuine bug or just my misunderstanding.
My suspicion is that when using the form which groups multiple FDs within a single Array, Ruby stops attempting to auto-detect whether it should open the FDs in read or write mode, and just opens in read mode. Then this fails when the process tries to write to it. This seems to be confirmed by the lack of any error or even warning from Ruby when you attempt to mix input/output FDs within the same Array:
% yes | head -n3 > foo
% ruby -e 'Process.waitpid(spawn("wc", [:in, :out] => "foo"))'
wc: write error
% cat foo
y
y
y
% yes | head -n3 > foo
% ruby -e 'Process.waitpid(spawn("wc", [:in, :out] => ["foo"]))'
wc: write error
% cat foo
y
y
y
But I'm sure other people can offer better explanations.

Self-logging: Busybox shell script that logs stdout to a file

My question relates to an answer posted by jbarlow to the following question:
redirect COPY of stdout to log file from within bash script itself
I used the suggested script as listed below. I have to use this because I don't have access to full bash (as jbarlow points out) because I'm using a buildroot version of busybox.
#!/bin/sh
if [ "$SELF_LOGGING" != "1" ]
then
PIPE=tmp.fifo
mkfifo $PIPE
# Keep PID of this process
SELF_LOGGING=1 sh $0 $* >$PIPE &
PID=$!
tee logfile <$PIPE &
# Safe to rm pipe because the child processes have references to it
rm $PIPE
wait $PID
# Return same error code as original process
exit $?
fi
The issue I'm finding is that it appears that something can freeze up from this script. For example, an strace of a frozen script using the above code looks like:
Process 29750 attached - interrupt to quit
open("/tmp/tmp.fifo", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
write(2, "/usr/bin/runStuff", 24) = 24
write(2, ": ", 2) = 2
write(2, "line ", 5) = 5
write(2, "45", 2) = 2
write(2, ": ", 2) = 2
write(2, "can't open ", 11) = 11
write(2, "/tmp/tmp.fifo", 21) = 21
write(2, ": ", 2) = 2
write(2, "no such file", 12) = 12
write(2, "\n", 1) = 1
stat64("/sbin/tee", 0xbff7c20c) = -1 ENOENT (No such file or directory)
stat64("/usr/sbin/tee", 0xbff7c20c) = -1 ENOENT (No such file or directory)
stat64("/bin/tee", 0xbff7c20c) = -1 ENOENT (No such file or directory)
stat64("/usr/bin/tee", {st_mode=S_IFREG|0755, st_size=18956, ...}) = 0
_exit(1) = ?
Process 29750 detached
What it looks like (to me, with limited knowledge in this) is that tee is ending and the parent script doesn't die. Is that correct? If so, shouldn't the lack of a readable file cause the script to end? tee is backgrounded, so obviously that has no control over the parent.
As background, there's another process that repeatedly calls this if it dies. So it's possible that using the same file is causing a lockup situation. Or maybe the rm is happening before the fifo is created?
I've considered using 'read' with a timeout, but there can be situations where nothing is logged for hours at a time.
Can the script be modified so that this doesn't lock up and the script will die if/when one of the ends of the fifo dies?

Why does there appear text on my command line even though I've redirected both STDOUT and STDERR to /dev/null?

I'm trying to unmount a encfs-filesystem from a script, but no matter how I try I seem unable to prevent the fuse error below to appear on screen/in crontab emails.
# exec 3>&1 1>/dev/null 4>&2 2>/dev/null; setsid fusermount -u /data/encfs; exec 1>&3 2>&4 3>&- 4>&-
# fuse failed. Common problems:
- fuse kernel module not installed (modprobe fuse)
- invalid options -- see usage message
The error itself I have to live with. The unmount is successfull and the error is false and due to a bug that is long gone in modern versions of fuse. I'm stuck with the older version since I'm on special hardware running a semi-ancient version of debian.
What annoys me is that I cannot tell the system to toss the nonsense message in /dev/null.
How does the message even appear on my screen after me using both setsid and redirects in my best efforts to prevent it?
EDIT:
# exec 3>&1 1>/dev/null 4>&2 2>/dev/null; setsid fusermount -u /data/encfs > /dev/null 2>&1; EXIT=$?; exec 1>&3 2>&4 3>&- 4>&-; echo $EXIT
0
# fuse failed. Common problems:
- fuse kernel module not installed (modprobe fuse)
- invalid options -- see usage message
I've even tried things like:
perl -e "`fusermount -u /data/encfs`"
But the error remain the same.
My /etc/syslog.conf:
auth,authpriv.* -/var/log/auth.log
*.*;auth,authpriv,cron.none -/var/log/syslog
cron.* -/var/log/cron.log
daemon.* -/var/log/daemon.log
kern.* -/var/log/kern.log
lpr.* -/var/log/lpr.log
mail.* -/var/log/mail.log
user.* -/var/log/user.log
*.=debug;\
auth,authpriv.none;\
mail.none -/var/log/debug
*.=info;*.=notice;*.=warn;\
auth,authpriv.none;\
cron,daemon.none;\
mail.none -/var/log/messages
EDIT2:
I don't think fusermount is the program actually generating the text. It pokes something else that does:
# strace -o ~/trash/strace.txt fusermount -u /data/encfs; EXIT=$?; echo $EXIT; grep write ~/trash/strace.txt
fuse failed. Common problems:
- fuse kernel module not installed (modprobe fuse)
- invalid options -- see usage message
0
write(5, "/dev/hdc1 / ext3 rw,noatime 0 0\n", 32) = 32
write(5, "proc /proc proc rw 0 0\n", 23) = 23
write(5, "devpts /dev/pts devpts rw 0 0\n", 30) = 30
write(5, "sysfs /sys sysfs rw 0 0\n", 24) = 24
write(5, "tmpfs /ramfs ramfs rw 0 0\n", 26) = 26
write(5, "tmpfs /USB tmpfs rw,size=16k 0 0"..., 33) = 33
write(5, "/dev/c/c /c ext3 rw,noatime,acl,"..., 65) = 65
write(5, "nfsd /proc/fs/nfsd nfsd rw 0 0\n", 31) = 31
write(5, "usbfs /proc/bus/usb usbfs rw 0 0"..., 33) = 33
write(5, "//localhost/smb /root/folder"..., 55) = 55
If I let strace log to stdout I get the error message in the middle of the umount system call:
# strace fusermount -u /data/encfs
execve("/usr/bin/fusermount", ["fusermount", "-u", "/data/encfs"], [/* 16 vars */]) = 0
[... abbreviating ...]
close(5) = 0
munmap(0x20020000, 16384) = 0
profil(0, 0, 0x2010c168, 0x4) = 0
umount("/data/encfs", 0fuse failed. Common problems:
- fuse kernel module not installed (modprobe fuse)
- invalid options -- see usage message
) = 0
profil(0, 0, 0x1177c, 0x20179f98) = 0
stat64("/etc/mtab", {st_mode=S_IFREG|0644, st_size=407, ...}) = 0
ftime(0x13840) = 0
Use strace on the command. It will show you details about what's going on, including the number of the descriptor to which the message is written
strace fsusermount -u /data/encfs
If the message comes from fsusermount you should see a line like
write(0, "- fuse kernel module not installed (modprobe fuse)\n")
somewhere in the output. The number (not necessarily 0) is the number of the file descriptor to which the message is written. Redirecting the descriptor with that number should get you rid of the message:
fsusermount -u /data/encfs 0>/dev/null
Figured it out.
The error message does not come from the fusermount, it comes from the mount command when fusermount runs.
Doing it like this fixes the problem:
# encfs --extpass=echo_key.sh /data/.encfs /data/encfs 2>/dev/null; sleep 3; fusermount -u /data/encfs
#
Feels so obvious now that I know...

Difference between "test -a file" and "test file -ef file"

QNX (Neutrino 6.5.0) uses an open source implementation of ksh as its shell. A lot of the provided scripts, including the system startup scripts, use constructs such as
if ! test /dev/slog -ef /dev/slog; then
# do something
fi
to check whether a resource manager exists or not in the filesystem. I've searched and could only find very dray explanations that -ef checks to see whether the two parameters are in fact the same file. Since the filename specified is the same it seems to just reduce to checking that the file exists.
I have checked the behaviour of test -a and test -e (both seem to check for file existance of any type of file according to the various docs I've read) and they seem to also work.
Is there any difference in the checks performed between -ef and -a/-e? Is using -ef some kind of attempt to protect against a race condition in the existence of the file?
Reviewing the strace on Ubuntu Linux's copy of ksh reveals no substantial differences. One call to stat vs two.
$ strace test /tmp/tmp.geLaoPkXXC -ef /tmp/tmp.geLaoPkXXC
showed this:
mmap(NULL, 7220736, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f11dc80b000
close(3) = 0
stat("/tmp/tmp.geLaoPkXXC", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
stat("/tmp/tmp.geLaoPkXXC", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
close(1) = 0
close(2) = 0
...whereas
$ strace test -a /tmp/tmp.geLaoPkXXC
showed this:
fstat(3, {st_mode=S_IFREG|0644, st_size=7220736, ...}) = 0
mmap(NULL, 7220736, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6b49e2b000
close(3) = 0
stat("/tmp/tmp.geLaoPkXXC", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
close(1) = 0
close(2) = 0
One stat vs two.
$ ksh --version
version sh (AT&T Research) 93u 2011-02-08
We don't know how the code use the stat exactly without code, we need to find the difference via the code.
/* code for -ef */
return (stat (argv[op - 1], &stat_buf) == 0
&& stat (argv[op + 1], &stat_spare) == 0
&& stat_buf.st_dev == stat_spare.st_dev
&& stat_buf.st_ino == stat_spare.st_ino);
/* code for -e/-a */
case 'a': /* file exists in the file system? */
case 'e':
return stat (argv[pos - 1], &stat_buf) == 0;
So, if the names are the same and two stat() with the same name will return the same value, then,
test -a/-e file is the same as test file -ef file. We know the first condition is true, and we know the second condition is also true from the comments from #tinman

Resources