Debugging my application I have found strange behaviour of shell interpreter(/bin/sh on Solaris, /bin/dash in Debian). While fork()ing in shell file descriptor by number 19(dec) is closed by the shell. In my case it leads to closing of communication socket pair between processes.
Looking at shell sources I have found this one and this:
for brevity:
/* used for input and output of shell */
#define INIO 19
and
if (input > 0) {
Ldup(input, INIO);
input = INIO;
}
...
static void
Ldup(int fa, int fb)
{
if (fa >= 0) {
if (fa != fb) {
close(fb);
fcntl(fa, 0, fb); /* normal dup */
close(fa);
}
fcntl(fb, 2, 1); /* autoclose for fb */
}
}
So the netto is simply closing FD number INIO(19);
Simple test for reproducing:
$ exec 19>&1
$ echo aaa >&19
aaa
$ bash -c 'echo aaa >&19'
aaa
$ dash -c 'echo aaa >&19'
dash: 1: Syntax error: Bad fd number
$ ksh -c 'echo aaa >&19'
aaa
The questions are:
What are the reasons for this strange behavior?
What is wrong with file descriptor 19 ?
19 is special because (long ago), the maximum number of open files was 20, e.g.,
#define _NFILE 20
in stdio.h
In POSIX, you may see other symbols such as OPEN_MAX via the sysconf interface.
File descriptors count from 0, and
are normally assigned in increasing order
so the "last possible" file descriptor would have been 19.
If there was an unused file descriptor, making it last would "work".
Both Solaris sh (in particular up through Solaris 10) and dash date back a while, and the detail you noticed probably was not breaking any legacy shell scripts that mattered (much).
Related
I wanted to understand bash builtins. Hence the following questions:
When I think of the term builtin I am thinking that the bash executable has a function defined in its symbol table that other parts of the executable can access without actually having to fork. Is this what builtin means?
I also see that some builtins have a separate executable. For instance type [ returns [ is a shell builtin. But then I also see an executable named /usr/bin/[. Is it correct to say that the same code is available through two executables: one through bash program and another through /usr/bin/[?
Loosely speaking, the program version of the built-ins is used when the shell interpreter is not available or not needed. Let's explain it in more details...
When you run a shell script, the interpreter recognizes the built-ins and will not fork/exec but merely call the function corresponding to the built-in. Even if you call them from an C/C++ executable through system(), the latter launches a shell first and then makes the spawn shell run the built-in.
Here is an example program, which runs echo message thanks to system() library service:
#include <stdlib.h>
int main(void)
{
system("echo message");
return 0;
}
Compile it and run it:
$ gcc msg.c -o msg
$ ./msg
message
Running the latter under strace with the -f option shows the involved processes. The main program is executed:
$ strace -f ./msg
execve("./msg", ["./msg"], 0x7ffef5c99838 /* 58 vars */) = 0
Then, system() triggers a fork() which is actually a clone() system call. The child process#5185 is launched:
clone(child_stack=0x7f7e6d6cbff0, flags=CLONE_VM|CLONE_VFORK|SIGCHLD
strace: Process 5185 attached
<unfinished ...>
The child process executes /bin/sh -c "echo message". The latter shell calls the echo built-in to display the message on the screen (write() system call):
[pid 5185] execve("/bin/sh", ["sh", "-c", "echo message"], 0x7ffdd0fafe28 /* 58 vars */ <unfinished ...>
[...]
[pid 5185] write(1, "message\n", 8message
) = 8
[...]
+++ exited with 0 +++
The program version of the built-ins is useful when you need them from a C/C++ executable without an intermediate shell for the sake of the performances. For instance, when you call them through execv() function.
Here is an example program which does the same thing as the preceding example but with execv() instead of system():
#include <unistd.h>
int main(void)
{
char *av[3];
av[0] = "/bin/echo";
av[1] = "message";
av[2] = NULL;
execv(av[0], av);
return 0;
}
Compile and run it to see that we get the same result:
$ gcc msg1.c -o msg1
$ ./msg1
message
Let's run it under strace to get the details. The output is shorter because no sub-process is involved to execute an intermediate shell. The actual /bin/echo program is executed instead:
$ strace -f ./msg1
execve("./msg1", ["./msg1"], 0x7fffd5b22ec8 /* 58 vars */) = 0
[...]
execve("/bin/echo", ["/bin/echo", "message"], 0x7fff6562fbf8 /* 58 vars */) = 0
[...]
write(1, "message \1\n", 10message
) = 10
[...]
exit_group(0) = ?
+++ exited with 0 +++
Of course, if the program is supposed to do additional things, a simple call to execv() is not sufficient as it overwrites itself by the /bin/echo program. A more elaborated program would fork and execute the latter program but without the need to run an intermediate shell:
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
int main(void)
{
if (fork() == 0) {
char *av[3];
av[0] = "/bin/echo";
av[1] = "message";
av[2] = NULL;
execv(av[0], av);
}
wait(NULL);
// Make additional things before ending
return 0;
}
Compile and run it under strace to see that the intermediate child process executes the /bin/echo program without the need of an intermediate shell:
$ gcc msg2.c -o msg2
$ ./msg2
message
$ strace -f ./msg2
execve("./msg2", ["./msg2"], 0x7ffc11a5e228 /* 58 vars */) = 0
[...]
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 5703 attached
, child_tidptr=0x7f8e0b6e0810) = 5703
[pid 5703] execve("/bin/echo", ["/bin/echo", "message"], 0x7ffe656a9d08 /* 58 vars */ <unfinished ...>
[...]
[pid 5703] write(1, "message\n", 8message
) = 8
[...]
[pid 5703] +++ exited with 0 +++
<... wait4 resumed>NULL, 0, NULL) = 5703
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=5703, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
exit_group(0) = ?
+++ exited with 0 +++
the bash executable has a function defined in its symbol table
There are builtins that are included inside Bash executable. You can load builtins dynamically from a separate shared library on runtime.
can access without actually having to fork
Yes.
Is it correct to say that the same code is available through two executables: one through bash executable and another through /usr/bin/[?
No, it's a different source code. One is a Bash builtin and the other is a program. It will be a different source code. There is also different behavior in grey areas.
$ printf "%q\n" '*'
\*
$ /bin/printf "%q\n" '*'
'*'
$ time echo 1
1
real 0m0.000s
user 0m0.000s
sys 0m0.000s
$ /bin/time echo 1
1
0.00user 0.00system 0:00.00elapsed 50%CPU (0avgtext+0avgdata 2392maxresident)k
64inputs+0outputs (1major+134minor)pagefaults 0swaps
$ [ -v a ]
$ /bin/[ -v a ]
/bin/[: ‘-v’: unary operator expected
How to create a command line pipe? (xcode mac os x) Hello I want to
create a command line with xcode (mac os x) that had the quality of
being used in pipe after "|" .
i know that by using xargs we can pass arguments stored in stdin into
arguments.
I would like to understand how to create a pipable command line. Thank
you for your answers
For example, if we define the hand function that should receive the
arguments to execute. In a rudimentary way we will write (in C):
int main (int argc, char ** argv)
{
char buf [512] = "";
char buf1[512] = "";
int f;
and achieve some things
the first argument of the argument array, contains in any case the no
of the command line that you are creating ex: argv [0] "echo" and the
argument following the one you wish to use, 'here for echo argv [1]
"the sun shines" of course if echo is able to receive an argument
after "|" (pipe) for example: echo "the sun is shining" | echo, but
echo do not use the pipe.
For our main function, is elementarily we will check if argv [1]
contains an argument by
if (argv [1] == NULL)
> {
there we arbitrarily take the guess that argv [1] is empty. as a reminder our command line is located after a pipe "|" if our argv [1]
is empty, it will need to find an argument to continue using our
command line, for this we accept the postulate that if the command
line is placed after "|" pipe is that the output of the previous
command is not empty from where what will follow
> f = open ("/ dev / stdin", O_RDONLY);
> read (f, buf, sizeof (buf));
memcpy (buf1, buf, sizeof (buf));
now we have opened the stream "stdin" which necessarily contains the
output of the previous command, we use 2 buffers buf [512] and buf1
[512], because we can not fill argv [1], now that we have our
arguments in a buffer, one can realize the continuation of the
execution of the command, as if it were about argv [1].
In objective-c or in C simply, to give an line command the virtue to
be used ien pipeline as a result of the use of "|" after another
command line, it is necessary to redirect "stdin" towards the entry of
the command line as it was an argument ("argv"). From there by
mastering a little programming in "C", one must arrive at its ends to
make a command line create with xcode, usable after "|" .
As "is known", a script my-script-file which starts with
#!/path/to/interpreter -arg1 val1 -arg2 val2
is executed by exec calling /path/to/interpreter with 2(!) arguments:
-arg1 val1 -arg2 val2
my-script-file
(and not, as one might naively expect, with 5 arguments
-arg1
val1
-arg2
val2
my-script-file
as has been explained in many previous questions, e.g.,
https://stackoverflow.com/a/4304187/850781).
My problem is from the POV of an interpreter developer, not script writer.
How do I detect from inside the interpreter executable that I was called from shebang as opposed to the command line?
Then I will be able to decide whether I need to split my first argument
by space to go from "-arg1 val1 -arg2 val2" to ["-arg1", "val1", "-arg2", "val2"] or not.
The main issue here is script files named with spaces in them.
If I always split the 1st argument, I will fail like this:
$ my-interpreter "weird file name with spaces"
my-interpreter: "weird": No such file or directory
On Linux, with GNU libc or musl libc, you can use the aux-vector to distinguish the two cases.
Here is some sample code:
#define _GNU_SOURCE 1
#include <stdio.h>
#include <errno.h>
#include <sys/auxv.h>
#include <sys/stat.h>
int
main (int argc, char* argv[])
{
printf ("argv[0] = %s\n", argv[0]);
/* https://www.gnu.org/software/libc/manual/html_node/Error-Messages.html */
printf ("program_invocation_name = %s\n", program_invocation_name);
/* http://man7.org/linux/man-pages/man3/getauxval.3.html */
printf ("auxv[AT_EXECFN] = %s\n", (const char *) getauxval (AT_EXECFN));
/* Determine whether the last two are the same. */
struct stat statbuf1, statbuf2;
if (stat (program_invocation_name, &statbuf1) >= 0
&& stat ((const char *) getauxval (AT_EXECFN), &statbuf2) >= 0)
printf ("same? %d\n", statbuf1.st_dev == statbuf2.st_dev && statbuf1.st_ino == statbuf2.st_ino);
}
Result for a direct invocation:
$ ./a.out
argv[0] = ./a.out
program_invocation_name = ./a.out
auxv[AT_EXECFN] = ./a.out
same? 1
Result for an invocation through a script that starts with #!/home/bruno/a.out:
$ ./a.script
argv[0] = /home/bruno/a.out
program_invocation_name = /home/bruno/a.out
auxv[AT_EXECFN] = ./a.script
same? 0
This approach is, of course, highly unportable: Only Linux has the getauxv function. And there are surely cases where it does not work well.
I found a strange and totally unexpected behaviour while working with redirection in bash and even If I manage to work around it, I'd like to know why it happens.
If I run this command:{ echo wtf > /dev/stdout ; } >> wtf.txt N times, I expect to see the filled with N "wtf" lines. What I found in the file is a single line.
I think that since the first command is opening /dev/stdout in truncate mode, then the mode is inherited by the second file descriptor (wtf.txt), which is then completely erased, but I'd like to know if some of you may explain it better and if this is the correct behaviour or a bug.
Just to be clear, the command I used was a different one, but with the echo example is simpler to understand it. The original command was a command who need an output file as argument and since I want the output on stdout I passed /dev/stdout as argument. The same behaviour may be verified with the command openssl rand -hex 4 -out /dev/stdout >> wtf.txt.
Finally, the solution I managed to fix the problem delegating the append operation to tee in the following way: { echo wtf > /dev/stdout } | tee -a wtf.txt > /dev/null
You can check what happens using strace:
strace -o wtf-trace.txt -ff bash -c '{ (echo wtf) > /dev/stdout; } >> wtf.txt'
This will generate two files like wtf-trace.txt.12889 and wtf-trace.txt.12890 in my case. What happens is, process 1 >> wtf.txt:
open("wtf.txt", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
dup2(3, 1) = 1
close(3) = 0
clone(child_stack=0, .................) = 12890
wait4(-1, [{WIFEXITED(s) .............) = 12890
exit_group(0) = ?
The first process opens or creates "wtf.txt" for appending and get FD 3. After that it duplicates FD 1 with FD 3 and closes FD 3. At this point it forks (clone), waits for it to exit and exits itself.
The second process { echo wtf > /dev/stdout } inherits the file by FD 1 (stdout) and it does:
open("/dev/stdout", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1) = 1
close(3) = 0
fstat(1, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
write(1, "wtf\n", 4) = 4
exit_group(0) = ?
As you can see it opens /dev/stdout (note O_TRUNC) and gets FD 3, dup2 to get FD 3 to FD 1, closes FD 3, checks FD 1 and gets a file with size of 0 st_size=0, writes to it and exits.
If you do | cat >> then the second process gets it FD 1 connected to a pipe, which is not seek-able or truncate-able...
NB: I show only the relevant lines of the files strace generated.
A recent vulnerability, CVE-2014-6271, in how Bash interprets environment variables was disclosed. The exploit relies on Bash parsing some environment variable declarations as function definitions, but then continuing to execute code following the definition:
$ x='() { echo i do nothing; }; echo vulnerable' bash -c ':'
vulnerable
But I don't get it. There's nothing I've been able to find in the Bash manual about interpreting environment variables as functions at all (except for inheriting functions, which is different). Indeed, a proper named function definition is just treated as a value:
$ x='y() { :; }' bash -c 'echo $x'
y() { :; }
But a corrupt one prints nothing:
$ x='() { :; }' bash -c 'echo $x'
$ # Nothing but newline
The corrupt function is unnamed, and so I can't just call it. Is this vulnerability a pure implementation bug, or is there an intended feature here, that I just can't see?
Update
Per Barmar's comment, I hypothesized the name of the function was the parameter name:
$ n='() { echo wat; }' bash -c 'n'
wat
Which I could swear I tried before, but I guess I didn't try hard enough. It's repeatable now. Here's a little more testing:
$ env n='() { echo wat; }; echo vuln' bash -c 'n'
vuln
wat
$ env n='() { echo wat; }; echo $1' bash -c 'n 2' 3 -- 4
wat
…so apparently the args are not set at the time the exploit executes.
Anyway, the basic answer to my question is, yes, this is how Bash implements inherited functions.
This seems like an implementation bug.
Apparently, the way exported functions work in bash is that they use specially-formatted environment variables. If you export a function:
f() { ... }
it defines an environment variable like:
f='() { ... }'
What's probably happening is that when the new shell sees an environment variable whose value begins with (), it prepends the variable name and executes the resulting string. The bug is that this includes executing anything after the function definition as well.
The fix described is apparently to parse the result to see if it's a valid function definition. If not, it prints the warning about the invalid function definition attempt.
This article confirms my explanation of the cause of the bug. It also goes into a little more detail about how the fix resolves it: not only do they parse the values more carefully, but variables that are used to pass exported functions follow a special naming convention. This naming convention is different from that used for the environment variables created for CGI scripts, so an HTTP client should never be able to get its foot into this door.
The following:
x='() { echo I do nothing; }; echo vulnerable' bash -c 'typeset -f'
prints
vulnerable
x ()
{
echo I do nothing
}
declare -fx x
seems, than Bash, after having parsed the x=..., discovered it as a function, exported it, saw the declare -fx x and allowed the execution of the command after the declaration.
echo vulnerable
x='() { x; }; echo vulnerable' bash -c 'typeset -f'
prints:
vulnerable
x ()
{
echo I do nothing
}
and running the x
x='() { x; }; echo Vulnerable' bash -c 'x'
prints
Vulnerable
Segmentation fault: 11
segfaults - infinite recursive calls
It doesn't overrides already defined function
$ x() { echo Something; }
$ declare -fx x
$ x='() { x; }; echo Vulnerable' bash -c 'typeset -f'
prints:
x ()
{
echo Something
}
declare -fx x
e.g. the x remains the previously (correctly) defined function.
For the Bash 4.3.25(1)-release the vulnerability is closed, so
x='() { echo I do nothing; }; echo Vulnerable' bash -c ':'
prints
bash: warning: x: ignoring function definition attempt
bash: error importing function definition for `x'
but - what is strange (at least for me)
x='() { x; };' bash -c 'typeset -f'
STILL PRINTS
x ()
{
x
}
declare -fx x
and the
x='() { x; };' bash -c 'x'
segmentation faults too, so it STILL accept the strange function definition...
I think it's worth looking at the Bash code itself. The patch gives a bit of insight as to the problem. In particular,
*** ../bash-4.3-patched/variables.c 2014-05-15 08:26:50.000000000 -0400
--- variables.c 2014-09-14 14:23:35.000000000 -0400
***************
*** 359,369 ****
strcpy (temp_string + char_index + 1, string);
! if (posixly_correct == 0 || legal_identifier (name))
! parse_and_execute (temp_string, name, SEVAL_NONINT|SEVAL_NOHIST);
!
! /* Ancient backwards compatibility. Old versions of bash exported
! functions like name()=() {...} */
! if (name[char_index - 1] == ')' && name[char_index - 2] == '(')
! name[char_index - 2] = '\0';
if (temp_var = find_function (name))
--- 364,372 ----
strcpy (temp_string + char_index + 1, string);
! /* Don't import function names that are invalid identifiers from the
! environment, though we still allow them to be defined as shell
! variables. */
! if (legal_identifier (name))
! parse_and_execute (temp_string, name, SEVAL_NONINT|SEVAL_NOHIST|SEVAL_FUNCDEF|SEVAL_ONECMD);
if (temp_var = find_function (name))
When Bash exports a function, it shows up as an environment variable, for example:
$ foo() { echo 'hello world'; }
$ export -f foo
$ cat /proc/self/environ | tr '\0' '\n' | grep -A1 foo
foo=() { echo 'hello world'
}
When a new Bash process finds a function defined this way in its environment, it evalutes the code in the variable using parse_and_execute(). For normal, non-malicious code, executing it simply defines the function in Bash and moves on. However, because it's passed to a generic execution function, Bash will correctly parse and execute additional code defined in that variable after the function definition.
You can see that in the new code, a flag called SEVAL_ONECMD has been added that tells Bash to only evaluate the first command (that is, the function definition) and SEVAL_FUNCDEF to only allow functio0n definitions.
In regard to your question about documentation, notice here in the commandline documentation for the env command, that a study of the syntax shows that env is working as documented.
There are, optionally, 4 possible options
An optional hyphen as a synonym for -i (for backward compatibility I assume)
Zero or more NAME=VALUE pairs. These are the variable assignment(s) which could include function definitions.
Note that no semicolon (;) is required between or following the assignments.
The last argument(s) can be a single command followed by its argument(s). It will run with whatever permissions have been granted to the login being used. Security is controlled by restricting permissions on the login user and setting permissions on user-accessible executables such that users other than the executable's owner can only read and execute the program, not alter it.
[ spot#LX03:~ ] env --help
Usage: env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]
Set each NAME to VALUE in the environment and run COMMAND.
-i, --ignore-environment start with an empty environment
-u, --unset=NAME remove variable from the environment
--help display this help and exit
--version output version information and exit
A mere - implies -i. If no COMMAND, print the resulting environment.
Report env bugs to bug-coreutils#gnu.org
GNU coreutils home page: <http://www.gnu.org/software/coreutils/>
General help using GNU software: <http://www.gnu.org/gethelp/>
Report env translation bugs to <http://translationproject.org/team/>