Carefully mimicking Argv[0] with bash - bash

I'm trying to write a bash wrapper script that very carefully mimics the value of argv[0]/$0. I'm using exec -a to execute a separate program with the wrapper's argv[0] value. I'm finding that sometimes bash's $0 doesn't give the same value I'd get in a C-program's argv[0]. Here's a simple test program that demonstrates the difference in both C and bash:
int main(int argc, char* argv[0])
{
printf("Argv[0]=%s\n", argv[0]);
return 0;
}
and
#!/bin/bash
echo \$0=$0
When running these programs with the full (absolute or relative) path to the binary, they behave the same:
$ /path/to/printargv
Argv[0]=/path/to/printargv
$ /path/to/printargv.sh
$0=/path/to/printargv.sh
$ to/printargv
Argv[0]=to/printargv
$ to/printargv.sh
$0=to/printargv.sh
But when invoking them as if they are in the path, I get different results:
$ printargv
Arv[0]=printargv
$ printargv.sh
$0=/path/to/printargv.sh
Two questions:
1) Is this intended behavior that can be explained, or is this a bug?
2) What's the "right" way to achieve the goal of carefully mimicking argv[0]?
edit: typos.

What you're seeing here is the documented behaviour of bash and execve (at least, it is documented on Linux and FreeBSD; I presume that other systems have similar documentation), and reflects the different ways that argv[0] is constructed.
Bash (like any other shell) constructs argv from the provided command line, after having performed the various expansions, resplit words as necessary, and so on. The end result is that when you type
printargv
argv is constructed as { "printargv", NULL } and when you type
to/printargv
argv is constructed as { "to/printargv", NULL }. So no surprises there.
(In both cases, had there been command line arguments, they would have appeared in argv starting at position 1.)
But the execution path diverges at that point. When the first word in the command line includes a /, then it is considered to be a filename, either relative or absolute. The shell does no further processing; it simply calls execve with the provided filename as its filename argument and the argv array constructed previously as its argv argument. In this case, argv[0] precisely corresponds to the filename
But when the command has no slashes:
printargv
the shell does a lot more work:
First, it checks to see if the name is a user-defined shell function. If so, it executes it, with $1...$n taken from the argv array already constructed. ($0 continues to be argv[0] from the script invocation, though.)
Then, it checks to see if the name is a built-in bash command. If so, it executes it. How built-ins interact with command-line arguments is out of scope for this answer, and is not really user-visible.
Finally, it attempts to find the external utility corresponding with the command, by searching through the components of $PATH and looking for an executable file. If it finds one, it calls execve, giving it the path that it found as the filename argument, but still using the argv array consisting of the words from the command. So in this case, filename and argv[0] do not correspond.
So, in both cases, the shell ends up calling execve, providing a filepath (possibly relative) as the filename argument and the word-split command as the argv argument.
If the indicated file is an executable image, there is nothing more to say, really. The image is loaded into memory, and its main is called with the provided argv vector. argv[0] will be a single word or a relative or absolute path, depending only on what was originally typed.
But if the indicated file is a script, the loader will produce an error and execve will check to see if the file starts with a shebang (#!). (Since Posix 2008, execve will also attempt to run the file as a script using the system shell, as though it had #!/bin/sh as a shebang line.)
Here's the documentation for execve on Linux:
An interpreter script is a text file that has execute permission enabled and whose first line is of the form:
#! interpreter [optional-arg]
The interpreter must be a valid pathname for an executable file. If the filename argument of execve() specifies an interpreter script, then interpreter will be invoked with the following arguments:
interpreter [optional-arg] filename arg...
where arg... is the series of words pointed to by the argv argument of execve(), starting at argv[1].
Note that in the above, the filename argument is the filename argument to execve. Given the shebang line #!/bin/bash we now have either
/bin/bash to/printargv # If the original invocation was to/printargv
or
/bin/bash /path/to/printargv # If the original invocation was printargv
Note that argv[0] has effectively disappeared.
bash then runs the script in the file. Prior to executing the script, it sets $0 to the filename argument it was given, in our example either to/printargv or /path/to/printargv, and sets $1...$n to the remaining arguments, which were copied from the command-line arguments in the original command line.
In summary, if you invoke the command using a filename with no slashes:
If the filename contains an executable image, it will see argv[0] as the command name as typed.
If the filename contains a bash script with a shebang line, the script will see $0 as the actual path to the script file.
If you invoke the command using a filename with slashes, in both cases it will see argv[0] as the filename as typed (which might be relative, but will obviously always have a slash).
On the other hand, if you invoke a script by invoking the shell interpreter explicitly (bash printargv), the script will see $0 as the filename as typed, which not only might be relative but also might not have a slash.
All that means that you can only "carefully mimic argv[0]" if you know what form of invoking the script you wish to mimic. (It also means that the script should never rely on the value of argv[0], but that's a different topic.)
If you are doing this for unit testing, you should provide an option to specify what value to provide as argv[0]. Many shell scripts which attempt to analyze $0 assume that it is a filepath. They shouldn't do that, since it might not be, but there it is. If you want to smoke those utilities out, you'll want to supply some garbage value as $0. Otherwise, your best bet as a default is to provide a path to the scriptfile.

Related

Using a parameter with exec in Ruby

I'm trying to execute the command exec when I give it a parameter by console, but I don´t know how to make it.
exec('ls -l #{argv[1]}')
Argv[1] is the parameter I pass by console but it doesn´t do anything.
Unless you need your command to be executed by a shell (such as, you redirect to/from a file), you can pass a list of arguments to exec:
exec 'ls', '-l', ARGV[1]
You're aware that exec replaces the running ruby process? Do you want system instead?
https://ruby-doc.org/core-2.5.0/Process.html#method-c-exec
https://ruby-doc.org/core-2.5.0/Kernel.html#method-i-system
There are several small issues in your code:
variables are not interpolated in single-quote strings; this means your script always tries to execute ls -l #{argv[1]} (as it is written here).
there is no variable, constant or method of class Object that is named argv; there is a global constant named ARGV that contains the command line arguments.
ARGV does not contain the script name (it is stored in a separate property) but only its positional parameters; consequently, the first argument in the command line is stored at index 0, not 1.
Putting together all of the above, your script should be:
exec("ls -l #{ARGV[0]}")

Bash's 'hash' command always succeeds when trying to check existance of a command with a slash in the name

Consider the following script:
#!/bin/bash
hash ./a.sh && echo ./a.sh exists
hash foo && echo foo exists
hash bar/foo && echo bar/foo exists
bar/foo
It tries to check whether different commands exist, namely ./a.sh, foo and bar/foo (e.g. a foo executable inside bar directory). Afterwards, it tries to run bar/foo command. My output is:
./a.sh exists
./a.sh: line 3: hash: foo: not found
bar/foo exists
./a.sh: line 5: bar/foo: No such file or directory
First two lines are expected, as well as the last one. However, the third line says that hash command did not fail for bar/foo, which is strange.
I have though that using "hash" is preferrable for testing existence of commands which the script is about to use. At least, it's mentioned as a possible alternative in this SO answer. Turns out it does not work very well for commands which are relative paths (haven't tested with absolute paths). Why is that? type works better, but I considered them to be mostly synonymous.
Refer to bash's documentation on how commands are looked up and executed:
3.7.2 Command Search and Execution
After a command has been split into words, if it results in a simple
command and an optional list of arguments, the following actions are
taken.
If the command name contains no slashes, the shell attempts to locate it. If there exists a shell function by that name, that
function is invoked as described in Shell Functions.
If the name does not match a function, the shell searches for it in the list of shell builtins. If a match is found, that builtin is
invoked.
If the name is neither a shell function nor a builtin, and contains no slashes, Bash searches each element of $PATH
for a directory containing an executable file by that name. Bash uses a hash table to remember the full pathnames of executable files
to avoid multiple PATH searches (see the description of
hash in Bourne Shell Builtins). A full search of the directories in $PATH is performed only if the command is
not found in the hash table. If the search is unsuccessful, the
shell searches for a defined shell function named
command_not_found_handle. If that function exists, it is invoked
with the original command and the original command’s arguments as its
arguments, and the function’s exit status becomes the exit status of
the shell. If that function is not defined, the shell prints an error
message and returns an exit status of 127.
If the search is successful, or if the command name contains one or more slashes, the shell executes the named program in a separate
execution environment. ...
In short, look-up and hashing is performed only for commands that do not contain slashes. If a command looks like a path (i.e. contains a slash) it is assumed to refer to en executable file at that path and the complex procedure of look-up is not needed. As a result, hash handles arguments with slashes as if they would resolve to themselves and exits with a success status unconditionally (that is without checking that the named file actually exists and is executable).

Bash - Special characters on command line [duplicate]

I'm looking for a way (other than ".", '.', \.) to use bash (or any other linux shell) while preventing it from parsing parts of command line. The problem seems to be unsolvable
How to interpret special characters in command line argument in C?
In theory, a simple switch would suffice (e.g. -x ... telling that the
string ... won't be interpreted) but it apparently doesn't exist. I wonder whether there is a workaround, hack or idea for solving this problem. The original problem is a script|alias for a program taking youtube URLs (which may contain special characters (&, etc.)) as arguments. This problem is even more difficult: expanding "$1" while preventing shell from interpreting the expanded string -- essentially, expanding "$1" without interpreting its result
Use a here-document:
myprogramm <<'EOF'
https://www.youtube.com/watch?v=oT3mCybbhf0
EOF
If you wrap the starting EOF in single quotes, bash won't interpret any special chars in the here-doc.
Short answer: you can't do it, because the shell parses the command line (and interprets things like "&") before it even gets to the point of deciding your script/alias/whatever is what will be run, let alone the point where your script has any control at all. By the time your script has any influence in the process, it's far too late.
Within a script, though, it's easy to avoid most problems: wrap all variable references in double-quotes. For example, rather than curl -o $outputfile $url you should use curl -o "$outputfile" "$url". This will prevent the shell from applying any parsing to the contents of the variable(s) before they're passed to the command (/other script/whatever).
But when you run the script, you'll always have to quote or escape anything passed on the command line.
Your spec still isn't very clear. As far as I know the problem is you want to completely reinvent how the shell handles arguments. So… you'll have to write your own shell. The basics aren't even that difficult. Here's pseudo-code:
while true:
print prompt
read input
command = (first input)
args = (argparse (rest input))
child_pid = fork()
if child_pid == 0: // We are inside child process
exec(command, args) // See variety of `exec` family functions in posix
else: // We are inside parent process and child_pid is actual child pid
wait(child_pid) // See variety of `wait` family functions in posix
Your question basically boils down to how that "argparse" function is implemented. If it's just an identity function, then you get no expansion at all. Is that what you want?

Ruby system method arguments

I'm quite confused reading the doc of Ruby's system method here. I'm not sure what are commands and what are options. What do I do if I want to execute the following?
wget -pk -nd -P /public/google www.google.com
For security reasons, I'd like to use one of the versions that uses no shell (the second and third forms in the URL I gave, rather than the first)
Consider the examples:
system("echo *")
system("echo", "*")
The first one passes the string 'echo *' to the shell to be parsed and executed; that's why system('echo *') produces the same output as saying echo * from the shell prompt: you get a list of files in the current directory. The corresponding argument form is:
commandline : command line string which is passed to the standard shell
The second one bypasses the shell entirely. It will look for echo in the PATH and then execute it with the string '*' as its argument. Since the shell expands wildcards (at least on unixy systems), the * will stay as a simple * and you'll see * as the output. The corresponding argument form here is:
cmdname, arg1, ... : command name and one or more arguments (no shell)
The third form:
[cmdname, argv0], arg1, ... : command name, argv[0] and zero or more arguments (no shell)
is used when you want to execute cmdname but have it show up with a different name in ps listings and such. You can see this in action by opening two terminals. Open up irb in one of them and say:
system('sleep', '10')
then quickly switch to the other and look at the ps listing. You should see sleep 10 in there. But, if you give this to irb:
system(['sleep', 'pancakes'], '10')
and check the ps listing, you'll see pancakes 10. Similar two-terminal tricks will show you a shell -c sleep 10 if you say system('sleep 10').
If you supply a Hash as the first argument, then that Hash is used as the environment variables for the spawned process. If you supply a Hash as the final argument, then that Hash is used as options; further documentation on the arguments is, as noted in the system documentation, available under Kernel#spawn.

Ruby. The strange argv[0] element in the array being an argument of the spawn method

As is known, the Ruby's Kernel#spawn method executes the specified command and returns its pid. The method can accept either a whole command line as a single argument, a command name and any number of the command's arguments or an array where the first element is the command itself and the second is, according to the documentation, the strange variable argv[0]. As it turned out, the variable has nothing to do with the Ruby's ARGV[0].
What is this variable? What does it contain?
Thanks.
Debian GNU/Linux 6.0.2;
Ruby 1.9.3-p0.
I don't think it's a variable at all.
When executing a command (in the general case), the arguments go into argv[1] to argv[*n*]. The name of the command executed can be found in argv[0]. (For Ruby applications, they will be placed in ARGV, for C applications they can be accessed using the argc and argv arguments to main.)
By default, argv[0] will be the same as the command started. However, if you use following form:
exec(["alpha", "beta"])
The program alpha will be executed, but it's argv[0] will be beta.

Resources