Ruby system method arguments - ruby

I'm quite confused reading the doc of Ruby's system method here. I'm not sure what are commands and what are options. What do I do if I want to execute the following?
wget -pk -nd -P /public/google www.google.com
For security reasons, I'd like to use one of the versions that uses no shell (the second and third forms in the URL I gave, rather than the first)

Consider the examples:
system("echo *")
system("echo", "*")
The first one passes the string 'echo *' to the shell to be parsed and executed; that's why system('echo *') produces the same output as saying echo * from the shell prompt: you get a list of files in the current directory. The corresponding argument form is:
commandline : command line string which is passed to the standard shell
The second one bypasses the shell entirely. It will look for echo in the PATH and then execute it with the string '*' as its argument. Since the shell expands wildcards (at least on unixy systems), the * will stay as a simple * and you'll see * as the output. The corresponding argument form here is:
cmdname, arg1, ... : command name and one or more arguments (no shell)
The third form:
[cmdname, argv0], arg1, ... : command name, argv[0] and zero or more arguments (no shell)
is used when you want to execute cmdname but have it show up with a different name in ps listings and such. You can see this in action by opening two terminals. Open up irb in one of them and say:
system('sleep', '10')
then quickly switch to the other and look at the ps listing. You should see sleep 10 in there. But, if you give this to irb:
system(['sleep', 'pancakes'], '10')
and check the ps listing, you'll see pancakes 10. Similar two-terminal tricks will show you a shell -c sleep 10 if you say system('sleep 10').
If you supply a Hash as the first argument, then that Hash is used as the environment variables for the spawned process. If you supply a Hash as the final argument, then that Hash is used as options; further documentation on the arguments is, as noted in the system documentation, available under Kernel#spawn.

Related

Meaning of single character commands preceded by a hyphen [duplicate]

What are the differences between these terms: "option", "argument", and "parameter"? In man pages these terms often seem to be used interchangeably.
A command is split into an array of strings named arguments. Argument 0 is (normally) the command name, argument 1, the first element following the command, and so on. These arguments are sometimes called positional parameters.
$ ls -la /tmp /var/tmp
arg0 = ls
arg1 = -la
arg2 = /tmp
arg3 = /var/tmp
An option is a documented1 type of argument modifying the behavior of a command, e.g. -l commonly means "long", -v verbose. -lv are two options combined in a single argument. There are also long options like --verbose (see also Using getopts to process long and short command line options). As their name suggests, options are usually optional. There are however some commands with paradoxical "mandatory options".
$ ls -la /tmp /var/tmp
option1= -l
option2= -a
A parameter is an argument that provides information to either the command or one of its options, e.g. in -o file, file is the parameter of the -o option. Unlike options, whose possible values are hard coded in programs, parameters are usually not, so the user is free to use whatever string suits his/her needs. Should you need to pass a parameter that looks like an option but shouldn't be interpreted as such, you can separate it from the beginning of the command line with a double dash: --2.
$ ls -la /tmp /var/tmp
parameter1= /tmp
parameter2= /var/tmp
$ ls -l -- -a
option1 = -l
parameter1 = -a
A shell parameter is anything that store a value in the context of the shell. This includes positional parameters (e.g. $1, $2...), variables (e.g. $foo, $bar...) and special character ones (e.g. $#)
Finally, there are subcommands, also known as functions / (low-level) commands, which are used with "metacommands" that embed multiple separate commands, like busybox, git, apt-get, openssl, and the likes. With them, you might have global options preceeding the subcommand, and subcommand specific options that follow the subcommand. Unlike parameters, the list of possible subcommands is hardcoded in the command itself. e.g.:
$ busybox ls -l
command = busybox
subcommand = ls
subcommand option1 = -l
$ git --git-dir=a.git --work-tree=b -C c status -s
command = git
command option1 = --git-dir=a.git
command option2 = --work-tree=b
command option3 = -C c
subcommand = status
subcommand option1 = -s
Note that some commands like test, tar, dd and find have more complex argument parsing syntax than the ones described previously and can have some or all of their arguments parsed as expressions, operands, keys and similar command specific components.
Note also that optional variable assignments and redirections, despite being processed by the shell for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal like other command line parameters are not taken into account in my reply because they have disappeared when the command is actually called and passed its arguments.
1 I should have written usually documented because of course, undocumented options are still options.
2 The double dash feature need to be implemented by the program though.
The man page for a typical Unix command often uses the terms argument, option and parameter. At the lowest level, we have argument and everything is an argument, including the (filesystem path to the) command itself.
In a shell script you access arguments using the special variables $0 .. $n. Other languages have similar ways to access them (commonly through an array with a name like argv).
Arguments may be interpreted as options if you wish. How this is done is implementation-specific. You can either roll your own, for exampe a shell (such as bash) script can use provided getopts or getopt commands.
These typically define an option as an argument beginning with a hyphen (-) and some options may use proceeding arguments as its parameters. More capable parsers (e.g getopt) support mixing short-form (-h) and long-form (--help) options.
Typically, most options take zero or one parameter. Such parameters are also sometimes called values.
The supported options are coded in the program code (e.g in the invocation of getopts within a shell script). Any remaining arguments after the options have been consumed are commonly called positional parameters when the order in which they are given is significant (this is in contrast to options which usually can be given in any order).
Again, the script defines what the positional parameters are by how it consumes and uses them.
So a typical command
$ ls -I README -l foo 'bar car' baz
has seven arguments: /usr/bin/ls, -I, README, -l, foo, bar car, and baz accessible as $0 thru $6. The -l and -I are interpreted as options, the latter having a parameter (or value) of README. What remains are positional parameters (foo, bar car and baz).
Option parsing may alter the argument list by removing those it consumes (e.g using shift or set) so that only the positional parameters remain and are thereafter accessible as $1 .. $n.
Since the question is tagged "bash", I looked for relevant sections in the Bash manual. I list these as quoted passages below together with my own one sentence summaries.
Arguments
Everything following the command is an argument.
A simple shell command such as echo a b c consists of the command itself followed by arguments, separated by spaces.
A simple command is the kind of command encountered most often. It’s just a sequence of words separated by blanks, terminated by one of the shell’s control operators (see Definitions). The first word generally specifies a command to be executed, with the rest of the words being that command’s arguments.
Parameters
Arguments are referred to as parameters during function execution.
When a function is executed, the arguments to the function become the positional parameters during its execution
A parameter is an entity that stores values. It can be a name, a number, or one of the special characters listed below. A variable is a parameter denoted by a name.
A positional parameter is a parameter denoted by one or more digits, other than the single digit 0. Positional parameters are assigned from the shell’s arguments when it is invoked, and may be reassigned using the set builtin command. Positional parameter N may be referenced as ${N}, or as $N when N consists of a single digit.
Options
There is no dedicated section to defining what an option is, but they are referred to as hyphen-prefixed characters throughout the manual.
The -p option changes the output format to that specified by POSIX

Script takes only first part of double quotes

Yesterday I asked a similar question about escaping double quotes in env variables, although It didn't solve my problem (Probably because I didn't explain good enough) so I would like to specify more.
I'm trying to run a script (Which I know is written in Perl), although I have to use it as a black box because of permissions issue (so I don't know how the script works). Lets call this script script_A.
I'm trying to run a basic command in Shell: script_A -arg "date time".
If I run from the command line, it's works fine, but If I try to use it from a bash script or perl scrip (for example using the system operator), it will take only the first part of the string which was sent as an argument. In other words, it will fail with the following error: '"date' is not valid..
Example to specify a little bit more:
If I run from the command line (works fine):
> script_A -arg "date time"
If I run from (for example) a Perl script (fails):
my $args = $ENV{SOME_ENV}; # Assume that SOME_ENV has '-arg "date time"'
my $cmd = "script_A $args";
system($cmd");
I think that the problem comes from the environment variable, but I can't use the one quote while defining the env variable. For example, I can't use the following method:
setenv SOME_ENV '-arg "date time"'
Because it fails with the following error: '"date' is not valid.".
Also, I tried to use the following method:
setenv SOME_ENV "-arg '"'date time'"'"
Although now the env variable will containe:
echo $SOME_ENV
> -arg 'date time' # should be -arg "date time"
Another note, using \" fails on Shell (tried it).
Any suggestions on how to locate the reason for the error and how to solve it?
The $args, obtained from %ENV as you show, is a string.
The problem is in what happens to that string as it is manipulated before arguments are passed to the program, which needs to receive strings -arg and date time
If the program is executed in a way that bypasses the shell, as your example is, then the whole -arg "date time" is passed to it as its first argument. This is clearly wrong as the program expects -arg and then another string for its value (date time)
If the program were executed via the shell, what happens when there are shell metacharacters in the command line (not in your example), then the shell would break the string into words, except for the quoted part; this is how it works from the command line. That can be enforced with
system('/bin/tcsh', '-c', $cmd);
This is the most straightforward fix but I can't honestly recommend to involve the shell just for arguments parsing. Also, you are then in the game of layered quoting and escaping, what can get rather involved and tricky. For one, if things aren't right the shell may end up breaking the command into words -arg, "date, time"
How you set the environment variable works
> setenv SOME_ENV '-arg "date time"'
> perl -wE'say $ENV{SOME_ENV}' #--> -arg "date time" (so it works)
what I believe has always worked this way in [t]csh.
Then, in a Perl script: parse this string into -arg and date time strings, and have the program is executed in a way that bypasses the shell (if shell isn't used by the command)
my #args = $ENV{SOME_ENV} =~ /(\S+)\s+"([^"]+)"/; #"
my #cmd = ('script_A', #args);
system(#cmd) == 0 or die "Error with system(#cmd): $?";
This assumes that SOME_ENV's first word is always the option's name (-arg) and that all the rest is always the option's value, under quotes. The regex extracts the first word, as consecutive non-space characters, and after spaces everything in quotes.† These are program's arguments.
In the system LIST form the program that is the first element of the list is executed without using a shell, and the remaining elements are passed to it as arguments. Please see system for more on this, and also for basics of how to investigate failure by looking into $? variable.
It is in principle advisable to run external commands without the shell. However, if your command needs the shell then make sure that the string is escaped just right to to preserve quotes.
Note that there are modules that make it easier to use external commands. A few, from simple to complex: IPC::System::Simple, Capture::Tiny, IPC::Run3, and IPC::Run.
I must note that that's an awkward environment variable; any way to ogranize things otherwise?
† To make this work for non-quoted arguments as well (-arg date) make the quote optional
my #args = $ENV{SOME_ENV} =~ /(\S+)\s+"?([^"]+)/;
where I now left out the closing (unnecessary) quote for simplicity

Using a parameter with exec in Ruby

I'm trying to execute the command exec when I give it a parameter by console, but I don´t know how to make it.
exec('ls -l #{argv[1]}')
Argv[1] is the parameter I pass by console but it doesn´t do anything.
Unless you need your command to be executed by a shell (such as, you redirect to/from a file), you can pass a list of arguments to exec:
exec 'ls', '-l', ARGV[1]
You're aware that exec replaces the running ruby process? Do you want system instead?
https://ruby-doc.org/core-2.5.0/Process.html#method-c-exec
https://ruby-doc.org/core-2.5.0/Kernel.html#method-i-system
There are several small issues in your code:
variables are not interpolated in single-quote strings; this means your script always tries to execute ls -l #{argv[1]} (as it is written here).
there is no variable, constant or method of class Object that is named argv; there is a global constant named ARGV that contains the command line arguments.
ARGV does not contain the script name (it is stored in a separate property) but only its positional parameters; consequently, the first argument in the command line is stored at index 0, not 1.
Putting together all of the above, your script should be:
exec("ls -l #{ARGV[0]}")

Difference between terms: "option", "argument", and "parameter"?

What are the differences between these terms: "option", "argument", and "parameter"? In man pages these terms often seem to be used interchangeably.
A command is split into an array of strings named arguments. Argument 0 is (normally) the command name, argument 1, the first element following the command, and so on. These arguments are sometimes called positional parameters.
$ ls -la /tmp /var/tmp
arg0 = ls
arg1 = -la
arg2 = /tmp
arg3 = /var/tmp
An option is a documented1 type of argument modifying the behavior of a command, e.g. -l commonly means "long", -v verbose. -lv are two options combined in a single argument. There are also long options like --verbose (see also Using getopts to process long and short command line options). As their name suggests, options are usually optional. There are however some commands with paradoxical "mandatory options".
$ ls -la /tmp /var/tmp
option1= -l
option2= -a
A parameter is an argument that provides information to either the command or one of its options, e.g. in -o file, file is the parameter of the -o option. Unlike options, whose possible values are hard coded in programs, parameters are usually not, so the user is free to use whatever string suits his/her needs. Should you need to pass a parameter that looks like an option but shouldn't be interpreted as such, you can separate it from the beginning of the command line with a double dash: --2.
$ ls -la /tmp /var/tmp
parameter1= /tmp
parameter2= /var/tmp
$ ls -l -- -a
option1 = -l
parameter1 = -a
A shell parameter is anything that store a value in the context of the shell. This includes positional parameters (e.g. $1, $2...), variables (e.g. $foo, $bar...) and special character ones (e.g. $#)
Finally, there are subcommands, also known as functions / (low-level) commands, which are used with "metacommands" that embed multiple separate commands, like busybox, git, apt-get, openssl, and the likes. With them, you might have global options preceeding the subcommand, and subcommand specific options that follow the subcommand. Unlike parameters, the list of possible subcommands is hardcoded in the command itself. e.g.:
$ busybox ls -l
command = busybox
subcommand = ls
subcommand option1 = -l
$ git --git-dir=a.git --work-tree=b -C c status -s
command = git
command option1 = --git-dir=a.git
command option2 = --work-tree=b
command option3 = -C c
subcommand = status
subcommand option1 = -s
Note that some commands like test, tar, dd and find have more complex argument parsing syntax than the ones described previously and can have some or all of their arguments parsed as expressions, operands, keys and similar command specific components.
Note also that optional variable assignments and redirections, despite being processed by the shell for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal like other command line parameters are not taken into account in my reply because they have disappeared when the command is actually called and passed its arguments.
1 I should have written usually documented because of course, undocumented options are still options.
2 The double dash feature need to be implemented by the program though.
The man page for a typical Unix command often uses the terms argument, option and parameter. At the lowest level, we have argument and everything is an argument, including the (filesystem path to the) command itself.
In a shell script you access arguments using the special variables $0 .. $n. Other languages have similar ways to access them (commonly through an array with a name like argv).
Arguments may be interpreted as options if you wish. How this is done is implementation-specific. You can either roll your own, for exampe a shell (such as bash) script can use provided getopts or getopt commands.
These typically define an option as an argument beginning with a hyphen (-) and some options may use proceeding arguments as its parameters. More capable parsers (e.g getopt) support mixing short-form (-h) and long-form (--help) options.
Typically, most options take zero or one parameter. Such parameters are also sometimes called values.
The supported options are coded in the program code (e.g in the invocation of getopts within a shell script). Any remaining arguments after the options have been consumed are commonly called positional parameters when the order in which they are given is significant (this is in contrast to options which usually can be given in any order).
Again, the script defines what the positional parameters are by how it consumes and uses them.
So a typical command
$ ls -I README -l foo 'bar car' baz
has seven arguments: /usr/bin/ls, -I, README, -l, foo, bar car, and baz accessible as $0 thru $6. The -l and -I are interpreted as options, the latter having a parameter (or value) of README. What remains are positional parameters (foo, bar car and baz).
Option parsing may alter the argument list by removing those it consumes (e.g using shift or set) so that only the positional parameters remain and are thereafter accessible as $1 .. $n.
Since the question is tagged "bash", I looked for relevant sections in the Bash manual. I list these as quoted passages below together with my own one sentence summaries.
Arguments
Everything following the command is an argument.
A simple shell command such as echo a b c consists of the command itself followed by arguments, separated by spaces.
A simple command is the kind of command encountered most often. It’s just a sequence of words separated by blanks, terminated by one of the shell’s control operators (see Definitions). The first word generally specifies a command to be executed, with the rest of the words being that command’s arguments.
Parameters
Arguments are referred to as parameters during function execution.
When a function is executed, the arguments to the function become the positional parameters during its execution
A parameter is an entity that stores values. It can be a name, a number, or one of the special characters listed below. A variable is a parameter denoted by a name.
A positional parameter is a parameter denoted by one or more digits, other than the single digit 0. Positional parameters are assigned from the shell’s arguments when it is invoked, and may be reassigned using the set builtin command. Positional parameter N may be referenced as ${N}, or as $N when N consists of a single digit.
Options
There is no dedicated section to defining what an option is, but they are referred to as hyphen-prefixed characters throughout the manual.
The -p option changes the output format to that specified by POSIX

Command substitution in shell script without globbing

Consider this little shell script.
# Save the first command line argument
cmd="$1"
# Execute the command specified in the first command line argument
out=$($cmd)
# Do something with the output of the specified command
# Here we do a silly thing, like make the output all uppercase
echo "$out" | tr -s "a-z" "A-Z"
The script executes the command specified as the first argument, transforms the output obtained from that command and prints it to standard output. This script may be executed in this manner.
sh foo.sh "echo select * from table"
This does not do what I want. It may print something like the following,
$ sh foo.sh "echo select * from table"
SELECT FILEA FILEB FILEC FROM TABLE
if fileA, fileB and fileC is present in the current directory.
From a user perspective, this command is reasonable. The user has quoted the * in the command line argument, so the user doesn't expect the * to be globbed. But my script astonishes the user by using this argument in a command substitution which causes globbing of * as seen in the above output.
I want the output to be the following instead.
SELECT * FROM TABLE
The entire text in cmd actually comes from command line arguments to the script so I would like to preserve any * symbol present in the argument without globbing them.
I am looking for a solution that works for any POSIX shell.
One solution I have come up with is to disable globbing with set -o noglob just before the command substitution. Here is the complete code.
# Save the first command line argument
cmd="$1"
# Execute the command specified in the first command line argument
set -o noglob
out=$($cmd)
# Do something with the output of the specified command
# Here we do a silly thing, like make the output all uppercase
echo "$out" | tr -s "a-z" "A-Z"
This does what I expect.
$ sh foo.sh "echo select * from table"
SELECT * FROM TABLE
Apart from this, is there any other concept or trick (such as a quoting mechanism) I need to be aware of to disable globbing only within a command substitution without having to use set -o noglob.
I am not against set -o noglob. I just want to know if there is another way. You know, globbing can be disabled for normal command line arguments just by quoting them, so I was wondering if there is anything similar for command substiution.
If I understand correctly, you want the user to provide a shell command as a command-line argument, which will be executed by the script, and is expected to produce an SQL string, which will be processed (upper-cased) and echoed to stdout.
The first thing to say is that there is no point in having the user provide a shell command that the script just blindly executes. If the script applied some kind of modification/preprocessing of the command before it executed it then perhaps it could make sense, but if not, then the user might as well execute the command himself and pass the output to the script as a command-line argument, or via stdin.
But that being said, if you really want to do it this way, then there are two things that need to be said. Firstly, this is the proper form to use:
out=$(eval "$cmd");
A fairly advanced understanding of the shell grammer and expansion rules would be required to fully understand the rationale for using the above syntax, but basically executing $cmd and executing eval "$cmd" have subtle differences that render the $cmd form inappropriate for executing a given shell command string.
Just to give some detail that will hopefully clarify the above point, there are seven kinds of expansion that are performed by the shell in the following order when processing input: (1) brace expansion, (2) tilde expansion, (3) parameter and variable expansion, (4) arithmetic expansion, (5) command substitution, (6) word splitting, and (7) pathname expansion. Notice that variable expansion happens somewhat in the middle of that sequence, and thus the variable-expanded shell command (which was provided by the user) will not receive the benefit of the prior expansion types. Other issues are that leading variable assignments, pipelines, and command list tokens will not be executed correctly under the $cmd form, because they are parsed and processed prior to variable expansion (actually prior to all expansions) as well.
By running the command through eval, properly double-quoted, you ensure that the full shell parsing/processing/execution algorithm will be applied to the shell command string that was given by the user of your script.
The second thing to say is this: If you try the above proper form in your script, you will find that it has not solved your problem. You will still get SELECT FILEA FILEB FILEC FROM TABLE as output.
The reason is this: Since you've decided you want to accept an arbitrary shell command from the user of your script, it is now the user's responsibility to properly quote all metacharacters that may be embedded in that piece of code. It does not make sense for you to accept a shell command as a command-line argument, but somehow change the processing rules for shell commands so that certain metacharacters will no longer be metacharacters when the given shell command is executed. Actually, you could do something like that, perhaps using set -o noglob as you discovered, but then that must become a contract between the script and the user of the script; the user must be made aware of exactly what the precise processing rules will be when the command is executed so that he can properly use the script.
Under this design, the user could call the script as follows (notice the extra layer of quoting for the shell command string evaluation; could alternatively backslash-escape just the asterisk):
$ sh foo.sh "echo 'select * from table'";
I'd like to return to my earlier comment about the overall design; it doesn't really make sense to do it this way. It makes more sense to take the text-to-process itself, not a shell command that is expected to produce the text-to-process.
Here is how that could be done:
## take the text-to-process via a command-line argument
sql="$1";
## process and echo it
echo "$sql"| tr a-z A-Z;
(I also removed the -s option of tr, which really doesn't make sense here.)
Notice that the script is simpler now, and usage is also simpler:
$ sh foo.sh 'select * from table';

Resources