How to predictably run shell script in unknown user environment? - bash

Summary
How can I guarantee that my shell scripts will do what I expect, regardless of the environment?
(Let's assume that people have alias'd and function'd everything they can, but that they haven't touched any system binaries eg. /bin/ls)
Explanation
I am distributing shell scripts as part of an app. These shell scripts are executed in the user's environment - this cannot be changed.
This means users may have aliases for anything and functions redefining "standard" behavior. There have already been a few cases when normal shell keywords have been redefined (eg. local), causing unexpected side effects and crashes.
The only tokens that cannot be defined as functions are as follows:
Bash:
! [[ ]] case coproc do done elif else esac fi for function if in select then time until while { }
ZSH:
! [[ case coproc do done elif else end esac fi for foreach function if nocorrect repeat select then time until while { }
I am aware that:
You can escape a word to skip alias lookup
You can use builtin to always run a builtin
You can use command to always run a command
However, builtin and command can be redefined, so \builtin <command> may not always do what I expect.

Aliases are not expanded in bash scripts (unless you explicitly request this), and functions are usually not inherited by child processes. The caller of your script just has to avoid sourcing it. Problems could be environment variables and file handles.
It is difficult to make a script completely self-containing. For instance, I have seen cases where even standard programs (ls, cat,....) are stored in different locations, which means that if you set up your own PATH and don't know anything about the target platform, you have to apply some heuristics (searching a list of "commonly known directories") and hope that your search is correct.
A more reliable way would be to require from the user of the script to provide a certain minimal configuration (typically containing the basic definition for a PATH) and pass this configuration as parameter to your script.
There is one problem pointed out in the comment by Renaud Pacalet, in that bash allows functions to be exported (using export -f), and in bash, you would have to find out which functions exist, and explicitly remove their definitions (similarily as you would do it with environment variables). However, I see that you have tagged your question by bash and zsh, and if you don't mind, which script language you are using, writing the script in zsh would be perhaps better, because zsh does not have exported functions.
One point to keep in mind is, that every shell, bash and zsh, processes on startup certain files, before the commands in your script have any chance to run. For instance, no matter how you start your zsh, it will always process /etc/zshenv. For instance, if your script at one point invokes a zsh child script too, it would again run /etc/zshenv.
Of course, those startup files could set up functions, and in zsh, aliases are (AFIK) even expanded inside scripts. The strategy would be therefore to initially loop over your environment variables, the currently defined functions, the currently defined aliases (in zsh), and remove those definitions. Then you set up your own definitions (functions, variables).

Related

Bash script ignores positional arguments after first time used

I noticed that my script was ignoring my positional arguments in old terminal tabs, but working on recently created ones, so I decided to reduce it to the following:
TAG=test
while getopts 't:' c
do
case $c in
t)
TAG=$OPTARG
;;
esac
done
echo $TAG
And running the script I have:
~ source my_script
test
~ source my_script -t "test2"
test2
~ source my_script -t "test2"
test
I thought it could be that c was an special used variable elsewhere but after changing it to other names I had the exact same problem. I also tried adding a .sh extension to the file to see it that was a problem, but nothing worked.
Am I doing something wrong ? And why does it work the first time, but not the subsequent attempts ?
I am on MacOS and I use zsh.
Thank you very much.
The problem is that you're using source to run the script (the . command does the same thing). This makes it run in your current (interactive) shell (rather than a subprocess, like scripts normally do). This means it uses the same variables as the current shell, which is necessary if you want it to change those variables, but it can also have weird effects if you're not careful.
In this case, the problem is that getopts uses the variable OPTIND to keep track of where it is in the argument list (so it doesn't process the same argument twice). The first time you run the script with -t test2, getopts processes those arguments, and leaves OPTIND set to 3 (meaning that it's already done the first two arguments, "-t" and "test2". The second time you run it with options, it sees that OPTIND is set to 3, so it thinks it's already processed both arguments and just exits the loop.
One option is to add unset OPTIND before the while getopts loop, to reset the count and make it start from the beginning each time.
But unless there's some reason for this script to run in the current shell, it'd be better to make it a standard shell script and have it run as a subprocess. To do this:
Add a "shebang" line as the first line of the script. To make the script run in bash, that'd be either #!/bin/bash or #!/usr/bin/env bash. For zsh, use #!/bin/zsh or #!/usr/bin/env zsh. Since the script runs in a separate shell process, the you can run bash scripts from zsh or zsh scripts from bash, or whatever.
Add execute permission to the script file with chmod -x my_script (or whatever the file's actual name is).
Run the script with ./my_script (note the lack of a space between . and /), or by giving the full path to the script, or by putting the script in some directory in your PATH (the directories that're automatically searched for commands) and just running my_script. Do NOT run it with the bash, sh, zsh etc commands; these override the shebang and therefore can cause confusion.
Note: adding ".sh" to the filename is not recommended; it does nothing useful, and makes the script less convenient to run since you have to type in the extension every time you run it.
Also, a couple of recommendations: there are a bunch of all-caps variable names with special meanings (like PATH and OPTIND), so unless you want one of those special meanings, it's best to use lower- or mixed-case variable names (e.g. tag instead of TAG). Also, double-quoting variable references (e.g. echo "$tag" instead of echo $tag) avoids a lot of weird parsing headaches. Run your scripts through shellcheck.net; it's good at spotting common mistakes like this.

Script runs when executed but fails when sourced

Original Title: Indirect parameter substitution breaks when the script is sourced (zsh)
zsh 5.7.1 (x86_64-apple-darwin19.0)
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
I’m developing a shell script on a Mac and I’m trying to keep it portable between bash & zsh, so array indexing is a consideration. I know that I can set KSH_ARRAYS to get indexing to start at 0, but I decided to query the OS for the shell that’s in use and set the start index accordingly, which led to the issue described below.
It made sense (to me anyway!) to use indirect expansion, which is what led to the problem. Consider the script indirect.sh:
#! /bin/bash
declare -r ARRAY_START_BASH=0
declare -r ARRAY_START_ZSH=1
declare -r SHELL_BASH=0
declare -r SHELL_ZSH=1
# Indirect expansion is used to reference the values of the variables declared
# in this case statement e.g. ${!ARRAY_START}
case $(basename $SHELL) in
"bash" )
declare -r SHELL_ID=SHELL_BASH
declare -r ARRAY_START=ARRAY_START_BASH
;;
"zsh" )
declare -r SHELL_ID=SHELL_ZSH
declare -r ARRAY_START=ARRAY_START_ZSH
;;
* )
return 1
;;
esac
echo "Shell ID: ${!SHELL_ID} Index arrays from: ${!ARRAY_START}"
It works fine when run from the command line while in the same directory:
<my home> ~ % echo "$(./indirect.sh)"
Shell ID: 1 Index arrays from: 1
Problems arise when I source the script:
<my home> ~ % echo "$(. ~/indirect.sh)"
/Users/<me>/indirect.sh:28: bad substitution
I don’t understand why sourcing the script changes the behavior of the parameter expansion.
Is this expected behavior? If so, I’d be grateful if someone could explain it and hopefully, offer a work around.
The problem described in the original post has nothing to do with indirect expansion. The difference in behavior is a result of different shells being invoked depending on whether the script is “executed” or “sourced”. These differences reveal the basic flaw in deriving the shell from the $SHELL variable that underpins the script's design. If the shell defined in $SHELL does not match the shebang, the script will fail either when sourced or executed. An explanation follows.
Indirect expansion doesn’t offer value in the given scenario because values could just as easily be assigned directly. They’ll have to be assigned that way regardless given the different syntax used for indirect expansion between shells. In fact, other syntax differences between shells makes the entire premise for detecting the shell moot! However, putting that aside, the difference in behavior is a result of different shells being invoked based on whether the script is “executed” or “sourced”. The behavior of sourcing is well documented with numerous explanations on the web, but for context here’s how it works:
Executing a Script
Use the “./“ syntax to execute a script.
When run this way, the script executes in a sub-shell. Any changes the
script makes to it’s shell are applied to the sub-shell, not the shell
in which the script was launched, so those changes are lost when the
shell exits because the sub-shell in which it executed is destroyed as
well. For example, if the script changes the working directory, it
does so in the sub-shell. The working directory of the main shell that
launched the script is unchanged when the script terminates. If you
want to make changes to the shell in which the script was launched, it
must be sourced.
Sourcing a Script
Use the “source “ syntax to source a
script. When run this way, the script essentially becomes an argument
for the source command, which handles invoking the appropriate
execution. Some shells (e.g. ksh) use a single period “.” instead of
“source”.
When a script is executed with the “./“ syntax, the shebang at the top of the file is used to determine which shell to use. When a script is sourced, the shebang is ignored and the shell in which the script is launched is used instead. Also note that the period that appears in the “./“ command syntax used to execute a script, is not related to the period that’s occasionally used as an alias for the source command.
The script in the post uses bash in the shebang statement, so it works when executed because it’s run using bash. When it’s sourced from zsh, it encounters the incorrect indirect expansion syntax:
“${!A_VAR}"
The correct syntax is:
"${(P)A_VAR}"
However, correcting the syntax won’t help because it will then fail when executed. The shebang will invoke bash and the syntax will be wrong again. That renders indirection useless for accessing a variable designed to indicate the shell in use. More importantly, a design based on querying an environment variable for the shell is flawed due to differences in the shell that’s ultimately used depending on whether the script is executed or sourced.
To add to your answer (what I'm going to say is too long for a comment), I can not think of any application, why your script could be useful if not sourced. Actually, I came accross the need of such a script by myself in exactly one occasion:
Since I use as interactive shell not only zsh, but also sometimes bash, so I have written my .zshrc and .bashrc to set up everything (including defining variables and shell functions for interactive use). In order to safe work,
I try to put code which works under both bash and zsh into a single file (say: .commonrc), and my .zshrc and .bashrc have inside them a
source .commonrc
While many things are so different in bash and zsh, that I can't put them into .commonrc, some can, provided I do some tweaking. One reason for headache is obviously the different indexing of arrays, which you seemingly try to solve. So I have also a similar feature. However, I don't nee ca case construct for this. Instead, my .bashrc looks like this (using your naming of the variables):
...
declare -r ARRAY_START=0
source .commonrc
...
and my .zshrc looks like this:
...
declare -r ARRAY_START=1
source .commonrc
...
Since it does not happen that the .bashrc is run from a zsh and vice versa, I don't need to query what kind of shell I have.

Bash what bash alias actually is? [duplicate]

I'm surprised hasn't been asked before, but…
What is the difference between
alias ⇢ alias EXPORT='alias'
function ⇢ function exporter() { echo $EXPORT }
and
export ⇢ export ALIAS='export'
and for that matter...
alias export=$(function) (j/k)
in bash (zsh, et al.)
Specifically, I'd be most interested in knowing the lexical/practical difference between
alias this=that
and
export that=this
I have both forms... all over the place - and would prefer to stop arbitrarily choosing one, over the other. 😂
I'm sure there is a great reference to a "scopes and use-cases for unix shells", somewhere... but thought I'd post the question here, in the name of righteous-canonicalicism.
You're asking about two very different categories of things: aliases and functions define things that act like commands; export marks a variable to be exported to child processes. Let me go through the command-like things first:
An alias (alias ll='ls -l') defines a shorthand for a command. They're intended for interactive use (they're actually disabled by default in shell scripts), and are simple but inflexible. For example, any arguments you specify after the alias simply get tacked onto the end of the command; if you wanted something like alias findservice='grep "$1" /etc/services', you can't do it, because $1 doesn't do anything useful here.
A function is like a more flexible, more powerful version of an alias. Functions can take & process arguments, contain loops, conditionals, here-documents, etc... Basically, anything you could do with a shell script can be done in a function. Note that the standard way to define a function doesn't actually use the keyword function, just parentheses after the name. For example: findservice() { grep "$1" /etc/services; }
Ok, now on to shell variables. Before I get to export, I need to talk about unexported variables. Basically, you can define a variable to have some (text) value, and then if you refer to the variable by $variablename it'll be substituted into the command. This differs from an alias or function in two ways: an alias or function can only occur as the first word in the command (e.g. ll filename will use the alias ll, but echo ll will not), and variables must be explicitly invoked with $ (echo $foo will use the variable foo, but echo foo will not). More fundamentally, aliases and functions are intended to contain executable code (commands, shell syntax, etc), while variables are intended to store non-executable data.
(BTW, you should almost always put variable references inside double-quotes -- that is, use echo "$foo" instead of just echo $foo. Without double-quotes the variable's contents get parsed in a somewhat weird way that tends to cause bugs.)
There are also some "special" shell variables, that are automatically set by the shell (e.g. $HOME), or influence how the shell behaves (e.g. $PATH controls where it looks for executable commands), or both.
An exported variable is available both in the current shell, and also passed to any subprocesses (subshells, other commands, whatever). For example, if I do LC_ALL=en_US.UTF-8, that tells my current shell use the "en_US.UTF-8" locale settings. On the other hand, if I did export LC_ALL=en_US.UTF-8 that would tell the current shell and all subprocesses and commands it executes to use that locale setting.
Note that a shell variable can be marked as exported separately from defining it, and once exported it stays exported. For example, $PATH is (as far as I know) always exported, so PATH=/foo:/bar has the same effect as export PATH=/foo:/bar (although the latter may be preferred just in case $PATH somehow wasn't already exported).
It's also possible to export a variable to a particular command without defining it in the current shell, by using the assignment as a prefix for the command. For example LC_ALL=en_US.UTF-8 sort filename will tell the sort command to use the "en_US.UTF-8" locale settings, but not apply that to the current shell (or any other commands).
TL;DR:
The shell evaluation order (per POSIX) for the entities in your question is:
aliases --> variables --> command substitutions --> special built-ins --> functions --> regular built-ins
Aliases do not persist across subshells, but variables (and in Bash, functions) can be made to do so with the export command.
Regular built-ins can be overridden by writing functions that have the same name as the regular built-in (since functions expand before regular built-ins). (NOTE: If you're trying to add functionality to the regular built-in, call the built-in with command in your function definition so you don't accidentally create a recursive function.)
Variables can be made readonly with the (special built-in) readonly command, but aliases cannot.
USE CASES:
Export a variable if you need to use a variable across subshells.
Make a variable readonly if you don't want it changed for the life of the parent shell (once performed, this cannot be undone with unset; you must restart the parent shell).
If you want to override or add functionality to a regular built-in, use a function.
NOTE: If you want to be sure that you're using a special or regular built-in and not someone else's function, use builtin the_builtin, or if the shell doesn't support the builtin command, use the POSIX comand command -p the_builtin, where the -p switch tells command to use the $PATH that ships with the shell by default (in case the user has overriden path).
NOTE: A variable can be made to act like an alias that also persists across subshells and cannot be changed. For example,
#! /bin/sh
my_cmd='ls -al'
export my_cmd
readonly my_cmd
will act like
#! /bin/sh
alias my_cmd='ls -al'
so long as
my_cmd is used without double-quotes (i.e. ${my_cmd}, NOT "${my_cmd}") so it isn't treated as a single string, and
IFS is the standard space-tab-newline and not switched to something else so that the elements of my_cmd are globbed and each part separated by a space is evaluated as a single token (otherwise it will be evaluated as a single string).
Each shell (e.g. bash, zsh, ksh, yash, etc.) is a bit different, so be sure to review the reference manual for it (they each implement POSIX in a unique way, or sometimes not at all).

Access local shell variables in vim

In vim I can access my bash environment variables such as $PWD and $PATH. I would like to know how to access my temporary shell variables in vim too.
For example, suppose I was in my terminal and define a variable foo="bar". Then I enter vim and try to access this variable with the following command :!echo $foo, but it does not recognize this variable. From my understanding, vim starts a new shell each time a bash command is invoked and then closes it immediately after. Is there a way to use the same shell in vim that my local variable foo was defined in?
No, you can't interact with the parent shell from a subprocess it spawned (without that shell's active participation, which isn't reasonably/practically available in the scenario at hand) -- but you can export your variables to make them accessible to new shells started in child processes.
Running
set -a
...will make any variable defined going forward be automatically exported to the environment, even without an explicit export command.
Since (unlike the C system() function) vim's system() honors the SHELL environment variable, if SHELL=/bin/bash (or :set shell=/bin/bash has been run in vim), you can also invoke exported functions from vim. That is, if you define the function and export it as follows:
foo() { echo "bar"; }
export -f foo
...then you can invoke it with !foo from inside vim.
Even then, however, this is running in a new, transient shell instance, not the original parent process.
Explanation
Environment variables and shell variables are two entirely different concepts, but as we manipulate them in a similar way in bash, it's easy to get confused.
Whenever a process is created (by fork), it may include an environment, given by its parent at fork-time. The child process may then access and modify its content. How this is done as a user depends on the program :
In vim, you can access an environment variable like this : :echo $foo
In bash, you can access it like this : $ echo "$foo"
In most programming languages, you can access it with a syntax coherent with the rest of the language, such as ENV['foo'] in ruby
On the other hand, a program may allocate memory for any internal use, but notably, it will quite often define and use variables. Once again, this depends on the program :
In vim, you would use the :let command to assign an internal variable
In bash, you would assign a variable with $ foo='bar', and then read it with $ echo "$foo"
In most programming languages, you have a variation of the foo='bar' syntax, sometimes with type declarations, etc
As you can see, bash uses the same syntax to read an environment variable and one of its own private variables, which can lead to some confusion.
When you execute vim from your bash shell, the environment is copied over from the parent process (bash) to the child (vim), but the private memory of bash (including the variables you may have defined) are not.
Thus, accessing them from the child process would require some inter-process communication mechanism, between parent and child. While technically doable, this option is not implemented in bash nor vim.
Solution
In order for your variable to be accessible from vim (or any forked process, for that matter), you need it to be present in the environment of your vim process.
Several options to do that :
$ export foo='bar' : This will mark your variable for export to the environment of subsequently executed commands. That's what you want in most cases.
$ foo='bar' vim : This adds your variable to the environment of this vim command. Very useful for troubleshooting, or for one-liners.
$ set -a : As you can see in bash manpage, this marks every subsequent definitions for export to the environment of subsequent commands. It's essentially equivalent to prepending every subsequent definition by export.
To go further
The question uses the :!echo $foo syntax to display the value of foo, which is yet another usecase. The ! here is actually an escape sequence that allows you to execute a shell command from vim.
However, vim cannot execute anything in the parent shell (the one you executed the vim command in), so it creates a new bash shell in a child process, executes echo in it, and displays the result.
In the current case, the result is mostly the same, but it could easily be misleading in other situations, so it's important to understand what is happening here.
There is another vim syntax, using expand, that allows one to lookup variables : :echo expand("$foo")
It however works entirely differently.
If no internal variable named foo exists, vim will invoke a shell to look it up (similarly to what ! would do).
This options is way slower than an environment lookup, and not recommended for most usecases.
If you want to use a value from your shell on the :substitute command, there's actually a way to do it.
I don't know if it solves your need but here we go.
Let's say we want to substitute Mydir by your PWD:
:s/Mydir/\=expand($PWD)/g

Do subshells started by () read startup files?

Parentheses are used in shell to group commands, executing them in a subshell, so that they don't affect parent shell environment.
Now, I wonder if this spawned subshells do read init files, like any other shells.
From direct experience I'd say they don't.
But I don't find any place where that is stated.
Also, is this different for different types of shells?
In general, the behaviour of shells is not particularly well-defined since the only applicable standard was essentially the result of reverse-engineering the behaviour of various commonly-used shells. Nonetheless, there is an expectation that shells will converge to the standard, albeit with extensions.
Having said that, here's what Posix says about (...):
(compound-list)
Execute compound-list in a subshell environment.
And a subshell environment:
A subshell environment shall be created as a duplicate of the shell environment, except that signal traps that are not being ignored shall be set to the default action. Changes made to the subshell environment shall not affect the shell environment. Command substitution, commands that are grouped with parentheses, and asynchronous lists shall be executed in a subshell environment. Additionally, each command of a multi-command pipeline is in a subshell environment; as an extension, however, any or all commands in a pipeline may be executed in the current environment. All other commands shall be executed in the current shell environment.
The take-away here is that the subshell environment is a "duplicate of the shell environment", and not a new shell; the only difference is the specific exception for signal traps. So it is pretty clearly not expected that the subshell will undergo reinitialization, such as rereading startup files.
Posix only provides one requirement for start-up files, which is documented in Section 4 in the description of the sh utility:
ENV
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion by the shell, and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
Most shells implement a richer set of start-up files with specific names, so that the ENV variable may not be necessary. So the fact that Posix states "when and only when an interactive shell is invoked" is only indicative, but I think it is a good indication.
When a subshell is started, it is just a child resulting from a fork(), thus it inherits all from the father and doesn't need to read the config files, whose it already knows.
Conversely, when a shell is exec()-uted, it looses everything except PIDs and redirections, thus it has to read again config files.

Resources