Shell independent way of setting environment variable - bash

I need to make a script which can modify an environment variable of the calling shell. To allow the script to modify the environment variable I'm using source <script> and I want both bash and tcsh to be able to use the same script.
I'm hitting the fact that tcsh and bash have different if syntax so I can't even switch between the two inside the script. What is the best way to handle setting the environment variable?

Ok, you got me. I did some experimentation, and you might actually be able to do this with one script. (Update: I way overcomplicated the original, here's a much better solution that also works in zsh.)
What you're trying to create is a bash/tcsh polyglot (we'll assume for now that you don't want to support any other shells). I'll put the actual polyglot here, then some explanation and caveats afterwards:
if ( : != : ) then
echo "In a POSIX shell or zsh or ksh"
else
echo "In tcsh"
alias fi :
endif
fi
The first line is really the interesting bit in this polyglot.
In POSIX sh, it creates a subshell to run the command : with two arguments, == and :. : always returns true, so the first branch of the if-statement is executed. (Usually a semicolon is used after the condition in an if-statement, but in fact a close-paren works too, since both are control operators, which can be used to end a simple command – the condition in an if-statement is really a list, but that degenerates to a simple command, going by the Bash manual.)
In tcsh, it compares the string : with the string : – since they are equal, and we were testing for inequality, it executes the second branch.
The last line of the second (tcsh) branch just ensures that that tcsh won't complain that the final fi isn't a command. There's no need for a similar alias in the first branch, because the endif is still in the second branch of the if-statement as far as a POSIX shell is concerned.
With regard to caveats, you're somewhat limited in what you can actually do in the POSIX shell section: for example, you can't define any functions with the POSIX syntax (foo() {...}), since tcsh will complain about the parentheses, although the Bash syntax (function foo {...}) works. I assume there are similar limitations in the tcsh section.
This polyglot also doesn't work in fish, though it does work in zsh. (That's why the condition is : != : rather than something like : == '' – in zsh, == expands to the path to the command =, which doesn't exist.) It also appears to work in ksh (though at this point it's turning into less of a polyglot, more of a "is this shell csh" program...)

I hate to write an answer that does little more than expand on the comment made by #Ash to the original question. But I felt it important to note that you need to consider not just POSIX 1003 shells like bash and classic shells like csh/tcsh. You also need to consider modern alternatives like fish which is not compatible with either of those shells.
As #Ash noted the solution is to use "bridge" code for each of the invoking shells which maps the information into the syntax appropriate for the invoking shell.

Related

How to predictably run shell script in unknown user environment?

Summary
How can I guarantee that my shell scripts will do what I expect, regardless of the environment?
(Let's assume that people have alias'd and function'd everything they can, but that they haven't touched any system binaries eg. /bin/ls)
Explanation
I am distributing shell scripts as part of an app. These shell scripts are executed in the user's environment - this cannot be changed.
This means users may have aliases for anything and functions redefining "standard" behavior. There have already been a few cases when normal shell keywords have been redefined (eg. local), causing unexpected side effects and crashes.
The only tokens that cannot be defined as functions are as follows:
Bash:
! [[ ]] case coproc do done elif else esac fi for function if in select then time until while { }
ZSH:
! [[ case coproc do done elif else end esac fi for foreach function if nocorrect repeat select then time until while { }
I am aware that:
You can escape a word to skip alias lookup
You can use builtin to always run a builtin
You can use command to always run a command
However, builtin and command can be redefined, so \builtin <command> may not always do what I expect.
Aliases are not expanded in bash scripts (unless you explicitly request this), and functions are usually not inherited by child processes. The caller of your script just has to avoid sourcing it. Problems could be environment variables and file handles.
It is difficult to make a script completely self-containing. For instance, I have seen cases where even standard programs (ls, cat,....) are stored in different locations, which means that if you set up your own PATH and don't know anything about the target platform, you have to apply some heuristics (searching a list of "commonly known directories") and hope that your search is correct.
A more reliable way would be to require from the user of the script to provide a certain minimal configuration (typically containing the basic definition for a PATH) and pass this configuration as parameter to your script.
There is one problem pointed out in the comment by Renaud Pacalet, in that bash allows functions to be exported (using export -f), and in bash, you would have to find out which functions exist, and explicitly remove their definitions (similarily as you would do it with environment variables). However, I see that you have tagged your question by bash and zsh, and if you don't mind, which script language you are using, writing the script in zsh would be perhaps better, because zsh does not have exported functions.
One point to keep in mind is, that every shell, bash and zsh, processes on startup certain files, before the commands in your script have any chance to run. For instance, no matter how you start your zsh, it will always process /etc/zshenv. For instance, if your script at one point invokes a zsh child script too, it would again run /etc/zshenv.
Of course, those startup files could set up functions, and in zsh, aliases are (AFIK) even expanded inside scripts. The strategy would be therefore to initially loop over your environment variables, the currently defined functions, the currently defined aliases (in zsh), and remove those definitions. Then you set up your own definitions (functions, variables).

Script runs when executed but fails when sourced

Original Title: Indirect parameter substitution breaks when the script is sourced (zsh)
zsh 5.7.1 (x86_64-apple-darwin19.0)
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
I’m developing a shell script on a Mac and I’m trying to keep it portable between bash & zsh, so array indexing is a consideration. I know that I can set KSH_ARRAYS to get indexing to start at 0, but I decided to query the OS for the shell that’s in use and set the start index accordingly, which led to the issue described below.
It made sense (to me anyway!) to use indirect expansion, which is what led to the problem. Consider the script indirect.sh:
#! /bin/bash
declare -r ARRAY_START_BASH=0
declare -r ARRAY_START_ZSH=1
declare -r SHELL_BASH=0
declare -r SHELL_ZSH=1
# Indirect expansion is used to reference the values of the variables declared
# in this case statement e.g. ${!ARRAY_START}
case $(basename $SHELL) in
"bash" )
declare -r SHELL_ID=SHELL_BASH
declare -r ARRAY_START=ARRAY_START_BASH
;;
"zsh" )
declare -r SHELL_ID=SHELL_ZSH
declare -r ARRAY_START=ARRAY_START_ZSH
;;
* )
return 1
;;
esac
echo "Shell ID: ${!SHELL_ID} Index arrays from: ${!ARRAY_START}"
It works fine when run from the command line while in the same directory:
<my home> ~ % echo "$(./indirect.sh)"
Shell ID: 1 Index arrays from: 1
Problems arise when I source the script:
<my home> ~ % echo "$(. ~/indirect.sh)"
/Users/<me>/indirect.sh:28: bad substitution
I don’t understand why sourcing the script changes the behavior of the parameter expansion.
Is this expected behavior? If so, I’d be grateful if someone could explain it and hopefully, offer a work around.
The problem described in the original post has nothing to do with indirect expansion. The difference in behavior is a result of different shells being invoked depending on whether the script is “executed” or “sourced”. These differences reveal the basic flaw in deriving the shell from the $SHELL variable that underpins the script's design. If the shell defined in $SHELL does not match the shebang, the script will fail either when sourced or executed. An explanation follows.
Indirect expansion doesn’t offer value in the given scenario because values could just as easily be assigned directly. They’ll have to be assigned that way regardless given the different syntax used for indirect expansion between shells. In fact, other syntax differences between shells makes the entire premise for detecting the shell moot! However, putting that aside, the difference in behavior is a result of different shells being invoked based on whether the script is “executed” or “sourced”. The behavior of sourcing is well documented with numerous explanations on the web, but for context here’s how it works:
Executing a Script
Use the “./“ syntax to execute a script.
When run this way, the script executes in a sub-shell. Any changes the
script makes to it’s shell are applied to the sub-shell, not the shell
in which the script was launched, so those changes are lost when the
shell exits because the sub-shell in which it executed is destroyed as
well. For example, if the script changes the working directory, it
does so in the sub-shell. The working directory of the main shell that
launched the script is unchanged when the script terminates. If you
want to make changes to the shell in which the script was launched, it
must be sourced.
Sourcing a Script
Use the “source “ syntax to source a
script. When run this way, the script essentially becomes an argument
for the source command, which handles invoking the appropriate
execution. Some shells (e.g. ksh) use a single period “.” instead of
“source”.
When a script is executed with the “./“ syntax, the shebang at the top of the file is used to determine which shell to use. When a script is sourced, the shebang is ignored and the shell in which the script is launched is used instead. Also note that the period that appears in the “./“ command syntax used to execute a script, is not related to the period that’s occasionally used as an alias for the source command.
The script in the post uses bash in the shebang statement, so it works when executed because it’s run using bash. When it’s sourced from zsh, it encounters the incorrect indirect expansion syntax:
“${!A_VAR}"
The correct syntax is:
"${(P)A_VAR}"
However, correcting the syntax won’t help because it will then fail when executed. The shebang will invoke bash and the syntax will be wrong again. That renders indirection useless for accessing a variable designed to indicate the shell in use. More importantly, a design based on querying an environment variable for the shell is flawed due to differences in the shell that’s ultimately used depending on whether the script is executed or sourced.
To add to your answer (what I'm going to say is too long for a comment), I can not think of any application, why your script could be useful if not sourced. Actually, I came accross the need of such a script by myself in exactly one occasion:
Since I use as interactive shell not only zsh, but also sometimes bash, so I have written my .zshrc and .bashrc to set up everything (including defining variables and shell functions for interactive use). In order to safe work,
I try to put code which works under both bash and zsh into a single file (say: .commonrc), and my .zshrc and .bashrc have inside them a
source .commonrc
While many things are so different in bash and zsh, that I can't put them into .commonrc, some can, provided I do some tweaking. One reason for headache is obviously the different indexing of arrays, which you seemingly try to solve. So I have also a similar feature. However, I don't nee ca case construct for this. Instead, my .bashrc looks like this (using your naming of the variables):
...
declare -r ARRAY_START=0
source .commonrc
...
and my .zshrc looks like this:
...
declare -r ARRAY_START=1
source .commonrc
...
Since it does not happen that the .bashrc is run from a zsh and vice versa, I don't need to query what kind of shell I have.

prevent script injection when spawning command line with input arguments from external source

I've got a python script that wraps a bash command line tool, that gets it's variables from external source (environment variables). is there any way to perform some soft of escaping to prevent malicious user from executing bad code in one of those parameters.
for example if the script looks like this
/bin/sh
/usr/bin/tool ${VAR1} ${VAR2}
and someone set VAR2 as follows
export VAR2=123 && \rm -rf /
so it may not treat VAR2 as pure input, and perform the rm command.
Is there any way to make the variable non-executable and take the string as-is to the command line tool as input ?
The correct and safe way to pass the values of variables VAR1 and VAR2 as arguments to /usr/bin/tool is:
/usr/bin/tool -- "$VAR1" "$VAR2"
The quotes prevent any special treatment of separator or pattern matching characters in the strings.
The -- should prevent the variable values being treated as options if they begin with - characters. You might have to do something else if tool is badly written and doesn't accept -- to terminate command line options.
See Quotes - Greg's Wiki for excellent information about quoting in shell programming.
Shellcheck can detect many cases where quotes are missing. It's available as either an online tool or an installable program. Always use it if you want to eliminate many common bugs from your shell code.
The curly braces in the line of code in the question are completely redundant, as they usually are. Some people mistakenly think that they act as quotes. To understand their use, see When do we need curly braces around shell variables?.
I'm guessing that the /bin/sh in the question was intended to be a #! /bin/sh shebang. Since the question was tagged bash, note that #! /bin/sh should not be used with code that includes Bashisms. /bin/sh may not be Bash, and even if it is Bash it behaves differently when invoked as /bin/sh rather than /bin/bash.
Note that even if you forget the quotes the line of code in the question will not cause commands (like rm -rf /) embedded in the variable values to be run at that point. The danger is that badly-written code that uses the variables will create and run commands that include the variable values in unsafe ways. See should I avoid bash -c, sh -c, and other shells' equivalents in my shell scripts? for an explanation of (only) some of the dangers.
To avoid injections at best, consider switching to [T]csh.
Unlike Bourne Shells, the C Shell is "limited", thus instructing one to take different, safer paths to write scripts. The "limitations" imposed by the C Shell make it one of the most reliable Shells to work with.
(E.g: Nesting is minimal to impossible, thus preventing injections at all costs; there are better ways to achieve what one want.)

'which' command is incorrect

I have a shell script in my home directory called "echo". I added my home directory to my path, so that this echo would replace the other one.
To do this, I used: export PATH=/home/me:$PATH
When I do which echo, it shows the one I want. /home/me/echo
But when I actually do something like echo asdf it uses the system echo.
Am I doing something wrong?
which is an external command, so it doesn't have access to your current shell's built-in commands, functions, or aliases. In fact, at least on my system, /usr/bin/which is a shell script, so you can examine it and see how it works.
If you want to know how your shell will interpret a command, use type rather than which. If you're using bash, type -a will print all possible meanings in order of precedence. Consult your shell's documentation for details.
For most shells, built-in commands take precedence over commands in your $PATH. The whole point of having a built-in echo, for example, is that it's faster than loading /bin/echo into memory.
If you want your own echo command to override the shell's built-in echo, you can define it as a shell function.
On the other hand, overriding the built-in echo command doesn't strike me as a good idea in the first place. If it behaves the same as the built-in echo, there's not much point. If it doesn't, then it could break scripts that use echo expecting it to work a certain way. If possible, I suggest giving your command a different way. If it's an enhanced version of echo, you could even call it Echo.
It is likely using the shell's builtin.
If you want the one in your path you can do
`which echo` asdf
From this little article that explains the rules, here's a list in descending order of precedence:
Aliases
Shell functions
Shell builtin commands
Hash tables
PATH variable
echo is a shell builtin command (al least in bash) and PATH has the lowest priority. I guess you'll need to create a function or an alias.

Code that is a no-op in bash but stops with an error message in csh?

I am working with someone on a data analysis project and we frequently document the steps we perform by putting them into small shell scripts. The problem is that I use bash and the other person uses csh. The other person has a habit of using source to run these scripts instead of executing them directly (this habit probably dates back to times when spawning an extra shell was an extravagent waste of resources, so it's probably too entrenched to change) , and I want to have my scripts (which are, of course, bash scripts) simply stop with a message reminding the user to run them with bash instead of csh when this person sources them from within csh. At the same time, I would like them to continue to function as bash scripts.
So is there some code I can put at the beginning of my scripts that is a no-op in bash but will signal an error and cancel the execution of the rest of the file (but not kill the shell itself) when sourced from cshell?
This is harder than I thought due to csh's ancient variable substitution flavor. However, $?BASH_VERSION expands to 0 (not set) in csh and to 0BASH_VERSION (or whatever the last commands' RV was) in bash. So,
test "$?BASH_VERSION" = 0 && exit 1
should do the trick.
This is not easy, as you cannot assign variables the same way or run if statements the same way.
You can use csh's meagre string parsing skills against itself. The following executes cleanly in all shells, include KSH, BASH, ZSH, CSH and SH on all platforms that I tested it on (Linux, AIX, HP-UX, Solaris):
test '\\' = "\\" && echo "CSH detected"
The idea that is used is here is that backslashes are not special in double quoted strings on CSH, whereas all other shells do see them as different.
However, that is only half an answer as what do you want to do if you don't want to simply exit the script if the 'wrong' shell is detected? Well, you may want to have a Bournish sh part to your script and a csh part.
If you can keep the csh code limited to code that does not use single quotes, the following will work everywhere:
test '\\' = "\\" && goto csh
# Just skip the block containing the csh code. Again we use csh's meagre string parsing capabilities against it.
false || csh_code_block='
csh:
... csh code goes here ...
exit 0
'
... sh code goes here ...
If you are not worried about HP-UX's csh (which seems a little better than others in parsing) you could replace the multi-line single quoted command with a 'HERE' document (<<CSH_BLOCK ... CSH_BLOCK). You can't just reverse the order either, as the 'goto' statement doesn't like all syntax that it skips over.

Resources