I have a shell script from 2011, the purpose of which is to run on different unix systems.
The script defines certain variables and I don't understand the logic behind it, I would like to know why is this done so. For example:
instead of using echo or grep directly in the script, these variables are defined as follows:
ECHO="/bin/echo"
GREP="/bin/grep" (for linux)
for Solaris or other , the corresponding path is defined as well.
They are then used as ${ECHO} "something out"
What is the purpose of this practice and why can I not use it directly?
As others have pointed out, it is unlikely that those lines are correct, more likely they should be:
ECHO="/bin/echo"
GREP="/bin/grep" # (for linux)
Assuming that they are correct, code like this used to be commonly seen in shell scripts (not mine, I might add). You don't see many people using these any more.
echo: ksh (Korn shell, which used to be the shell of choice), csh (C-shell, default shell on Sun) and sh (Bourne shell before it was POSIX) all had their own built-in versions of echo that were slightly different (mostly around the -n argument). Therefore the stand-alone program /bin/echo was sometimes used for portability. There is a performance price to pay for that.
grep and others: It used to be commonly recommended that the full path name for external programs should be set in a script. The main reason was security. In theory a user could provide their own version in a local directory and change their PATH variable. PATH, and all other environment variables, is still considered a security risk by many. A secondary reason was the performance overhead of searching the directories of $PATH - this was before the days of tracked aliases (ksh) or hashing (bash).
I'll repeat that I don't subscribe to all these views, and I have had arguments over the years with those who do, however that is the explanation. In my opinion this practice causes more problems than it solves.
EDIT: the practices I mention go back to the 1970s and 80s. Why would they be in a script from 2011? Possibly because "we always do that", a.k.a. "company policy", i.e. no one knows or cares why, we just do. Alternatively it could be a case of copy n'paste from an old web-site or book, or even someone who believes this is a good idea.
There is no good reason whatsoever for this practice.
It reduces scripts' portability (by requiring modification when moving to any system with different binary locations), reduces performance (by suppressing use of shell builtins where available), and (as PATH lookups are cached) does not significantly improve runtime performance by saving lookup costs.
One caveat: On some systems, /bin/ is not the canonical location for POSIX tools; for instance, /usr/xpg/bin/sh would be the location for POSIX sh, and /usr/xpg/bin/awk would be the location for POSIX awk, on some ancient SunOS systems.
The wrong way to enforce use of POSIX-compliant tools on such a system is to hardcode these paths in variables defined at the top of the script.
The right way to enforce use of POSIX-compliant tools on such a system is simply to specify a PATH that puts /usr/xpg/bin before /bin. For instance, a script can specify [ -d /usr/xpg/bin ] && PATH=/usr/xpg/bin:$PATH, and serve this purpose thus.
Alternately, assume that one wishes to use GNU find. Instead of setting a FIND variable at the top of a script, one can specify a wrapper as needed, falling through to the default behavior of using the standard find command if no renamed alternative exists:
# use GNU find if under a name other than "find"
if type gnufind >/dev/null 2>&1; then
find() { gnufind "$#"; }
elif type gfind >/dev/null 2>&1; then
find() { gfind "$#"; }
fi
Limiting this answer the echo part.
Probably this was an attempt to make the program portable, but it was a futile one.
The echo command itself is always unportable if the arguments can contain a backslash or the first argument is -n. POSIX says in these cases the behavior will be implementation-dependent.
Source: https://www.gnu.org/software/coreutils/manual/html_node/echo-invocation.html#echo-invocation
Both dash and bash claim POSIX compliance, but echo 'a\nb' will lead to a different result. And both are correct. I would not rely on the hope that all stand-alone echo programs on the planet just happen to choose the same implementation either.
The easiest way to get the code really portable for any argument is to use printf instead of echo.
If you really wanted to call the command echo instead of the built-in because you are confident that your code will never be run on a system with a different implementation choice command echo would be the best way to do it.
Related
Summary
How can I guarantee that my shell scripts will do what I expect, regardless of the environment?
(Let's assume that people have alias'd and function'd everything they can, but that they haven't touched any system binaries eg. /bin/ls)
Explanation
I am distributing shell scripts as part of an app. These shell scripts are executed in the user's environment - this cannot be changed.
This means users may have aliases for anything and functions redefining "standard" behavior. There have already been a few cases when normal shell keywords have been redefined (eg. local), causing unexpected side effects and crashes.
The only tokens that cannot be defined as functions are as follows:
Bash:
! [[ ]] case coproc do done elif else esac fi for function if in select then time until while { }
ZSH:
! [[ case coproc do done elif else end esac fi for foreach function if nocorrect repeat select then time until while { }
I am aware that:
You can escape a word to skip alias lookup
You can use builtin to always run a builtin
You can use command to always run a command
However, builtin and command can be redefined, so \builtin <command> may not always do what I expect.
Aliases are not expanded in bash scripts (unless you explicitly request this), and functions are usually not inherited by child processes. The caller of your script just has to avoid sourcing it. Problems could be environment variables and file handles.
It is difficult to make a script completely self-containing. For instance, I have seen cases where even standard programs (ls, cat,....) are stored in different locations, which means that if you set up your own PATH and don't know anything about the target platform, you have to apply some heuristics (searching a list of "commonly known directories") and hope that your search is correct.
A more reliable way would be to require from the user of the script to provide a certain minimal configuration (typically containing the basic definition for a PATH) and pass this configuration as parameter to your script.
There is one problem pointed out in the comment by Renaud Pacalet, in that bash allows functions to be exported (using export -f), and in bash, you would have to find out which functions exist, and explicitly remove their definitions (similarily as you would do it with environment variables). However, I see that you have tagged your question by bash and zsh, and if you don't mind, which script language you are using, writing the script in zsh would be perhaps better, because zsh does not have exported functions.
One point to keep in mind is, that every shell, bash and zsh, processes on startup certain files, before the commands in your script have any chance to run. For instance, no matter how you start your zsh, it will always process /etc/zshenv. For instance, if your script at one point invokes a zsh child script too, it would again run /etc/zshenv.
Of course, those startup files could set up functions, and in zsh, aliases are (AFIK) even expanded inside scripts. The strategy would be therefore to initially loop over your environment variables, the currently defined functions, the currently defined aliases (in zsh), and remove those definitions. Then you set up your own definitions (functions, variables).
I have been working on a few scripts on CentOS 7 and sometimes I see:
#!/bin/sh -
on the first line. Looking at the man page for sh I see the following under the Special Parameters
- Expands to the current option flags as specified upon invocation,
by the set builtin command, or those set by the shell
itself (such as the -i option).
What exactly does this mean? When do I need to use this special parameter option??
The documentation you are reading has nothing to do with the command line you're looking at: it's referring to special variables. In this case, if you run echo $- you will see "the current option flags as specified upon invocation...".
If you take a look at the OPTIONS part of the bash man page, you will find:
-- A -- signals the end of options and disables further option processing.
Any arguments after the -- are treated as filenames and arguments. An
argument of - is equivalent to --.
In other words, an argument of - simply means "there are no other options after this argument".
You often see this used in situation in which you want to avoid filenames starting with - accidentally being treated as command options: for example, if there is a file named -R in your current directory, running ls * will in fact behave as ls -R and produce a recursive listing, while ls -- * will not treat the -R file specially.
The single dash when used in the #! line is meant as a security precaution. You can read more about that here.
/bin/sh is an executable representing the system shell. Actually, it is usually implemented as a symbolic link pointing to the executable for whichever shell is the system shell. The system shell is kind of the default shell that system scripts should use. In Linux distributions, for a long time this was usually a symbolic link to bash, so much so that it has become somewhat of a convention to always link /bin/sh to bash or a bash-compatible shell. However, in the last couple of years Debian (and Ubuntu) decided to switch the system shell from bash to dash - a similar shell - breaking with a long tradition in Linux (well, GNU) of using bash for /bin/sh. Dash is seen as a lighter, and much faster, shell which can be beneficial to boot speed (and other things that require a lot of shell scripts, like package installation scripts).
Dash is fairly well compatible with bash, being based on the same POSIX standard. However, it doesn't implement the bash-specific extensions. There are scripts in existence that use #!/bin/sh (the system shell) as their shebang, but which require bash-specific extensions. This is currently considered a bug that should be fixed by Debian and Ubuntu, who require /bin/sh to be able to work when pointed to dash.
Even though Ubuntu's system shell is pointing to dash, your login shell as a user continues to be bash at this time. That is, when you log in to a terminal emulator anywhere in Linux, your login shell will be bash. Speed of operation is not so much a problem when the shell is used interactively, and users are familiar with bash (and may have bash-specific customization in their home directory).
I need to make a script which can modify an environment variable of the calling shell. To allow the script to modify the environment variable I'm using source <script> and I want both bash and tcsh to be able to use the same script.
I'm hitting the fact that tcsh and bash have different if syntax so I can't even switch between the two inside the script. What is the best way to handle setting the environment variable?
Ok, you got me. I did some experimentation, and you might actually be able to do this with one script. (Update: I way overcomplicated the original, here's a much better solution that also works in zsh.)
What you're trying to create is a bash/tcsh polyglot (we'll assume for now that you don't want to support any other shells). I'll put the actual polyglot here, then some explanation and caveats afterwards:
if ( : != : ) then
echo "In a POSIX shell or zsh or ksh"
else
echo "In tcsh"
alias fi :
endif
fi
The first line is really the interesting bit in this polyglot.
In POSIX sh, it creates a subshell to run the command : with two arguments, == and :. : always returns true, so the first branch of the if-statement is executed. (Usually a semicolon is used after the condition in an if-statement, but in fact a close-paren works too, since both are control operators, which can be used to end a simple command – the condition in an if-statement is really a list, but that degenerates to a simple command, going by the Bash manual.)
In tcsh, it compares the string : with the string : – since they are equal, and we were testing for inequality, it executes the second branch.
The last line of the second (tcsh) branch just ensures that that tcsh won't complain that the final fi isn't a command. There's no need for a similar alias in the first branch, because the endif is still in the second branch of the if-statement as far as a POSIX shell is concerned.
With regard to caveats, you're somewhat limited in what you can actually do in the POSIX shell section: for example, you can't define any functions with the POSIX syntax (foo() {...}), since tcsh will complain about the parentheses, although the Bash syntax (function foo {...}) works. I assume there are similar limitations in the tcsh section.
This polyglot also doesn't work in fish, though it does work in zsh. (That's why the condition is : != : rather than something like : == '' – in zsh, == expands to the path to the command =, which doesn't exist.) It also appears to work in ksh (though at this point it's turning into less of a polyglot, more of a "is this shell csh" program...)
I hate to write an answer that does little more than expand on the comment made by #Ash to the original question. But I felt it important to note that you need to consider not just POSIX 1003 shells like bash and classic shells like csh/tcsh. You also need to consider modern alternatives like fish which is not compatible with either of those shells.
As #Ash noted the solution is to use "bridge" code for each of the invoking shells which maps the information into the syntax appropriate for the invoking shell.
I was writing some code, navigating my computer (OSX 10.11.6) via the command line, like I always do, and I made a typo! Instead of typing:
cd USB
I typed
Cd USB
Nothing happened, but it didn't register as an invalid command. Perplexed by this, I did some investigating: I checked the man entry. There was no entry. I found the source file (/usr/bin/Cd) using which Cd, and then cated it:
#!/bin/sh
# $FreeBSD: src/usr.bin/alias/generic.sh,v 1.2 2005/10/24 22:32:19 cperciva Exp $
# This file is in the public domain.
builtin `echo ${0##*/} | tr \[:upper:] \[:lower:]` ${1+"$#"}
What is this, and why is it here? How does it relate to freeBSD?
Any help would be amazing, thanks!
macOS uses a case-insensitive filesystem by default[1]
, which can be misleading at times:
which Cd is effectively the same as which cd and which CD in terms of returning the (effectively) same file path.
Confusingly, even though all 3 command refer to the same file, they do so in a case-preserving manner, misleadingly suggesting that the actual case of the filename is whatever you specified.
As a workaround, you can see the true case of the filename if you employ globbing (filename expansion):
$ ls "$(which Cd)"* # could match additional files, but the one of interest is among them
/usr/bin/cd # true case of the filename
Bash (the macOS default shell) is internally case-sensitive.
That is, it recognizes cd as builtin cd (its built-in directory-changing command).
By contrast, it does NOT recognize Cd as that, due to the difference in case.
Given that it doesn't recognize Cd as a builtin, it goes looking for an external utility (in the $PATH), and that is when it finds /usr/bin/cd.
/usr/bin/cd is implemented as a shell script, which is mostly useless, because as an external utility it cannot affect the shell's state, so its attempts to change the directory are simply quietly ignored.
(Keith Thompson points out in a comment that you can use it as test whether a given directory can be changed to, because the script's exit code will reflect that).
Matt's answer provides history behind the inclusion of the script in FreeBSD and OSX (which mostly builds on FreeBSD), but it's worth taking a closer look at the rationale (emphasis mine):
From the POSIX spec:
However, all of the standard utilities, including the regular built-ins in the table, but not the special built-ins described in Special Built-In Utilities, shall be implemented in a manner so that they can be accessed via the exec family of functions as defined in the System Interfaces volume of POSIX.1-2008 and can be invoked directly by those standard utilities that require it (env, find, nice, nohup, time, xargs).
In essence, the above means: regular built-ins must (also) be callable stand-alone, as executables (whether as scripts or binaries), nut just as built-ins from within the shell.
The cited regular built-ins table comprises these utilities:
alias bg cd command false fc fg getopts jobs kill newgrp pwd read true umask unalias wait
Note: special built-in utilities are by definition shell-internal only, and their behavior differs from regular built-in utilities.
As such, to be formally POSIX-compliant an OS must indeed provide cd as an external utility.
At the same time, the POSIX spec. does have awareness that at least some of these regular built-ins - notably cd - only makes sense as a built-in:
"Since cd affects the current shell execution environment, it is always provided as a shell regular built-in." - http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cd.html
Among the regular built-in utilities listed, some make sense both as a built-in and as an external utility:
For instance kill needs to be a built-in in order to kill jobs (which are a shell-internal concept), but it is also useful as an external utility, so as to kill processes by PID.
However, among the regular built-in utilities listed, the following never make sense as external utilities, as far as I can tell Do tell me if you disagree
, even though POSIX mandates their presence:
alias bg cd command fc fg getopts jobs read umask unalias
Tip of the hat to Matt for helping to complete the list; he also points that the hash built-in, even though it's not a POSIX utility, also has a pointless script implementation.
[1] As Dave Newton points out in a comment, it is possible to format HFS+, the macOS filesystem, in a case-sensitive manner (even though most people stick with the case-insensitive default). Based on the answer Dave links to, the following command will tell you whether your macOS filesystem is case-insensitive or not:
diskutil info / | grep -iq '^\s*Name.*case-sensitive*' && echo "case-SENSITIVE" || echo "case-INsensitive"
What is this?
The script itself is a portable way to convert a command, even with random upper casing, into the equivalent shell builtin based on the exec paths file name, that is any part of the string after the final / in the $0 variable). The script then runs the builtin command with the same arguments.
As OSX file systems are case insensitive by default, /usr/bin/cd converts running Cd, CD, cD and any form of cd with a / fs path (like /usr/bin/cd) back to the shell builtin command cd. This is largely useless in a script as cd only affects the current shell it is running in, which immediately closes when the script ends.
How does it relate to freeBSD?
A similar file exists in FreeBSD, which Apple adapted to do case conversion. Mac file systems by default are case insensitive (but case preserving).
The $FreeBSD: src/usr.bin/alias/generic.sh,v 1.2 2005/10/24 22:32:19 cperciva Exp $ header is the source information in the file.
Most of the underlying OSX system comes directly from FreeBSD or was based on it. The Windowing system on top of this and the Cocoa app layer is where OSX becomes truly Apple. Some of the lower level Apple bits have even made it back into FreeBSD like Clang and LLVM compiler.
Why is it here?
The earlier FreeBSD svn commits shed a bit of light:
A little bit more thought has resulted in a generic script which can
implement any of the useless POSIX-required ``regular shell builtin''
utilities...
Although most builtins aren't very useful when run in a new shell via a script, this compliance script was used for the commands alias bg cd command fc fg getopts hash jobs read type ulimit umask unalias wait. POSIX compliance is fun!
As I recall, MacOS uses a case-insensitive file system by default. The command you saw as /usr/bin/Cd is actually /usr/bin/cd, but it can be referred to by either name.
You can see this by typing
ls /usr/bin/ | grep -i cd
Normally cd is a builtin command in the shell. As you know, it changes the current directory. An external cd command is nearly useless -- but it still exists.
It can be used to detect whether it's possible to change to a specified directory without actually affecting the working directory of your current process.
Your shell (probably bash) tends to assume case-sensitive command names. The builtin command can only be referred to as cd, but since it's able to open the script file named /usr/bin/Cd, it can find and execute it.
I have a shell script in my home directory called "echo". I added my home directory to my path, so that this echo would replace the other one.
To do this, I used: export PATH=/home/me:$PATH
When I do which echo, it shows the one I want. /home/me/echo
But when I actually do something like echo asdf it uses the system echo.
Am I doing something wrong?
which is an external command, so it doesn't have access to your current shell's built-in commands, functions, or aliases. In fact, at least on my system, /usr/bin/which is a shell script, so you can examine it and see how it works.
If you want to know how your shell will interpret a command, use type rather than which. If you're using bash, type -a will print all possible meanings in order of precedence. Consult your shell's documentation for details.
For most shells, built-in commands take precedence over commands in your $PATH. The whole point of having a built-in echo, for example, is that it's faster than loading /bin/echo into memory.
If you want your own echo command to override the shell's built-in echo, you can define it as a shell function.
On the other hand, overriding the built-in echo command doesn't strike me as a good idea in the first place. If it behaves the same as the built-in echo, there's not much point. If it doesn't, then it could break scripts that use echo expecting it to work a certain way. If possible, I suggest giving your command a different way. If it's an enhanced version of echo, you could even call it Echo.
It is likely using the shell's builtin.
If you want the one in your path you can do
`which echo` asdf
From this little article that explains the rules, here's a list in descending order of precedence:
Aliases
Shell functions
Shell builtin commands
Hash tables
PATH variable
echo is a shell builtin command (al least in bash) and PATH has the lowest priority. I guess you'll need to create a function or an alias.