For portable shell scripts without long options, can getopt always be used? - shell

I need to write POSIX shell scripts for many platforms and just discovered that at least one of them does not have getopts but it does have getopt.
Is getopt supported everywhere? If not, where is it not?
I don't have any 'long options'. I do have options which take no argument, others which take an integer or a string (usually a path without spaces), and other positional arguments (all placed after the options since I was previously relying on getopts).
As long as I don't need long options, can I always rely on getopt? If not, when not?
Edit: I read a quote from a FAQ about how we should just forget getopt even exists but it was followed by an answer which appeared to disprove the rationale for the quote.
Stéphane Chazelas wrote (emphasis mine) "getopt is a traditional command that comes from System V long before Linux was ever released. getopt was never standardised. None of POSIX, Unix, or Linux (LSB) ever standardized the getopt command."
Is there a way to use getopt on all three? Like a minimal feature set which is common to all three?
Thanks!

The argument against getopt is against versions not from util-linux (i.e. "traditional versions of getopt" from the given Bash FAQ link).
The answer you linked to misses that context (I'd almost argue intentionally misses it as the quoted snippet starts immediately after the crucially important context word "traditional". And follows the sentence that explains what "traditional" in that context means.)
util-linux getopt supports them, traditional getopt does not. That makes it entirely non-portable.
I cannot speak to the general portability of getopt beyond that but I would expect that its basic functionality is likely to work just about everywhere (and more to the point unless you know your code is going to run on "obscure" environments it likely isn't going to).
That being said the non-getopt solutions that should be entirely portable are not particularly complicated and should be able handle everything you care to write the code for.

Related

What is the rationale behind variable assignment without space in bash script

I am trying to write an automate process for AWS that requires some JSON processing and other things in bash script. I am following a few blogs for bash script and I found this:
a=b
with the following note:
There is no space on either side of the equals ( = ) sign. We
also leave off the $ sign from the beginning of the variable name when
setting it
This is ugly and very difficult to read and comparing to other scripting languages, it is easy for user to make a mistake when writing a bash script by leaving space in between. I think everyone like to write clean and readable code, this restriction for sure is bad for code readability.
Can you explain why? explanation with examples are highly appreciated.
It's because otherwise the syntax would be ambiguous. Consider this command line:
cat = foo
Is that an assignment to the variable cat, or running the command cat with the arguments "=" and "foo"? Note that "=" and "foo" are both perfectly legal filenames, and therefore reasonable things to run cat on. Shell syntax settles this in favor of the command interpretation, so to avoid this interpretation you need to leave out the spaces. cat =foo has the same problem.
On the other hand, consider:
var= cat
Is that the command cat run with the variable var set to the empty string (i.e. a shorthand for var='' cat), or an assignment to the shell variable var? Again, the shell syntax favors the command interpretation so you need to avoid the temptation to add spaces.
There are many places in shell syntax where spaces are important delimiters. Another commonly-messed-up place is in tests, where if you leave out any of the spaces in:
if [ "$foo" = "$bar" ]
...it will lead to a different meaning, which might cause an error, or might just silently do the wrong thing.
What I'm getting at is that shell syntax does not allow you to arbitrarily add or remove spaces to improve readability. Don't even try, you'll just break things.
What you need to understand is that the shell language and syntax is old. Really old. The first version of the UNIX shell with variables was the Bourne shell which was designed and implemented in 1977. Back then, there were few precedents. (AFAIK, just the Thompson shell, which didn't support variables according to the manual entry.)
The rationale for the design decisions in the 1970's are ... lost in the mists of time. The design decisions were made by Steve Bourne and colleagues working at Bell Labs on v6 UNIX. They probably had no idea that their decisions would still be relevant 40+ years later.
The Bourne shell was designed to be general purpose and simple to use ... compared with the alternative of writing programs in C. And small. It was an outstanding success in those terms.
However, any language that is successful has the "problem" that it gets widely adopted. And that makes it more difficult to fix any issues (real or perceived) that may arise. Any proposal to change a language needs to be balanced against the impact of that change on existing users / uses of the language. You don't want to break existing programs or scripts.
Irrespective of arguments about whether spaces around = should be allowed in a shell variable assignment, changing this would break millions of shell scripts. It is just not going to happen.
Of course, Linux (and UNIX before it) allow you to design and implement your own shell. You could (in theory) replace the default shell. It is just a lot of work.
And there is nothing stopping you from writing your scripts in another scripting language (e.g. Python, Ruby, Perl, etc) or designing and implementing your own scripting language.
In summary:
We cannot know for sure why they designed the shell with this syntax for variable assignment, but it is moot anyway.
Reference:
Evolution of shells in Linux: a history of shells.
It prevents ambiguity in a lot of cases. Otherwise, if you have a statement foo = bar, it could then either mean run the foo program with = and bar as arguments, or set the foo variable to bar. When you require that there are no spaces, now you've limited ambiguity to the case where a program name contains an equals sign, which is basically unheard of.
I agree with #StephenC, and here's some more context with sources:
Unix v6 from 1975 did not have an environment, there was just a exec syscall that took a program and a string array of arguments. The system sh, written by Thompson, did not support variables, only single digit numbered arguments like $1 (probably why $12 to this day is interpreted as ${1}2)
Unix v7 from 1979, emboldened by advances in hardware, added a ton of features including a second string array to the exec call. The man page described it like this, which is still how it works to this day:
An array of strings called the environment is made available by exec(2) when a process begins. By convention these strings have the form name=value
The system sh, now written by Bourne, worked much like v6 shell, but now allowed you to specify these environment strings in the same format in front of commands (because which other format would you use?). The simplistic parser essentially split words by spaces, and flagged a word as destined for a variable if it contained a = and all preceding characters had been alphanumeric.
Thanks to Unix v7's incredible popularity, forks and clones copied a lot of things including this behavior, and that's what we're still seeing today.

Is Bash an interpreted language?

From what I've read so far, bash seems to fit the defintion of an interpreted language:
it is not compiled into a lower format
every statement ends up calling a subroutine / set of subroutines already translated into machine code (i.e. echo foo calls a precompiled executable)
the interpreter itself, bash, has already been compiled
However, I could not find a reference to bash on Wikipedia's page for interpreted languages, or by extensive searches on Google. I've also found a page on Programmers Stack Exchange that seems to imply that bash is not an interpreted language- if it's not, then what is it?
Bash is definitely interpreted; I don't think there's any reasonable question about that.
There might possibly be some controversy over whether it's a language. It's designed primarily for interactive use, executing commands provided by the operating system. For a lot of that particular kind of usage, if you're just typing commands like
echo hello
or
cp foo.txt bar.txt
it's easy to think that it's "just" for executing simple commands. In that sense, it's quite different from interpreted languages like Perl and Python which, though they can be used interactively, are mainly used for writing scripts (interpreted programs).
One consequence of this emphasis is that its design is optimized for interactive use. Strings don't require quotation marks, most commands are executed immediately after they're entered, most things you do with it will invoke external programs rather than built-in features, and so forth.
But as we know, it's also possible to write scripts using bash, and bash has a lot of features, particularly flow control constructs, that are primarily for use in scripts (though they can also be used on the command line).
Another distinction between bash and many scripting languages is that a bash script is read, parsed, and executed in order. A syntax error in the middle of a bash script won't be detected until execution reaches it. A Perl or Python script, by contrast, is parsed completely before execution begins. (Things like eval can change that, but the general idea is valid.) This is a significant difference, but it doesn't mark a sharp dividing line. If anything it makes Perl and Python more similar to compiled languages.
Bottom line: Yes, bash is an interpreted language. Or, perhaps more precisely, bash is an interpreter for an interpreted language. (The name "bash" usually refers to the shell/interpreter rather than to the language that it interprets.) It has some significant differences from other interpreted languages that were designed from the start for scripting, but those differences aren't enough to remove it from the category of "interpreted languages".
Bash is an interpreter according to the GNU Bash Reference Manual:
Bash is the shell, or command language interpreter, for the GNU operating system.

Short/long options with option argument - is this some sort of convention? [duplicate]

This question already has answers here:
What is the general syntax of a Unix shell command?
(4 answers)
Closed 7 years ago.
It seems that most (a lot of) commands implement option arguments like this:
if a short option requires an option argument, the option is separated by a space from the option argument, e.g.
$ head -n 10
if a long option requires an option argument, the option is separated by a = from the option argument, e.g.
$ head --lines=10
Is this some sort of convention and yes, where can I find it? Besides, what's the reasoning?
Why e.g. is it not
$ head --lines 10
?
The short option rationale is documented in the POSIX Utility Conventions. Most options parsers allow the value to be 'attached' to the letter (-n10), mainly because of extensive historical precedent.
The long option rationale is specified by GNU in their Coding Standards and in the manual page for getopt_long().
Once upon a long time ago, in a StackOverflow of long ago, there was a question about command option styles. Not perhaps a good question, but I think the answers rescued it (but I admit to bias). Anyway, it has since been deleted, so I'm going to resuscitate my answer here because (a) it was a painful process to rediscover the answer and (b) it has useful information in it related to options.
How many different types of options do you recognize? I can think of many, including:
Single-letter options preceded by single dash, groupable when there is no argument, argument can be attached to option letter or in next argument (many, many Unix commands; most POSIX commands).
Single-letter options preceded by single dash, grouping not allowed, arguments must be attached (RCS).
Single-letter options preceded by single dash, grouping not allowed, arguments must be separate (pre-POSIX SCCS, IIRC).
Multi-letter options preceded by single dash, arguments may be attached or in next argument (X11 programs).
Multi-letter options preceded by single dash, may be abbreviated (Atria Clearcase).
Multi-letter options preceded by single plus (obsolete).
Multi-letter options preceded by double dash; arguments may follow '=' or be separate (GNU utilities).
Options without prefix/suffix, some names have abbreviations or are implied, arguments must be separate. (AmigaOS Shell, added by porneL)
Options taking an optional argument sometimes must be attached, sometimes must follow an '=' sign. POSIX doesn't support optional arguments meaningfully (the POSIX getopt() only allows them for the last option on the command line).
All sensible option systems use an option consisting of double-dash ('--') alone to mean "end of options" - the following arguments are "non-option arguments" (usually file names) even if they start with a dash. (I regard supporting this notation as an imperative.) Note that if you have a command cmd with an option -f that expects an argument, then if you invoke it with -- in place of the argument (cmd -f -- -other, many versions of getopt() will treat the -- as the file name for -f and then parse -other as regular options. That is, -- does not terminate the options if it has to be interpreted as an argument to another option.
Many but not all programs accept single dash as a file name to mean standard input (usually) or standard output (occasionally). Sometimes, as with GNU 'tar', both can be used in a single command line:
tar -cf - -F - | ...
The first solo dash means 'write to stdout'; the second means 'read file names from stdin'.
Some programs use other conventions — that is, options not preceded by a dash. Many of these are from the oldest days of Unix. For example, 'tar' and 'ar' both accept options without a dash, so:
tar cvzf /tmp/somefile.tgz some/directory
The dd command uses opt=value exclusively:
dd if=/some/file of=/another/file bs=16k count=200
Some programs allow you to interleave options and other arguments completely; the C compiler, make and the GNU utilities run without POSIXLY_CORRECT in the environment are examples. Many programs expect the options to precede the other arguments.
Modern programs such as git increasingly seem to use a base command name (git) followed by a sub-command (commit) followed by options (-m "Commit message"). This was presaged by the sccs interface to the SCCS commands, and then by cvs, and is used by svn too (and they are all version control systems). However, other big suites of commands adopt similar styles when it seems appropriate.
I don't have strong preferences between the different systems. When there are few enough options, then single letters with mnemonic value are convenient. GNU supports this, but recommends backing it up with multi-letter options preceded by a double-dash.
There are some things I do object to. One of the worst is the same option letter being used with different meanings depending on what other option letters have preceded it. In my book, that's a no-no, but I know of software where it is done.
Another objectionable behaviour is inconsistency in style of handling arguments (especially for a single program, but also within a suite of programs). Either require attached arguments or require detached arguments (or allow either), but do not have some options requiring an attached argument and others requiring a detached argument. And be consistent about whether '=' may be used to separate the option and the argument.
As with many, many (software-related) things — consistency is more important than the individual decisions.
Whatever you do, please, read the TAOUP's Command-Line Options and consider Standards for Command Line Interfaces. (Added by J F Sebastian — thanks; I agree.)

Is there any shell script and/or Makefile static code analyser?

Or how can I ensure reliability of my Makefiles/scripts?
Update: by shell scripts I mean sh dialect (bash, zsh, whatever), by Makefiles I mean GNU make. I know, they are different beasts, but they have many in common.
P. S. Yeah, I know, static code analysis can't verify all possible cases, and that I need to write my Makefiles and shell script in a way, that would be reliable. I just need tool, that will tell me, when I use bad practices, when I forgot about them or didn't notice in big script. Not fix errors for me, but just take second look.
For sh scripts, ShellCheck will do some static analysis checks, like detecting when variable modifications are hidden by subshells, when you accidentally use [ $foo=bar ] or when you neglect to quote variables that could contain spaces. It also comments on some stylistic issues like useless use of cat or using sed when you could use parameter expansion.

Command substitution: backticks or dollar sign / paren enclosed? [duplicate]

This question already has answers here:
What is the difference between $(command) and `command` in shell programming?
(6 answers)
Closed 8 years ago.
What's the preferred way to do command substitution in bash?
I've always done it like this:
echo "Hello, `whoami`."
But recently, I've often seen it written like this:
echo "Hello, $(whoami)."
What's the preferred syntax, and why? Or are they pretty much interchangeable?
I tend to favor the first, simply because my text editor seems to know what it is, and does syntax highlighting appropriately.
I read here that escaped characters act a bit differently in each case, but it's not clear to me which behavior is preferable, or if it just depends on the situation.
Side question: Is it bad practice to use both forms in one script, for example when nesting command substitutions?
There are several questions/issues here, so I'll repeat each section of the poster's text, block-quoted, and followed by my response.
What's the preferred syntax, and why? Or are they pretty much interchangeable?
I would say that the $(some_command) form is preferred over the `some_command` form. The second form, using a pair of backquotes (the "`" character, also called a backtick and a grave accent), is the historical way of doing it. The first form, using dollar sign and parentheses, is a newer POSIX form, which means it's probably a more standard way of doing it. In turn, I'd think that that means it's more likely to work correctly with different shells and with different *nix implementations.
Another reason given for preferring the first (POSIX) form is that it's easier to read, especially when command substitutions are nested. Plus, with the backtick form, the backtick characters have to be backslash-escaped in the nested (inner) command substitutions.
With the POSIX form, you don't need to do that.
As far as whether they're interchangeable, well, I'd say that, in general, they are interchangeable, apart from the exceptions you mentioned for escaped characters. However, I don't know and cannot say whether all modern shells and all modern *nixes support both forms. I doubt that they do, especially older shells/older *nixes. If I were you, I wouldn't depend on interchangeability without first running a couple of quick, simple tests of each form on any shell/*nix implementations that you plan to run your finished scripts on.
I tend to favor the first, simply because my text editor seems to know what it is, and does syntax highlighting appropriately.
It's unfortunate that your editor doesn't seem to support the POSIX form; maybe you should check to see if there's an update to your editor that supports the POSIX way of doing it. Long shot maybe, but who knows? Or, maybe you should even consider trying a different editor.
GGG, what text editor are you using???
I read here that escaped characters act a bit differently in each case, but it's not clear to me which behavior is preferable, or if it just depends on the situation.
I'd say that it depends on what you're trying to accomplish; in other words, whether you're using escaped characters along with command substitution or not.
Side question: Is it bad practice to use both forms in one script, for example when nesting command substitutions?
Well, it might make the script slightly easier to READ (typographically speaking), but harder to UNDERSTAND! Someone reading your script (or YOU, reading it six months later!) would likely wonder why you didn't just stick to one form or the other--unless you put some sort of note about why you did this in the comments. Plus, mixing both forms in one script would make that script less likely to be portable: In order for the script to work properly, the shell that's executing it has to support BOTH forms, not just one form or the other.
For making a shell script understandable, I'd personally prefer sticking to one form or the other throughout any one script, unless there's a good technical reason to do otherwise. Moreover, I'd prefer the POSIX form over the older form; again, unless there's a good technical reason to do otherwise.
For more on the topic of command substitution, and the two different forms for doing it, I suggest you refer to the section on command substitution in the O'Reilly book "Classic Shell Scripting," second edition, by Robbins and Beebe. In that section, the authors state that the POSIX form for command substitution "is recommended for all new development." I have no financial interest in this book; it's just one I have (and love) on shell scripting, though it's more for intermediate or advanced shell scripting, and not really for beginning shell scripting.
-B.
You can read the differences from bash manual. At most case, they are interchangeable.
One thing to mention is that you should escape backquote to nest commands:
$ echo $(echo hello $(echo word))
hello word
$ echo `echo hello \`echo word\``
hello word
The backticks are compatible with ancient shells, and so scripts that need to be portable (such as GNU autoconf snippets) should prefer them.
The $() form is a little easier on the eyes, esp. after a few levels of escaping.

Resources