I am trying to learn some Bash to maybe get a job working with computers one day.
To improve my clarity and disciple writing my self-learning code, I am trying to adhere to a set of consistent "guiding principles".
As I roll my own "guidelines" I obviously ask myself: should I not be using an established standard instead?
I could not find one such "authoritative" reference for Bash, similar to what these other languages have:
Java (http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html)
How to Write Doc Comments for the Javadoc Tool
Java (http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html) Code Conventions for the Java Programming Language
Java (https://code.google.com/p/java-coding-standards/wiki/Introduction) Google Java coding standards
Python (http://www.python.org/dev/peps/pep-0008/) PEP 8 -- Style Guide for Python Code
Python (http://google-styleguide.googlecode.com/svn/trunk/pyguide.html) Google Python Style Guide
Is there a link with a similar document for Bash which has good reason for being used?
Here is the type of stuff I was putting together on my own... but I think, especially as a beginner, I should be using guidelines written by experts rather than trying to come up with my own, as they would not be based on much experience, insight, practicality, knowledge of common patterns/anti-patterns, etc.
You may dispute the validity of such documents in general, but some people must like them for the web to have such prominent examples online as the ones I mention in the bullet-list above..
################################################################################
# Coding conventions
#
# - Prefer lines of 80 characters of length or less
#
# - Perform arithmetic operations and numeric comparisons within "(( ))" blocks
# e.g. if ((42<=24+24)), ((3**3==27))
#
# - Reference variables by name, not expansion, within arithmetic evaluation
# e.g. ((i++)) rather than (($i++)), ((v+=42)) rathern than v=$(($v+42))
#
# - Prefer "[[" to "[" for conditional expressions
#
# - Prefer "[[ $s ]]" to "[[ -n $s ]]" when checking for empty strings
#
# - Document each function with at least a summary sentence. This should not
# exceed the preferred line length, be written in third person, end with a
# period and concisely describe the general utility of the function
#
# ...
# ...
# ...
#
################################################################################
The best bash style guide I have come across is from Google.
There is also some additional advice from the Chromium project.
For learning bash, the Apple Developer Shell Scripting primer is excellent.
Using a style guide for bash is rather sensible, as it is full of tricks and unexpected and difficult to diagnose quirks. Thus, if you follow the same style all the time, you should only fall for each of those tricks once.
My shell scripting standards
Prefer portability, but don't sacrifice security and whitespace awareness to it.
Prefer builtins over external commands.
But, use fast external commands to process very large inputs.
Avoid unnecessary subshells and pipelines.
Don't preoptimize.
Learn the rules of quoting. Then, use quotes.
Use functions to improve readability and control scope.
Don't give scripts silly file extensions.
Never change directory without checking that it worked.
Eschew hobgoblins
A foolish consistency is the hobgoblin of little minds, adored by
little statesmen and philosophers and divines.
– Ralph Waldo Emerson
When to ignore the portability rule
Use -execdir when appropriate with find.
Use null separators when the toolset allows to avoid accidentally wordsplitting on whitespace.
Learn all the glob extensions and use them.
If your target systems all have BASH, don't bend over backwards to be POSIXLY_STRICT.
Use here strings.
Related
For context, i'm trying to create an overly simplified version of bash, not like a bash full script interpreter, just a series of commands and operators (|,||, &&, <, >, <<, >>,$, $?) small interpreter, The mental model which i used in a nutshell is:
Lexer + Expander: in the first stage i used a simple state machine to lex and store data (commands, arguments, redirection files etc.) and lex input into tokens, i expand env variables and i handle lexical errors too.(as simple as checking finite states of valid characters).
Parser: in the second i stage i intend to create an AST out of the tokens + data, and handle parsing errors.
Executor: Finally i'll execute the AST.
No i'm at the parser stage, and i'm trying to think about how might i handle parsing errors, now the thought i had is out of the possible range of valid statements, it seems very difficult to check the validity of such an input cause the range is too big or at least that's what i think, and i'm sure there's some generalized solution for the problem, why i'm sure? because bash have done it.
For example this statement:
$ < $FILE || && > outfile
From the lexer point of view it's all bright and shiny, but it's surely not a valid input from the parser's perspective. Now one possible solution to this is to check whether there's a command token in the input if not then invalid. but what about this one:
$ || ls > $FILE && cat < $FILE
Again all valid lexeme, but unparsable statement, maybe that too could be checked against "if the line start with an OR or AND token error.".
Now the specific question is how bash exactly parse these combination of commands and operators, either there's some sort of more generalized solution or i'm left with an if&else error checking against inputs that i think is invalid. which honestly seems stupid and cumbersome.
Most of the complexity of shell parsing is in the tokenisation, although you certainly don't need to worry about all of the complications which have crept in over the years. The grammar itself is pretty simple; it's designed to be parsed by a parser generated with a tool like Bison (or some other yacc derivative), and that's precisely how Bash works.
The various syntactic rules recognised by Bash are scattered throughout the Bash manual, but the grammar is based on the standard shell grammar specified in the Posix standard, which is probably an easier starting point. In that document, the grammar is included as what is basically a Yacc input file (without any of the semantic actions necessary for an actual implementation); you can find it at the end of section 2.10. Make sure to read the initial part of that section, though, because it contains important information about how tokens are classified. Also, take note of section 2.3, token recognition.
Between these two sections you'll find a precise description of shell quoting rules and the various expansions which are done prior to parsing (or, better said, intermingled with parsing because command substitution makes the whole process recursive.) You might not want to absorb all of that on a first reading, although it will also help you be more effective in your use of the shell.
Bash implements a lot more features, but probably most or all of them go beyond your needs.
#choroba has the right idea - to understand exactly how Bash parses scripts you need to look at the source of Bash. There are basically fractal rules of thumb for how Bash works in increasingly complex cases, and any description short enough to fit in a SO response is probably not detailed enough to give you the full picture.
I am trying to write an automate process for AWS that requires some JSON processing and other things in bash script. I am following a few blogs for bash script and I found this:
a=b
with the following note:
There is no space on either side of the equals ( = ) sign. We
also leave off the $ sign from the beginning of the variable name when
setting it
This is ugly and very difficult to read and comparing to other scripting languages, it is easy for user to make a mistake when writing a bash script by leaving space in between. I think everyone like to write clean and readable code, this restriction for sure is bad for code readability.
Can you explain why? explanation with examples are highly appreciated.
It's because otherwise the syntax would be ambiguous. Consider this command line:
cat = foo
Is that an assignment to the variable cat, or running the command cat with the arguments "=" and "foo"? Note that "=" and "foo" are both perfectly legal filenames, and therefore reasonable things to run cat on. Shell syntax settles this in favor of the command interpretation, so to avoid this interpretation you need to leave out the spaces. cat =foo has the same problem.
On the other hand, consider:
var= cat
Is that the command cat run with the variable var set to the empty string (i.e. a shorthand for var='' cat), or an assignment to the shell variable var? Again, the shell syntax favors the command interpretation so you need to avoid the temptation to add spaces.
There are many places in shell syntax where spaces are important delimiters. Another commonly-messed-up place is in tests, where if you leave out any of the spaces in:
if [ "$foo" = "$bar" ]
...it will lead to a different meaning, which might cause an error, or might just silently do the wrong thing.
What I'm getting at is that shell syntax does not allow you to arbitrarily add or remove spaces to improve readability. Don't even try, you'll just break things.
What you need to understand is that the shell language and syntax is old. Really old. The first version of the UNIX shell with variables was the Bourne shell which was designed and implemented in 1977. Back then, there were few precedents. (AFAIK, just the Thompson shell, which didn't support variables according to the manual entry.)
The rationale for the design decisions in the 1970's are ... lost in the mists of time. The design decisions were made by Steve Bourne and colleagues working at Bell Labs on v6 UNIX. They probably had no idea that their decisions would still be relevant 40+ years later.
The Bourne shell was designed to be general purpose and simple to use ... compared with the alternative of writing programs in C. And small. It was an outstanding success in those terms.
However, any language that is successful has the "problem" that it gets widely adopted. And that makes it more difficult to fix any issues (real or perceived) that may arise. Any proposal to change a language needs to be balanced against the impact of that change on existing users / uses of the language. You don't want to break existing programs or scripts.
Irrespective of arguments about whether spaces around = should be allowed in a shell variable assignment, changing this would break millions of shell scripts. It is just not going to happen.
Of course, Linux (and UNIX before it) allow you to design and implement your own shell. You could (in theory) replace the default shell. It is just a lot of work.
And there is nothing stopping you from writing your scripts in another scripting language (e.g. Python, Ruby, Perl, etc) or designing and implementing your own scripting language.
In summary:
We cannot know for sure why they designed the shell with this syntax for variable assignment, but it is moot anyway.
Reference:
Evolution of shells in Linux: a history of shells.
It prevents ambiguity in a lot of cases. Otherwise, if you have a statement foo = bar, it could then either mean run the foo program with = and bar as arguments, or set the foo variable to bar. When you require that there are no spaces, now you've limited ambiguity to the case where a program name contains an equals sign, which is basically unheard of.
I agree with #StephenC, and here's some more context with sources:
Unix v6 from 1975 did not have an environment, there was just a exec syscall that took a program and a string array of arguments. The system sh, written by Thompson, did not support variables, only single digit numbered arguments like $1 (probably why $12 to this day is interpreted as ${1}2)
Unix v7 from 1979, emboldened by advances in hardware, added a ton of features including a second string array to the exec call. The man page described it like this, which is still how it works to this day:
An array of strings called the environment is made available by exec(2) when a process begins. By convention these strings have the form name=value
The system sh, now written by Bourne, worked much like v6 shell, but now allowed you to specify these environment strings in the same format in front of commands (because which other format would you use?). The simplistic parser essentially split words by spaces, and flagged a word as destined for a variable if it contained a = and all preceding characters had been alphanumeric.
Thanks to Unix v7's incredible popularity, forks and clones copied a lot of things including this behavior, and that's what we're still seeing today.
Looking for a command line code formatter that can be used for bash code. It must be configurable and preferably usable from command line.
I have a big project in bash, which I need to use Q in mind for. So far I am happy with a program written in python by Paul Lutus (a remake of his previous version in Ruby).
See http://arachnoid.com/python/beautify_bash_program.html (also cloned here https://github.com/ewiger/beautify_bash).
but I would like to learn any serious alternative to this tool if it exists. Requirements: it should provide robust enough performance and behavior of treating/parsing rather complicated code.
PS I believe full parsing of bash code is generally complicated because there exists no official language grammar (but please correct me if I am wrong about it).
You could give shfmt a try. It implements its own shell parser including Bash support, so it's more robust than plaintext-based tools.
And both the parser and printer are available as Go packages, so it should be easy to write a 20-line Go program to manipulate or play with shell code.
Please note that I'm the author, so the advice may be a bit biased :)
you can script vim to do: "gg=G" that means "indent all the file"
I discovered that the type builtin will print functions in a formatted manner.
#/usr/bin/env bash
source <(cat <(echo 'wrapper() {') - <(echo '}'));
type wrapper | tail -n +4 | head -n -1 | sed 's/^ //g'
https://github.com/bas080/flush
On the contrary the shell does have a rigorous grammar.
It is described both in English in the ISO standard and documentation for Bash and other shells, and in formal terms in the shell.y file in the Bash source tree.
What makes it "hard" is that where one normally thinks of, say, a quoted string as single lexical token, In the shell every meta character is a separate lexical token, so the meaning of a character can change depending on its grammatical context.
So the parsing tokens do not align with the "shell words" that a user thinks of, and a simple quoted string is at least 3 tokens.
The implementations typically take some shortcuts involving using multiple lexical analysers chosen by whether the grammar is inside quotes, inside numeric context, or outside both.
This question already has answers here:
What is the difference between $(command) and `command` in shell programming?
(6 answers)
Closed 8 years ago.
What's the preferred way to do command substitution in bash?
I've always done it like this:
echo "Hello, `whoami`."
But recently, I've often seen it written like this:
echo "Hello, $(whoami)."
What's the preferred syntax, and why? Or are they pretty much interchangeable?
I tend to favor the first, simply because my text editor seems to know what it is, and does syntax highlighting appropriately.
I read here that escaped characters act a bit differently in each case, but it's not clear to me which behavior is preferable, or if it just depends on the situation.
Side question: Is it bad practice to use both forms in one script, for example when nesting command substitutions?
There are several questions/issues here, so I'll repeat each section of the poster's text, block-quoted, and followed by my response.
What's the preferred syntax, and why? Or are they pretty much interchangeable?
I would say that the $(some_command) form is preferred over the `some_command` form. The second form, using a pair of backquotes (the "`" character, also called a backtick and a grave accent), is the historical way of doing it. The first form, using dollar sign and parentheses, is a newer POSIX form, which means it's probably a more standard way of doing it. In turn, I'd think that that means it's more likely to work correctly with different shells and with different *nix implementations.
Another reason given for preferring the first (POSIX) form is that it's easier to read, especially when command substitutions are nested. Plus, with the backtick form, the backtick characters have to be backslash-escaped in the nested (inner) command substitutions.
With the POSIX form, you don't need to do that.
As far as whether they're interchangeable, well, I'd say that, in general, they are interchangeable, apart from the exceptions you mentioned for escaped characters. However, I don't know and cannot say whether all modern shells and all modern *nixes support both forms. I doubt that they do, especially older shells/older *nixes. If I were you, I wouldn't depend on interchangeability without first running a couple of quick, simple tests of each form on any shell/*nix implementations that you plan to run your finished scripts on.
I tend to favor the first, simply because my text editor seems to know what it is, and does syntax highlighting appropriately.
It's unfortunate that your editor doesn't seem to support the POSIX form; maybe you should check to see if there's an update to your editor that supports the POSIX way of doing it. Long shot maybe, but who knows? Or, maybe you should even consider trying a different editor.
GGG, what text editor are you using???
I read here that escaped characters act a bit differently in each case, but it's not clear to me which behavior is preferable, or if it just depends on the situation.
I'd say that it depends on what you're trying to accomplish; in other words, whether you're using escaped characters along with command substitution or not.
Side question: Is it bad practice to use both forms in one script, for example when nesting command substitutions?
Well, it might make the script slightly easier to READ (typographically speaking), but harder to UNDERSTAND! Someone reading your script (or YOU, reading it six months later!) would likely wonder why you didn't just stick to one form or the other--unless you put some sort of note about why you did this in the comments. Plus, mixing both forms in one script would make that script less likely to be portable: In order for the script to work properly, the shell that's executing it has to support BOTH forms, not just one form or the other.
For making a shell script understandable, I'd personally prefer sticking to one form or the other throughout any one script, unless there's a good technical reason to do otherwise. Moreover, I'd prefer the POSIX form over the older form; again, unless there's a good technical reason to do otherwise.
For more on the topic of command substitution, and the two different forms for doing it, I suggest you refer to the section on command substitution in the O'Reilly book "Classic Shell Scripting," second edition, by Robbins and Beebe. In that section, the authors state that the POSIX form for command substitution "is recommended for all new development." I have no financial interest in this book; it's just one I have (and love) on shell scripting, though it's more for intermediate or advanced shell scripting, and not really for beginning shell scripting.
-B.
You can read the differences from bash manual. At most case, they are interchangeable.
One thing to mention is that you should escape backquote to nest commands:
$ echo $(echo hello $(echo word))
hello word
$ echo `echo hello \`echo word\``
hello word
The backticks are compatible with ancient shells, and so scripts that need to be portable (such as GNU autoconf snippets) should prefer them.
The $() form is a little easier on the eyes, esp. after a few levels of escaping.
I know the question is very subjective. But I cannot form the question in a much more better manner. I would appreciate some guidance.
I often as a developer feel how easier it would have been for me if I could have some tools for doing some reasonably complex task on a log file , or a set of source files, or some data set etc.
Clearly, when the same type of task needs to be done repetitively and when speed is critical, I can think of writing it in C++/Java.
But most of the times, it is some kind of text processing or file searching activity that I want to do only once just to perform a quick check or to do some preliminary analysis etc. In such cases, I would be better off doing the task manually rather than writing in C++/Java. But I could surely doing it in seconds if I knew some language like Bash/Python/Perl/Ruby/Sed/Awk.
I know this whole question is subjective and there is no objective definite answer, but in general what does the developer community feel as a whole? What subset of these languages should I know so that I can do all these kinds of tasks easily and improve my productivity.
Would Perl be a good choice?
It is a super set of Sed/Awk, plus it allows to write terse code. I can get done with fewer lines of code. It is neither readable nor easily maintainable, but I never wanted those features anyway.
The only thing that bothers me is the negative publiciity that Perl has got lately and it has been criticized by the Ruby/Python community a lot. Also, I am not sure if it can replace bash scripting totally.
If not, then is Perl+Bash a good combination for these kind of tasks?
I tend to do a lot of processing with ruby. It has all the functionality of perl, but I find it to be a little more readable. Both perl and ruby support the -n, -e, and -p options.
-e 'command' one line of script. Several -e's allowed. Omit [programfile]
-n assume 'while gets(); ... end' loop around your script
-p assume loop like -n but print line also like sed
For example in ruby
seq 1 4 | ruby -ne 'BEGIN{ $product = 1 }; $product *= $_.to_i; END { puts $product }'
24
Which is very similar to perl
seq 1 4 | perl -ne 'BEGIN{ $product = 1 }; $product *= $_; END { print $product }'
24
In Python, the same would look like this:
seq 1 4 | python -c 'import sys; print reduce(lambda x,y : int(x)*int(y), sys.stdin.read().splitlines(True))'
24
While it's possible to do the above in bash/awk/sed, you'll be limited by their lack of more advanced features.
Python is more expressive and readable than bash, but requires more setup: import os and so on. For simple tasks, bash is quicker -- which is the most important for this. And don't underestimate the power of input/output redirection in bash!
I would use Perl over a bash/sed/awk combination. Why ?
You only have the one executable, rather than spawning off multiple executables to do work.
You can make use of a wide range of Perl modules to do most anything (see CPAN for the modules available)
In fact I would recommend any scripting language over the shell/awk/sed combination, for the same reasons. I don't have a problem with sed/awk per se, but as your required solutions become more complex/lengthy, I find the more powerful scripting languages more scalable, and (to some degree) refactorable for re-use.
I find Python+Bash a very nice combo.
I usually use Python because it's very readable and maintainable. And because there are lots of online documentation available.
Btw, I suggest you to read http://www.ibm.com/developerworks/aix/library/au-python/
With shell scripting, all you ever need to know is a bit bash/sh and a lot of awk. Bash for calling your commands, and awk for processing. Some of the unix tools below, contrary to the fact that many people use them, are not necessary because awk can do their functions.
1) cut
2) sed
3) wc
4) (e)grep
5) cat
6) head
7) etc..
and a few others whose functions overlap. In the end, your script will not cluttered with redundant tools and slow down your script.
Perl/Python are very useful sysadmin tools as well. Both of them do similar things and have libraries that help in your sysadmin tasks. The only significant difference is, aesthetically speaking, the appearance of your code written in them.
You can learn about Ruby if you want, but in terms of sysadmin, I would say go for Perl/Python instead.
In the time it took you to write those few paragraphs, you could have already learned enough Python to make your life significantly better.
Anyone who already knows C++ or Java can become productive in Python in about 4 hours. Just read the tutorial.
My first port of call is bash with sed to provide regular expression processing. You can do a lot with a bash for loop, grep and some regular expressions.
It's worth learning regular expressions if you don't already know them. An editor which lets you use them (like vi) is extremely useful when manipulating files (e.g. you have a set of data extracted from a logfile, and you need to turn it into a set of SQL statements for example).
If it takes me more than a few minutes to figure out how to do whatever parsing task I'm trying to do in bash/sed, I usually end up using perl instead. As suggested by ikkebr, python is probably as good as (or better than) perl; I just had the misfortune to learn perl first, so am much more familiar with it - if I was to start again, I'd learn python instead I think.