I just started learning UNIX and out of curiosity, I want to know if UNIX shell scripts compiled or interpreted by the shell? My guess is interpretation. I am not sure though.
They are 100% interpreted. Pretty much any time you hear the word "script" you can assume that it's interpreted.
There exist compilers for the shell language, e.g. the proprietary CCsh: http://www.comeaucomputing.com/faqs/ccshfaq.html
The mainstream implementations of the shell do not compile whatsoever.
The shell language is designed with a complete disregard for the possibility of compiling.
Related
I am trying to write an automate process for AWS that requires some JSON processing and other things in bash script. I am following a few blogs for bash script and I found this:
a=b
with the following note:
There is no space on either side of the equals ( = ) sign. We
also leave off the $ sign from the beginning of the variable name when
setting it
This is ugly and very difficult to read and comparing to other scripting languages, it is easy for user to make a mistake when writing a bash script by leaving space in between. I think everyone like to write clean and readable code, this restriction for sure is bad for code readability.
Can you explain why? explanation with examples are highly appreciated.
It's because otherwise the syntax would be ambiguous. Consider this command line:
cat = foo
Is that an assignment to the variable cat, or running the command cat with the arguments "=" and "foo"? Note that "=" and "foo" are both perfectly legal filenames, and therefore reasonable things to run cat on. Shell syntax settles this in favor of the command interpretation, so to avoid this interpretation you need to leave out the spaces. cat =foo has the same problem.
On the other hand, consider:
var= cat
Is that the command cat run with the variable var set to the empty string (i.e. a shorthand for var='' cat), or an assignment to the shell variable var? Again, the shell syntax favors the command interpretation so you need to avoid the temptation to add spaces.
There are many places in shell syntax where spaces are important delimiters. Another commonly-messed-up place is in tests, where if you leave out any of the spaces in:
if [ "$foo" = "$bar" ]
...it will lead to a different meaning, which might cause an error, or might just silently do the wrong thing.
What I'm getting at is that shell syntax does not allow you to arbitrarily add or remove spaces to improve readability. Don't even try, you'll just break things.
What you need to understand is that the shell language and syntax is old. Really old. The first version of the UNIX shell with variables was the Bourne shell which was designed and implemented in 1977. Back then, there were few precedents. (AFAIK, just the Thompson shell, which didn't support variables according to the manual entry.)
The rationale for the design decisions in the 1970's are ... lost in the mists of time. The design decisions were made by Steve Bourne and colleagues working at Bell Labs on v6 UNIX. They probably had no idea that their decisions would still be relevant 40+ years later.
The Bourne shell was designed to be general purpose and simple to use ... compared with the alternative of writing programs in C. And small. It was an outstanding success in those terms.
However, any language that is successful has the "problem" that it gets widely adopted. And that makes it more difficult to fix any issues (real or perceived) that may arise. Any proposal to change a language needs to be balanced against the impact of that change on existing users / uses of the language. You don't want to break existing programs or scripts.
Irrespective of arguments about whether spaces around = should be allowed in a shell variable assignment, changing this would break millions of shell scripts. It is just not going to happen.
Of course, Linux (and UNIX before it) allow you to design and implement your own shell. You could (in theory) replace the default shell. It is just a lot of work.
And there is nothing stopping you from writing your scripts in another scripting language (e.g. Python, Ruby, Perl, etc) or designing and implementing your own scripting language.
In summary:
We cannot know for sure why they designed the shell with this syntax for variable assignment, but it is moot anyway.
Reference:
Evolution of shells in Linux: a history of shells.
It prevents ambiguity in a lot of cases. Otherwise, if you have a statement foo = bar, it could then either mean run the foo program with = and bar as arguments, or set the foo variable to bar. When you require that there are no spaces, now you've limited ambiguity to the case where a program name contains an equals sign, which is basically unheard of.
I agree with #StephenC, and here's some more context with sources:
Unix v6 from 1975 did not have an environment, there was just a exec syscall that took a program and a string array of arguments. The system sh, written by Thompson, did not support variables, only single digit numbered arguments like $1 (probably why $12 to this day is interpreted as ${1}2)
Unix v7 from 1979, emboldened by advances in hardware, added a ton of features including a second string array to the exec call. The man page described it like this, which is still how it works to this day:
An array of strings called the environment is made available by exec(2) when a process begins. By convention these strings have the form name=value
The system sh, now written by Bourne, worked much like v6 shell, but now allowed you to specify these environment strings in the same format in front of commands (because which other format would you use?). The simplistic parser essentially split words by spaces, and flagged a word as destined for a variable if it contained a = and all preceding characters had been alphanumeric.
Thanks to Unix v7's incredible popularity, forks and clones copied a lot of things including this behavior, and that's what we're still seeing today.
From what I've read so far, bash seems to fit the defintion of an interpreted language:
it is not compiled into a lower format
every statement ends up calling a subroutine / set of subroutines already translated into machine code (i.e. echo foo calls a precompiled executable)
the interpreter itself, bash, has already been compiled
However, I could not find a reference to bash on Wikipedia's page for interpreted languages, or by extensive searches on Google. I've also found a page on Programmers Stack Exchange that seems to imply that bash is not an interpreted language- if it's not, then what is it?
Bash is definitely interpreted; I don't think there's any reasonable question about that.
There might possibly be some controversy over whether it's a language. It's designed primarily for interactive use, executing commands provided by the operating system. For a lot of that particular kind of usage, if you're just typing commands like
echo hello
or
cp foo.txt bar.txt
it's easy to think that it's "just" for executing simple commands. In that sense, it's quite different from interpreted languages like Perl and Python which, though they can be used interactively, are mainly used for writing scripts (interpreted programs).
One consequence of this emphasis is that its design is optimized for interactive use. Strings don't require quotation marks, most commands are executed immediately after they're entered, most things you do with it will invoke external programs rather than built-in features, and so forth.
But as we know, it's also possible to write scripts using bash, and bash has a lot of features, particularly flow control constructs, that are primarily for use in scripts (though they can also be used on the command line).
Another distinction between bash and many scripting languages is that a bash script is read, parsed, and executed in order. A syntax error in the middle of a bash script won't be detected until execution reaches it. A Perl or Python script, by contrast, is parsed completely before execution begins. (Things like eval can change that, but the general idea is valid.) This is a significant difference, but it doesn't mark a sharp dividing line. If anything it makes Perl and Python more similar to compiled languages.
Bottom line: Yes, bash is an interpreted language. Or, perhaps more precisely, bash is an interpreter for an interpreted language. (The name "bash" usually refers to the shell/interpreter rather than to the language that it interprets.) It has some significant differences from other interpreted languages that were designed from the start for scripting, but those differences aren't enough to remove it from the category of "interpreted languages".
Bash is an interpreter according to the GNU Bash Reference Manual:
Bash is the shell, or command language interpreter, for the GNU operating system.
Does anybody know of a Haskell library which can parse arbitrary Bash scripts?
A cursory search of Hackage indicates that there's a package called bash for writing scripts, but I don't see anything for parsing them.
Basically I've just had a large collection of Bash scripts dumped on me, and I'd like to do some code analysis on it. But the first stage is obviously to be able to parse this stuff.
I don't know Bash very well personally. I suppose I could sit down and wage through the volumous man-page to get the complete BNF grammar for it. (I imagine it's very complex, given the shell's long and backwards-compatible history.) I was just wondering whether somebody else has already done this work for me...
Perhaps extend language-sh.
Language.Sh is a collection of modules for parsing and manipulating
expressions in shell grammar. This is part of a larger project, shsh.
Please note that the API is somewhat unstable until we reach version
1.0.
Or how can I ensure reliability of my Makefiles/scripts?
Update: by shell scripts I mean sh dialect (bash, zsh, whatever), by Makefiles I mean GNU make. I know, they are different beasts, but they have many in common.
P. S. Yeah, I know, static code analysis can't verify all possible cases, and that I need to write my Makefiles and shell script in a way, that would be reliable. I just need tool, that will tell me, when I use bad practices, when I forgot about them or didn't notice in big script. Not fix errors for me, but just take second look.
For sh scripts, ShellCheck will do some static analysis checks, like detecting when variable modifications are hidden by subshells, when you accidentally use [ $foo=bar ] or when you neglect to quote variables that could contain spaces. It also comments on some stylistic issues like useless use of cat or using sed when you could use parameter expansion.
Do you know if there's any tool for compiling bash scripts?
It doesn't matter if that tool is just a translator (for example, something that converts a bash script to a C program), as long as the translated result can be compiled.
I'm looking for something like shc (it's just an example -- I know that shc doesn't work as a compiler). Are there any other similar tools?
A Google search brings up CCsh, but it will set you back $50 per machine for a license.
The documentation says that CCsh compiles Bourne Shell (not bash ...) scripts to C code and that it understands how to replicate the functionality of 50 odd standard commands avoiding the need to fork them.
But CCsh is not open source, so if it doesn't do what you need (or expect) you won't be able to look at the source code to figure out why.
I don't think you're going to find anything, because you can't really "compile" a shell script. You could write a simple script that converts all lines to calls to system(3), then "compile" that as a C program, but this wouldn't have a major performance boost over anything you're currently using, and might not handle variables correctly. Don't do this.
The problem with "compiling" a shell script is that shell scripts just call external programs.
In theory you could actually get a good performance boost.
Think of all the
if [ x"$MYVAR" == x"TheResult" ]; then echo "TheResult Happened" fi
(note invocation of test, then echo, as well as the interpreting needed to be done.)
which could be replaced by
if ( !strcmp(myvar, "TheResult") ) printf("TheResult Happened");
In C: no process launching, no having to do path searching. Lots of goodness.