How to get bash arguments with leading pound-sign (octothorpe) - bash

I need to process an argument to a bash script that might or might not have a leading pound sign (octothorpe). The simplest example is:
echo #1234
which returns nothing
It might be because it processes the text as a command and assumes it is a comment.
$#, $*, etc. do not work. getopts does not seem to address this sort of thing.
Suggestions welcome

This is completely impossible, because the "argument" in question is parsed as a comment and never passed to the command at all.
Keep in mind that programs in C have the following calling convention for their main function:
int main(int argc, char *argv[])
This means that programs are passed a list of individual, separate arguments, not a single string that isn't yet parsed. The original string from which that vector of arguments was parsed is not given to the invoked program at all; often, no "original string" even exists. Consequently, a program that was invoked has no way to "unring the bell" and go back from the parsed list of strings to the original string from which it was generated.
Consequently, if your script is invoked as an external command (as opposed to a shell function), the invocation of the shell that runs it by the operating system will go through the execve syscall, which takes as its arguments (1) the file to execute; (2) the argument vector to pass it (which is to say, the aforementioned list of individual C strings); and (3) a list of environment variables. There is no argument for an unparsed shell command line, so no such content is available to the subprocess.
Train your users to use appropriate quoting. All of the below will have completely indistinguishable behavior, insofar as yourscript is concerned:
yourscript '#1234' # single quotes prevent content from being parsed as shell syntax
yourscript ''#1234 # "#" only begins a comment at the front of a string
yourscript '#'1234 # note that shell quoting is character-by-character
yourscript \#1234 # ...so either quoting or escaping only that single character suffices.
...any of the above will pass an argv containing (in C syntax) char[][]{ "yourscript", "#1234", NULL }

Related

How can I adjust my bash function such that I can omit the double-quotes?

Throughout the day, I type something like this frequently:
git stash push -u -m "some phrase as a message"
I would prefer to type instead:
stpu some phrase as a message
So with help from this answer, I created a function in my ~./bashrc:
function stpu() {
git stash push -u -m "${#}"
}
Now I'm able to type stpu "some phrase as a message", which is pretty close to what I want.
How can I adjust my function such that I can omit the double-quotes?
I've tried many different variations (adding more double-quotes that are escaped, adding single-quotes, etc) but haven't gotten it to work.
You can sometimes omit the quotes if you use "$*" instead of "$#"
This will concatenate all your arguments together into a single string, separated with spaces (by default; the first character in IFS, if it's been overridden). -m expects a single string to follow it (instead of a separate argument per word), so this is exactly what it wants.
This is not reliable, and it's better to just use the quotes.
Security
Consider as an example if you want to use the commit message: Make $(rm -rf ~) safe in an argument name for a security fix. If this string is unquoted (or double quoted), the command is executed before your function is ever started (which makes sense: a function can't be called until after its argument list is known), so there's nothing your function can do to fix it. In this context, using single quotes to prevent the command substitution from taking place is the correct and safe practice.
(To single-quote a string that contains single quotes, consider using ANSI C-like strings: $'I\'m a single-quoted string that contains a single quote')
Correctness
Or, as another example: Process only files matching *.csv -- if it's not quoted, the *.csv can be replaced with a list of CSV files that exist in the directory where you ran the command. Again, this happens before your function is ever started, so nothing inside the function can prevent it.

how to skip first argument when running bash script?

Say I have a bash script with two optional arguments
How would I be able to run the script providing an input for the second argument, but not the first argument?
The shell's argument list is just a sequence of strings. There is no way for the first string to be undefined and the second to be defined, but if you have control over the program, or the person who wrote it anticipated this scenario, perhaps it supports passing in an empty first argument, or perhaps a specific string which is interpreted as "undefined".
To pass in an empty string, the shell allows you to put two adjacent quotes (which will be removed by the shell before the argument is passed on to the program you are running, by way of how quotes are handled by the shell in general).
program '' second third fourth
A common related convention is to let a lone or double dash signify "an option which isn't an option".
program -- second third fourth
If you have control over the command and its argument handling (and it's not already cemented because you have programs written by other people which depend on the current behavior) a better design would be to make the optional argument truly optional, i.e. maybe make the first argument a dash option.
program --uppercase string of arguments
program --lowercase STRING OF SHOUTING
program The arguments will be passed through WITHOUT case conversion
The implementation is straightforward:
toupper () { tr '[:lower:]' '[:upper:]'; }
tolower () { tr '[:upper:]' '[:lower:]'; }
case $1 in
--uppercase) shift; toupper;;
--lowercase) shift; tolower;;
*) cat;;
esac <<<"$#"
If the behavior is cemented, a way forward is to create a command with a different name with the same core behavior but with better command-line semantics, and eventually phase out the old version with the clumsy argument handling.

Bash - Why does $VAR1=FOO or 'VAR=FOO' (with quotes) return command not found?

For each of two examples below I'll try to explain what result I expected and what I got instead. I'm hoping for you to help me understand why I was wrong.
1)
VAR1=VAR2
$VAR1=FOO
result: -bash: VAR2=FOO: command not found
In the second line, $VAR1 gets expanded to VAR2, but why does Bash interpret the resulting VAR2=FOO as a command name rather than a variable assignment?
2)
'VAR=FOO'
result: -bash: VAR=FOO: command not found
Why do the quotes make Bash treat the variable assignment as a command name?
Could you please describe, step by step, how Bash processes my two examples?
How best to indirectly assign variables is adequately answered in other Q&A entries in this knowledgebase. Among those:
Indirect variable assignment in bash
Saving function output into a variable named in an argument
If that's what you actually intend to ask, then this question should be closed as a duplicate. I'm going to make a contrary assumption and focus on the literal question -- why your other approaches failed -- below.
What does the POSIX sh language specify as a valid assignment? Why does $var1=foo or 'var=foo' fail?
Background: On the POSIX sh specification
The POSIX shell command language specification is very specific about what constitutes an assignment, as quoted below:
4.21 Variable Assignment
In the shell command language, a word consisting of the following parts:
varname=value
When used in a context where assignment is defined to occur and at no other time, the value (representing a word or field) shall be assigned as the value of the variable denoted by varname.
The varname and value parts shall meet the requirements for a name and a word, respectively, except that they are delimited by the embedded unquoted equals-sign, in addition to other delimiters.
Also, from section 2.9.1, on Simple Commands, with emphasis added:
The words that are recognized as variable assignments or redirections according to Shell Grammar Rules are saved for processing in steps 3 and 4.
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Redirections shall be performed as described in Redirection.
Each variable assignment shall be expanded for tilde expansion, parameter expansion, command substitution, arithmetic expansion, and quote removal prior to assigning the value.
Also, from the grammar:
If all the characters preceding '=' form a valid name (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 3.230, Name), the token ASSIGNMENT_WORD shall be returned. (Quoted characters cannot participate in forming a valid name.)
Note from this:
The command must be recognized as an assignment at the very beginning of the parsing sequence, before any expansions (or quote removal!) have taken place.
The name must be a valid name. Literal quotes are not part of a valid variable name.
The equals sign must be unquoted. In your second example, the entire string was quoted.
Assignments are recognized before tilde expansion, parameter expansion, command substitution, etc.
Why $var1=foo fails to act as an assignment
As given in the grammar, all characters before the = in an assignment must be valid characters within a variable name for an assignment to be recognized. $ is not a valid character in a name. Because assignments are recognized in step 1 of simple command processing, before expansion takes place, the literal text $var1, not the value of that variable, is used for this matching.
Why 'var=foo' fails to act as an assignment
First, all characters before the = must be valid in variable names, and ' is not valid in a variable name.
Second, an assignment is only recognized if the = is not quoted.
1)
VAR1=VAR2
$VAR1=FOO
You want to use a variable name contained in a variable for the assignment. Bash syntax does not allow this. However, there is an easy workaround :
VAR1=VAR2
declare "$VAR1"=FOO
It works with local and export too.
2)
By using single quotes (double quotes would yield the same result), you are telling Bash that what is inside is a string and to treat it as a single entity. Since it is the first item on the line, Bash tries to find an alias, or shell builtin, or an executable file in its PATH, that would be named VAR=FOO. Not finding it, it tells you there is no such command.
An assignment is not a normal command. To perform an assignment contained in a quote, you would need to use eval, like so :
eval "$VAR1=FOO" # But please don't do that in real life
Most experienced bash programmers would probably tell you to avoid eval, as it has serious drawbacks, and I am giving it as an example just to recommend against its use : while in the example above it would not involve any security risk or error potential because the value of VAR1 is known and safe, there are many cases where an arbitrary (i.e. user-supplied) value could cause a crash or unexpected behavior. Quoting inside an eval statement is also more difficult and reduces readability.
You declare VAR2 earlier in the program, right?
If you are trying to assign the value of VAR2 to VAR1, then you need to make sure and use $ in front of VAR2, like so:
VAR1=$VAR2
That will set the value of VAR2 equal to VAR1, because when you utilize the $, you are saying that value that is stored in the variable. Otherwise it doesn't recognize it as a variable.
Basically, a variable that doesn't have a $ in front of it will be interpreted as a command. Any word will. That's why we have the $ to clarify "hey this is a variable".

How does : <<'END' work in bash to create a multi-line comment block?

I found a great answer for how to comment in bash script (by #sunny256):
#!/bin/bash
echo before comment
: <<'END'
bla bla
blurfl
END
echo after comment
The ' and ' around the END delimiter are important, otherwise things inside the block like for example $(command) will be parsed and executed.
This may be ugly, but it works and I'm keen to know what it means. Can anybody explain it simply? I did already find an explanation for : that it is no-op or true. But it does not make sense to me to call no-op or true anyway....
I'm afraid this explanation is less "simple" and more "thorough", but here we go.
The goal of a comment is to be text that is not interpreted or executed as code.
Originally, the UNIX shell did not have a comment syntax per se. It did, however, have the null command : (once an actual binary program on disk, /bin/:), which ignores its arguments and does nothing but indicate successful execution to the calling shell. Effectively, it's a synonym for true that looks like punctuation instead of a word, so you could put a line like this in your script:
: This is a comment
It's not quite a traditional comment; it's still an actual command that the shell executes. But since the command doesn't do anything, surely it's close enough: mission accomplished! Right?
The problem is that the line is still treated as a command beyond simply being run as one. Most importantly, lexical analysis - parameter substitution, word splitting, and such - still takes place on those destined-to-be-ignored arguments. Such processing means you run the risk of a syntax error in a "comment" crashing your whole script:
: Now let's see what happens next
echo "Hello, world!"
#=> hello.sh: line 1: unexpected EOF while looking for matching `''
That problem led to the introduction of a genuine comment syntax: the now-familiar # (which was first introduced in the C shell created at BSD). Everything from # to the end of the line is completely ignored by the shell, so you can put anything you like there without worrying about syntactic validity:
# Now let's see what happens next
echo "Hello, world!"
#=> Hello, world!
And that's How The Shell Got Its Comment Syntax.
However, you were looking for a multi-line (block) comment, of the sort introduced by /* (and terminated by */) in C or Java. Unfortunately, the shell simply does not have such a syntax. The normal way to comment out a block of consecutive lines - and the one I recommend - is simply to put a # in front of each one. But that is admittedly not a particularly "multi-line" approach.
Since the shell supports multi-line string-literals, you could just use : with such a string as an argument:
: 'So
this is all
a "comment"
'
But that has all the same problems as single-line :. You could also use backslashes at the end of each line to build a long command line with multiple arguments instead of one long string, but that's even more annoying than putting a # at the front, and more fragile since trailing whitespace breaks the line-continuation.
The solution you found uses what is called a here-document. The syntax some-command <<whatever causes the following lines of text - from the line immediately after the command, up to but not including the next line containing only the text whatever - to be read and fed as standard input to some-command. Here's an alternate shell implementation of "Hello, world" which takes advantage of this feature:
cat <<EOF
Hello, world
EOF
If you replace cat with our old friend :, you'll find that it ignores not only its arguments but also its input: you can feed whatever you want to it, and it will still do nothing (and still indicate that it did that nothing successfully).
However, the contents of a here-document do undergo string processing. So just as with the single-line : comment, the here-document version runs the risk of syntax errors inside what is not meant to be executable code:
#!/bin/sh -e
: <<EOF
(This is a backtick: `)
EOF
echo 'In modern shells, $(...) is preferred over backticks.'
#=> ./demo.sh: line 2: bad substitution: no closing "`" in `
The solution, as seen in the code you found, is to quote the end-of-document "sentinel" (the EOF or END or whatever) on the line introducing the here document (e.g. <<'EOF'). Doing this causes the entire body of the here-document to be treated as literal text - no parameter expansion or other processing occurs. Instead, the text is fed to the command unchanged, just as if it were being read from a file. So, other than a line consisting of nothing but the sentinel, the here-document can contain any characters at all:
#!/bin/sh -e
: <<'EOF'
(This is a backtick: `)
EOF
echo 'In modern shells, $(...) is preferred over backticks.'
#=> In modern shells, $(...) is preferred over backticks.
(It is worth noting that the way you quote the sentinel doesn't matter - you can use <<'EOF', <<E"OF", or even <<EO\F; all have the same result. This is different from the way here-documents work in some other languages, such as Perl and Ruby, where the content is treated differently depending on the way the sentinel is quoted.)
Notwithstanding any of the above, I strongly recommend that you instead just put a # at the front of each line you want to comment out. Any decent code editor will make that operation easy - even plain old vi - and the benefit is that nobody reading your code will have to spend energy figuring out what's going on with something that is, after all, intended to be documentation for their benefit.
It is called a Here Document. It is a code block that lets you send a list of commands to another command or program
The string following the << is the marker determining the end of the block. If you send commands to no-op, nothing happens, which is why you can use it as a comment block.
That's heredoc syntax. It's a way of defining multi-line string literals.
As the answer at your link explains, the single quotes around the END disables interpolation, similar to the way single-quoted strings disable interpolation in regular bash strings.

Bash command line parsing containing whitespace

I have a parse a command line argument in shell script as follows:
cmd --a=hello world good bye --b=this is bash script
I need the parse the arguments of "a" i.e "hello world ..." which are seperated by whitespace into an array.
i.e a_input() array should contain "hello", "world", "good" and "bye".
Similarly for "b" arguments as well.
I tried it as follows:
--a=*)
a_input={1:4}
a_input=$#
for var in $a_input
#keep parsing until next --b or other argument is seen
done
But the above method is crude. Any other work around. I cannot use getopts.
The simplest solution is to get your users to quote the arguments correctly in the first place.
Barring that you can manually loop until you get to the end of the arguments or hit the next --argument (but that means you can't include a word that starts with -- in your argument value... unless you also do valid-option testing on those in which you limit slightly fewer -- words).
Adding to Etan Reisners answer, which is absolutely correct:
I personally find bash a bit cumbersome, when array/string processing gets more complex, and if you really have the strange requirement, that the caller should not be required to use quotes, I would here write an intermediate script in, say, Ruby or Perl, which just collects the parameters in a proper way, wraps quoting around them, and passes them on to the script, which originally was supposed to be called - even if this costs an additional process.
For example, a Ruby One-Liner such as
system("your_bash_script here.sh '".(ARGV.join(' ').split(' --').select {|s| s.size>0 }.join("' '"))."'")
would do this sanitizing and then invoke your script.

Resources