I want to make sure my script will work when the user uses a syntax like this:
script.sh firstVariable < SecondVariable
For some reason I can't get this to work.
I want $1=firstVariable
And $2=SecondVariable
But for some reason my script thinks only firstVariable exists?
This is a classic X-Y problem. The goal is to write a utility in which
utility file1 file2
and
utility file1 < file2
have the same behaviour. It seems tempting to find a way to somehow translate the second invocation into the first one by (somehow) figuring out the "name" of stdin, and then using that name the same way as the second argument would be used. Unfortunately, that's not possible. The redirection happens before the utility is invoked, and there is no portable way to get the "name" of an open file descriptor. (Indeed, it might not even have a name, in the case of other_cmd | utility file1.)
So the solution is to focus on what is being asked for: make the two behaviours consistent. This is the case with most standard utilities (grep, cat, sort, etc.): if the input file is not specified, the utility uses stdin.
In many unix implementations, stdin does actually have a name: /dev/stdin. In such systems, the above can be achieved trivially:
utility() {
utility_implementation "$1" "${2:-/dev/stdin}"
}
where utility_implementation actually does whatever is required to be done. The syntax of the second argument is normal default parameter expansion; it represents the value of $2 if $2 is present and non-empty, and otherwise the string /dev/stdin. (If you leave out the - so that it is "${2:/dev/stdin}", then it won't do the substitution if $2 is present and empty, which might be better.)
Another way to solve the problem is to ensure that the first syntax becomes the same as the second syntax, so that the input is always coming from stdin even with a named file. The obvious simple approach:
utility() {
if (( $# < 2 )); then
utility_implementation "$1"
else
utility_implementation "$1" < "$2"
fi
}
Another way to do this uses the exec command with just a redirection to redirect the shell's own stdin. Note that we have to do this inside a subshell ((...) instead of {...}) so that the redirection does not apply to the shell which invokes the function:
utility() (
if (( $# > 1 )) then; exec < "$2"; fi
# implementation goes here. $1 is file1 and stdin
# is now redirected to $2 if $2 was provided.
# ...
)
To make the stdin of the second variable the final argument to the script(so if you have one arg then < second arg, it will be the second), you can use the below
#!/bin/bash
##read loop to read in stdin
while read -r line
do
## This just checks if the variable is empty, so a newline isn't appended on the front
[[ -z $Vars ]] && Vars="$line" && continue
## Appends every line read to variable
Vars="$Vars"$'\n'"$line"
## While read loop using stdin
done < /dev/stdin
##Set re-sets the arguments to the script to the original arguments and then the new argument we derived from stdin
set - "$#" "$Vars"
## Echo the new arguments
echo "$#"
Related
(This question is a follow-up on this comment, in an answer about git hooks)
I'm far too unskilled in bash (so far) to understand fully the remark and how to act accordingly. More specifically, I've been advised to avoid using bash command cat this way :
echo "$current_branch" $(cat "$1") > "$1"
because the order of operations depends on the specific shell and it could end up destroying the contents of the passed argument, so the commit message itself if I got it right?
Also, how to "save the contents in a separate step"?
Would the following make any sense?
tmp = "$1"
echo "$current_branch" $(cat $tmp) > "$1"
The proposed issue is not about overwriting variables or arguments, but about the fact that both reading from and writing to a file at the same time is generally a bad idea.
For example, this command may look like it will just write a file to itself, but instead it truncates it:
cat myfile > myfile # Truncates the file to size 0
However, this is not a problem in your specific command. It is guaranteed to work in a POSIX compliant shell because the order of operations specify that redirections will happen after expansions:
The words that are not variable assignments or redirections shall be expanded. If any fields remain following their expansion, the first field shall be considered the command name and remaining fields are the arguments for the command.
Redirections shall be performed as described in Redirection.
Double-however, it's still a bit fragile in the sense that seemingly harmless modifications may trigger the problem, such as if you wanted to run sed on the result. Since the redirection (> "$1") and command substitution $(cat "$1") are now in separate commands, the POSIX definition no longer saves you:
# Command may now randomly result in the original message being deleted
echo "$current_branch $(cat "$1")" | sed -e 's/(c)/©/g' > "$1"
Similarly, if you refactor it into a function, it will also suddenly stop working:
# Command will now always delete the original message
modify_message() {
echo "$current_branch $(cat "$1")"
}
modify_message "$1" > "$1"
You can avoid this by writing to a temporary file, and then replace your original.
tmp=$(mktemp) || exit
echo "$current_branch $(cat "$1")" > "$tmp"
mv "$tmp" "$1"
In my opinion, it's better to save to another file.
You may try something like
echo "$current_branch" > tmp
cat "$1" >> tmp # merge these into
# echo "$current_branch" $(cat "$1") > tmp
# may both OK
mv tmp "$1"
However I am not sure if my understanding is right, or there are some better solutions.
This is what I considered as the core of question. It is hard to decide the "precedence" of $() block and >. If > is executed "earlier", then echo "$current_branch" will rewrite "$1" file and drop the original content of "$1", which is a disaster. If $() is executed "earlier", then everything works as expected. However, there exists a risk, and we should avoid it.
A command group would be far better than a command substitution here. Note the similarity to Geno Chen's answer.
{
echo "$current_branch"
cat "$1"
} > tmp && mv tmp "$1"
Tried to keep my code as simple as possible:
1: What are the rules for using echo within a while loop?
All my $a and some of my $word variables are echoed not my echo kk?
2: What is the scope of my count variable? Why is it not working within my while loop? can I extend the variable to make it global?
3: When I use the grep in the final row the $word cariable only prints the first word in the passing rows ehile if I remove the grep line in the end $work functions as intended and prints all the words.
count=1
while read a; do
((count=count+1))
if [ $count -le 2 ]
then
echo $a
echo kk
for word in $a; do
echo $word
done
fi
done < data.txt | grep Iteration
Use Process Substitution
In a comment, you say:
I thtought I was using grep on data.txt (sic)
No. Your current pipeline passes the loop's results through grep, not the source file. To do that, you need to rewrite your redirection to use process substitution. For example:
count=1
while read a; do
((count=count+1))
if [ $count -le 2 ]
then
echo $a
echo kk
for word in $a; do
echo $word
done
fi
done < <(fgrep Iteration data.txt)
#CodeGnome answered your question but there's other problems with your script that will come back to bite you at some point. (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for discussions on some of them and also google quoting shell variables). Just don't do it. Shell scripts are just for sequencing calls to tools and the UNIX tool for manipulating text is awk. In this case all you'd need to do the job robustly, portably and efficiently would be:
awk '
/Iteration/ {
if (++count <= 2) {
print
print "kk"
for (i=1; i<=NF; i++) {
print $i
}
}
}' data.txt
and of course it'd be more efficient still if you just stop reading the input when count hits 2:
awk '
/Iteration/ {
print
print "kk"
for (i=1; i<=NF; i++) {
print $i
}
if (++count == 2) {
exit
}
}' data.txt
To complement CodeGnome's helpful answer with an explanation of how your command actually works and why it doesn't do what you want:
In Bash's grammar, an input redirection such as < data.txt is part of a single command, whereas |, the pipe symbol, chains multiple commands, from left to right, to form a pipeline.
Technically, while ... done ... < data.txt | grep Iteration is a single pipeline composed of 2 commands:
a single compound command (while ...; do ...; done) with an input redirection (< data.txt),
and a simple command (grep Iteration) that receives the stdout output from the compound command via its stdin, courtesy of the pipe.
In other words:
only the contents of data.txt is fed to the while loop as input (via stdin),
and whatever stdout output the while loop produces is then sent to the next pipeline segment, the grep command.
By contrast, it sounds like you want to apply grep to data.txt first, and only sent the matching lines to the while loop.
You have the following options for sending a command's output to another command:
Note: The following solutions use a simplified while loop for brevity - whether a while command is single-line or spans multiple lines is irrelevant.
Also, instead of using input redirection (< data.txt) to pass the file content to grep, data.txt is passed as a filename argument.
Option 1: Place the command whose output to send to your while loop first in the pipeline:
grep 'Iteration' data.txt | while read -r a; do echo "$a"; done
The down-side of this approach is that your while loop then runs in a subshell (as all segments of a pipeline do by default), which means that variables defined or modified in your while command won't be visible to the current shell.
In Bash v4.2+, you can fix this by running shopt -s lastpipe, which tells Bash to run the last pipeline segment - the while command in this case - in the current shell instead.
Note that lastpipe is a nonstandard bash extension to the POSIX standard.
(To try this in an interactive shell, you must first turn off job control with set +m.)
Option 2: Use a process substitution:
Loosely speaking, a process substitution <(...) allows you to present command output as the content of a temporary file that cleans up after itself.
Since <(...) expands to the temporary file's (FIFO's) path, and read in the while loop only accepts stdin input, input redirection must be applied as well: < <(...):
while read -r a; do echo "$a"; done < <(grep 'Iteration' data.txt)
The advantage of this approach is that the while loop runs in the current subshell, and any variables definitions or modifications therefore remain in scope after the command completes.
The potential down-side of this approach is that process substitutions are a nonstandard bash extension to the POSIX standard (although ksh and zsh support them too).
Option 3: Use a command substitution inside a here-document:
Using the command first in the pipeline (option 1) is a POSIX-compliant approach, but doesn't allow you to modify variables in the current shell (and Bash's lastpipe option is not POSIX-compliant).
The only POSIX-compliant way to send command output to a command that runs in the current shell is to use a command substitution ($(...)) inside a double-quoted here-document:
while read -r a; do echo "$a"; done <<EOF
$(grep 'Iteration' data.txt)
EOF
Streamlining your code and making it more robust:
The rest of your code has some non-obvious pitfalls that are worth addressing:
Double-quote your variable references (e.g., echo "$a" instead of echo $a), unless you specifically want word-splitting and globbing (filename expansion) applied to the values; word splitting and globbing are two kinds of shell expansions.
Similarly, don't use for to iterate over an (of necessity unquoted) variable reference (don't use for word in $a, in your case), unless you want globbing applied to the individual words - see what happens when you run $a='one *'; for word in $a; do echo "$word"; done
You could turn globbing off beforehand (set -f) and back on after (set +f), but it's better to use read -ra words ... to read the words into an array first, and then safely iterate over the array elements with for word in "${words[#]}"; ...- note the "..." around the array variable reference.
Always use -r with read; without it, rarely used \-preprocessing is applied, which will "eat" embedded \ chars.
If we heed the advice above, apply a few additional tweaks, and use a process substitution to feed grep's output to the while loop, we get:
count=1
while read -r a; do # Note the -r
if (( ++count <= 2 )); then
echo "$a"
# Split $a safely into words and store the words in
# array variable ${words[#]}.
read -ra words <<<"$a" # Note the -a to read into an *array*.
# Loop over the words (elements of the array).
# Note: To simply print the words, you could use
# `printf '%s\n' "${words[#]}"`` instead of the loop.
for word in "${words[#]}"; do
echo "$word"
done
fi
done < <(grep 'Iteration' data.txt)
Note: As written, you don't need a loop at all, because you always exit after the 1st iteration.
Finally, as a general alternative for larger input sets, consider Ed Morton's helpful answer, which is much faster due to using awk to process your input file, whereas looping in shell code is generally slow.
I need to read command line arguments. First arg is script name. second one is redirection operator i.e. "<" and third one is input filename. When I tried to use "$#", I got 0. When I used "$*", it gave me nothing. I have to use "<" this operator. My input file consists of all user input data. If I don't use the operator, It asks user for the input. Can someone please help me? Thank you !
Command Line :
./script_name < input_file
Script:
echo "$*" # gave nothing
echo "$#" # gave me 0
I need to read input filename and store it to some variable. Then I have to change the extension of it. Any help/suggestions should be appreciated.
When a user runs:
./script_name <input_file
...that's exactly equivalent to if they did the following:
(exec <input_file; exec ./script_name)
...first redirecting stdin from input_file, then invoking the script named ./script_name without any arguments.
There are operating-system-specific interfaces you can use to get the filename associated with a handle (when it has one), but to use one of these would make your script only able to run on an operating system providing that interface; it's not worth it.
# very, very linux-specific, won't work for "cat foo | ./yourscript", generally evil
if filename=$(readlink /proc/self/fd/0) && [[ -e $filename ]]; then
set -- "$#" "$filename" # append filename to the end of the argument list
fi
If you want to avoid prompting for input when an argument is given, and to have the filename of that argument, then don't take it on stdin but as an argument, and do the redirection yourself within the script:
#!/bin/bash
if [[ $1 ]]; then
exec <"$1" # this redirects your stdin to come from the file
fi
# ...put other logic here...
...and have users invoke your script as:
./script_name input_file
Just as ./yourscript <filename runs yourscript with the contents of filename on its standard input, a script invoked with ./yourscript filename which invokes exec <"$1" will have the contents of filename on its stdin after executing that command.
< is used for input redirection. And whatever is at the right side of < is NOT a command line argument.
So, when you do ./script_name < input_file , there will be zero (0) command line arguments passed to the script, hence $# will be zero.
For your puprpose you need to call your script as:
./script_name input_file
And in your script you can change the extension with something like:
mv -- "$1" "${1}_new_extension"
Edit: This was not what OP wanted to do.
Altough, there is already another spot on answer, I will write this for the sake of completeness. If you have to use the '<' redirection you can do something like this in your script.
while read filename; do
mv -- "$filename" "${filename}_bak"
done
And call the script as, ./script < input_file. However, note that you will not be able to take inputs from stdin in this case.
Unfortunately, if you're hoping to take redirection operators as arguments to your script, you're not going to be able to do that without surrounding your command line arguments in quotes:
./script_name "<input_file"
The reason for this is that the shell (at least bash or zsh) processes the command before ever invoking your script. When the shell interprets your command, it reads:
[shell command (./script_name)][shell input redirection (<input_file)]
invoking your script with quotes effectively results in:
[shell command (./script_name)][script argument ("<input_file")]
Sorry this is a few years late; hopefully someone will find this useful.
I am trying to write a function in a bash script that gets lines from stdin and picks out the first line which is not contained in a file.
Here is my approach:
doubles=file.txt
firstnotdouble(){
while read input_line; do
found=0;
cat $doubles |
while read double_line; do
if [ "$input_line" = "$double_line" ]
then
found=1;
break
fi
done
if [ $found -eq 0 ] # no double found, echo and break!
then
echo $input_line
break
fi
done
}
After some debugging attempts I realized that when found is set to 1 in the first if block, it does not keep its value until the next if block. That's why it's not working. Why does the script act as if there were two found variables in different "scopes"?
The second question would be if the approach as a whole could be optimized.
As indicated in the comments, the issue with environment variables is that the commands in a pipeline (that is, a series of commands separated by |) run in subshells, and each subshell has its own environment variables. You could have avoided the problem by avoiding the UUOC (useless use of cat), writing:
while read ...; do ... done < "$doubles"
instead of the pipeline.
A (much) faster way than using a while read loop repeatedly through the doubles file is to use grep:
# Specify the file to be scanned as the first argument
firstnotdouble() {
while IFS= read -r double_line; do
if ! grep -qxF "$double_line" "$1"; then
echo "$double_line"
return
fi
done
return 1
}
In the grep:
-q suppress print out, and stop on first match
-x pattern must match the entire line
-F pattern is a simple string instead of a regular expression.
In the read:
IFS= avoids spaces being trimmed
-r avoids backslashes being deleted
With GNU grep, you could use -xF -m1 (or even -xFm1 if you like being cryptic) instead of -qxF, and then leave out the echo. The grep extension -m N limits the number of matches found to N.
Script doesn't work when I want to use standard input when there are no arguments (files) passed. Is there any way how to use stdin instead of a file in this code?
I tried this:
if [ ! -n $1 ] # check if argument exists
then
$1=$(</dev/stdin) # if not use stdin as an argument
fi
var="$1"
while read line
do
... # find the longest line
done <"$var"
For a general case of wanting to read a value from stdin when a parameter is missing, this will work.
$ echo param | script.sh
$ script.sh param
script.sh
#!/bin/bash
set -- "${1:-$(</dev/stdin)}" "${#:2}"
echo $1
Just substitute bash's specially interpreted /dev/stdin as the filename:
VAR=$1
while read blah; do
...
done < "${VAR:-/dev/stdin}"
(Note that bash will actually use that special file /dev/stdin if built for an OS that offers it, but since bash 2.04 will work around that file's absence on systems that do not support it.)
pilcrow's answer provides an elegant solution; this is an explanation of why the OP's approach didn't work.
The main problem with the OP's approach was the attempt to assign to positional parameter $1 with $1=..., which won't work.
The LHS is expanded by the shell to the value of $1, and the result is interpreted as the name of the variable to assign to - clearly, not the intent.
The only way to assign to $1 in bash is via the set builtin.
The caveat is that set invariably sets all positional parameters, so you have to include the other ones as well, if any.
set -- "${1:-/dev/stdin}" "${#:2}" # "${#:2}" expands to all remaining parameters
(If you expect only at most 1 argument, set -- "${1:-/dev/stdin}" will do.)
The above also corrects a secondary problem with the OP's approach: the attempt to store the contents rather than the filename of stdin in $1, since < is used.
${1:-/dev/stdin} is an application of bash parameter expansion that says: return the value of $1, unless $1 is undefined (no argument was passed) or its value is the empty string (""or '' was passed). The variation ${1-/dev/stdin} (no :) would only return /dev/stdin if $1 is undefined (if it contains any value, even the empty string, it would be returned).
If we put it all together:
# Default to filename '/dev/stdin' (stdin), if none was specified.
set -- "${1:-/dev/stdin}" "${#:2}"
while read -r line; do
... # find the longest line
done < "$1"
But, of course, the much simpler approach would be to use ${1:-/dev/stdin} as the filename directly:
while read -r line; do
... # find the longest line
done < "${1:-/dev/stdin}"
or, via an intermediate variable:
filename=${1:-/dev/stdin}
while read -r line; do
... # find the longest line
done < "$filename"
Variables are assigned a value by Var=Value and that variable is used by e.g. echo $Var. In your case, that would amount to
1=$(</dev/stdin)
when assigning the standard input. However, I do not think that variable names are allowed to start with a digit character. See the question bash read from file or stdin for ways to solve this.
Here is my version of script:
#!/bin/bash
file=${1--} # POSIX-compliant; ${1:--} can be used either.
while IFS= read -r line; do
printf '%s\n' "$line"
done < <(cat -- "$file")
If file is not present in the argument, read the from standard input.
See more examples: How to read from file or stdin in bash? at stackoverflow SE