Argument in bash script - bash

I have the following bash script called countscript.sh
1 #!/bin/bash
2 echo "Running" $0
3 tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed $1 q
But I don't understand how to pass the argument correctly: ( "3" should be the argument $1 of sed).
$ echo " one two two three three three" | ./countscript.sh 3
Running ./countscript.sh
sed: -e expression #1, char 1: missing command
This works fine:
$ echo "one two three four one one four" | tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed 3q
3 one
2 four
1 two
Thanks.
PS: Anybody else noticed the
bug in this script on page 10, https://www.cs.tufts.edu/~nr/cs257/archive/don-knuth/pearls-2.pdf ?

In the quoted paper, I think you are misreading
sed ${1}q
as
sed ${1} q
and sed does not consider 3 by itself a valid command. The separate argument q is treated as an input file name. If the value of $1 did result in a single valid sed script, you would have likely gotten an error for the missing input file q.
Proper shell programming would dictate this be written as
sed "${1}q"
or
sed "${1} q"
instead; with the space as part of the script, sed correctly outputs the first $1 lines of input and exits.
It's somewhat curious that the authors used sed instead of head - "$1" to output the first few lines, as one of them (McIlroy) essentially invented the idea of the Unix pipeline as a series of special-purpose, narrowly focused tools. Not having read the full paper, I don't know what Knuth and McIlroy's contributions to the paper were; perhaps Bentley just likes sed. :)

When running the following command:
$ echo " one two two three three three" | ./countscript.sh 3
the special variable $1 will be replaced by 3, your first argument. Hence, the script runs:
tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed 3 q
Notice the space between the 3 and the q. sed does not know what to do, because you give it no command (3 is not a command).
Remove the space, and you should be fine.
tr -cs A-Za-z '\n' | tr A-Z a-z | sort | uniq -c | sort -rn | sed "${1}q"

Related

Check for duplicate word in comma-separated string in bash

I need to check that a variable does not contain a duplicate entry in a comma-separated string.
For example, inside of $animals, if I have:
,dog,cat,bird,goat,fish,
That would be considered valid since every word is unique.
The string:
,dog,cat,dog,bird,fish,
would be invalid since dog is entered twice.
,dog,cat,dogs,bird,fish,
Would be valid since there is only one instance of dog (dogs is there but allowed since it's not the same exact word)
The string:
,dog,cat,DOG,bird,fish
Would also be invalid since dog is the same as DOG only in uppercase.
Is there any way I can do this? I would put some code I've tried but I don't know what to use to even experiment.
Using bash 3.2.57(1)-release on 10.11.6 El Capitan
Case sensitive:
echo ",dog,cat,dog,bird,fish," | tr ',' '\n' | grep -v '^$' | sort | uniq -c | sort -k 1,1nr
Case insensitive:
echo ",dog,DOG,cat,dog,bird,fish," | tr ',' '\n' | grep -v '^$' | sort -rf | uniq -ci | sort -k 1,1nr
Perform a reverse sort (-r) and do it case insensitive to get the lower-case letters after upper ones. Then uniq them with -i. (You might have to ensure the defined collation LC_COLLATE and maybe locales like LANG and LC_ALL aren't affecting sort behavior).
Then check if the number in the first row > 1
Simple script-based solution
Usage
$ .\script.sh ,dog,dog,cat,
Actual Script
#!/bin/sh
num_duplicated() {
echo $1 |
tr ',' '\n' | # Split each items into its own line
tr '[:upper:]' '[:lower:]' | # Convert everything to lowercase
sort | # Sorts the lines (required for the call to `uniq`
uniq -d | # Passing the `-d` flag to show only duplicated lines
grep -v '^$' | # Passing `-v` on the pattern `^$` to remove empty lines
wc -l # Count the number of duplicate lines
}
main() {
num_duplicates=$(num_duplicated "$1")
if [[ $num_duplicates -eq '0' ]]
then
echo "No duplicates"
else
echo "Contains duplicate(s)"
fi
}
main $1

Why does my bash script flag this awk substring command as a syntactic error when it works in the terminal?

I'm trying to extract a list of dates from a series of links using lynx's dump function and piping the output through grep and awk. This operation works successfully in the terminal and outputs dates accurately. However, when it is placed into a shell script, bash claims a syntax error:
Scripts/ETC/PreD.sh: line 18: syntax error near unexpected token `('
Scripts/ETC/PreD.sh: line 18: ` lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt'
For context, this is part of a while-read loop in which $link is being read from a file. Operations undertaken inside this while-loop when the awk command is removed are all successful, as are similar while-loops that include other awk commands.
I know that either I'm misunderstanding how bash handles variable substitution, or how bash handles awk commands, or some combination of the two. Any help would be immensely appreciated.
EDIT: Shellcheck is divided on this, the website version finds no error, but my downloaded version provides error SC1083, which says:
This { is literal. Check expression (missing ;/\n?) or quote it.
A check on the Shellcheck GitHub page provides this:
This error is harmless when the curly brackets are supposed to be literal, in e.g. awk {'print $1'}.
However, it's cleaner and less error prone to simply include them inside the quotes: awk '{print $1}'.
Script follows:
#!/bin/bash
while read -u 4 link
do
IFS=/ read a b c d e <<< "$link"
echo "$e" >> 1.txt
lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt
done 4< links.txt
In sed command you have unmatched ', due to unquoted '.
In awk script your have constant zero length variable.
From gawk manual:
substr(string, start [, length ])
Return a length-character-long substring of string, starting at character number start. The first character of a string is character
number one.48 For example, substr("washington", 5, 3) returns "ing".
If length is not present, substr() returns the whole suffix of string that begins at character number start. For example,
substr("washington", 5) returns "ington". The whole suffix is also
returned if length is greater than the number of characters remaining
in the string, counting from character start.
If start is less than one, substr() treats it as if it was one. (POSIX doesn’t specify what to do in this case: BWK awk acts this way,
and therefore gawk does too.) If start is greater than the number of
characters in the string, substr() returns the null string. Similarly,
if length is present but less than or equal to zero, the null string
is returned.
Also I suggest you combine grep|awk|sed|tr into single awk script. And debug the awk script with printouts.
From:
lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10,length)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
To:
lynx --dump "$link" | awk '/With/{found=1;next}found{found=0;print sub(/\(.*\),/,"& and",gsub(/ +/," ",substr($0,10)))}' >> 2.txt
From:
lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10,length)}' >> dates.txt
To:
lynx --dump "$link" | awk '/Date/{print substr($0,10)}' >> dates.txt

BASH Finding palindromes in a .txt file

I have been given a .txt file in which we have to find all the palindromes in the text (must have at least 3 letters and they cant be the same letters e.g. AAA)
it should be displayed with the first column being the amount of times it appears and the second being the word e.g.
123 kayak
3 bob
1 dad
#!/bin/bash
tmp='mktemp'
awk '{for(x=1;$x;++x)print $x}' "${1}" | tr -d [[:punct:]] | tr -s [:space:] | sed -e 's/#//g' -e 's/[0-9]*//g'| sed -r '/^.{,2}$/d' | sort | uniq -c -i > tmp1
This outputs the file as it should do, ignoring case, words less than 3 letters, punctuation and digits.
However i am now stump on how to pull out the palindromes from this, i thought a temp file might be the way, just don't know where to take it.
any help or guidance is much appreciated.
# modify this to your needs; it should take your input on stdin, and return one word per
# line on stdout, in the same order if called more than once with the same input.
preprocess() {
tr -d '[[:punct:][:digit:]#]' \
| sed -E -e '/^(.)\1+$/d' \
| tr -s '[[:space:]]' \
| tr '[[:space:]]' '\n'
}
paste <(preprocess <"$1") <(preprocess <"$1" | rev) \
| awk '$1 == $2 && (length($1) >= 3) { print $1 }' \
| sort | uniq -c
The critical thing here is to paste together your input file with a stream that has each line from that input file reversed. This gives you two separate columns you can compare.

Extracting minimum and maximum from line number grep

Currently, I have a command in a bash script that greps for a given string in a text file and prints the line numbers only using sed ...
grep -n "<string>" file.txt | sed -n 's/^\([0-9]*\).*/\1/p'
The grep could find multiple matches, and thus, print multiple line numbers. From this command's output, I would like to extract the minimum and maximum values, and assign those to respective bash variables. How could I best modify my existing command or add new commands to accomplish this? If using awk or sed will be necessary, I have a preference of using sed. Thanks!
You can get the minimum and maximum with this:
grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'
You can also read them into an array:
F=($(grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'))
echo ${F[0]} # min
echo ${F[1]} # max
grep -n "<string>" file.txt | sed -n -e '1s/^\([0-9]*\).*/\1/p' -e '$s/^\([0-9]*\).*/\1/p'
grep .... |awk -F: '!f{print $1;f=1} END{print $1}'
Here's how I'd do it, since grep -n 'pattern' file prints output in the format line number:line contents ...
minval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | head -1)
maxval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | tail -1)
the cut -d':' -f1 command splits the grep output around the colon and pulls out only the first field (the line numbers), sort -n sorts the numeric line numbers in ascending order (which they would already be in, but it's good practice to ensure it), then head -1 and tail -1 remove the first, and last value in the sorted list respectively, i.e. the minimum and maximum values and assign them to variables $minval and $maxval respectively.
Hope this helps!
Edit: Turns out you can't do it the way I had it originally, since echoing out a list of newline-separated values apparently concatenates them into one line.
It can be done with one process. Like this:
awk '/expression/{if(!n)print NR;n=NR} END {print n}' file.txt
Then You can assign to an array (as perreal suggested). Or You can modify this script and assign to varables using eval
eval $(awk '/expression/{if(!n)print "A="NR;n=NR} END {print "B="n}' file.txt)
echo $A
echo $B
Output (file.txt contains three lines of expression)
1
3

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources