Why does my bash script flag this awk substring command as a syntactic error when it works in the terminal? - bash

I'm trying to extract a list of dates from a series of links using lynx's dump function and piping the output through grep and awk. This operation works successfully in the terminal and outputs dates accurately. However, when it is placed into a shell script, bash claims a syntax error:
Scripts/ETC/PreD.sh: line 18: syntax error near unexpected token `('
Scripts/ETC/PreD.sh: line 18: ` lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt'
For context, this is part of a while-read loop in which $link is being read from a file. Operations undertaken inside this while-loop when the awk command is removed are all successful, as are similar while-loops that include other awk commands.
I know that either I'm misunderstanding how bash handles variable substitution, or how bash handles awk commands, or some combination of the two. Any help would be immensely appreciated.
EDIT: Shellcheck is divided on this, the website version finds no error, but my downloaded version provides error SC1083, which says:
This { is literal. Check expression (missing ;/\n?) or quote it.
A check on the Shellcheck GitHub page provides this:
This error is harmless when the curly brackets are supposed to be literal, in e.g. awk {'print $1'}.
However, it's cleaner and less error prone to simply include them inside the quotes: awk '{print $1}'.
Script follows:
#!/bin/bash
while read -u 4 link
do
IFS=/ read a b c d e <<< "$link"
echo "$e" >> 1.txt
lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt
done 4< links.txt

In sed command you have unmatched ', due to unquoted '.
In awk script your have constant zero length variable.
From gawk manual:
substr(string, start [, length ])
Return a length-character-long substring of string, starting at character number start. The first character of a string is character
number one.48 For example, substr("washington", 5, 3) returns "ing".
If length is not present, substr() returns the whole suffix of string that begins at character number start. For example,
substr("washington", 5) returns "ington". The whole suffix is also
returned if length is greater than the number of characters remaining
in the string, counting from character start.
If start is less than one, substr() treats it as if it was one. (POSIX doesn’t specify what to do in this case: BWK awk acts this way,
and therefore gawk does too.) If start is greater than the number of
characters in the string, substr() returns the null string. Similarly,
if length is present but less than or equal to zero, the null string
is returned.
Also I suggest you combine grep|awk|sed|tr into single awk script. And debug the awk script with printouts.
From:
lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10,length)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
To:
lynx --dump "$link" | awk '/With/{found=1;next}found{found=0;print sub(/\(.*\),/,"& and",gsub(/ +/," ",substr($0,10)))}' >> 2.txt
From:
lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10,length)}' >> dates.txt
To:
lynx --dump "$link" | awk '/Date/{print substr($0,10)}' >> dates.txt

Related

Replace one character by the other (and vice-versa) in shell

Say I have strings that look like this:
$ a='/o\\'
$ echo $a
/o\
$ b='\//\\\\/'
$ echo $b
\//\\/
I'd like a shell script (ideally a one-liner) to replace / occurrences by \ and vice-versa.
Suppose the command is called invert, it would yield (in a shell prompt):
$ invert $a
\o/
$ invert $b
/\\//\
For example using sed, it seems unavoidable to use a temporary character, which is not great, like so:
$ echo $a | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
\o/
$ echo $b | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
/\\//\
For some context, this is useful for proper printing of git log --graph --all | tac (I like to see newer commits at the bottom).
tr is your friend:
% echo 'abc' | tr ab ba
bac
% echo '/o\' | tr '\\/' '/\\'
\o/
(escaping the backslashes in the output might require a separate step)
I think this can be done with (g)awk:
$ echo a/\\b\\/c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a\/b/\c
$ echo a\\/b/\\c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a/\b\/c
$
-F "/" This defines the separator, The input will be split in "/", and should no longer contain a "/" character.
for(i=1;i<=NF;i++) gsub(/\\/,"/",$i);. This will replace, in all items in the input, the backslash (\) for a slash (/).
If you want to replace every instance of / with \, you can uses the y command of sed, which is quite similar to what tr does:
$ a='/o\'
$ echo "$a"
/o\
$ echo "$a" | sed 'y|/\\|\\/|'
\o/
$ b='\//\\/'
$ echo "$b"
\//\\/
$ echo "$b" | sed 'y|/\\|\\/|'
/\\//\
If you are strictly limited to GNU AWK you might get desired result following way, let file.txt content be
\//\\\\/
then
awk 'BEGIN{FPAT=".";OFS="";arr["/"]="\\";arr["\\"]="/"}{for(i=1;i<=NF;i+=1){if($i in arr){$i=arr[$i]}};print}' file.txt
gives output
/\\////\
Explanation: I inform GNU AWK that field is any single character using FPAT built-in variable and that output field separator (OFS) is empty string and create array where key-value pair represent charactertobereplace-replacement, \ needs to be escaped hence \\ denote literal \. Then for each line I iterate overall all fields using for loop and if given field hold character present in array arr keys I do exchange it for corresponding value, after loop I print line.
(tested in gawk 4.2.1)

Using grep to get the line number of first occurrence of a string in a file

I am using bash script for testing purpose.During my testing I have to find the line number of first occurrence of a string in a file. I have tried "awk" and "grep" both, but non of them return the value.
Awk example
#/!bin/bash
....
VAR=searchstring
...
cpLines=$(awk '/$VAR/{print NR}' $MYDIR/Configuration.xml
this does not expand $VAR. If I use the value of VAR it works, but I want to use VAR
Grep example
#/!bin/bash
...
VAR=searchstring
...
cpLines=grep -n -m 1 $VAR $MYDIR/Configuration.xml |cut -f1 -d:
this gives error line 20: -n: command not found
grep -n -m 1 SEARCH_TERM FILE_PATH |sed 's/\([0-9]*\).*/\1/'
grep switches
-n = include line number
-m 1 = match one
sed options (stream editor):
's/X/Y/' - replace X with Y
\([0-9]*\) - regular expression to match digits zero or multiple times occurred, escaped parentheses, the string matched with regex in parentheses will be the \1 argument in the Y (replacement string)
\([0-9]*\).* - .* will match any character occurring zero or multiple times.
You need $() for variable substitution in grep
cpLines=$(grep -n -m 1 $VAR $MYDIR/Configuration.xml |cut -f1 -d: )
Try something like:
awk -v search="$var" '$0~search{print NR; exit}' inputFile
In awk, / / will interpret awk variable literally. You need to use match (~) operator. What we are doing here is looking for the variable against your input line. If it matches, we print the line number stored in NR and exit.
-v allows you to create an awk variable (search) in above example. You then assign it your bash variable ($var).
grep -n -m 1 SEARCH_TERM FILE_PATH | grep -Po '^[0-9]+'
explanation:
-Po = -P -o
-P use perl regex
-o only print matched string (not the whole line)
Try pipping;
grep -P 'SEARCH TERM' fileName.txt | wc -l

Assigning deciles using bash

I'm learning bash, and here's a short script to assign deciles to the second column of file $1.
The complicating bit is the use of awk within the script, leading to ambiguous redirects when I run the script.
I would have gotten this done in SAS by now, but like the idea of two lines of code doing the job.
How can I communicate the total number of rows (${N}) to awk within the script? Thanks.
N=$(wc -l < $1)
cat $1 | sort -t' ' -k2gr,2 | awk '{$3=int((((NR-1)*10.0)/"${N}")+1);print $0}'
You can set an awk variable from the command line using -v.
N=$(wc -l < "$1" | tr -d ' ')
sort -t' ' -k2gr,2 "$1" | awk -v n=$N '{$3=int((((NR-1)*10.0)/n)+1);print $0}'
I added tr -d to get rid of the leading spaces that wc -l puts in its result.

results of wc as variables

I would like to use the lines coming from 'wc' as variables. For example:
echo 'foo bar' > file.txt
echo 'blah blah blah' >> file.txt
wc file.txt
2 5 23 file.txt
I would like to have something like $lines, $words and $characters associated to the values 2, 5, and 23. How can I do that in bash?
In pure bash: (no awk)
a=($(wc file.txt))
lines=${a[0]}
words=${a[1]}
chars=${a[2]}
This works by using bash's arrays. a=(1 2 3) creates an array with elements 1, 2 and 3. We can then access separate elements with the ${a[indice]} syntax.
Alternative: (based on gonvaled solution)
read lines words chars <<< $(wc x)
Or in sh:
a=$(wc file.txt)
lines=$(echo $a|cut -d' ' -f1)
words=$(echo $a|cut -d' ' -f2)
chars=$(echo $a|cut -d' ' -f3)
There are other solutions but a simple one which I usually use is to put the output of wc in a temporary file, and then read from there:
wc file.txt > xxx
read lines words characters filename < xxx
echo "lines=$lines words=$words characters=$characters filename=$filename"
lines=2 words=5 characters=23 filename=file.txt
The advantage of this method is that you do not need to create several awk processes, one for each variable. The disadvantage is that you need a temporary file, which you should delete afterwards.
Be careful: this does not work:
wc file.txt | read lines words characters filename
The problem is that piping to read creates another process, and the variables are updated there, so they are not accessible in the calling shell.
Edit: adding solution by arnaud576875:
read lines words chars filename <<< $(wc x)
Works without writing to a file (and do not have pipe problem). It is bash specific.
From the bash manual:
Here Strings
A variant of here documents, the format is:
<<<word
The word is expanded and supplied to the command on its standard input.
The key is the "word is expanded" bit.
lines=`wc file.txt | awk '{print $1}'`
words=`wc file.txt | awk '{print $2}'`
...
you can also store the wc result somewhere first.. and then parse it.. if you're picky about performance :)
Just to add another variant --
set -- `wc file.txt`
chars=$1
words=$2
lines=$3
This obviously clobbers $* and related variables. Unlike some of the other solutions here, it is portable to other Bourne shells.
I wanted to store the number of csv file in a variable. The following worked for me:
CSV_COUNT=$(ls ./pathToSubdirectory | grep ".csv" | wc -l | xargs)
xargs removes the whitespace from the wc command
I ran this bash script not in the same folder as the csv files. Thus, the pathToSubdirectory
You can assign output to a variable by opening a sub shell:
$ x=$(wc some-file)
$ echo $x
1 6 60 some-file
Now, in order to get the separate variables, the simplest option is to use awk:
$ x=$(wc some-file | awk '{print $1}')
$ echo $x
1
declare -a result
result=( $(wc < file.txt) )
lines=${result[0]}
words=${result[1]}
characters=${result[2]}
echo "Lines: $lines, Words: $words, Characters: $characters"

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources