I'm trying to extract a list of dates from a series of links using lynx's dump function and piping the output through grep and awk. This operation works successfully in the terminal and outputs dates accurately. However, when it is placed into a shell script, bash claims a syntax error:
Scripts/ETC/PreD.sh: line 18: syntax error near unexpected token `('
Scripts/ETC/PreD.sh: line 18: ` lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt'
For context, this is part of a while-read loop in which $link is being read from a file. Operations undertaken inside this while-loop when the awk command is removed are all successful, as are similar while-loops that include other awk commands.
I know that either I'm misunderstanding how bash handles variable substitution, or how bash handles awk commands, or some combination of the two. Any help would be immensely appreciated.
EDIT: Shellcheck is divided on this, the website version finds no error, but my downloaded version provides error SC1083, which says:
This { is literal. Check expression (missing ;/\n?) or quote it.
A check on the Shellcheck GitHub page provides this:
This error is harmless when the curly brackets are supposed to be literal, in e.g. awk {'print $1'}.
However, it's cleaner and less error prone to simply include them inside the quotes: awk '{print $1}'.
Script follows:
#!/bin/bash
while read -u 4 link
do
IFS=/ read a b c d e <<< "$link"
echo "$e" >> 1.txt
lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10)}' >> dates.txt
done 4< links.txt
In sed command you have unmatched ', due to unquoted '.
In awk script your have constant zero length variable.
From gawk manual:
substr(string, start [, length ])
Return a length-character-long substring of string, starting at character number start. The first character of a string is character
number one.48 For example, substr("washington", 5, 3) returns "ing".
If length is not present, substr() returns the whole suffix of string that begins at character number start. For example,
substr("washington", 5) returns "ington". The whole suffix is also
returned if length is greater than the number of characters remaining
in the string, counting from character start.
If start is less than one, substr() treats it as if it was one. (POSIX doesn’t specify what to do in this case: BWK awk acts this way,
and therefore gawk does too.) If start is greater than the number of
characters in the string, substr() returns the null string. Similarly,
if length is present but less than or equal to zero, the null string
is returned.
Also I suggest you combine grep|awk|sed|tr into single awk script. And debug the awk script with printouts.
From:
lynx --dump "$link" | grep -A 1 -e With: | tr -d [:cntrl:][:digit:][] | sed 's/\With//g' | awk '{print substr($0,10,length)}' | sed 's/\(.*\),/\1'\ and'/' | tr -s ' ' >> 2.txt
To:
lynx --dump "$link" | awk '/With/{found=1;next}found{found=0;print sub(/\(.*\),/,"& and",gsub(/ +/," ",substr($0,10)))}' >> 2.txt
From:
lynx --dump "$link" | grep -m 1 Date | awk '{print substr($0,10,length)}' >> dates.txt
To:
lynx --dump "$link" | awk '/Date/{print substr($0,10)}' >> dates.txt
My file looks like
//
[297]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,(((23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0468499,4:0.0855423):0.0451632,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.123648,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,(47:0.0275497,39:0.0275497):0.0125652):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[2271]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,(((23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0468499,4:0.0855423):0.0451632,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[687]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,((4:0.128716,(23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0900232):0.0019898,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.106275,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0320701):0.0345064):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.0884852):0.0576977):0.0378275):0.552713);
[186]((((21:0.125204,20:0.125204):0.00994299,(28:0.0790047,(7:0.0146105,5:0.0146105):0.0643943):0.0561423):0.0578754,((4:0.128716,(23:0.0386924,((((26:0.0160606,22:0.0160606):0.00378,(19:0.0160596,16:0.0160596):0.00378096):0.00242531,12:0.0222659):0.0146336,((29:0.0160393,(17:0.00712055,14:0.00712055):0.00891871):0.0195068,11:0.0355461):0.00135346):0.00179282):0.0900232):0.0019898,((25:0.059669,(30:0.0155625,13:0.0155625):0.0441064):0.0223692,(3:0.0288957,1:0.0288957):0.0531425):0.0486673):0.062317):0.60861,((((47:0.0363305,(((62:0.00660739,58:0.00660739):0.011345,(70:0.00496959,54:0.00496959):0.0129828):0.0065665,((68:0.00291155,53:0.00291155):0.0178013,(66:0.0163583,((65:0.0045002,(69:0.00305355,59:0.00305355):0.00144664):0.000757378,(61:0.00311373,52:0.00311373):0.00214385):0.0111007):0.00435459):0.003806):0.0118116):0.111837,(76:0.0395418,(40:0.00641035,34:0.00641035):0.0331314):0.108625):0.0327298,((((44:0.00708562,36:0.00708562):0.0773928,(37:0.025,27:0.025):0.0594785):0.00501024,18:0.0894887):0.0248315,(15:0.0649576,6:0.0649576):0.0493626):0.0665766):0.0680223,((((80:0.0173948,73:0.0173948):0.0162433,(67:0.0129751,((63:0.00435012,57:0.00435012):0.00727273,(60:0.00848091,(64:0.00386096,((56:0.00203231,55:0.00203231):0.00103,51:0.0030623):0.000798654):0.00461996):0.00314194):0.00135223):0.0206631):0.0296773,(33:0.0415374,((75:0.0372575,(45:0.0371022,38:0.0371022):0.000155282):0.0029007,((43:0.0101608,32:0.0101608):0.0242563,31:0.0344171):0.00574108):0.00137926):0.021778):0.147776,((((74:0.0336172,((79:0.0258073,(77:0.0203659,(78:0.00390563,72:0.00390563):0.0164602):0.00544144):0.00767555,49:0.0334829):0.000134364):0.0132633,(35:0.0137148,24:0.0137148):0.0331656):0.0721567,(10:0.0147938,8:0.0147938):0.104243):0.0343567,((((46:0.00103749,42:0.00103749):0.0373456,(48:0.0259862,41:0.0259862):0.0123969):0.00173179,39:0.0401149):0.0339623,((71:0.0427659,50:0.0427659):0.0221428,(9:0.0467372,2:0.0467372):0.0181715):0.00916857):0.0793167):0.0576977):0.0378275):0.552713);
So after the first line every line starts with a number in brackets. I would need to grep the number in brackets and output it into a new file (without [) ..how would that be done>
grep -Po '(?<=\[)\d+(?=\])' file > new_file
-P for Perl regexs so it is possible to use:
\d for a digit
positive lookbehind and positive lookahead ((?<=\[) and (?=\]))
-o for only matching
Another possibility if your grep doesn't support the -P option but awk is available could be this:
awk -F '[][]' '{ if ($2 != "") print $2 }' file > new_file
-F tells awk to accept both ] and [ as a field delimiter, $2 then contains the number you want and is printed.
In three steps using simple commands:
grep -v "//" inputfile | cut -d"[" -f2 | cut -d"]" -f1
In sed can you remove everything outside the []:
grep -v "//" inputfile | sed 's/.*\[\(.*\)].*/\1/'