An error with asterisk in if statement - bash

I'm having problem with the following code:
nawk -F "," '{if($2<=2)&&($9!=45)&&($11==2348*)) print $2}' abc12* | wc -l
The error is in ($11==2348*). I tried to put this number in variable x and do ($11==$x*).

if you mean a regex match change it to
$ awk -F, '$2<=2 && $9!=45 && $11~/^2348/ {c++; print $2} END{print c}' abc12*
note that you can incorporate line count in the script as well.
If you want equality check $11=="2348*" would do. Will check that the field is literally 2348* without any special meaning of *.

Looks like you intend to use regexp?
$11==2348*
should give you a syntax error as
2348*
is an incomplete multiplication.
For a regular expression match you would have to use
$11 ~ /2348*/
if you intend to have zero to man "8"s or
$11 ~ /2348.*/ or may be $11 ~ /2348[0-9]*/
if the intial intent is having any character or only digits after "2348"

i think your code would work just fine if you wouldnt have added one more ")" than expected. if you count them you have 7.... so this ($11==2348*)) should acctually be ($11==2348*)

Related

Parameter expansion not working when used inside Awk on one of the column entries

System: Linux. Bash 4.
I have the following file, which will be read into a script as a variable:
/path/sample_A.bam A 1
/path/sample_B.bam B 1
/path/sample_C1.bam C 1
/path/sample_C2.bam C 2
I want to append "_string" at the end of the filename of the first column, but before the extension (.bam). It's a bit trickier because of containing the path at the beginning of the name.
Desired output:
/path/sample_A_string.bam A 1
/path/sample_B_string.bam B 1
/path/sample_C1_string.bam C 1
/path/sample_C2_string.bam C 2
My attempt:
I did the following script (I ran: bash script.sh):
List=${1};
awk -F'\t' -vOFS='\t' '{ $1 = "${1%.bam}" "_string.bam" }1' < ${List} ;
And its output was:
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
Problem:
I followed the idea of using awk for this substitution as in this thread https://unix.stackexchange.com/questions/148114/how-to-add-words-to-an-existing-column , but the parameter expansion of ${1%.bam} it's clearly not being recognised by AWK as I intend. Does someone know the correct syntax for that part of code? That part was meant to mean "all the first entry of the first column, except the last part of .bam". I used ${1%.bam} because it works in Bash, but AWK it's another language and probably this differs. Thank you!
Note that the paramter expansion you applied on $1 won't apply inside awk as the entire command
body of the awk command is passed in '..' which sends content literally without applying any
shell parsing. Hence the string "${1%.bam}" is passed as-is to the first column.
You can do this completely in Awk
awk -F'\t' 'BEGIN { OFS = FS }{ n=split($1, arr, "."); $1 = arr[1]"_string."arr[2] }1' file
The code basically splits the content of $1 with delimiter . into an array arr in the context of Awk. So the part of the string upto the first . is stored in arr[1] and the subsequent split fields are stored in the next array indices. We re-construct the filename of your choice by concatenating the array entries with the _string in the filename part without extension.
If I understood your requirement correctly, could you please try following.
val="_string"
awk -v value="$val" '{sub(".bam",value"&")} 1' Input_file
Brief explanation: -v value means passing shell variable named val value to awk variable variable here. Then using sub function of awk to substitute string .bam with string value along with .bam value which is denoted by & too. Then mentioning 1 means print edited/non-edtied line.
Why OP's attempt didn't work: Dear, OP. in awk we can't pass variables of shell directly without mentioning them in awk language. So what you are trying will NOT take it as an awk variable rather than it will take it as a string and printing it as it is. I have mentioned in my explanation above how to define shell variables in awk too.
NOTE: In case you have multiple occurences of .bam then please change sub to gsub in above code. Also in case your Input_file is TAB delmited then use awk -F'\t' in above code.
sed -i 's/\.bam/_string\.bam/g' myfile.txt
It's a single line with sed. Just replace the .bam with _string.bam
You can try this way with awk :
awk -v a='_string' 'BEGIN{FS=OFS="."}{$1=$1 a}1' infile

printing lines based on pattern matching in multiple fields using awk

Suppose I have a html input like
<li>this is a html input line</li>
I want to filter all such input lines from a file which begins with <li> and ends with </li>. Now my idea was to search for pattern <li> in the first field and pattern </li> in the last field using the below awk command
awk '$1 ~ /\<li\>/ ; $NF ~ /\</li\>/ {print $0}'
but looks like there is no provision to match two fields at a time or I am making some syntax mistakes. Could you please help me here?
PS: I am working on a Solaris SunOS machine.
There's a lot going wrong in your script on Solaris:
awk '$1 ~ /\<li\>/ ; $NF ~ /\</li\>/ {print $0}'
The default awk on Solaris (and so the one we have to assume you are using since you didn't state otherwise) is old, broken awk which must never be used. On Solaris use /usr/xpg4/bin/awk. There's also nawk but it's got less POSIX features (eg. no support for character classes).
\<...\> are gawk-specific word boundaries. There is no awk on Solaris that would recognize those. If you were just trying to get literal characters then there's no need to escape them as they are not regexp metacharacters.
If you want to test for condition 1 and condition 2 you put && between them, not ; which is just the statement terminator in lieu of a newline.
The default action given a true condition is {print $0} so you don't need to explicitly write that code.
/ is the awk regexp delimiter so you do need to escape that in mid-regexp.
The default field separator is white space so in your posted sample input $1 and $NF will be <li>this and line</li>, not <li> and </li>.
So if you DID for some reason compare multiple fields you could do:
awk '($1 ~ /^<li>.*/) && ($NF ~ /.*<\/li>$/)'
but this is probably what you really want:
awk '/^<li>.*<\/li>/'
in which case you could just use grep:
grep '^<li>.*</li>'
Why not just use a regex to match the start and end of the line like
awk '/^[[:space:]]*<li>.*<\/li>[[:space:]]*$/ {print}'
though in general if you're trying to process HTML you'll be better of using a tool that's really designed to handle that.

awk output is acting weird

cat TEXT | awk -v var=$i -v varB=$j '$1~var , $1~varB {print $1}' > PROBLEM HERE
I am passing two variables from an array to parse a very large text file by range. And it works, kind of.
if I use ">" the output to the file will ONLY be the last three lines as verified by cat and a text editor.
if I use ">>" the output to the file will include one complete read of TEXT and then it will divide the second read into the ranges I want.
if I let the output go through to the shell I get the same problem as above.
Question:
It appears awk is reading every line and printing it. Then it goes back and selects the ranges from the TEXT file. It does not do this if I use constants in the range pattern search.
I undestand awk must read all lines to find the ranges I request.
why is it printing the entire document?
How can I get it to ONLY print the ranges selected?
This is the last hurdle in a big project and I am beating my head against the table.
Thanks!
give this a try, you didn't assign varB in right way:
yours: awk -v var="$i" -varB="$j" ...
mine : awk -v var="$i" -v varB="$j" ...
^^
Aside from the typo, you can't use variables in //, instead you have to specify with regular ~ match. Also quote your shell variables (here is not needed obviously, but to set an example). For example
seq 1 10 | awk -v b="3" -v e="5" '$0 ~ b, $0 ~ e'
should print 3..5 as expected
It sounds like this is what you want:
awk -v var="foo" -v varB="bar" '$1~var{f=1} f{print $1} $1~varB{f=0}' file
e.g.
$ cat file
1
2
foo
3
4
bar
5
foo
6
bar
7
$ awk -v var="foo" -v varB="bar" '$1~var{f=1} f{print $1} $1~varB{f=0}' file
foo
3
4
bar
foo
6
bar
but without sample input and expected output it's just a guess and this would not address the SHELL behavior you are seeing wrt use of > vs >>.
Here's what happened. I used an array to input into my variables. I set the counter for what I thought was the total length of the array. When the final iteration of the array was reached, there was a null value returned to awk for the variable. This caused it to print EVERYTHING. Once I correctly had a counter with the correct number of array elements the printing oddity ended.
As far as the > vs >> goes, I don't know. It did stop, but I wasn't as careful in documenting it. I think what happened is that I used $1 in the print command to save time, and with each line it printed at the end it erased the whole file and left the last three identical matches. Something to ponder. Thanks Ed for the honest work. And no thank you to Robo responses.

How not to get expanded variables in AWK

Good day,
I was wondering how not to get expanded variables in AWK.
variable to pass:achi
But, when I try with:
awk -F, -v var1="achi" '$(NF-1)~var1' file
It just does not work. It prints all lines that match achi.
I'll appreciate some insights to do it properly.
Input
achi, francia
nachi, peru
universidad achi, japon
achito, suecia
Expected Output
achi, francia
You seem to be trying to test equivalence with the pattern matching operator ~. The proper operator to test equivalence is ==.
awk -F, -v var1="achi" '$(NF-1)==var1' file
If you are expecting more fields you should take into account that your values are separated with a comma and a space, this can be done using ", " as the field separator.
awk -F", " -v var1="achi" '$(NF-1)==var1'

shell: write integer division result to a variable and print floating number

I'm trying to write a shell script and plan to calculate a simple division using two variables inside the script. I couldn't get it to work. It's some kind of syntax error.
Here is part of my code, named test.sh
awk '{a+=$5} END {print a}' $variable1 > casenum
awk '{a+=$5} END {print a}' $variable2 > controlnum
score=$(echo "scale=4; $casenum/$controlnum" | bc)
printf "%s\t%s\t%.4f\n", $variable3 $variable4 $score
It's just the $score that doesn't work.
I tried to use either
sh test.sh
or
bash test.sh
but neither worked. The error message is:
(standard_in) 1: syntax error
Does anyone know how to make it work? Thanks so much!
You are outputting to files, not to vars. For this, you need var=$(command). Hence, this should make it:
casenum=$(awk '{a+=$5} END {print a}' $variable1)
controlnum=$(awk '{a+=$5} END {print a}' $variable2)
score=$(echo "scale=4; $casenum/$controlnum" | bc)
printf "%s\t%s\t%.4f\n", $variable3 $variable4 $score
Note $variable1 and $variable2 should be file names. Otherwise, indicate it.
First your $variable1 and $variable2 must expand to a name of an existing file; but that's not a syntax error, it's just a fact that makes your code wrong, unless you mean really to cope with files containing numbers and accumulating the sum of the fifth field into a file. Since casenum and controlnum are not assigned (in fact you write the awk result to a file, not into a variable), your score computation expands to
score=$(echo "scale=4; /" | bc)
which is wrong (Syntax error comes from this).
Then, the same problem with $variable3 and $variable4. Are they holding a value? Have you assigned them with something like
variable=...
? Otherwise they will expand as "". Fixing these (including assigning casenum and controlnum), will fix everything, since basically the only syntax error is when bc tries to interpret the command / without operands. (And the comma after the printf is not needed).
The way you assign the output of execution of a command to a variable is
var=$(command)
or
var=`command`
If I understand your commands properly, you could combine calculation of score with a single awk statement as follows
score=$(awk 'NR==FNR {a+=$5; next} {b+=$5} END {printf "%.4f", a/b}' $variable1 $variable2)
This is with assumption that $variable1 and $variable2 are valid file names
Refer to #fedorqui's solution if you want to stick to your approach of 2 awk and 1 bc.

Resources