Awk flag to remove unwanted data - bash

another awk question.
I have a large text file that is separated by numerical values
43 47
abc
efg
hig
21 122
hijk
lmnop
39 41
somemore
texthere
what i would like to do is print the text only if a condition is satisfied.
here's what i have tried, with no luck
awk '{a=$1; b=$2; if (a < 43 && a > 37 && b < 52 && b > 41) {f=1} elif (a > 43 && a < 37 && b > 52 && b < 41) {print; f=0} } f' file
I'd like to print all of the text if the statement is satisfied and i'd like to skip the text if the statement isn't satisfied.
desired output from above
43 47
abc
efg
hig
39 41
somemore
texthere

awk '
# on a line with 2 numbers:
NF == 2 && $1 ~ /^[0-9]+$/ && $2 ~ /^[0-9]+$/ {
# set a flag if the numbers fall in the given ranges
f = (37 <= $1 && $1 <= 43 && 41 <= $2 && $2 <= 52)
}
f
' file

Self-explanining solution:
awk '
function inrange(x, a, b) { return a <= x && x <= b }
/^[0-9]+[\t ]+[0-9]/ {
f = inrange($1, 37, 43) && inrange($2, 41, 52)
}
f
'

Related

Print the entire row which has difference in value while compare the columns

I want to print the entire row whose value dont match
EG :
Symbol Qty Symbol Qty Symbol qty
a 10 a 10 a 11
b 11 b 11 b 11
c 12 c 12 f 13
f 12 f 12 g 13
OUTPUT :
a 10 a 10 a 11
c 12 c 12 (empty Space)
f 12 f 12 f 13
empty space {ES} g 13
awk 'FNR==NR{a[$0];next}!($0 in a ) ' output1.csv output2.csv >> finn1.csv
awk 'FNR==NR{a[$0];next}!($0 in a ) ' finn1.csv output4.csv >> finn.csv
but this prints all in one column that is missing
Like a 11, but I require the whole line
Assuming that you only want to test for mismatched Qty fields, try this:
#!/bin/bash
declare input_file="/path/to/input_file"
declare -i header_flag=0 a b c
while read line; do
[ ${header_flag} -eq 0 ] && header_flag=1 && continue # Ignore first line.
[ ${#line} -eq 0 ] && continue # Ignore blank lines.
read x a x b x c x <<< ${line} # Reuse ${x} because it is not used.
[ ${a} -ne ${b} -o ${a} -ne ${c} -o ${b} -ne ${c} ] && echo ${line}
done < ${input_file}
The awk one-liner
awk '!($1 == $3 && $2 == $4 && $3 == $5 && $4 == $6)' file
will output
Symbol Qty Symbol Qty Symbol qty
a 10 a 10 a 11
c 12 c 12 f 12
f 12 f 12 g 13
You're going about this the wrong way: you can't mash up all the files into one and then try to find which ones have different/missing values. You need to process the individual files
$ cat file1
Symbol Qty
a 10
b 11
c 12
f 12
$ cat file2
Symbol Qty
a 10
b 11
c 12
f 12
$ cat file3
Symbol qty
a 11
b 11
f 13
g 13
Then
assuming you have GNU awk
gawk '
FNR > 1 { qty[$1][FILENAME] = $1 " " $2 }
END {
OFS = "\t"
for (sym in qty) {
missing = !((ARGV[1] in qty[sym]) && (ARGV[2] in qty[sym]) && (ARGV[3] in qty[sym]))
unequal = !(qty[sym][ARGV[1]] == qty[sym][ARGV[2]] && qty[sym][ARGV[1]] == qty[sym][ARGV[3]])
if (missing || unequal) {
print qty[sym][ARGV[1]], qty[sym][ARGV[2]], qty[sym][ARGV[3]]
}
}
}
' file{1,2,3}
outputs
a 10 a 10 a 11
c 12 c 12
f 12 f 12 f 13
g 13

How to use awk to search for min and max values of column in certain files

I know that awk is helpful in trying to find certain things in columns in files, but I'm not sure how to use it to find the min and max values of a column in a group of files. Any advice? To be specific I have four files in a directory that I want to go through awk with.
If you're looking for the absolute maximum and minimum of column N over all the files, then you might use:
N=6
awk -v N=$N 'NR == 1 { min = max = $N }
{ if ($N > max) max = $N; else if ($N < min) min = $N }
END { print min, max }' "$#"
You can change the column number using a command line option or by editing the script (crude, but effective — go with option handling), or any other method that takes your fancy.
If you want the maximum and minimum of column N for each file, then you have to detect new files, and you probably want to identify the files, too:
awk -v N=$N 'FNR == 1 { if (NR != 1) print file, min, max; min = max = $N; file = FILENAME }
{ if ($N > max) max = $N; else if ($N < min) min = $N }
END { print file, min, max }' "$#"
Try this: it will give min and max in file with comma seperated.
simple:
awk 'BEGIN {max = 0} {if ($6>max) max=$6} END {print max}' yourfile.txt
or
awk 'BEGIN {min=1000000; max=0;}; { if($2<min && $2 != "") min = $2; if($2>max && $2 != "") max = $2; } END {print min, max}' file
or more awkish way:
awk 'NR==1 { max=$1 ; min=$1 }
FNR==NR { if ($1>=max) max=$1 ; $1<=min?min=$1:0 ; next}
{ $2=($1-min)/(max-min) ; print }' file file
sort can do the sorting and you can pick up the first and last by any means, for example, with awk
sort -nk2 file{1..4} | awk 'NR==1{print "min:"$2} END{print "max:"$2}'
sorts numerically by the second field of files file1,file2,file3,file4 and print the min and max values.
Since you didn't provide any input files, here is a worked example, for the files
==> file_0 <==
23 29 84
15 58 19
81 17 48
15 36 49
91 26 89
==> file_1 <==
22 63 57
33 10 50
56 85 4
10 63 1
72 10 48
==> file_2 <==
25 67 89
75 72 90
92 37 89
77 32 19
99 16 70
==> file_3 <==
50 93 71
10 20 55
70 7 51
19 27 63
44 3 46
if you run the script, now with a variable column number n
n=1; sort -k${n}n file_{0..3} |
awk -v n=$n 'NR==1{print "min ("n"):",$n} END{print "max ("n"):",$n}'
you'll get
min (1): 10
max (1): 99
and for the other values of n
n=2; sort ...
min (2): 3
max (2): 93
n=3; sort ...
min (3): 1
max (3): 90

How a loop works in awk ? and do we get matched data from two files?

I am trying to extract data from two files with a common column but I am unable to fetch the required data.
File1
A B C D E F G
Dec 3 abc 10 2B 21 OK
Dec 1 %xyZ 09 3F 09 NOK
Dec 5 mnp 89 R5 11 OK
File2
H I J K
abc 10 6.3 A9
xyz 00 0.2 2F
pqr 45 6.9 3c
I am able to get output A B C D E F G but unable to add columns of File2 in between columns in File1 column.
Trail:
awk 'FNR==1{next}
NR==FNR{a[$1]=$2; next}
{k=$3; sub(/^\%/,"",k)} k in a{print $1,$2,$3,$a[2,3,4],$4,$5,$6,$7; delete a[k]}
END{for(k in a) print k,a[k] > "unmatched"}' File2 File1 > matched
Required output:
matched:
A B I C J K D E F G
Dec 3 10 abc 6.3 A9 10 2B 21 OK
Dec 1 00 %syz 0.2 2F 09 3F 09 NOK
unmatched :
H I J K
pqr 45 6.9 3c
Could you please help me for getting this output please ? Thank you.
awk '
FNR == 1 { next }
FNR==NR {
As[ $3] = $0
S3 = $3
gsub( /%/, "", S3)
ALs[ tolower( S3)] = $3
next
}
{
Bs[ tolower( $1)] = $0
}
END {
print "matched:"
print "A B I C J K D E F G"
for ( B in Bs){
if ( B in ALs){
split( As[ ALs[B]] " " Bs[B], Fs)
printf( "%s %s %s %s %s %s %s %s %s %s\n", Fs[1], Fs[2], Fs[9], Fs[3], Fs[10], Fs[11], Fs[4], Fs[5], F[6], F[7])
}
}
print "unmatched :"
print "H I J K"
for ( B in Bs) if ( ! ( B in ALs)) print Bs[ B]
}
' File1 File2
added non define constraint of ignore case of reference (%xyZ vs xyz)
need to keep both file in memory (array) to treat at the end. Matched could be done at reading. I keep, for understanding purpose output at END level
Your problem:
you mainly take reference to wrong file in your code (k=$3 is used when reading File2 with field from File1 reference, ...)

Print only '+' or '-' if string matches (with two conditions)

I would like to add two additional conditions to the actual code I have: print '+' if in File2 field 5 is greater than 35 and also field 7 is grater than 90.
Code:
while read -r line
do
grep -q "$line" File2.txt && echo "$line +" || echo "$line -"
done < File1.txt '
Input file 1:
HAPS_0001
HAPS_0002
HAPS_0005
HAPS_0006
HAPS_0007
HAPS_0008
HAPS_0009
HAPS_0010
Input file 2 (tab-delimited):
Query DEG_ID E-value Score %Identity %Positive %Matching_Len
HAPS_0001 protein:plasmid:149679 3.00E-67 645 45 59 91
HAPS_0002 protein:plasmid:139928 4.00E-99 924 34 50 85
HAPS_0005 protein:plasmid:134646 3.00E-98 915 38 55 91
HAPS_0006 protein:plasmid:111988 1.00E-32 345 33 54 86
HAPS_0007 - - 0 0 0 0
HAPS_0008 - - 0 0 0 0
HAPS_0009 - - 0 0 0 0
HAPS_0010 - - 0 0 0 0
Desired output (tab-delimited):
HAPS_0001 +
HAPS_0002 -
HAPS_0005 +
HAPS_0006 -
HAPS_0007 -
HAPS_0008 -
HAPS_0009 -
HAPS_0010 -
Thanks!
This should work:
$ awk '
BEGIN {FS = OFS = "\t"}
NR==FNR {if($5>35 && $7>90) a[$1]++; next}
{print (($1 in a) ? $0 FS "+" : $0 FS "-")}' f2 f1
HAPS_0001 +
HAPS_0002 -
HAPS_0005 +
HAPS_0006 -
HAPS_0007 -
HAPS_0008 -
HAPS_0009 -
HAPS_0010 -
join file1.txt <( tail -n +2 file2.txt) | awk '
$2 = ($5 > 35 && $7 > 90)?"+":"-" { print $1, $2 }'
You don't care about the second field in the output, so overwrite it with the appropriate sign for the output.

Shell script to find common values and write in particular pattern with subtraction math to range pattern

Shell script to find common values and write in particular pattern with subtraction math to range pattern
Shell script to get command values in two files and write i a pattern to new file AND also have the first value of the range pattern to be subtracted by 1
$ cat file1
2
3
4
6
7
8
10
12
13
16
20
21
22
23
27
30
$ cat file2
2
3
4
8
10
12
13
16
20
21
22
23
27
Script that works:
awk 'NR==FNR{x[$1]=1} NR!=FNR && x[$1]' file1 file2 | sort | awk 'NR==1 {s=l=$1; next} $1!=l+1 {if(l == s) print l; else print s ":" l; s=$1} {l=$1} END {if(l == s) print l; else print s ":" l; s=$1}'
Script out:
2:4
8
10
12:13
16
20:23
27
Desired output:
1:4
8
10
11:13
16
19:23
27
Similar to sputnick's, except using comm to find the intersection of the file contents.
comm -12 <(sort file1) <(sort file2) |
sort -n |
awk '
function print_range() {
if (start != prev)
printf "%d:", start-1
print prev
}
FNR==1 {start=prev=$1; next}
$1 > prev+1 {print_range(); start=$1}
{prev=$1}
END {print_range()}
'
1:4
8
10
11:13
16
19:23
27
Try doing this :
awk 'NR==FNR{x[$1]=1} NR!=FNR && x[$1]' file1 file2 |
sort |
awk 'NR==1 {s=l=$1; next}
$1!=l+1 {if(l == s) print l; else print s -1 ":" l; s=$1}
{l=$1}
END {if(l == s) print l; else print s -1 ":" l; s=$1}'

Resources