AWK print group of lines twice if conditions met - bash

I am having difficulties writing an AWK statement that would print out a group of lines twice under specified conditions, with the option to change values in the lines being repeated. For example, if the first field of a row is 11 ($1=11), then I would like to print out that row and the row that follows twice (adjusting the value in the second column ($2).
So far this is what I have, but it does not replicate lines with the first field = to 11 and the following line.
awk '{if(NF<3) print $0; if(NF==3 && $1==11) print $0, 1, 20; if(NF==3 && $1 != 11) print $0, 0, 0; if(NF>3) print $0;}'
Example Input
1 3
6 0.1 99
0.100 0.110 0.111
7 0.4 88
0.200 0.220 0.222
11 0.5 77
0.300 0.330 0.333
2 2
7 0.3 66
0.400 0.440 0.444
11 0.7 55
0.500 0.550 0.555
This is a simplified version of what I would like to do, so let's just say for simplicity I would like the printed NR where $1==11 and following row (NR+1) to have the value in the second column ($2) be half of the original value. Example, for the grouping of rows under the 1 3 section, the value after 11 is 0.5. Ideally, the rows printed out would have the value following 11 to be 0.25.
Ideal Output
1 3
6 0.1 99 0 0
0.100 0.110 0.111
7 0.4 88 0 0
0.200 0.220 0.222
11 0.25 77 1 20
0.300 0.330 0.333
11 0.25 77 1 20
0.300 0.330 0.333
2 2
7 0.3 66 0 0
0.400 0.440 0.444
11 0.35 55 1 20
0.500 0.550 0.555
11 0.35 55 1 20
0.500 0.550 0.555

With GNU awk for gensub() and \s/\S:
$ awk '$1==11{$0=gensub(/^(\s+\S+\s+)\S+/,"\\1"$2/2,1); c=2; s=$0} {print} c&&!--c{print s ORS $0}' file
1 3
6 0.1 99
0.100 0.110 0.111
7 0.4 88
0.200 0.220 0.222
11 0.25 77
0.300 0.330 0.333
11 0.25 77
0.300 0.330 0.333
2 2
7 0.3 66
0.400 0.440 0.444
11 0.35 55
0.500 0.550 0.555
11 0.35 55
0.500 0.550 0.555

You can use the following awk script. (P.S. There are leading and trailing space in your input file. That's why I had to use NF>2 && NF<=5 rather than NF==3.)
BEGIN {
c=0;FS="[ \t]+";OFS=" ";x="";y="";
}
c==2{
print x, 1, 20;
print y;
c=0;
}
NF ==2{
print $0;
}
NF>2 && NF<=5{
if(c==1){
print $0;
y=$0;c=2;next;
}
if($2==11){
print $0, 1, 20;
x=$0;c=1;
}
else print $0;
}
NF>5{
print $0,"hello";
}
END{
if(c==2){
print x, 1, 20;
print y;
}
}

Related

how to find a sequence of numbers

I have a data file formatted like this:
0.00 0.00 0.00
1 10 1.0
2 12 1.0
3 15 1.0
4 20 0.0
5 23 0.0
0.20 0.15 0.6
1 12 1.0
2 15 1.0
3 20 0.0
4 18 0.0
5 20 0.0
0.001 0.33 0.15
1 8 1.0
2 14 1.0
3 17 0.0
4 25 0.0
5 15 0.0
I need to remove some data and reorder line like this:
1 10
1 12
1 8
2 12
2 15
2 14
3 15
3 20
3 17
4 20
4 18
4 25
5 23
5 20
5 15
My code do not show anything. The problem might be in the grep command. Could you please help me out?
touch extract_file.txt
for (( i=1; i<=band; i++))
do
sed -e '1, 7d' data_file | grep -w " '$(echo $i)' " | awk '{print $2}' > extract(echo $i).txt
paste -s extract_file.txt extract$(echo $i).txt > data
done
#rm eigen*.txt
The following code with comments:
cat <<EOF |
0.00 0.00 0.00
1 10 1.0
2 12 1.0
3 15 1.0
4 20 0.0
5 23 0.0
0.20 0.15 0.6
1 12 1.0
2 15 1.0
3 20 0.0
4 18 0.0
5 20 0.0
0.001 0.33 0.15
1 8 1.0
2 14 1.0
3 17 0.0
4 25 0.0
5 15 0.0
EOF
# remove lines not starting with a space
grep -v '^[^ ]' |
# remove leading space
sed 's/^[[:space:]]*//' |
# remove third arg
sed 's/[[:space:]]*[^[:space:]]*$//' |
# stable sort on first number
sort -s -n -k1 |
# each time first number changes, print additional newline
awk '{ if(length(last) != 0 && last != $1) printf "\n"; print; last=$1}'
outputs:
1 10
1 12
1 8
2 12
2 15
2 14
3 15
3 20
3 17
4 20
4 18
4 25
5 23
5 20
5 15
Tested on repl.
perl one-liner:
$ perl -lane 'push #{$nums{$F[0]}}, "#F[0,1]" if /^ /;
END { for $n (sort { $a <=> $b } keys %nums) {
print for #{$nums{$n}};
print "" }}' input.txt
1 10
1 12
1 8
2 12
2 15
2 14
3 15
3 20
3 17
4 20
4 18
4 25
5 23
5 20
5 15
Basically, for each line starting with a space, use the first number as a key to a hash table that stores lists of the first two numbers, and print them out sorted by first number.

Organize the file by AWK

Well, I have the following file:
week ID Father Mother Group C_Id Weight Age System Gender
9 107001 728 7110 922 107001 1287 56 2 2
10 107001 728 7110 1022 107001 1319 63 2 2
11 107001 728 7110 1122 107001 1491 70 2 2
1 107002 702 7006 111 107002 43 1 1 1
2 107002 702 7006 211 107002 103 7 1 1
4 107002 702 7006 411 107002 372 21 1 1
1 107003 729 7112 111 107003 40 1 1 1
2 107003 729 7112 211 107003 90 7 1 1
5 107003 729 7112 511 107003 567 28 1 1
7 107003 729 7112 711 107003 1036 42 1 1
I need to transpose the Age ($8) and Weight ($7) columns, where the column ($8) will be the new label (1, 7, 21, 28, 42, 56, 63, 70). Additionally, the age label should be in ascending order. But not all animals have all age measures, animals that do not possess should be given the "NS" symbol. The Id, Father, Mother, System, and Gender columns will be maintained, but with the transposition of the Age and Weight columns, it will not be necessary to repeat these variables as in the first table. Week, Group and C_Id columns are not required. Visually, I need that file be this way:
ID Father Mother System Gender 1 7 21 28 42 56 63 70
107001 728 7110 2 2 NS NS NS NS NS 1287 1319 1491
107002 702 7006 1 1 43 103 372 NS NS NS NS NS
107003 729 7112 1 1 40 90 NS 567 1036 NS NS NS
I tried this program:
#!/bin/bash
awk 'NR==1{h=$2 OFS $3 OFS $4 OFS $9 OFS $10; next}
{a[$2]=(($1 in a)?(a[$1] OFS $NF):(OFS $3 OFS $4 OFS $9 OFS $10));
if(!($8 in b)) {h=h OFS $8; b[$8]}}
END{print h; for(k in a) print k,a[k]}' banco.txt | column -t > a
But I got it:
ID Father Mother System Gender
56 63 70 1 7 21 28 42
107001 728 7110 2 2
107002 702 7006 1 1
107003 729 7112 1 1
And I'm stuck at that point, any suggestion please? Thanks.
With GNU awk for "sorted_in":
$ cat tst.awk
{
id = $2
weight = $7
age = $8
idAge2weight[id,age] = weight
id2common[id] = $2 OFS $3 OFS $4 OFS $9 OFS $10
ages[age]
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc"
printf "%s", id2common["ID"]
for (age in ages) {
printf "%s%s", OFS, age
}
print ""
delete id2common["ID"]
for (id in id2common) {
printf "%s", id2common[id]
for (age in ages) {
weight = ((id,age) in idAge2weight ? idAge2weight[id,age] : "NS")
printf "%s%s", OFS, weight
}
print ""
}
}
$ awk -f tst.awk file | column -t
ID Father Mother System Gender Age 1 7 21 28 42 56 63 70
107001 728 7110 2 2 NS NS NS NS NS NS 1287 1319 1491
107002 702 7006 1 1 NS 43 103 372 NS NS NS NS NS
107003 729 7112 1 1 NS 40 90 NS 567 1036 NS NS NS
I added the pipe to column -t just so you could see the field alignment.

subtracting data from columns in bash csv

I have several columns in a file. I want to subtract two columns...
They have these form...without decimals...
1.000 900
1.012 1.010
1.015 1.005
1.020 1.010
I need another column in the same file with the subtract
100
2
10
10
I have tried
awk - F "," '{$16=$4-$2; print $1","$2","$3","$4","$5","$6}'
but it gives me...
0.100
0.002
0.010
0.010
Any indication?
Using this awk:
awk -v OFS='\t' '{p=$1;q=$2;sub(/\./, "", p); sub(/\./, "", q); print $0, (p-q)}' file
1.000 900 100
1.012 1.010 2
1.015 1.005 10
1.020 1.010 10
Using perl:
perl -lanE '$,="\t",($x,$y)=map{s/\.//r}#F;say#F,$x-$y' file
prints:
1.000 900 100
1.012 1.010 2
1.015 1.005 10
1.020 1.010 10

Specific sorting to category by awk and bash

Dear all i have one question.
I have the input like : (second column is only index)
chr1 1 30
chr1 2 40.5
chr1 3 30.5
chr1 4 41
chr2 10 60
chr2 15 40.1
And i want to get this:
chr1 chr2
30 - 31 2 0
31 - 32 0 0
...
40 - 41 1 1 etc..
I need categorize data to each group from 30 to 60 per 1. From the input data I count all rows for chr1 which are contain in in the category 30-31 from $3. I have this code, but I do not understand where is problem: (some problem with loop)
samtools view /home/filip/Desktop/AMrtin\ Hynek/54321Odfiltrovany.bam | awk '{ n=length($10); print $3,"\t",NR,"\t", gsub(/[GCCgcs]/,"",$10)/n;}' | awk '($3 <= 0.6 && $3 >= 0.3)' | awk '{print $1,"\t",$2,"\t",($3*100)}' > data.txt
for j in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22
do
export $j
awk -v sop=$j '{if($1 == $sop) print $0}' data.txt |
awk '{d=int($3)
a[d]++
if (NR==1) {min=d}
min=(min>=d?d:min)
max=(max>d?max:d)}
END{for (i=min; i<=max; i++) print i, "-", i+1, a[i]+0}' ;
done
Part of code I made by help "fedorqui"
Using awk:
awk '
!($1 in chrs) { chr[++c] = $1 ; chrs[$1]++ }
{
val = int($3);
map[$1,val]++;
min = (NR==1?val:min>=val?val:min);
max = (max>val?max:val)
}
END {
printf "\t\t"
for (j=1; j<=c; j++) {
printf "%s%s", sep, chr[j]
sep = "\t"
}
print ""
for (i=min; i<=max; i++) {
printf "%d - %d\t", i, i+1
for (j=1; j<=c; j++) {
printf "\t%s", map[chr[j],i] + 0
}
print ""
}
}' file
chr1 chr2
30 - 31 2 0
31 - 32 0 0
32 - 33 0 0
...
38 - 39 0 0
39 - 40 0 0
40 - 41 1 1
41 - 42 1 0
42 - 43 0 0
...
59 - 60 0 0
60 - 61 0 1
You increment the chr array by the order of chromosome seen.
Rest stuff in the main block is pretty much your code except that we also create a map array that is indexed at chromosome and range having counts as its value.
In the END block we first iterate over our chr array and print the chromosomes
Then using our min and max variables we create a loop and print the values from our map array which is indexed at chromosome and the range.
I have truncated some lines from the output. As you can see from the output it will print all numbers starting at min and ending at max.
First, you could use :
for j in {1..22}; do
chrj="char$j"
# now you could use $chrj instead of $j in this loop
done
Instead of :
for j in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22
do
# ...
done
Then, you don't need to multiply calls to awk and pipes. Only one awk should be enough.
For example :
... | awk '($3 <= 0.6 && $3 >= 0.3)' | awk '{print $1,"\t",$2,"\t",($3*100)}'
Should be :
awk '($3 <= 0.6 && $3 >= 0.3){print $1,"\t",$2,"\t",($3*100)}'
# or
awk '{if ($3 <= 0.6 && $3 >= 0.3){print $1,"\t",$2,"\t",($3*100)}}'
Otherwise :
export $j
What is the purpose of this export ?
I haven't read everything on your code but at this point many optimizations must be done !
If you are using gawk, this should work. There's a filter for $1 that should handle everything you were doing with $j (unless you truly need only chr1..chr22, in which case it should still be possible to develop a regex for it).
BEGIN {
for(i = 30; i <= 60; i++) {
rstring = i " - " i + 1;
rows[rstring] = 0;
}
}
$1 ~ /^chr[0-9][0-9]?$/ {
row = int($3) " - " int($3) + 1;
columns[$1] = 0;
rows[row] = 0;
data[row][$1] += 1;
rowwidth = length(row) > rowwidth ? length(row) : rowwidth;
colwidth = length($1) > colwidth ? length($1) : colwidth;
}
END {
rowheader = "%-" (rowwidth * 2) "s";
colheader = "%" colwidth "s\t";
dataformat = "%" int(colwidth / 2) "d\t";
asorti(columns, sortedcolumns);
asorti(rows, sortedrows);
printf rowheader, "";
for(c in sortedcolumns) printf "%s\t", sortedcolumns[c];
print "";
for(r in sortedrows) {
printf rowheader, sortedrows[r];
for(c in sortedcolumns)
printf dataformat, data[sortedrows[r]][sortedcolumns[c]];
print ""
}
}
Running it with gawk -f [scriptfile from above] < data.txt should produce something like:
chr1 chr2
30 - 31 2 0
31 - 32 0 0
. . .
39 - 40 0 0
40 - 41 1 1
41 - 42 1 0
42 - 43 0 0
. . .
59 - 60 0 0
60 - 61 0 1
Following can be used if you want to use Perl
perl -ane '
$h{$F[0]}{int $F[2]}++;
push #range, int $F[2];
}{
#range = sort #range;
print "\t\t", join "\t", sort { $a cmp $b } keys %h; print "\n";
for $i ($range[0] .. $range[-1]) {
print "$i - ", $i + 1, "\t\t";
print $h{$_}{$i} + 0, "\t" for sort { $a cmp $b } keys %h; print "\n"
}' file
Output should be like this
chr1 chr2
30 - 31 2 0
31 - 32 0 0
32 - 33 0 0
33 - 34 0 0
34 - 35 0 0
35 - 36 0 0
36 - 37 0 0
37 - 38 0 0
38 - 39 0 0
39 - 40 0 0
40 - 41 1 1
41 - 42 1 0
42 - 43 0 0
43 - 44 0 0
44 - 45 0 0
45 - 46 0 0
46 - 47 0 0
47 - 48 0 0
48 - 49 0 0
49 - 50 0 0
50 - 51 0 0
51 - 52 0 0
52 - 53 0 0
53 - 54 0 0
54 - 55 0 0
55 - 56 0 0
56 - 57 0 0
57 - 58 0 0
58 - 59 0 0
59 - 60 0 0

Pretty-print with awk?

I have a code which is intended to output numbers stored in a file (which are in one column) to another TXT file. The part of the code which does this this is:
awk -F"\n" 'NR==1{a=$1" ";next}{a=a$1" "}END{print a}' col_trim.txt >> row.txt
the output is something like this:
1.31 2.3 3.35 2.59 1.63
2.03 2.21 1.99 1.5 1.12
1 0.6 -0.71 -2.1 0.01
But I want it to be like this:
1.31 2.30 3.35 2.59 1.63
2.03 2.21 1.99 1.50 1.12
1.00 0.60 -0.71 -2.10 0.01
As you see all numbers in the second sample have 2 digits after decimal and also if they are negative, the negative sign is placed before the number so it doesn't mess the arrangement of the numbers.
Any idea?
P.S.:
The input file is a text file with a column of numbers (for each row):
1.31
2.3
3.35
2.59
1.63
The whole code is like this:
#!/bin/sh
rm *.txt
for time in 00 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96; do
filename=gfs.t00z.master.grbf$time.10m.uv.grib2
wgrib2 $filename -spread $time.txt
sed 's:lon,lat,[A-Z]* 10 m above ground d=\([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*:\1 '$time'0000:' $time.txt > temp.txt
for (( j = 1; j <= 2; j++ )); do
if [ j == 1 ]; then
sed -n '/lon,lat,UGRD/,/lon,lat,VGRD/p' $time.txt > vel_sep.txt
else
sed -n '/lon,lat,VGRD/,/359.500000,90.000000/p' $time.txt > vel_sep.txt
fi
line=174305
sed -n 1p temp.txt >> row.txt
for (( i = 1; i <= 48; i++ )); do
sed -n "$line","$(($line+93))"p vel_sep.txt > col.txt
sed 's:[0-9]*.[0-9]*,[0-9]*.[0-9]*,::' col.txt > col_trim.txt
awk -F"\n" 'NR==1{a=$1" ";next}{a=a$1" "}END{print a}' col_trim.txt >> row.txt
line=$(($line-720))
done
done
done
exit 0
Replace your awk by this:
awk -F"\n" 'NR==1{a=sprintf("%10.2f", $1); next}
{a=sprintf("%s%10.2f", a,$1);}END{print a}' col_trim.txt >> row.txt
EDIT: For left alignment:
awk -F"\n" 'NR==1{a=sprintf("%-8.2f", $1); next}
{a=sprintf("%s%-8.2f", a,$1);}END{print a}' col_trim.txt >> row.txt
You can use the column command:
awk -F"\n" 'NR==1{a=$1" ";next}{a=a$1" "}END{print a}' col_trim.txt | \
column -t >> row.txt
This gives:
1.31 2.3 3.35 2.59 1.63
2.03 2.21 1.99 1.5 1.12
1 0.6 -0.71 -2.1 0.01
This can be solved using printf with awk
Eksample:
echo -e "1 -2.5 10\n-3.4 2 12" | awk '{printf "%8.2f %8.2f %8.2f\n",$1,$2,$3}'
1.00 -2.50 10.00
-3.40 2.00 12.00
Additionally, this script has big spaces we can improve.
Here is the first one:
change from:
for time in 00 03 06 09 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96; do
to
for time in $(seq 0 3 96); do
time=$(printf "%02d" $time)
if you can show us the sample output of wgrib2 $filename -spread $time.txt, we can give more suggestions.

Resources