what's wrong in this awk print statement? - bash

I have a file test.txt below. Each line contains a value and the values are sequence of 6 values in the order of current1, voltage1, current2, voltage2, current3, voltage3. Below is the test.txt file.
11
12
13
14
15
16
21
22
23
24
25
26
31
32
33
34
35
36
41
42
43
44
45
46
Using awk, I want to print it in the format below(one set in one line).
11 12 13 14 15 16
21 22 23 24 25 26
31 32 33 34 35 36
41 42 43 44 45 46
So I wrote a simple awk script like below. I run a modular counter which runs from 1 to 6 and according to cnt value, I keep the input value in i1,v1,i2,v2,i3,v3 repectively. and when cnt is 6(when all the values in a set have been collected), I print the values.
BEGIN{cnt=1}
cnt == 1{i1 = $0}
cnt == 2{v1 = $0}
cnt == 3{i2 = $0}
cnt == 4{v2 = $0}
cnt == 5{i3 = $0}
cnt == 6{v3 = $0}
{if (cnt==6) {cnt = 1; print i1 v1 i2 v2 i3 v3} else cnt = cnt + 1}
The result is like below which is weird. It's been a while that I used awk so I can't figure out what is wrong with the script easily.
awk -f div.awk test.txt
16
26
36
46
What is the problem?

Use the modulo operator. It should be:
awk 'NR%6{printf "%s ",$0}!(NR%6){print}' file
Btw, it looks like your file is using Windows line endings, which leads to the error you reported. Convert them to UNIX before using awk, for example:
sed 's/\r//' file | awk 'NR%6{printf "%s ",$0}!(NR%6){print}'

Related

How to check whether one number range from one file is the subset of other number range from other file?

I'm trying to find out whether range1 numbers [both columns a and b] are the subset or lying between range2's columns [both columns b and c].
range1
a b
15 20
8 10
37 44
32 37
range2
a b c
chr1 6 12
chr2 13 21
chr3 31 35
chr4 36 45
output:
a b c
chr1 6 12 8 10
chr2 13 21 15 20
chr4 36 45 37 44
I wanted to compare range1[a] with range2[b] and range1[b] with range2[c]. One to all comparison.
For example in the first run: the first row of range-1 with all other rows of range-2. But range1[a] should be compared only with range2[b] and similarly, range1[b] should be compared only with range2[c]. Based on this only I have written a criteria :
lbs[i] && lbsf1[j] <= ubs[i] && ubsf1[j] >= lbs[i] && ubsf1[j] <= ubs[i]
r1[a] r2[b] r1[b] r2[c]
15 > 6 20 < 12 False
15 > 13 20 < 21 True
15 > 31 20 < 35 False
15 > 36 20 < 45 False
I have tried to learn from this code [which is working if we wanted to check if a single number is lying in a specific range], therefore I tried modifying the same for two both numbers. But did not work, I'm feeling I'm not able to read the second file properly.
Code: [reference but little modified]
#!/bin/bash
awk -F'\t' '
# 1st pass (fileB): read the lower and upper range bounds
FNR==NR { lbs[++count] = $2+0; ubs[count] = $3+0; next }
# 2nd pass (fileA): check each line against all ranges.
{ lbsf1[++countf1] = $1+0; ubsf1[countf1] = $2+0;
for(i=1;i<=count;++i)
{
for(j=1;j<=countf1;++j)
{
if (lbsf1[j] >= lbs[i] && lbsf1[j] <= ubs[i] && ubsf1[j] >= lbs[i] && ubsf1[j] <= ubs[i])
{ print lbs[i]"\t"ubs[i]"\t"lbsf1[j]"\t"ubsf1[j] ; next }
}
}
}
' range2 range1
This code gave me output:
6 12 8 10
6 12 8 10
6 12 8 10
Thank you.
Assumptions:
input files do not have a b nor a b c as the first line (we can modify the proposed code if these lines really do exist in the data)
lines in range2 do not have leading white space (as shown in the provided sample)
while not demonstrated by the small sample provided, going to assume that a row from range1 may 'match' with multiple rows from range2 and that we want to print all matches (we can modify the proposed code if we need to stop processing a range1 row once we find the first 'match')
Sample data:
$ cat range1
15 20
8 10
37 44
32 37
$ cat range2
chr1 6 12
chr2 13 21
chr3 31 35
chr4 36 45
chr15 36 67 # added to demonstrate multi-match for range1 [ 37 , 44 ]
Issues with current code:
loads the range1 data into an array and then loops over this (ever growing array) for each line read from range1; this array is unnecessary as we just need to process the current row from range1
the dual loop logic is aborted (; next) upon printing the first matching set of records; this premature cancellation means we only see the first match ... over and over; the ; next can be removed
the range2[a] column is not captured during range2 input processing so we're unable to display this column in the final output
Updating OP's current code to address these issues:
awk '
BEGIN { FS=OFS="\t" }
FNR==NR { chromo[++count]=$1
lbs[count]=$2
ubs[count]=$3
next
}
{ lb=$1
ub=$2
for (i=1;i<=count;++i)
if ( lb >= lbs[i] && lb <= ubs[i] && ub >= lbs[i] && ub <= ubs[i] )
print chromo[i],lbs[i],ubs[i],lb,ub
}
' range2 range1
This generates:
chr2 13 21 15 20
chr1 6 12 8 10
chr4 36 45 37 44
chr15 36 67 37 44
If the output needs to be sorted we could modify the awk code to store the results in another array and then during END {...} processing sort and print the array. But for simplicity sake we'll just pipe the output to sort, eg:
$ awk ' BEGIN { FS=OFS="\t" } FNR==NR ....' range2 range1 | sort -V
chr1 6 12 8 10
chr2 13 21 15 20
chr4 36 45 37 44
chr15 36 67 37 44

Moving average with successive elements using awk

I am trying to write a script in which each row element will give the average of next N rows (including itself). I know how to do it with preceding rows like the Nth row will give the average of the preceding N rows. Here is the script for that
awk '
BEGIN{
N = 5;
}
{
x = $2;
i = NR % N;
aveg += (x - X[i]) / N;
X[i] = x;
print $1, $2, aveg;
}' < file > aveg.txt
where file looks like this
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26
27 27
28 28
29 29
30 30
31 31
32 32
33 33
34 34
35 35
36 36
37 37
38 38
39 39
40 40
I want that the first row has average of the next 5 elements i.e.
(1+2+3+4+5)/5=3
second row (2+3+4+5+6)/5=4
third row (3+4+5+6+7)/5=5
and so on. The rows should look like
1 1 3
2 2 4
3 3 5
4 4 6 ...
Can it be done as simply as the script shown above? I was thinking of assigning the row value as the value of nth row below and then proceeding with the above script. But, unfortunately I am unable to assign the row value to some value down the file. Can someone help me to write this script and find the moving average. I am open to other commands in shell as well.
$ cat test.awk
BEGIN {
N=5 # the window size
}
{
n[NR]=$1 # store the value in an array
}
NR>=N { # for records where NR >= N
x=0 # reset the sum variable
delete n[NR-N] # delete the one out the window of N
for(i in n) # all array elements
x+=n[i] # ... must be summed
print n[NR-(N-1)],x/N # print the row from the beginning of window
} # and the related window average
Try it:
$ for i in {1..36}; do echo $i $i >> test.in ; done
$ awk -f test.awk test.in
1 3
2 4
3 5
...
30 32
31 33
32 34
It could be done in running sum, add current and subtract n[NR-N], like this:
BEGIN {
N=5
}
{
n[NR]=$1
x+=$1-n[NR-N]
}
NR>=N {
delete n[NR-N]
print n[NR-(N-1)],x/N
}
Using a N-sized array
BEGIN { N=5 }
{
s+=array[i++]=$1
if (i>=N) i=0
}
NR>=N {
print array[i], s/N
s-=array[i]
}
$ cat tst.awk
BEGIN { OFS="\t"; range=5 }
{ recs[NR%range] = $0 }
NR >= range {
sum = 0
for (i in recs) {
split(recs[i],flds)
sum += flds[2]
}
print recs[(NR+1-range)%range], sum / range
}
.
$ awk -f tst.awk file
1 1 3
2 2 4
3 3 5
4 4 6
5 5 7
6 6 8
7 7 9
8 8 10
9 9 11
10 10 12
11 11 13
12 12 14
13 13 15
14 14 16
15 15 17
16 16 18
17 17 19
18 18 20
19 19 21
20 20 22
21 21 23
22 22 24
23 23 25
24 24 26
25 25 27
26 26 28
27 27 29
28 28 30
29 29 31
30 30 32
31 31 33
32 32 34
33 33 35
34 34 36
35 35 37
36 36 38

Count occurrences in a text line

Is there any way to count how often a value occurs in a line?. My input is a tab delimited .txt file. It looks something like this (but with thousands of lines):
#N/A 14 13 #N/A 15 13 #N/A 14 13 13 15 14 13 15 14 14 15
24 26 #N/A 24 22 #N/A 24 26 #N/A 24 26 24 22 24 22 24 26
45 43 45 43 #N/A #N/A #N/A 43 45 45 43 #N/A 47 45 45 43
I would like an output like this or similar.
#N/A(3) 14 13(3) 15 13(1) 13 15(1) 15 14(1) 14 15 (1)
24 26(4) #N/A(3) 24 22(3)
45 45(4) #N/A(4) 43 45(1) 47 45(1)
Perl solution:
perl -laF'/\t/' -ne '
chomp; my %h;
$h{$_}++ for #F;
print join "\t", map "$_ ($h{$_})", keys %h
' < input
-a splits each line on -F (\t means tab) into the #F array
-l adds newlines to prints
-n reads the input line by line
chomp removes the final newline
%h is a hash table, the keys are the members of #F, the values are the counts
awk to the rescue!
$ awk -F' +' -v OFS=' ' '{for(i=1;i<=NF;i++) if($i!="")a[$i]++;
for(k in a) printf "%s", k"("a[k]")" OFS; delete a; print ""}' file
#N/A(3) 14 13(3) 13 15(1) 15 13(1) 14 15(1) 15 14(1)
#N/A(3) 24 22(3) 24 26(4)
#N/A(4) 43 45(1) 45 43(4) 47 45(1)

printing selected rows from a file using awk

I have a text file with data in the following format.
1 0 0
2 512 6
3 992 12
4 1536 18
5 2016 24
6 2560 29
7 3040 35
8 3552 41
9 4064 47
10 4576 53
11 5088 59
12 5600 65
13 6080 71
14 6592 77
15 7104 83
I want to print all the lines where $1 > 1000.
awk 'BEGIN {$1 > 1000} {print " " $1 " "$2 " "$3}' graph_data_tmp.txt
This doesn't seem to give the output that I am expecting.What am I doing wrong?
You can do this :
awk '$1>1000 {print $0}' graph_data_tmp.txt
print $0 will print all the content of the line
If you want to print the content of the line after the 1000th line/ROW, then you could do the same by replacing $1 with NR. NR represents the number of rows.
awk 'NR>1000 {print $0}' graph_data_tmp.txt
All you need is:
awk '$1>1000' file

Using bash to read elements on a diagonal on a matrix and redirecting it to another file

So, currently i have created a code to do this as shown below. This code works and does what it is supposed to do after I echo the variables:
a=`awk 'NR==2 {print $1}' $coor`
b=`awk 'NR==3 {print $2}' $coor`
c=`awK 'NR==4 {print $3}' $coor`
....but i have to do this for many more lines and i want a more general expression. So I have attempted to create a loop shown below. Syntax wise i don't think anything is wrong with the code, but it is not outputting anything to the file "Cmain".
I was wondering if anyone could help me, I'm kinda new at scripting.
If it helps any, I can also post what i am trying to read.
for (( i=1; i <= 4 ; i++ )); do
for (( j=0; j <= 3 ; j++ )); do
B="`grep -n "cell" "$coor" | awk 'NR=="$i" {print $j}'`"
done
done
echo "$B" >> Cmain
You can replace your lines of awk with this one:
awk '{ for (i=1; i<=NF; i++) if (NR >= 2 && NR == i) print $(i - 1) }' file.txt
Tested input:
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
Output:
11
22
33
44
55
66
77
awk 'BEGIN {f=1} {print $f; f=f+1}' infile > outfile
An alternative using sed and coreutils, assuming space separated input is in infile:
n=$(wc -l infile | cut -d' ' -f1)
for i in $(seq 1 $n); do
sed -n "${i} {p; q}" infile | cut -d' ' -f$i
done

Resources