Find the durations and their maximum between the dataset in an interval in shell script - shell

This is related to my older question Find the durations and their maximum between the dataset in shell script
I have a dataset as:
ifile.txt
2
3
2
3
2
20
2
0
2
0
0
2
1
2
5
6
7
0
3
0
3
4
5
I would like to find out different duration and their maximum between the 0 values in 6 values interval.
My desire output is:
ofile.txt
6 20
1 2
1 2
1 2
5 7
1 3
3 5
Where
6 is the number of counts until next 0 within 6 values (i.e. 2,3,2,3,2,20) and 20 is the maximum value among them;
1 is the number of counts until next 0 within next 6 values (i.e. 2,0,2,0,0,2) and 2 is the maxmimum;
Next 1 and 2 are withing same 6 values;
5 is the number of counts until next 0 within next 6 values (i.e. 1,2,5,6,7,0) and 7 is the maximum among them;
And so on
As per the answer in my previous question, I was trying with this:
awk '(NR%6)==0
$0!=0{
count++
max=max>$0?max:$0
}
$0==0{
if(count){
print count,max
}
count=max=""
}
END{
if(count){
print count,max
}
}
' ifile.txt

A format command added to the EDIT2 solution given by RavinderSingh13 which will print exact desire output:
awk '
$0!=0{
count++
max=max>$0?max:$0
found=""
}
$0==0{
print count,max
count=max=0
next
}
FNR%6==0{
print count,max
count=max=0
found=1
}
END{
if(!found){
print count,max
}
}
' Input_file | awk '!/^ /' | awk '$1 != 0'
Output will be as follows.
6 20
1 2
1 2
1 2
5 7
1 3
3 5
EDIT2: Adding another solution which will print values in every 6 elements along with zeros coming in between.
awk '
$0!=0{
count++
max=max>$0?max:$0
found=""
}
$0==0{
print count,max
count=max=0
next
}
FNR%6==0{
print count,max
count=max=0
found=1
}
END{
if(!found){
print count,max
}
}
' Input_file
Output will be as follows.
6 20
1 2
1 2
0 0
1 2
5 7
1 3
3 5
EDIT: As per OP's comment OP doesn't want to reset of count of non-zeros when a zero value comes in that case try following.
awk '
$0!=0{
count++
max=max>$0?max:$0
found=""
}
FNR%6==0{
print count,max
count=max=0
found=1
}
END{
if(!found){
print count,max
}
}
' Input_file
Output will be as follows.
6 20
3 2
5 7
.......
Could you please try following(written and tested with posted samples only).
awk '
$0!=0{
count++
max=max>$0?max:$0
found=""
}
$0==0{
count=FNR%6==0?count:0
found=""
}
FNR%6==0{
print count,max
count=max=0
found=1
}
END{
if(!found){
print count,max
}
}
' Input_file

Related

Find the probability in 2nd column for a selection in 1st column

I have two columns as follows
ifile.dat
1 10
3 34
1 4
3 32
5 3
2 2
4 20
3 13
4 50
1 40
2 20
5 2
I would like to calculate the probability in 2nd column for some selection in 1st column.
ofile.dat
1-2 0.417 #Here 1-2 means all values in 1st column ranging from 1 to 2;
#0.417 is the probability of corresponding values in 2nd column
# i.e. count(10,4,2,40,20)/total = 5/12
3-4 0.417 #count(34,32,20,13,50)/total = 5/12
5-6 0.167 #count(3,2)/total = 2/12
Similarly if I choose the range of selection with 3 number, then the desire output will be
ofile.dat
1-3 0.667
4-6 0.333
RavinderSingh13 and James Brown had given nice scripts (see answer), but these are not working for lager values than 10 in 1st column.
ifile2.txt
10 10
30 34
10 4
30 32
50 3
20 2
40 20
30 13
40 50
10 40
20 20
50 2
~
EDIT2: Considering OP's edited samples could you please try following. I have tested it successfully with OP's 1st and latest edit samples and it worked perfectly fine with both of them.
Also one more thing, I made this solution such that a "corner case" where range could leave printing elements in case it is NOT crossing range value at last lines. Like OP's 1st sample where range=2 but max value is 5 so it will NOT leave 5 in here.
sort -n Input_file |
awk -v range="2" '
!b[$1]++{
c[++count]=$1
}
{
d[$1]=(d[$1]?d[$1] OFS:"")$2
tot_element++
till=$1
}
END{
for(i=1;i<=till;i++){
num+=split(d[i],array," ")
if(++j==range){
start=start?start:1
printf("%s-%s %.02f\n",start,i,num/tot_element)
start=i+1
j=num=""
delete array
}
if(j!="" && i==till){
printf("%s-%s %.02f\n",start,i,num/tot_element)
}
}
}
'
Output will be as follows.
1-10 0.25
11-20 0.17
21-30 0.25
31-40 0.17
41-50 0.17
EDIT: In case your Input_file don't have 2nd column then try following.
sort -k1 Input_file |
awk -v range="1" '
!b[$1]++{
c[++count]=$1
}
{
d[$1]=(d[$1]?d[$1] OFS:"")$0
tot_element++
till=$1
}
END{
for(i=1;i<=till;i+=(range+1)){
for(j=i;j<=i+range;j++){
num=split(d[c[j]],array," ")
total+=num
}
print i"-"i+range,tot_element?total/tot_element:0
total=num=""
}
}
'
Could you please try following, written and tested with shown samples.
sort -k1 Input_file |
awk -v range="1" '
!b[$1]++{
c[++count]=$1
}
{
d[$1]=(d[$1]?d[$1] OFS:"")$2
tot_element++
till=$1
}
END{
for(i=1;i<=till;i+=(range+1)){
for(j=i;j<=i+range;j++){
num=split(d[c[j]],array," ")
total+=num
}
print i"-"i+range,tot_element?total/tot_element:0
total=num=""
}
}
'
In case you don't have to include any 0 value then try following.
sort -k1 Input_file |
awk -v range="1" '
!b[$1]++{
c[++count]=$1
}
{
d[$1]=(d[$1]!=0?d[$1] OFS:"")$2
tot_element++
till=$1
}
END{
for(i=1;i<=till;i+=(range+1)){
for(j=i;j<=i+range;j++){
num=split(d[c[j]],array," ")
total+=num
}
print i"-"i+range,tot_element?total/tot_element:0
total=num=""
}
}
'
Another:
$ awk '
BEGIN {
a[1]=a[2]=1 # define the groups here
a[3]=a[4]=2 # others will go to an overflow group 3
}
{
b[(($1 in a)?a[$1]:3)]++ # group 3 defined here
}
END { # in the end
for(i in b) # loop all groups in no particular order
print i,b[i]/NR # and output
}' file
Output
1 0.416667
2 0.416667
3 0.166667
Update. Yet another awk with range configuration file. $1 is the start of range, $2 the end and $3 is the group name:
1 3 1-3
4 9 4-9
10 30 10-30
40 100 40-100
Awk program:
$ awk '
BEGIN {
OFS="\t"
}
NR==FNR {
for(i=$1;i<=$2;i++)
a[i]=$3
next
}
{
b[(($1 in a)?a[$1]:"others")]++ # the overflow group is now called "others"
}
END {
for(i in b)
print i,b[i]/NR
}' rangefile datafile
Output with both your datasets catenated together (and awk output piped to sort -n):
1-3 0.285714
4-9 0.142857
10-30 0.285714
40-100 0.142857

How to edit few lines in a column using awk?

I have a ascii data file e.g.:
ifile.txt
2
3
2
3
4
5
6
4
I would like to multiply 3 into all the numbers after 6th line. So outfile will be:
ofile.txt
2
3
2
3
4
15
18
12
my algorithm/ script is
awk '{if ($1<line 6); printf "%10.5f\n", $1}' ifile.txt > ofile.txt
awk '{if ($1>=line 6); printf "%10.5f\n", $1*3}' ifile.txt >> ofile.txt
The simplest way to do this is:
awk 'NR > 6 { $1 *= 3 } 1' ifile.txt
Multiply the first field by 3 when the record (line) number NR is greater than 6.
The structure of an awk program is condition { action }, where the default condition is true and the default action is { print }, so the 1 at the end is the shortest way of always printing every line.

Sum of all rows of all columns - Bash

I have a file like this
1 4 7 ...
2 5 8
3 6 9
And I would like to have as output
6 15 24 ...
That is the sum of all the lines for all the columns. I know that to sum all the lines of a certain column (say column 1) you can do like this:
awk '{sum+=$1;}END{print $1}' infile > outfile
But I can't do it automatically for all the columns.
One more awk
awk '{for(i=1;i<=NF;i++)$i=(a[i]+=$i)}END{print}' file
Output
6 15 24
Explanation
{for (i=1;i<=NF;i++) Set field to 1 and increment through
$i=(a[i]+=$i) Set the field to the sum + the value in field
END{print} Print the last line which now contains the sums
As with the other answers this will retain the order of the fields regardless of the number of them.
You want to sum every column differently. Hence, you need an array, not a scalar:
$ awk '{for (i=1;i<=NF;i++) sum[i]+=$i} END{for (i in sum) print sum[i]}' file
6
15
24
This stores sum[column] and finally prints it.
To have the output in the same line, use:
$ awk '{for (i=1;i<=NF;i++) sum[i]+=$i} END{for (i in sum) printf "%d%s", sum[i], (i==NF?"\n":" ")}' file
6 15 24
This uses the trick printf "%d%s", sum[i], (i==NF?"\n":" "): print the digit + a character. If we are in the last field, let this char be new line; otherwise, just a space.
There is a very simple command called numsum to do this:
numsum -c FileName
-c --- Print out the sum of each column.
For example:
cat FileName
1 4 7
2 5 8
3 6 9
Output :
numsum -c FileName
6 15 24
Note:
If the command is not installed in your system, you can do it with this command:
apt-get install num-utils
echo "1 4 7
2 5 8
3 6 9 " \
| awk '{for (i=1;i<=NF;i++){
sums[i]+=$i;maxi=i}
}
END{
for(i=1;i<=maxi;i++){
printf("%s ", sums[i])
}
print}'
output
6 15 24
My recollection is that you can't rely on for (i in sums) to produce the keys any particular order, but maybe this is "fixed" in newer versions of gawk.
In case you're using an old-line Unix awk, this solution will keep your output in the same column order, regardless of how "wide" your file is.
IHTH
AWK Program
#!/usr/bin/awk -f
{
print($0);
len=split($0,a);
if (maxlen < len) {
maxlen=len;
}
for (i=1;i<=len;i++) {
b[i]+=a[i];
}
}
END {
for (i=1;i<=maxlen;i++) {
printf("%s ", b[i]);
}
print ""
}
Output
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
3 6 9 12 15
Your answer is correct. It is just missed to print "sum". Try this:
awk '{sum+=$1;} END{print sum;}' infile > outfile

Bash - printing diagonal line from matrix [duplicate]

This question already has answers here:
Bash printing diagonal of any size array
(3 answers)
Closed 8 years ago.
I have a file like this
1 3 4 5
2 5 0 9
3 4 6 6
0 1 0 1
I want it to print out a diagonal line from right to left soo..
5 0 4 0
I have this so far, but its only printing out the last column.
#!/bin/bash
awk '{c=NF}{printf "%d ", $c}{c-=1}'
You were close:
echo "1 3 4 5
2 5 0 9
3 4 6 6
0 1 0 1" | awk 'BEGIN{c=4}{printf("%d ", $c);c-=1}'
You had to put the c=NF part in the BEGIN "section". But then, NF is not defined yet.
So if your size is not fixed, what you could do is use NR:
echo "1 3 4 5
2 5 0 9
3 4 6 6
0 1 0 1" | awk '{printf("%d ", $(NF-NR+1))}'
See the link in jaypal's comment too.
Printing top-left to bottom right
awk '{printf(" %d", $NR)} END{print ""}'
Print the column corresponding to the record number; at the end, print a newline. If you don't want the leading space:
awk '{printf("%s%d", pad, $NR); pad=" "} END{print ""}'
Printing top-right to bottom left
awk '{if (NF >= NR) printf(" %d", $(NF - NR + 1))} END{print ""}'
awk '{if (NF >= NR) printf("%s%d", pad, $(NF - NR + 1)); pad=" "} END{print ""}'
Note that none of the programs handles malformed arrays particularly cleverly.
GIGO — garbage in, garbage out.

Aggregate rows with specified granularity

Input:
11 1
12 2
13 3
21 1
24 2
33 1
50 1
Let's say 1st column specify index. I'd like to reduce size of my data as follows:
I sum values from second column with granularity of 10 according to indices. An example:
First I consider range of 0-9 of indices. There aren't any indices from that range so sum equals 0. Next I go to the next range 10-19. There're 3 indices (11,12,13) which meet the range. I sum values from 2nd column for them, it equals 1+2+3=6. And so on...
Desirable output:
0 0
10 6
20 3
30 1
40 0
50 1
That's what I made up:
M=0;
awk 'FNR==NR
{
if ($1 < 10)
{ A[$1]+=$2;next }
else if($1 < $M+10)
{
A[$M]+=$2;
next
}
else
{ $M=$M+10;
A[$M]+=2;
next
}
}END{for(i in A){print i" "A[i]}}' input_file
Sorry but I'm not quite good at AWK.
After some changes:
awk 'FNR==NR {
M=10;
if ($1 < 10){
A[$1]+=$2;next
} else if($1 < M+10) {
A[M]+=$2;
next
} else {
M=sprintf("%d",$1/10);
M=M*10;
A[M]+=$2;
next
}
}END{for(i in A){print i" "A[i]}}' input
This is GNU awk
{
ind=int($1/10)*10
if (mxi<ind) mxi=ind
a[ind]++
}
END {
for (i=0; i<=mxi; i+=10) {
s=(a[i]*(a[i]+1))/2
print i " " s
}
}

Resources