strange behaviour in awk sub replacement - bash

I've got a string a
[root#rh-1 ~]# echo $a
4 11 10 7 11
This set of values need to be checked against the id field in the file table
[root#rh-1 ~]# cat table
id name inuse rules
1 Critical data 0 ...
2 Important data 0 ...
3 Normal data 0 ...
4 nc-test 0 ...
5 schedule one 0 ...
7 foo sc2 0 ...
8 foo sc3 0 ...
9 foo-sc4 0 ...
10 foo_sc5 0 ...
11 foosc6 0 ...
Wherever the id is one of the numbers in $a, I need to replace the value of the inuse column with 1.The rules column can be discarded
[root#rh-1 ~]# cat table | awk -v a="$a" ' {
split(a,sid," "); NF=NF-1;}
{ $NF=0;n=length(sid);
for (i=1;i<n;i++)
{ if($1 == sid[i]){
sub($NF,1);
break;
}
}
if($1=="id")
print("mid name InUse" );
else
printf("%-25s\n",$0);
} '
The approach above works, but for some reason here, in the line
10 foo_sc5
instead of replacing the 0 at the end,
to
10 foo_sc5 1
it replaces as
11 foo_sc5 0
while all others are replaced correctly.
mid name InUse
1 Critical data 0
2 Important data 0
3 Normal data 0
4 nc-test 1
5 schedule one 0
7 foo sc2 1
8 foo sc3 0
9 foo-sc4 0
11 foo_sc5 0
11 foosc6 1
[root#rh-1 ~]#
Somehow the replacement only for that particular line is faulty. Can someone help ?

Replace sub($NF,1) with $NF=1.
sub($NF,1) uses $NF as a regular expression and, For the first occurrence in the line, it replaces with 1. Since $NF starts out as 0, this means that the first zero in the line is replaced with a one.
By contrast, $NF=1 merely sets the last column to one.

Related

Add up every 5 rows in a column of integers BASH

I am writing a parser, and have to so some fancy stuff. I am trying not to use python, but I might have to at this point.
Given an STDOUT that looks like this:
1
0
2
3
0
0
1
0
0
2
0
3
0
4
0
5
0
2
.
.
.
For about 100,000 lines. What I need to do is add up every 5, like so:
1 - start
0 |
2 | - 6
3 |
0 - end
0 - start
1 |
0 | - 3
0 |
2 - end
0 - start
3 |
0 | - 7
4 |
0 - end
5
0
2
.
.
.
The -, |, start, end, are all for visual representation, I just need it in a column list:
6
3
7
.
.
.
I currently have a method of doing this by using an increment head -n $i and tail -n 5 to cut 5 rows out of the list, then I use paste -sd+ - | bc to add up all the values. But this is wayyyy to slow because there is 100,000 columns.
If anyone has anything to add I would appreciate it. Let me know if more info is needed.
Thank you
It looks like awk is a natural tool to use:
awk '{ sum += $1 } NR % 5 == 0 { print sum; sum = 0 }'
Add values in column 1 to sum. If the record number modulo 5 is 0, print the sum and reset it to 0. Note that if the last group of records is short (1-4 elements in the group), their sum is not printed. If you want the sum for the short group printed, add END { if (NR % 5 != 0) print sum } to the script.
Since this makes a single pass over the data file using a single command, it will be hard to beat it. Using Perl might be a little faster. I don't know how Python would fare against either Awk or Perl.
You can use awk for it.
Say file named file1 contains
1
0
2
3
0
0
1
0
0
2
0
3
0
4
0
5
0
.
.
.
So the awk command goes like:
awk 'begin{sum=0;} {sum=sum+1;if(NR%5==0){print sum;sum=0;}}' file1

How to find sum of elements in column inside of a text file (Bash)

I have a log file with lots of unnecessary information. The only important part of that file is a table which describes some statistics. My goal is to have a script which will accept a column name as argument and return the sum of all the elements in the specified column.
Example log file:
.........
Skipped....
........
WARNING: [AA[409]: Some bad thing happened.
--- TOOL_A: READING COMPLETED. CPU TIME = 0 REAL TIME = 2
--------------------------------------------------------------------------------
----- TOOL_A statistics -----
--------------------------------------------------------------------------------
NAME Attr1 Attr2 Attr3 Attr4 Attr5
--------------------------------------------------------------------------------
AAA 885 0 0 0 0
AAAA2 1 0 2 0 0
AAAA4 0 0 2 0 0
AAAA8 0 0 2 0 0
AAAA16 0 0 2 0 0
AAAA1 0 0 2 0 0
AAAA8 0 0 23 0 0
AAAAAAA4 0 0 18 0 0
AAAA2 0 0 14 0 0
AAAAAA2 0 0 21 0 0
AAAAA4 0 0 23 0 0
AAAAA1 0 0 47 0 0
AAAAAA1 2 0 26 0
NOTE: Some notes
......
Skipped ......
The expected usage script.sh Attr1
Expected output:
888
I've tried to find something with sed/awk but failed to figure out a solution.
tldr;
$ cat myscript.sh
#!/bin/sh
logfile=${1}
attribute=${2}
field=$(grep -o "NAME.\+${attribute}" ${logfile} | wc -w)
sed -nre '/NAME/,/NOTE/{/NAME/d;/NOTE/d;s/\s+/\t/gp;}' ${logfile} | \
cut -f${field} | \
paste -sd+ | \
bc
$ ./myscript.sh mylog.log Attr3
182
Explanation:
assign command-line arguments ${1} and ${2} to the logfile and attribute variables, respectively.
with wc -w, count the quantity of words within the line that
contains both NAME and ${attribute} (the field index) and assign it to field
with sed
suppress automatic printing (-n) and enable extended regular expressions (-r)
find lines between the NAME and NOTE lines, inclusive
delete the lines that match NAME and NOTE
translate each contiguous run of whitespace to a single tab and print the result
cut using the field index
paste all numbers as an infix summation
evaluate the infix summation via bc
Quick and dirty (without any other spec)
awk -v CountCol=2 '/^[^[:blank:]]/ && NF == 6 { S += $( CountCol) } END{ print S + 0 }' YourFile
with column name
awk -v ColName='Attr1' '/^[[:blank:]]/ && NF == 6 { for(i=1;i<=NF;i++){if ( $i == ColName) CountCol = i } /^[^[:blank:]]/ && NF == 6 && CountCol{ S += $( CountCol) } END{ print S + 0 }' YourFile
you should add a header/trailer filter to avoid noisy line (a flag suit perfect for this) but lack of info about structure to set this flag, i use sthe simple field count (assuming text field have 0 as value so not changing the sum when taken in count)
$ awk -v col='Attr3' '/NAME/{for (i=1;i<=NF;i++) f[$i]=i} col in f{sum+=$(f[col]); if (!NF) {print sum+0; exit} }' file
182

bash search output for similar text and perform calculation between the 2

I am working on a script that will run a pm2 list and assign it to a variable, wait X seconds and run it again assigning it to a different variable. Then I run those through a comm <(echo "$pm2_1") <(echo "$pm2_2") -3 that gives me only the output that is different between the 2 in a nice format
name ID restart count
prog-name 0 1
prog-name 0 2
prog-name-live 10 1
prog-name-live 10 8
prog-name-live 3 1
prog-name-live 3 4
prog-name-live 6 1
prog-name-live 6 6
What I need is a way to compare the restart counts on the 2 lines with similar IDs.. EX
name ID restart count
prog-name 0 1
prog-name 0 2
prog-name-worker 10 1
prog-name-worker 10 8
Any ideas would be very helpful!
Thanks
awk supports hash hope that helps
awk '{k=$1" "$2; a[k]=$3; print k, a[k]}'
here is example of using it to find difference, you can try any logic
awk '{k=$1" "$2; if (a[k]==0)a[k]=$3; else {a[k]-=$3; q=a[k]>0?a[k]:a[k]*-1;print k,q}}'

How do I filter tab-separated input by the count of fields with a given value?

My data(tab separated):
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
...
how can i grep the lines with exact, for example, 5 '1's,
ideal output:
1 0 0 1 0 1 1 0 1
Also, how can i grep lines with equal or more than (>=) 5 '1's,
ideal output:
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
i tried,
grep 1$'\t'1$'\t'1$'\t'1$'\t'1
however this will only output consecutive '1's, which is not all i want.
i wonder if there will be any simple method to achieve this, thank you!
John Bollinger's helpful answer and anishane's answer show that it can be done with grep, but, as has been noted, that is quite cumbersome, given that regular expression aren't designed for counting.
awk, by contrast, is built for field-based parsing and counting (often combined with regular expressions to identify field separators, or, as below, the fields themselves).
Assuming you have GNU awk, you can use the following:
Exactly 5 1s:
awk -v FPAT='\\<1\\>' 'NF==5' file
5 or more 1s:
awk -v FPAT='\\<1\\>' 'NF>=5' file
Special variable FPAT is a GNU awk extension that allows you to identify fields via a regex that describes the fields themselves, in contrast with the standard approach of using a regex to define the separators between fields (via special variable FS or option -F):
'\\<1\\>' identifies any "isolated" 1 (surrounded by non-word characters) as a field, based on word-boundary assertions \< and \>; the \ must be doubled here so that the initial string parsing performed by awk doesn't "eat" single \s.
Standard variable NF contains the count of input fields in the line at hand, which allows easy numerical comparison. If the conditional evaluates to true, the input line at hand is implicitly printed (in other words: NF==5 is implicitly the same as NF==5 { print } and, more verbosely, NF==5 { print $0 }).
A POSIX-compliant awk solution is a little more complicated:
Exactly 5 1s:
awk '{ l=$0; gsub("[\t0]", "") }; length($0)==5 { print l }' file
5 or more 1s:
awk '{ l=$0; gsub("[\t0]", "") }; length($0)>=5 { print l }' file
l=$0 saves the input line ($0) in its original form in variable l.
gsub("[\t0]", "") replaces all \t and 0 chars. in the input line with the empty string, i.e., effectively removes them, and only leaves (directly concatenated) 1 instances (if any).
length($0)==5 { print l } then prints the original input line (l) only if the resulting string of 1s (i.e., the count of 1s now stored in the modified input line ($0)) matches the specified count.
You can use grep. But that would be an abuse of regex.
$ cat countme
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0
$ grep -P '^[0\t]*(1[0\t]*){5}[0\t]*$' countme # Match exactly 5
1 0 0 1 0 1 1 0 1
$ grep -P '^[0\t]*(1[0\t]*){5,}[0\t]*$' countme # Match >=5
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
You can do this to get lines with exactly five '1's:
grep '^[^1]*\(1[^1]*\)\{5,5\}[^1]*$'
You can simplify that to this for at least five '1's:
grep '\(1[^1]*\)\{5,\}'
The enumerated quantifier (\{n,m\}) enables you to conveniently specify a particular number or range of numbers of consecutive matches to a sub-pattern. To avoid matching lines with extra matches to such a pattern, however, you must also anchor it to the beginning and end of the line.
The other other trick is to make sure the gaps previous to the first 1, between the 1s, and after the last 1 are matched. In your case, all of those gaps can be represented pretty simply as ranges of zero or more characters other than 1: [^1]*. Putting those pieces together gives you the above regular expressions.
Do
sed -nE '/^([^1]*1[^1]*){5}$/p' your_file
for exactly 5 matches and
sed -nE '/^([^1]*1[^1]*){5,}$/p' your_file
for 5 or more matches.
Note: In GNU sed you may not see the -E option in the manpage, but it is supported. Using -E is for portability to, say, Mac OSX.
with perl
$ perl -ane 'print if (grep {$_==1} #F) == 5' ip.txt
1 0 0 1 0 1 1 0 1
$ perl -ane 'print if (grep {$_==1} #F) >= 5' ip.txt
1 0 0 1 0 1 1 0 1
1 1 0 1 0 1 0 1 1
1 1 1 1 1 1 1 1 1
-a to automatically split input line on whitespaces and save to #F array
grep {$_==1} #F returns array with elements from #F array which are exactly equal to 1
(grep {$_==1} #F) == 5 in scalar context, comparison will be done based on number of elements of array
See http://perldoc.perl.org/perlrun.html#Command-Switches for details on -ane options

Another split file in bash - based on difference between rows of column x

Hello stackoverflow users!
Generally I would like to tune up script I am using, just to make it more insensitive to missing data.
My example data looks like this (tab delimited csv file with headers):
ColA ColB ColC
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
I use awk script found elsewhere, as follows:
awk 'BEGIN { fn=0 }
NR==1 { next }
NR==2 { delim=$2 }
$2 == delim {
f=sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f }'
Which gives me output I want - omit 1st line, find 2nd column and set delimiter - in this example it will be '0':
file_no00.txt
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
file_no01.txt
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
file_no02.txt
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
To make the script more robust (imagine that rows with 0's are deleted) I would need to split file according to the subtracted value of rows 'n+1' and 'n' if this value is below 0 split file, so basically if (value_row_n+1)-value_row_n < 0 then split file. Of course I would need also to maintain the file naming. Preferred way is bash with awk use. Any advices? Thanks in advance!
Cheers!
Here is awk command that you can use:
cat file
ColA ColB ColC
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100
awk 'NR == 1 {
next
}
!p || $2 < p {
f = sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{
p = $2;
print $0 > f
}' file
I suggest small modifications to your current script:
awk 'BEGIN { fn=0; f=sprintf("file_no%02d.txt",fn++); print "Creating " f }
NR==1 { next }
NR==2 { delim=$2 }
$2 - delim < 0 {
f=sprintf("file_no%02d.txt",fn++);
print "Creating " f
}
{ print $0 > f; delim = $2 }' infile
First, create the first file name just before starting the processing.
Second, in last condition save the value of current line to compare with the value of next line.
Third, instead the comparison with zero, do the substraction between previous value and current one to check if result is less than zero.
It yields:
==> file_no00.txt <==
6 0 0
3 5.16551 12.1099
1 10.2288 19.4769
6 20.0249 30.6543
3 30.0499 40.382
1 59.9363 53.2281
2 74.9415 57.1477
2 89.9462 61.3308
6 119.855 64.0319
==> file_no01.txt <==
4 0 0
8 5.06819 46.8086
6 10.0511 60.1357
9 20.0363 71.679
6 30.0228 82.1852
6 59.8738 98.4446
3 74.871 100.648
1 89.9973 102.111
6 119.866 104.148
==> file_no02.txt <==
3 0 0
1 5.07248 51.9168
2 9.92203 77.3546
2 19.9233 93.0228
6 29.9373 98.7797
6 59.8709 100.518
6 74.7751 100.056
3 89.9363 99.5933
1 119.872 100

Resources