count number of a specific string on consecutive lines - bash

I'm rather new to Linux and would like to use a shell script to count the number of times a specific string appears in concurrent lines of text.
For example, I have a log with data similar to this:
active node table^M
-------------------^M
pkey vlan master-s/n gateway-s/n gateway-prio if-name gateway name advertised ip
0x7fff 0 N/A 0xa0600 100 if0 DA2WIBL1-25-io 10.126.144.49
0x7fff 0 N/A 0xa0580 100 if0 DA2WIBL1-31-io 10.126.144.51
0x7fff 0 N/A 0xa0400 100 if0 DA2WIBL1-28-io 10.126.144.50
active node table
-------------------
I want to simply count the number of concurrent lines containing the string: '0x7fff', and write that to a file.
Does anyone have an idea of how to wisely approach this? I know some shell, expect and similar scripting languages.

If you mean "consecutive" by "concurrent", then you normally use uniq command to group consecutive lines, uniq -c to group and count them simultaneously:
cat logfile | awk '{print $1}' | uniq -c | grep '0x7fff' | awk '{print $1}'
Given content of logfile as:
0x7fff 0 N/A 0xa0600 100 if0 DA2WIBL1-25-io 10.126.144.49
0x7fff 0 N/A 0xa0580 100 if0 DA2WIBL1-31-io 10.126.144.51
0x7fff 0 N/A 0xa0400 100 if0 DA2WIBL1-28-io 10.126.144.50
0x8b5f 0 N/A 0xa0600 100 if0 DA2WIBL1-25-io 10.126.144.49
0x7fff 0 N/A 0xa0400 100 if0 DA2WIBL1-28-io 10.126.144.50
, above command will produce the following output:
3
1

grep "0x7fff" logfilename.log | wc -l
the above code could help finding the count of number of lines in which the pattern or the string valye "0x7fff" appears.

$ awk -v key='0x7fff' '{if($1==key) c++; else {print c; c=0}}
END {print c}' file
test using #gudok's sample with
3
1

Related

Replacing the value of specific field in a table-like string stored as bash variable

I am looking for a way to replace (with 0) a specific value (1043252782) in a "table-like" string stored as a bash variable. The output of echo "$var"looks like this:
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 1043252782
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
After the replacement echo "$var" should look like this:
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 0
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
Is there a way to do this without saving the content of $var to a file and directly manipulating it within the bash (shell script)?
Maby with awk? I can select the value in the 10th field of the second record with awk and pattern matching ("7 Seek_Error_Rate ....") like this:
echo "$var" | awk '/^ 7/{print $10}'
Maby there is some way doing it with awk (or other cli-tool) to replace it and store it back into $var? Also, the value changes over time, but the structure remains the same (some record at the 10th field).
You can change a specific string directly in the shell:
var=${var/1043252782/0}
To replace final number of second line, you could use awk or sed:
var=$(awk 'NR==2 { sub(/[0-9]+$/,0) }1' <<<"$var")
var=$(sed '2s/[0-9][0-9]*$/0/' <<<"$var")
If you don't know which line it will be, you can match a known string:
var=$(awk '/Seek_Error_Rate/{ sub(/[0-9]+$/,0) }1' <<<"$var")
var=$(sed '/Seek_Error_Rate/s/[0-9][0-9]*$/0/' <<<"$var")
You can use a here-string to feed the variable as input to awk.
Use sub() to perform a regular expression replacement.
var=$(awk '{sub(/1043252782$/, "0")}1' <<<"$var")
Using sed
$ var=$(sed '/1043252782$/s//0/' <<< "$var")
$ echo "$var"
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 0
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
if you don't wanna ruin formatting of tabs and spaces :
{m,g}wk NF=NF FS=' 1043252782$' OFS=' 0'
:
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 090 060 045 Pre-fail Always - 0
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
or doing the whole file in one single shot :
awk NF=NF FS=' 1043252782\n' OFS=' 0\n' RS='^$' ORS=
awk NF=NF FS=' 1043252782\n' OFS=' 0\n' RS= -- (This might work too but I'm not too well versed in any side effects for blank RS)

How to find sum of elements in column inside of a text file (Bash)

I have a log file with lots of unnecessary information. The only important part of that file is a table which describes some statistics. My goal is to have a script which will accept a column name as argument and return the sum of all the elements in the specified column.
Example log file:
.........
Skipped....
........
WARNING: [AA[409]: Some bad thing happened.
--- TOOL_A: READING COMPLETED. CPU TIME = 0 REAL TIME = 2
--------------------------------------------------------------------------------
----- TOOL_A statistics -----
--------------------------------------------------------------------------------
NAME Attr1 Attr2 Attr3 Attr4 Attr5
--------------------------------------------------------------------------------
AAA 885 0 0 0 0
AAAA2 1 0 2 0 0
AAAA4 0 0 2 0 0
AAAA8 0 0 2 0 0
AAAA16 0 0 2 0 0
AAAA1 0 0 2 0 0
AAAA8 0 0 23 0 0
AAAAAAA4 0 0 18 0 0
AAAA2 0 0 14 0 0
AAAAAA2 0 0 21 0 0
AAAAA4 0 0 23 0 0
AAAAA1 0 0 47 0 0
AAAAAA1 2 0 26 0
NOTE: Some notes
......
Skipped ......
The expected usage script.sh Attr1
Expected output:
888
I've tried to find something with sed/awk but failed to figure out a solution.
tldr;
$ cat myscript.sh
#!/bin/sh
logfile=${1}
attribute=${2}
field=$(grep -o "NAME.\+${attribute}" ${logfile} | wc -w)
sed -nre '/NAME/,/NOTE/{/NAME/d;/NOTE/d;s/\s+/\t/gp;}' ${logfile} | \
cut -f${field} | \
paste -sd+ | \
bc
$ ./myscript.sh mylog.log Attr3
182
Explanation:
assign command-line arguments ${1} and ${2} to the logfile and attribute variables, respectively.
with wc -w, count the quantity of words within the line that
contains both NAME and ${attribute} (the field index) and assign it to field
with sed
suppress automatic printing (-n) and enable extended regular expressions (-r)
find lines between the NAME and NOTE lines, inclusive
delete the lines that match NAME and NOTE
translate each contiguous run of whitespace to a single tab and print the result
cut using the field index
paste all numbers as an infix summation
evaluate the infix summation via bc
Quick and dirty (without any other spec)
awk -v CountCol=2 '/^[^[:blank:]]/ && NF == 6 { S += $( CountCol) } END{ print S + 0 }' YourFile
with column name
awk -v ColName='Attr1' '/^[[:blank:]]/ && NF == 6 { for(i=1;i<=NF;i++){if ( $i == ColName) CountCol = i } /^[^[:blank:]]/ && NF == 6 && CountCol{ S += $( CountCol) } END{ print S + 0 }' YourFile
you should add a header/trailer filter to avoid noisy line (a flag suit perfect for this) but lack of info about structure to set this flag, i use sthe simple field count (assuming text field have 0 as value so not changing the sum when taken in count)
$ awk -v col='Attr3' '/NAME/{for (i=1;i<=NF;i++) f[$i]=i} col in f{sum+=$(f[col]); if (!NF) {print sum+0; exit} }' file
182

how shell script merging three files by lines and calculating some value, meeting some condition

while read line1
do
while read line2
do
while read line3
do echo "$line1, $line2, $line3" | awk -F , ' $1==$5 && $6==$11 && $10==$12 {print $1,",",$2,",",$3,",",$4,",",$6,",",$7,",",$8,",",$9,",",$10,",",$13,",",$14,",",$15}' >>out.txt
done < grades.csv
done < subjects.csv
done < students.csv
In this code i am merging three files by line(cross product) and if any merged line meets the condition "$1==$5 && $6==$11 && $10==$12", I am printing them in the output file.
Now my problem is i want to keep adding "$13" field values for each iteration if it meets the condition.
How can I do this? Please help.
Here is the sample files.
gardes.csv containes lines :
1,ARCH,1,90,very good,80
1,ARCH,2,70,good,85
1,PLNG,1,89,very good,85
subjects.csv contains lines :
1,ARCH,Computer Architecture,A,K.Gose
1,PLNG,Programming Languages,A,P.Yang
1,OS,Operating System,B,K.Gopalan
2,ARCH,Computer Architecture,A,K.Gose
students.csv contains lines:
1,pankaj,vestal,986-654-32
2,satadisha,binghamton,879-876-54
5,pankaj,vestal,986-654-32
6,pankaj,vestal,986-654-31
This is the expected output:
ARCH 1 pankaj vestal 986-654-32 Computer Architecture A K.Gose 1 1 90 very good 80
ARCH 1 pankaj vestal 986-654-32 Computer Architecture A K.Gose 1 2 70 good 85
ARCH 2 satadisha binghamton 879-876-54 Computer Architecture A K.Gose 1 1 90 very good 80
ARCH 2 satadisha binghamton 879-876-54 Computer Architecture A K.Gose 1 2 70 good 85
PLNG 1 pankaj vestal 986-654-32 Programming Languages A P.Yang 1 1 89 very good 85
Also I need the sum of (90+70+90+70+89) in another shell variable which can be written to a file.
Assuming you have joined the columns to form a TSV (tab-separated values) file or stream, and that columns $k1, $k2, and $k3 (in that file or stream) form the key, and that you want to sum column $s in the join, here is the awk command you can use to form a TSV listing of the keys and sum:
awk -F\\t -v k1=$k1 -v k2=$k2 -v k3=$k3 '
BEGIN{t=OFS="\t"}
{ key=$k1 t $k2 t $k3; sum[key]+=$s }
END {for (key in sum) {print key, sum[key] } }'
(Using awk to process CSV files that might contain commas is asking for trouble, so I've illustrated how to use awk with tabs.)
You can use the join to create your expanded data and operate with awk on it.
$ join -t, -1 5 -2 2 <(join -t, -j 1 file3 file2 | sort -t, -k5,5) file1 | column -s, -t
ARCH 1 pankaj vestal 986-654-32 Computer Architecture A K.Gose 1 1 90 very good 80
ARCH 1 pankaj vestal 986-654-32 Computer Architecture A K.Gose 1 2 70 good 85
ARCH 2 satadisha binghamton 879-876-54 Computer Architecture A K.Gose 1 1 90 very good 80
ARCH 2 satadisha binghamton 879-876-54 Computer Architecture A K.Gose 1 2 70 good 85
PLNG 1 pankaj vestal 986-654-32 Programming Languages A P.Yang 1 1 89 very good 85
alternatively, you can do the join in awk as well, eliminating the while loops.
If you want to add the values in $11.
$ join -t, -1 5 -2 2 <(join -t, -j 1 file3 file2
| sort -t, -k5,5) file1 | awk -F, '{sum+=$11} END{print sum}'
To assign the result to a shell variable
$ sum=$(join ... )

Divide column values of different files by a constant then output one minus the other

I have two files of the form
file1:
#fileheader1
0 123
1 456
2 789
3 999
4 112
5 131
6 415
etc.
file2:
#fileheader2
0 442
1 232
2 542
3 559
4 888
5 231
6 322
etc.
How can I take the second column of each, divide it by a value then minus one from the other and then output a new third file with the new values?
I want the output file to have the form
#outputheader
0 123/c-422/k
1 456/c-232/k
2 789/c-542/k
etc.
where c and k are numbers I can plug into the script
I have seen this question: subtract columns from different files with awk
But I don't know how to use awk to do this by myself, does anyone know how to do this or could explain what is going on in the linked question so I can try to modify it?
I'd write:
awk -v c=10 -v k=20 ' ;# pass values to awk variables
/^#/ {next} ;# skip headers
FNR==NR {val[$1]=$2; next} ;# store values from file1
$1 in val {print $1, (val[$1]/c - $2/k)} ;# perform the calc and print
' file1 file2
output
0 -9.8
1 34
2 51.8
3 71.95
4 -33.2
5 1.55
6 25.4
etc. 0

Calculate numbers in file by bash script

I have file with the following content.
e.g. 2 images with FRAGMENT size. I want to calculate the fragments total size in bash script.
IMAGE admindb1 8 admindb1_1514997916 bus 4 Default-Application-Backup 2 3 1 1517676316 0 0
FRAG 1 1 10784 0 2 6 2 HSBRQ2 fuj 65536 329579 1514995208 60 0 *NULL* 1517676316 0 3 1 *NULL*
IMAGE admindb1 8 admindb1_1514995211 bus 4 Default-Application-Backup 2 3 1 1517673611 0 0
FRAG 1 1 13168256 0 2 6 12 HSBQ8I fuj 65536 173783 1514316708 52 0 *NULL* 1517673611 0 3 1 *NULL*
FRAG 1 2 24288384 0 2 6 1 HSBRJ7 fuj 65536 2 1514995211 65 0 *NULL* 0 0 3 1 *NULL*
FRAG 1 3 24288384 0 2 6 1 HSBRON fuj 65536 2 1514995211 71 0 *NULL* 0 0 3 1 *NULL*
FRAG 1 4 13806752 0 2 6 1 HSBRRK fuj 65536 2 1514995211 49 0 *NULL* 0 0 3 1 *NULL*
Output should be like this:
For Image admindb1_1514997916 total size is 10784
For Image admindb1_1514995211 total size is 75551776
4th column at line which is beginning with FRAG should be calculated.
My script is not working:
#!/bin/bash
file1=/home/turgun/Desktop/IBteck/script-last/frags
imagelist=/home/turgun/Desktop/IBteck/script-last/imagelist
counter=1
for counter in `cat $imagelist`
do
n=`awk '/'$counter'/{ print NR; exit }' $file1`
for n in `cat $file1`
do
if [[ ! $n = 'IMAGE' ]]; then
echo "For Image $counter total size is " \
`grep FRAG frags | awk '{print total+=$4}'`
fi
done
done
awk 'function summary() {print "For Image",image,"total size is",sum}
$1=="IMAGE" {image=$4}
$1=="FRAG" {sum+=$4}
$1=="" {summary(); sum=0}
END{summary()}' file
Output:
For Image admindb1_1514997916 total size is 10784
For Image admindb1_1514995211 total size is 75551776
I assume that the last line is not empty.
cat -s file.txt | sed -n '/^IMAGE /{s:^[^ ]* *[^ ]* *[^ ]* *\([^ ]*\).*$:echo -n "For image \1 total size is "; echo `echo ":;s:\*::g;p};/^$/{s:^:0"`|bc:;p};/^FRAG /{s:^[^ ]* *[^ ]* *[^ ]* *\([^ ]*\).*$:\1\+:;s:\*::g;p};' | bash
Output:
For image admindb1_1514997916 total size is 10784
For image admindb1_1514995211 total size is 75551776
Awk solution:
awk '/^IMAGE/{
if (t) { printf "For image %s total size is %d\n",img,t; t=0 } img=$4
}
/^FRAG/{ t+=$4 }
END{ if (t) printf "For image %s total size is %d\n",img,t }' file
The output:
For image admindb1_1514997916 total size is 10784
For image admindb1_1514995211 total size is 75551776
Gnarly combo of cut, GNU sed, (with a careless use of evaluate), and datamash:
cut -d' ' -f4 file | datamash --no-strict transpose |
sed 's#\tN/A\t#\n#g;y#\t# #' |
sed 's/ \(.*\)/ $((\1))/;y/ /+/;s/+/ /;
s/\(.*\) \(.*\)/echo For image \1 total size is \2/e'
Output:
For image admindb1_1514997916 total size is 10784
For image admindb1_1514995211 total size is 75551776
Cyrus's answer is just better. This answer shows some limits of sed. Also if the data file is huge, (say millions of numbers to sum), the evaluate used to farm out the addition to the shell would probably exceed its command line length limit.

Resources