getting the sum of the out put in unix [duplicate] - bash

This question already has answers here:
Summing values of a column using awk command
(2 answers)
Closed 5 years ago.
I am trying to get the sum of my output in bash shell using only awk. One of the problems I am getting is that I only need to use awk in this.
This is the code I am using for getting the output:
awk '{print substr($7, 9, 4)}' emp.txt
This is the output I am getting: (output omitted)
7606
6498
7947
4044
1657
3872
4834
8463
9280
2789
9104
this is how I am trying to do the sum of the numbers: awk '(s = s + substr($7, 9, 4)) {print s}' emp.txt
The problem is that it is not giving me the right output (which should be 9942686) but instead giving me the series sum (as shown below).
(output omitted)
9890696
9898643
9902687
9904344
9908216
9913050
9921513
9930793
9933582
9942686
Am I using the code the wrong way? Or is there any other method of doing it with awk and I am doing it the wrong way?
Here is the sample file I am working on:
Brynlee Watkins F 55 Married 2016 778-555-6498 62861
Malcolm Curry M 24 Married 2016 604-555-7947 54647
Aylin Blake F 45 Married 2015 236-555-4044 80817
Mckinley Hodges F 50 Married 2015 604-555-1657 46316
Rylan Dorsey F 51 Married 2017 778-555-3872 77160
Taylor Clarke M 23 Married 2015 604-555-4834 46624
Vivaan Hooper M 26 Married 2016 778-555-8463 80010
Gibson Rowland M 42 Married 2017 236-555-9280 59874
Alyson Mahoney F 51 Single 2017 778-555-2789 71394
Catalina Frazier F 53 Married 2016 604-555-9104 79364
EDIT: I want to get the sum of the numbers that are repeating in the output. Let's say the repeating numbers are 4826 and 0028 in the output and both of them repeated 2 times. I only want the sum of these numbers (each repetition must be counted as the individual. hence these are counted as 4). So the desired output for these 4 numbers shall be 9708
Will Duffy M 33 Single 2017 236-555-4826 47394
Nolan Reed M 27 Single 2015 604-555-0028 46622
Anya Horn F 54 Married 2017 236-555-4826 73270
Cynthia Davenport F 29 Married 2015 778-555-0028 59687
Oscar Medina M 43 Married 2016 778-555-7864 73688
Angelina Herrera F 37 Married 2017 604-555-7910 82061
Peyton Reyes F 35 Married 2017 236-555-8046 51920

END { print s }
Since you only need the total sum printed once, do it under the END pattern.
awk '{s = s + substr($7, 9, 4)} END {print s}' emp.txt

Could you please try following awk and let me know if this helps you. It will look always look for last digits after -:
awk -F' |-' '{sum+=$(NF-1)} END{print sum}' Input_file
EDIT:
awk -F' |-' '
{
++a[$(NF-1)];
b[$(NF-1)]=b[$(NF-1)]?b[$(NF-1)]+$(NF-1):$(NF-1)
}
END{
for(i in a){
if(a[i]>1){
print i,b[i]}
}}
' Input_file
Output will be as follows:
4826 9652
0028 56

Related

How can we sum the values group by from file using shell script

I have a file where I have student Roll no, Name, Subject, Obtain Marks and Total Marks data:
10 William English 80 100
10 William Math 50 100
10 William IT 60 100
11 John English 90 100
11 John Math 75 100
11 John IT 85 100
How can i get Group by sum (total obtained marks) of every student in shell Shell? I want this output:
William 190
John 250
i have tried this:
cat student.txt | awk '{sum += $14}END{print sum" "$1}' | sort | uniq -c | sort -nr | head -n 10
This is not working link group by sum.
With one awk command:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file
Output
William 190
John 250
If you want to sort the output, you can pipe to sort, e.g. descending by numerical second field:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file | sort -rnk2
or ascending by student name:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file | sort
You need to use associative array in awk.
Try
awk '{ a[$2]=a[$2]+$4 } END {for (i in a) print i, a[i]}'
a[$2]=a[$2]+$4 Create associate array with $2 as index and sum of values $4 as value
END <-- Process all records
for (i in a) print i, a[i] <-- Print index and value of array
Demo :
$awk '{ a[$2]=a[$2]+$4 } END {for (i in a) print i, a[i]}' temp.txt
William 190
John 250
$cat temp.txt
10 William English 80 100
10 William Math 50 100
10 William IT 60 100
11 John English 90 100
11 John Math 75 100
11 John IT 85 100
$

AWK or SED Replace space between alphabets in a particular column [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have an infile as below:
infile:
INM00042170 28.2500 74.9167 290.0 CHURU 2015 2019 2273
INM00042182 28.5833 77.2000 211.0 NEW DELHI/SAFDARJUNG 1930 2019 67874
INXUAE05462 28.6300 77.2000 216.0 NEW DELHI 1938 1942 2068
INXUAE05822 25.7700 87.5200 40.0 PURNEA 1933 1933 179
INXUAE05832 31.0800 77.1800 2130.0 SHIMLA 1926 1928 728
PKM00041640 31.5500 74.3333 214.0 LAHORE CITY 1960 2019 22915
I want to replace the space between two words by an underscore in column 5 (example: NEW DELHI becomes NEW_DELHI). I want output as below.
outfile:
INM00042170 28.2500 74.9167 290.0 CHURU 2015 2019 2273
INM00042182 28.5833 77.2000 211.0 NEW_DELHI/SAFDARJUNG 1930 2019 67874
INXUAE05462 28.6300 77.2000 216.0 NEW_DELHI 1938 1942 2068
INXUAE05822 25.7700 87.5200 40.0 PURNEA 1933 1933 179
INXUAE05832 31.0800 77.1800 2130.0 SHIMLA 1926 1928 728
PKM00041640 31.5500 74.3333 214.0 LAHORE_CITY 1960 2019 22915
Thank you
#!/bin/bash
# connect field 5 and 6 and remove those with numbers.
# this returns a list of new names (with underscore) for
# all cities that need to be replaced
declare -a NEW_NAMES=$(cat infile | awk '{print $5 "_" $6}' | grep -vE "_[0-9]")
# iterating all new names
for NEW_NAME in ${NEW_NAMES[#]}; do
OLD_NAME=$(echo $NEW_NAME | tr '_' ' ')
# replace in file
sed -i "s/${OLD_NAME}/${NEW_NAME}/g" infile
done

Sum values based on word in line in bash [duplicate]

This question already has answers here:
Calculate sum of column using reference of other column in awk
(2 answers)
Closed 4 years ago.
I have a script which outputs a number followed by a space and then a letter, e.g.:
2685 R
2435 M
984 D
924 A
353 R
291 A
593 D
577 A
476 R
769 M
629 R
179 D
Is there an easy way to sum the numbers based on the letter/word after the number? I would like this to be the output:
1792 A
1756 D
3204 M
4143 R
I have tried sort, awk and sed combinations but can't find a nice solution to sum.
Thanks.
Following awk may help you here.
awk '{a[$2]+=$1} END{for(i in a){print a[i],i}}' Input_file
In case you want to sort it as per 2nd field then append | sort -k2 to above code too.

Convert multi-line file into TSV using awk

Am using Windows 7 & gawk 3.1.3 (via UnxUtils).
I'd like to turn this input (Liverpool FC's fixtures):
Sunday, 27 November 2011
Barclays Premier League
Liverpool v Man City, 16:00
Tuesday, 29 November 2011
Carling Cup
Chelsea v Liverpool, QF, 19:45
...
into a tab-separated file, such as:
Sunday, 27 November 2011<tab>Barclays Premier League<tab>Liverpool v Man City, 16:00
Tuesday, 29 November 2011<tab>Carling Cup<tab>Chelsea v Liverpool, QF, 19:45
...
I've tried doing this with awk, but failed thus far. Identifying every first and second line is easy enough:
if (NR % 3 == 1 || NR % 3 == 2) print;
but despite many attempts (usually resulting in syntax errors) can't find out how to strip out the (Windows) line-endings and concatenate those with every third line.
I'm now wondering if awk is actually the right tool for the job.
Thanks for any pointers.
awk '(NR % 3) > 0 {printf("%s\t",$0)}
(Nr % 3) == 0 {printf("%s\n",$0)}
Should work. For every line where the modulo of NR (number of records) is not 0 it will print the line and a tab character. Otherwise the (input) line and a newline character.
HTH
see the test below:
kent$ echo "Sunday, 27 November 2011
Barclays Premier League
Liverpool v Man City, 16:00
Tuesday, 29 November 2011
Carling Cup
Chelsea v Liverpool, QF, 19:45
"|awk '{printf $0"\t";if(!(NR%3))print""}'
output:
Sunday, 27 November 2011 Barclays Premier League Liverpool v Man City, 16:00
Tuesday, 29 November 2011 Carling Cup Chelsea v Liverpool, QF, 19:45

using awk to do exact match in a file

i'm just wondering how can we use awk to do exact matches.
for eg
$ cal 09 09 2009
September 2009
Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30
$ cal 09 09 2009 | awk '{day="9"; col=index($0,day); print col }'
17
0
0
11
20
0
8
0
As you can see the above command outputs the index number of all the lines that contain the string/number "9", is there a way to make awk output index number in only the 4th line of cal output above.??? may be an even more elegant solution?
I'm using awk to get the day name using the cal command. here's the whole line of code:
$ dayOfWeek=$(cal $day $month $year | awk '{day='$day'; split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array); column=index($o,day); dow=int((column+2)/3); print array[dow]}')
The problem with the above code is that if multiple matches are found then i get multiple results, whereas i want it to output only one result.
Thanks!
Limit the call to index() to only those lines which have your "day" surrounded by spaces:
awk -v day=$day 'BEGIN{split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array)} $0 ~ "\\<"day"\\>"{for(i=1;i<=NF;i++)if($i == day){print array[i]}}'
Proof of Concept
$ cal 02 1956
February 1956
Su Mo Tu We Th Fr Sa
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29
$ day=18; cal 02 1956 | awk -v day=$day 'BEGIN{split("Sunday Monday Tuesday Wednesday Thursday Friday Saturday", array)} $0 ~ "\\<"day"\\>"{for(i=1;i<=NF;i++)if($i == day){print array[i]}}'
Saturday
Update
If all you are looking for is to get the day of the week from a certain date, you should really be using the date command like so:
$ day=9;month=9;year=2009;
$ dayOfWeek=$(date +%A -d "$day/$month/$year")
$ echo $dayOfWeek
Wednesday
you wrote
cal 09 09 2009
I'm not aware of a version of cal that accepts day of month as an input,
only
cal ${mon} (optional) ${year} (optional)
But, that doesn't affect your main issue.
you wrote
is there a way to make awk output index number in only the 4th line of cal output above.?
NR (Num Rec) is your friend
and there are numerous ways to use it.
cal 09 09 2009 | awk 'NR==4{day="9"; col=index($0,day); print col }'
OR
cal 09 09 2009 | awk '{day="9"; if (NR==4) {col=index($0,day); print col } }'
ALSO
In awk, if you have variable assignments that should be used throughout your whole program, then it is better to use the BEGIN section so that the assignment is only performed once. Not a big deal in you example, but why set bad habits ;-)?
HENCE
cal 09 2009 | awk 'BEGIN{day="9"}; NR==4 {col=index($0,day); print col }'
FINALLY
It is not completely clear what problem you are trying to solve. Are you sure you always want to grab line 4? If not, then how do you propose to solve that?
Problems stated as " 1. I am trying to do X. 2. Here is my input. 3. Here is my output. 4. Here is the code that generated that output" are much easier to respond to.
It looks like you're trying to do date calculations. You can be much more robust and general solutions by using the gnu date command. I have seen numerous useful discussions of this tagged as bash, shell, (date?).
I hope this helps.
This is so much easier to do in a language that has time functionality built-in. Tcl is great for that, but many other languages are too:
$ echo 'puts [clock format [clock scan 9/9/2009] -format %a]' | tclsh
Wed
If you want awk to only output for line 4, restrict the rule to line 4:
$ awk 'NR == 4 { ... }'

Resources