How to sum the column with matching characters in a specific location? - bash

I would like to add the output of du for all sub folders with the certain same subfolder characters.
I have tried (example)
du -s /aa/bb/cc/*/ | sort -k2.11,2.14
where I got the output sorted
2000 /aa/bb/cc/1234/
1000 /aa/bb/dd/1234/
2000 /aa/bb/ff/1234/
2000 /aa/bb/cc/5678/
2000 /aa/bb/dd/5678/
3000 /aa/bb/ee/5678/
1000 /aa/bb/gg/5678/
Now I would like to add all the ones with 1234 and 5678
Expected result
5000 -- 1234
8000 -- 5678

You can use awk to store all the content of first filed into an array a using key of 2nd last field.
du -s /aa/bb/cc/*/ | sort -k2.11,2.14 |awk -F'/' '{a[$(NF-1)]+=$1}END{for(i in a) print a[i],i}'
8000 5678
5000 1234

Related

A UNIX Command to Find the Name of the Student who has the Second Highest Score

I am new to Unix Programming. Could you please help me to solve the question.
For example, If the input file has the below content
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
The output will be
ABC
I tried something like this
sort -k3,3 -rn -t" " | head -n2 | awk '{print $2}'
Using awk
awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}'
Demo:
$cat file.txt
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
$awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}' file.txt
ABC
$
Explanation:
NR>1 --> Skip first record
{arr[$3]=$2} --> Create associtive array with marks as index and name as value
END <-- read till end of file
n=asorti(arr,arr_sorted) <-- Sort array arr on index value(i.e marks) and save in arr_sorted. n= number of element in array
print arr[arr_sorted[n-1]]} <-- n-1 will point to second last value in arr_sorted (i.e marks) and print corresponding value from arr
Your attempt is 90% correct just a single change
Try this...it will work.
sort -k3,3 -rn -t" " | head -n1 | awk '{print $2}'
Instead of using head -n2 replace it with head -n1

How to put pivot table using Shell script

I have data in a CSV file as below...
Emailid Storeid
a#gmail.com 2000
b#gmail.com 2001
c#gmail.com 2000
d#gmail.com 2000
e#gmail.com 2001
I am expecting below output, basically finding out how many email ids are there for each store.
StoreID Emailcount
2000 3
2001 2
So far i tried to solve my issue
IFS=","
while read f1 f2
do
awk -F, '{ A[$1]+=$2 } END { OFS=","; for (x in A) print x,A[x]; }' > /home/ec2-user/storewiseemials.csv
done < temp4.csv
With the above shell script i am not getting desired output, Can you guys please help me?
Using miller (https://github.com/johnkerl/miller) and starting from this (I have used a CSV, because I do not know if you use a tab or a white space as separator)
Emailid,Storeid
a#gmail.com,2000
b#gmail.com,2001
c#gmail.com,2000
d#gmail.com,2000
e#gmail.com,2001
and running
mlr --csv count-distinct -f Storeid -o Emailcount input >output
you will have
+---------+------------+
| Storeid | Emailcount |
+---------+------------+
| 2000 | 3 |
| 2001 | 2 |
+---------+------------+

Sort command strange behaviour

Input file: salary.txt
1 rob hr 10000
2 charls it 20000
4 kk Fin 30000
5 km it 30000
6 kl it 30000
7 mark hr 10000
8 kc it 30000
9 dc fin 40000
10 mn hr 40000
3 abi it 20000
objective: find all record with second highest salary where 4rthcolumn is salary (space separated record)
I ran two similar commands but the output is entirely different, What is that I am missing?
Command1 :
sort -nr -k4,4 salary.txt | awk '!a[$4]{a[$4]=$4;t++}t==2'
output:
8 kc it 30000
6 kl it 30000
5 km it 30000
4 kk Fin 30000
command2:
cat salary.txt | sort -nr -k4,4 | awk '!a[$4]{a[$4]=$4;t++}t==2' salary.txt
output:
2 charls it 20000
the difference in the two commands is only the way salary.txt is read but why the output is entirely different
Because in the second form awk will read directly from salary.txt - which you are passing as the name of the input file - ignoring the output from sort that you are passing to stdin. Leave out the final salary.txt in command2 and you'll see that the output matches that of command1. In fact, sort behaves the same way and the forms:
cat salary.txt | sort
echo "string that will be ignored" | sort salary.txt
will both yield the exact same output.
In your second command does not, awk does not read from stdin. If you change it to
cat salary.txt | sort -nr -k4,4 | awk '!a[$4]{a[$4]=$4;t++}t==2'
you get the same result

Assign the output of awk to a specific column in a file

I have the following example output from a log file where im trying to get the reverse pointer records for the IP Addresses in column 7 below
2017-01-09 11:25:22.421 0.306 TCP 192.168.1.2:50599 -> 192.0.2.25:443 500 20000 1
2017-01-09 11:30:11.210 0.000 TCP 192.168.1.2:50503 -> 192.0.2.25:443 100 4000 1
2017-01-09 09:01:22.546 0.000 TCP 192.169.1.2:50307 -> 192.0.2.25:443 100 4000 1
If I run this awk command I can extract the reverse records for column 7:
cat test.txt | awk '{print $7}'| grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+'| xargs -I % bash -c 'echo "$(dig -x % +short)"'
How do I get the output from the above command to replace whats in column 7 to update it so it will read for example:
2017-01-09 11:25:22.421 0.306 TCP google.com -> 192.0.2.25:443 500 20000 1
2017-01-09 11:30:11.210 0.000 TCP -> 192.0.2.25:443 100 4000 1
2017-01-09 09:01:22.546 0.000 TCP yahoo.com -> 192.0.2.25:443 100 4000 1
Using awk only:
$ awk '{split($7,a,":"); r=""; c="dig -x " a[1] " +short"; c|getline r; $7=r} 1' file
split by : to get the ip from $7 to a[1]
construct the dig command for shell to c var
execute it and store result to r
replace $7 with r and print with 1
Not showing any example output as the test file didn't have ips that would return any reverse.

Count query in shell

I have a file with many entries like
asd 13
dsa 14
ert 10
ghj 78
... and many entries like this
We can consider it to be key and count pair. Key entries are distinct.
I need top 6 Keys and their count.
WHAT HAVE I DONE: I dont know how to sort it on the basis of count. If I can get to that, I can print top 6.
sort -nrk2 | head -6
numeric sort
reverse sort
sort by field 2
get top 6
cat c.txt|awk '{print $2" "$1}'|sort -nr|head -6
Assuming file name as c.txt

Resources