Finding second highest salary using awk - shell

I have a file as follows
1 rob hr 10000
2 charls it 20000
4 kk Fin 30000
5 km it 30000
6 kl it 30000
7 mark hr 10000
8 kc it 30000
9 dc fin 40000
10 mn hr 40000
3 abi it 20000
where the 4rth column contains the salary of an individuals in column 2. I want to get all the records with 2nd highest salary (or nth highest salary to be general).
Sample output :
4 kk Fin 30000
5 km it 30000
8 kc it 30000
6 kl it 30000
I have tried this :
sort -k4,4 employee.txt | awk 'NR==1{a=$4;next}{if($4>a){print $0 ;exit} else next;}' | a=`awk '{ print $4}'` | awk -v b=$a '$4==b' < cat employee.txt
but this is not giving any output . Any smart suggestions please ?

awk to the rescue!
sort -k4nr file |
awk '!($4 in a){c++; a[$4]} c==2'
4 kk Fin 30000
5 km it 30000
6 kl it 30000
8 kc it 30000

For highest salary you can simply use
sort text.txt | awk '!($4 in a){c++; a[$4]} c==2'

Related

How can we sum the values group by from file using shell script

I have a file where I have student Roll no, Name, Subject, Obtain Marks and Total Marks data:
10 William English 80 100
10 William Math 50 100
10 William IT 60 100
11 John English 90 100
11 John Math 75 100
11 John IT 85 100
How can i get Group by sum (total obtained marks) of every student in shell Shell? I want this output:
William 190
John 250
i have tried this:
cat student.txt | awk '{sum += $14}END{print sum" "$1}' | sort | uniq -c | sort -nr | head -n 10
This is not working link group by sum.
With one awk command:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file
Output
William 190
John 250
If you want to sort the output, you can pipe to sort, e.g. descending by numerical second field:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file | sort -rnk2
or ascending by student name:
awk '{a[$2]+=$4} END {for (i in a) print i,a[i]}' file | sort
You need to use associative array in awk.
Try
awk '{ a[$2]=a[$2]+$4 } END {for (i in a) print i, a[i]}'
a[$2]=a[$2]+$4 Create associate array with $2 as index and sum of values $4 as value
END <-- Process all records
for (i in a) print i, a[i] <-- Print index and value of array
Demo :
$awk '{ a[$2]=a[$2]+$4 } END {for (i in a) print i, a[i]}' temp.txt
William 190
John 250
$cat temp.txt
10 William English 80 100
10 William Math 50 100
10 William IT 60 100
11 John English 90 100
11 John Math 75 100
11 John IT 85 100
$

process second column if first column matches

I just want the second column to be multiplied by exp(3) if the first column matches the parameter I define.
cat inputfile.i
100 2
200 3
300 1
100 5
200 2
300 3
I want the output to be:
100 2
200 60.25
300 1
100 5
200 40.17
300 3
I tried this code:
awk ' $1 == "200" {print $2*exp(3)}' inputfile
but nothing actually shows
you are not printing the unmatched lines, you don't need to quote numbers
$ awk '$1==200{$2*=exp(3)}1' file
100 2
200 60.2566
300 1
100 5
200 40.1711
300 3
Is there a difference between inputfile.i and inputfile?
Anyway, here is my solution for you:
awk '$1 == 200 {printf "%s %.2f\n",$1,$2*exp(3)};$1 != 200 {print $0}' inputfile.i
100 2
200 60.26
300 1
100 5
200 40.17
300 3

Sort command strange behaviour

Input file: salary.txt
1 rob hr 10000
2 charls it 20000
4 kk Fin 30000
5 km it 30000
6 kl it 30000
7 mark hr 10000
8 kc it 30000
9 dc fin 40000
10 mn hr 40000
3 abi it 20000
objective: find all record with second highest salary where 4rthcolumn is salary (space separated record)
I ran two similar commands but the output is entirely different, What is that I am missing?
Command1 :
sort -nr -k4,4 salary.txt | awk '!a[$4]{a[$4]=$4;t++}t==2'
output:
8 kc it 30000
6 kl it 30000
5 km it 30000
4 kk Fin 30000
command2:
cat salary.txt | sort -nr -k4,4 | awk '!a[$4]{a[$4]=$4;t++}t==2' salary.txt
output:
2 charls it 20000
the difference in the two commands is only the way salary.txt is read but why the output is entirely different
Because in the second form awk will read directly from salary.txt - which you are passing as the name of the input file - ignoring the output from sort that you are passing to stdin. Leave out the final salary.txt in command2 and you'll see that the output matches that of command1. In fact, sort behaves the same way and the forms:
cat salary.txt | sort
echo "string that will be ignored" | sort salary.txt
will both yield the exact same output.
In your second command does not, awk does not read from stdin. If you change it to
cat salary.txt | sort -nr -k4,4 | awk '!a[$4]{a[$4]=$4;t++}t==2'
you get the same result

Print first word of first line and last word

I'm trying to parse a log file in SHELL and want to print first word of first line and last word of each line under it.
For instance:
$ grep -A3 "2015-01-22T07" Test.log | grep -A3 "Messages from Summary report is"
2015-01-22T07:36:30 | 9316 | 461 | 50 | Messages from Summary report is :[ Number of C is 1500
Total distance 10 km
Total number of A is 2
Number of B is 2
]
--
2015-01-22T07:37:30 | 9316 | 461 | 50 | Messages from Summary report is :[ Number of C is 1600
Total distance 11 km
Total number of A is 3
Number of B is 3
]
--
2015-01-22T07:38:30 | 9316 | 461 | 50 | Messages from Summary report is :[ Number of C is 1700
Total distance 12 km
Total number of A is 4
Number of B is 4
]
Expected output:
2015-01-22T07:36:30,1500,10 km,2,2
2015-01-22T07:37:30,1600,11 km,3,3
2015-01-22T07:38:30,1700,12 km,4,4
sorry, im new to this site.
cat test1.log
2015-01-22T07:36:30 | 9316 | 461 | 50 | Messages from Summary report is :[
Number of C is 1500
Total distance 10 km
Total number of A is 2
Number of B is 2
]
--
2015-01-22T07:37:30 | 9316 | 461 | 50 | Messages from Summary report is :[
Number of C is 1600
Total distance 11 km
Total number of A is 3
Number of B is 3
]
--
2015-01-22T07:38:30 | 9316 | 461 | 50 | Messages from Summary report is :[
Number of C is 1700
Total distance 12 km
Total number of A is 4
Number of B is 4
]
Re-attempt:
# awk -v RS='\n' -v OFS=, '$1~/^[0-9]{4}-[0-9]{2}-[0-9]{2}T/ {if (s) print s; s=$1; next}
/Total distance/{s = s OFS $(NF-1) " " $NF;next}
NF>2{s = s OFS $NF}
END{print s
}' test1.log
Output
,:[,1500,10 km,2,2,:[,1600,11 km,3,3,:[,1700,12 km,4,4
Check*
# head -1 test.log|cat -vte
2015-01-22T07:36:30 | 9316 | 461 | 50 | Messages from Summary report is :[ $
You can use this awk on your given outpur:
awk -v RS='\r' -v OFS=, '$1~/^[0-9]{4}-[0-9]{2}-[0-9]{2}T/ {if (s) print s; s=$1; next}
/Total distance/{s = s OFS $(NF-1) " " $NF;next}
NF>2{s = s OFS $NF}
END{print s
}' file
2015-01-22T07:36:30,1500,10 km,2,2
2015-01-22T07:37:30,1600,11 km,3,3
2015-01-22T07:38:30,1700,12 km,4,4

Print awk for empty row

I have one problem. I am using this code in bash and awk:
#!/bin/bash
awk 'BEGIN {print "CHR\tSTART\tSTOP\tPOCET_READU\tGCcontent"}'
for z in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
do
export $z
for i in {0..249480000..60000}
do
u=$i
let "u +=60000"
export $i
export $u
samtools view /home/filip/Desktop/AMrtin\ Hynek/highThan89MapQ.bam chr$z:$i-$u | awk '{ n=length($10); print gsub(/[GCCgcs]/,"",$10)/n;}'| awk -v chr="chr"$z -v min=$i -v max=$u '{s+=$1}END{print chr,"\t",min,"\t",max,"\t",NR,"\t",s/NR}}'
done
done
From this I am getting the result like this:
chr1 60000 120000 30 0.333
chr3 540000 600000 10 0.555
The step of loop is 60000, but if I divide s/NR, sometimes the NR is 0 and this row is not in output. Thank I wan to get if the NR=0 and s/NR does not exist (because we cannot divide by 0):
chr1 0 60000 N/A N/A
chr1 60000 120000 30 0.333
chr3 480000 540000 N/A N/A
chr3 540000 600000 10 0.555
I tried use condition like
{s+=$1}END{print chr,"\t",min,"\t",max,"\t",NR,"\t",s/NR; if (S/NR == "") print chr,"\t",min,"\t",max,"\t","N/A","\t","N/A"}'
But it doesnt work.
Could you help me please?
The problem is you're dividing by zero, which is an error. You need to test NR before doing the division.
awk -v chr="chr"$z -v min=$i -v max=$u '
{s+=$1}
END {print chr, "\t", min, "\t", max, "\t", (NR ? NR : "N/A"), "\t", (NR ? s/NR : "N/A")}'

Resources