A UNIX Command to Find the Name of the Student who has the Second Highest Score - sorting

I am new to Unix Programming. Could you please help me to solve the question.
For example, If the input file has the below content
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
The output will be
ABC
I tried something like this
sort -k3,3 -rn -t" " | head -n2 | awk '{print $2}'

Using awk
awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}'
Demo:
$cat file.txt
RollNo Name Score
234 ABC 70
567 QWE 12
457 RTE 56
234 XYZ 80
456 ERT 45
$awk 'NR>1{arr[$3]=$2} END {n=asorti(arr,arr_sorted); print arr[arr_sorted[n-1]]}' file.txt
ABC
$
Explanation:
NR>1 --> Skip first record
{arr[$3]=$2} --> Create associtive array with marks as index and name as value
END <-- read till end of file
n=asorti(arr,arr_sorted) <-- Sort array arr on index value(i.e marks) and save in arr_sorted. n= number of element in array
print arr[arr_sorted[n-1]]} <-- n-1 will point to second last value in arr_sorted (i.e marks) and print corresponding value from arr

Your attempt is 90% correct just a single change
Try this...it will work.
sort -k3,3 -rn -t" " | head -n1 | awk '{print $2}'
Instead of using head -n2 replace it with head -n1

Related

Count duplicated couple of lines

I have a configuration file with this format:
cod 11
loc1 23
pto1 33
loc2 55
pto2 66
cod 12
loc1 55
pto1 66
loc2 88
pto2 77
...
I want to count how many times a pair of numbers appear in sequence loc/pto (indipendently of loc/pto number). In the example, the couple 55/66 appears 2 times (once as loc1/pto1 and one as loc2/pto2).
I have googled around and tried some combination of grep, uniq and awk but I only managed in count single line or number duplicated. I read the man documentation of those commands not finding any clue relative to my problem.
You could use the following:
$ sort file | uniq -f1 -dc
2 loc1 55
2 pto1 66
-f1 is skipping the 1st field when comparing lines
-dc is printing duplicate line with its associated count
Despite no visible effort on the part of the OP, this was an interesting question to work out.
awk '{for (i=1 ; i < 10 ; i++) if (NR == i) array[i]=$2} END {for (i=1 ; i < 10 ; i++) print array[i] "," array[i+1]}' file | sort | uniq -c
Output-
1 11,23
1 12,55
1 23,33
1 33,55
2 55,66
1 66,12
1 66,88
1 88,
The output tells you that 55 is followed by 66 twice. Other pairs only occur once.
Explanation-
I define an array in awk whoe elements are the ith number in the second column. The part after the END concatenates the ith and i+1th element. Then there is a sort | uniq -c to see if these pairs occur more than once.
If you want to know how many times a duplicate number appeared in the file:
awk '{print $2}' <filename> | sort | uniq -dc
Output:
2 55
2 66
If you want to know how many times a number appeared in the file regardless of being duplicate or not:
awk '{print $2}' <filename> | sort | uniq -c
Output:
1 11
1 12
1 23
1 33
2 55
2 66
1 77
1 88
If you want to print the full line on duplicate match based on second column:
awk '{print $2}' <filename> | sort | uniq -d | grep -F -f - <filename>
Output:
loc2 55
pto2 66
loc1 55
pto1 66

get subset of table based on unique column values

H- I am looking for a bash/awk/sed solution to get subsets of a table based on unique column values. For example if I have:
chrom1 333
chrom1 343
chrom2 380
chrom2 501
chrom1 342
chrom3 102
I want to be able to split this table into 3:
chrom1 333
chrom1 343
chrom1 342
chrom2 380
chrom2 501
chrom3 102
I know how to do this in R using the split command, but I am specifically looking for a bash/awk/sed solution.
Thanks
I don’t know if this awk is of any use but it will create 3 separate file based on the unique column values:
awk '{print >> $1; close($1)}' file
alternative awk which keeps the original order of records within each block
$ awk '{a[$1]=a[$1]?a[$1] ORS $0:$0}
END{for(k in a) print a[k] ORS ORS}' file
generates
chrom1 333
chrom1 343
chrom1 342
chrom2 380
chrom2 501
chrom3 102
there are 2 trailing empty lines at the end but not displayed in the formatted output.
Using sort and awk:
sort -k1,1 file | awk 'NR>1 && p != $1{print ORS} {p=$1} 1'
EDIT: If you want to keep original order of records from input file then use:
awk -v ORS='\n\n' '!($1 in a){a[$1]=$0; ind[++i]=$1; next}
{a[$1]=a[$1] RS $0}
END{for(k=1; k<=i; k++) print a[ind[k]]}' file
create input list file.txt
(
cat << EOF
chrom1 333
chrom1 343
chrom2 380
chrom2 501
chrom1 342
chrom3 102
EOF
) > file.txt
transfomation
cat file.txt | cut -d" " -f1 | sort -u | while read c
do
cat file.txt | grep "^$c" | sort
echo
done

How to sort columns using bash script? [duplicate]

I have a file full of data in columns
sarah mark john
10 20 5
x y z
I want to sort the data so the columns stay intact but the second row is in increasing order so it looks like this:
john sarah mark
5 10 20
z x y
I've been looking at the sort command but have only been able to find vertical sorting, not horizontal. I'm happy to use any tool, any help is appreciated.
Thank you!
Let's create a function to transpose a file (make rows become columns, and columns become rows):
transpose () {
awk '{for (i=1; i<=NF; i++) a[i,NR]=$i; max=(max<NF?NF:max)}
END {for (i=1; i<=max; i++)
{for (j=1; j<=NR; j++)
printf "%s%s", a[i,j], (j<NR?OFS:ORS)
}
}'
}
This just loads all the data into a bidimensional array a[line,column] and then prints it back as a[column,line], so that it transposes the given input. The wrapper transpose () { } is used to store it as a bash function. You just need to copy paste it in your shell (or in ~/.bashrc if you want it to be a permanent function, available any time you open a session).
Then, by using it, we can easily solve the problem by using sort -n -k2: sort numerically based on column 2. Then, transpose back.
$ cat a | transpose | sort -n -k2 | transpose
john sarah mark
5 10 20
z x y
In case you want to have a nice format as final output, just pipe to column like this:
$ cat a | transpose | sort -n -k2 | transpose | column -t
john sarah mark
5 10 20
z x y
Step by step:
$ cat a | transpose
sarah 10 x
mark 20 y
john 5 z
$ cat a | transpose | sort -n -k2
john 5 z
sarah 10 x
mark 20 y
$ cat a | transpose | sort -n -k2 | transpose
john sarah mark
5 10 20
z x y
Coming from a duplicate question, this would sort the columns by the first row:
#!/bin/bash
input="$1"
order=$((for i in $(head -1 $input); do echo $i; done) | nl | sort -k2 | cut -f1)
grep ^ $input | (while read line
do
read -a columns <<< "${line%"${line##*[![:space:]]}"}"
orderedline=()
for i in ${order[#]}
do
orderedline+=("${columns[$i - 1]}")
done
line=$(printf "\t%s" "${orderedline[#]}")
echo ${line:1}
done)
To sort by second row, replace head -1 $input with head -2 $input | tail -1. If the sort should be numeric, put in sort -n -k2 instead of sort -k2.
Good one-liner gets the job done:
perl -ane '$,=" "; print sort #F; print "\n";' file
I found it here: http://www.unix.com/unix-for-advanced-and-expert-users/36039-horizontal-sorting-lines-file-sed-implementation.html

bash uniq, how to show count number at back

Normally when I do cat number.txt | sort -n | uniq -c , I get numbers like this:
3 43
4 66
2 96
1 97
But what I need is the number shows of occurrences at the back, like this:
43 3
66 4
96 2
97 1
Please give advice on how to change this. Thanks.
Use awk to change the order of columns:
cat number.txt | sort -n | uniq -c | awk '{ print $2, $1 }'
Perl version:
perl -lne '$occ{0+$_}++; END {print "$_ $occ{$_}" for sort {$a <=> $b} keys %occ}' < numbers.txt
Through GNU sed,
cat number.txt | sort -n | uniq -c | sed -r 's/^([0-9]+) ([0-9]+)$/\2 \1/g'

retrieve and add two numbers of files

In my file I have following structure :-
A | 12 | 10
B | 90 | 112
C | 54 | 34
What I have to do is I have to add column 2 and column 3 and print the result with column 1.
output:-
A | 22
B | 202
C | 88
I retrieve the two columns but dont know how to add
What I did is :-
cut -d ' | ' -f3,5 myfile.txt
How to add those columns and display.
A Bash solution:
#!/bin/bash
while IFS="|" read f1 f2 f3
do
echo $f1 "|" $((f2+f3))
done < file
You can do this easily with awk.
awk '{print $1," | ",($3+$5)'} myfile.txt
wil work perhaps.
You can do this with awk:
awk 'BEGIN{FS="|"; OFS="| "} {print $1 OFS $2+$3}' input_filename
Input:
A | 12 | 10
B | 90 | 112
C | 54 | 34
Output:
A | 22
B | 202
C | 88
Explanation:
awk: invoke the awk tool
BEGIN{...}: do things before starting to read lines from the file
FS="|": FS stands for Field Separator. Think of it as the delimiter that separates each line of your file into fields
OFS="| ": OFS stands for Output Field Separator. Same idea as above, but for output. FS =/= OFS in this case due to formatting
{print $1 OFS $2+$3}: For each line that awk reads, print the first field (the letter), followed by a delimiter specified by OFS, then the sum of field 2 and field 3.
input_filename: awk accepts the input file name as an argument here.

Resources