Print out the value with the highest number of occurrences in a file - bash

In a bash shell script, I want to go through a list of numbers and then print out the number that occurs most often. If there are several different numbers appearing an equal amount of times, I want to print the highest number. For example, in a file like this:
10
10
10
15
15
20
20
20
20
I want to print the value 20.
How can I achieve this?

If the numbers are in a file, one per line:
sort < myfile | uniq -c | sort -r | head -1
without the count:
A=$(sort < myfile | uniq -c | sort -r | head -1)
set $A
echo $2

You can use this command -
echo 10 10 10 15 15 20 20 20 20 | sed 's/ /\n/g' | sort | uniq -c | sort -V | tail -n 1 | awk '{print $2}'
It will print the number you want.

Related

How to get the number of different lines in bash [duplicate]

I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are
in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within
this sorted list. I need to find the count for each unique number in that list.
For e.g. if the output of cmd1 is:
100
100
100
99
99
26
25
24
24
I need another command that I can pipe the above output to, so that, I get:
100 3
99 2
26 1
25 1
24 2
how about;
$ echo "100 100 100 99 99 26 25 24 24" \
| tr " " "\n" \
| sort \
| uniq -c \
| sort -k2nr \
| awk '{printf("%s\t%s\n",$2,$1)}END{print}'
The result is :
100 3
99 2
26 1
25 1
24 2
uniq -c works for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).
if order is not important
# echo "100 100 100 99 99 26 25 24 24" | awk '{for(i=1;i<=NF;i++)a[$i]++}END{for(o in a) printf "%s %s ",o,a[o]}'
26 1 100 3 99 2 24 2 25 1
Numerically sort the numbers in reverse, then count the duplicates, then swap the left and the right words. Align into columns.
printf '%d\n' 100 99 26 25 100 24 100 24 99 \
| sort -nr | uniq -c | awk '{printf "%-8s%s\n", $2, $1}'
100 3
99 2
26 1
25 1
24 2
In Bash, we can use an associative array to count instances of each input value. Assuming we have the command $cmd1, e.g.
#!/bin/bash
cmd1='printf %d\n 100 99 26 25 100 24 100 24 99'
Then we can count values in the array variable a using the ++ mathematical operator on the relevant array entries:
while read i
do
((++a["$i"]))
done < <($cmd1)
We can print the resulting values:
for i in "${!a[#]}"
do
echo "$i ${a[$i]}"
done
If the order of output is important, we might need an external sort of the keys:
for i in $(printf '%s\n' "${!a[#]}" | sort -nr)
do
echo "$i ${a[$i]}"
done
In case you have input stored in my_file you can do:
sort -nr my_file | uniq -c | awk ' { t = $1; $1 = $2; $2 = t; print; } '
Otherwise just pipe the input to be processed to the same cmd.
Explanation:
sort -nr sorts the input numerically (-n) in reverse order (-r)
uniq -c count duplicates and shows the count side-by-side
awk '{ t = $1; $1 = $2; $2 = t; print; }' swaps the two columns

Inconsistency in output field separator

We have to find the difference(d) Between last 2 nos and display rows with the highest value of d in ascending order
INPUT
1 | Latha | Third | Vikas | 90 | 91
2 | Neethu | Second | Meridian | 92 | 94
3 | Sethu | First | DAV | 86 | 98
4 | Theekshana | Second | DAV | 97 | 100
5 | Teju | First | Sangamithra | 89 | 100
6 | Theekshitha | Second | Sangamithra | 99 |100
Required OUTPUT
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
awk 'BEGIN{FS="|";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
Output:
4 $ Theekshana $ Second $ DAV $ 97 $ 100$3
5 $ Teju $ First $ Sangamithra $ 89 $ 100$11
3 $ Sethu $ First $ DAV $ 86 $ 98$12
As you can see there is space before and after $ sign but for the last column (avg) there is no space, please explain why its happening
2)
awk 'BEGIN{FS=" | ";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
OUTPUT
4$|$Theekshana$|$Second$|$0
5$|$Teju$|$First$|$0
6$|$Theekshitha$|$Second$|$0
I have not mentiond | as the output field separator but still it appears, why is this happening and the difference is zero too
I am just 6 days old in unix,please answer even if its easy
your field separator is only the pipe symbol, so surrounding whitespace is part of the field definitions and that's what you see in the output. In combined uses pipe has the regex special meaning and need to be escaped. In your second case it means space or space is the field separator.
$ awk 'BEGIN {FS=" *\\| *"; OFS="$"}
{d=sqrt(($NF-$(NF-1))^2); $1=$1;
print d "\t" $0,d}' file | sort -n | tail -3 | cut -f2-
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
a slight rewrite will eliminate the number of fields dependency and fixes the format.

bash uniq, how to show count number at back

Normally when I do cat number.txt | sort -n | uniq -c , I get numbers like this:
3 43
4 66
2 96
1 97
But what I need is the number shows of occurrences at the back, like this:
43 3
66 4
96 2
97 1
Please give advice on how to change this. Thanks.
Use awk to change the order of columns:
cat number.txt | sort -n | uniq -c | awk '{ print $2, $1 }'
Perl version:
perl -lne '$occ{0+$_}++; END {print "$_ $occ{$_}" for sort {$a <=> $b} keys %occ}' < numbers.txt
Through GNU sed,
cat number.txt | sort -n | uniq -c | sed -r 's/^([0-9]+) ([0-9]+)$/\2 \1/g'

Find most frequent line in file in bash

Suppose I have a file similar to as follows:
Abigail 85
Kaylee 25
Kaylee 25
kaylee
Brooklyn
Kaylee 25
kaylee 25
I would like to find the most repeated line, the output must be just the line.
I've tried
sort list | uniq -c
but I need clean output, just the most repeated line (in this example Kaylee 25).
Kaizen ~
$ sort zlist | uniq -c | sort -r | head -1| xargs | cut -d" " -f2-
Kaylee 25
does this help ?
IMHO, none of these answers will sort the results correctly. The reason is that sort, without the -n, option will sort like this "1 10 11 2 3 4", etc., instead of "1 2 3 4 10 11 12". So, add -n like so:
sort zlist | uniq -c | sort -n -r | head -1
You can then, of course, pipe that to either xargs or sed as described earlier.
awk -
awk '{a[$0]++; if(m<a[$0]){ m=a[$0];s[m]=$0}} END{print s[m]}' t.lis
$ uniq -c list | sort -r | head -1 | awk '{$1=""}1'
Kaylee 25
Is this what you're looking for?

counting duplicates in a sorted sequence using command line tools

I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are
in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within
this sorted list. I need to find the count for each unique number in that list.
For e.g. if the output of cmd1 is:
100
100
100
99
99
26
25
24
24
I need another command that I can pipe the above output to, so that, I get:
100 3
99 2
26 1
25 1
24 2
how about;
$ echo "100 100 100 99 99 26 25 24 24" \
| tr " " "\n" \
| sort \
| uniq -c \
| sort -k2nr \
| awk '{printf("%s\t%s\n",$2,$1)}END{print}'
The result is :
100 3
99 2
26 1
25 1
24 2
uniq -c works for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).
if order is not important
# echo "100 100 100 99 99 26 25 24 24" | awk '{for(i=1;i<=NF;i++)a[$i]++}END{for(o in a) printf "%s %s ",o,a[o]}'
26 1 100 3 99 2 24 2 25 1
Numerically sort the numbers in reverse, then count the duplicates, then swap the left and the right words. Align into columns.
printf '%d\n' 100 99 26 25 100 24 100 24 99 \
| sort -nr | uniq -c | awk '{printf "%-8s%s\n", $2, $1}'
100 3
99 2
26 1
25 1
24 2
In Bash, we can use an associative array to count instances of each input value. Assuming we have the command $cmd1, e.g.
#!/bin/bash
cmd1='printf %d\n 100 99 26 25 100 24 100 24 99'
Then we can count values in the array variable a using the ++ mathematical operator on the relevant array entries:
while read i
do
((++a["$i"]))
done < <($cmd1)
We can print the resulting values:
for i in "${!a[#]}"
do
echo "$i ${a[$i]}"
done
If the order of output is important, we might need an external sort of the keys:
for i in $(printf '%s\n' "${!a[#]}" | sort -nr)
do
echo "$i ${a[$i]}"
done
In case you have input stored in my_file you can do:
sort -nr my_file | uniq -c | awk ' { t = $1; $1 = $2; $2 = t; print; } '
Otherwise just pipe the input to be processed to the same cmd.
Explanation:
sort -nr sorts the input numerically (-n) in reverse order (-r)
uniq -c count duplicates and shows the count side-by-side
awk '{ t = $1; $1 = $2; $2 = t; print; }' swaps the two columns

Resources