Find most frequent line in file in bash

Find most frequent line in file in bash - bash

Suppose I have a file similar to as follows:
Abigail 85
Kaylee 25
Kaylee 25
kaylee
Brooklyn
Kaylee 25
kaylee 25
I would like to find the most repeated line, the output must be just the line.
I've tried
sort list | uniq -c
but I need clean output, just the most repeated line (in this example Kaylee 25).

Kaizen ~
$ sort zlist | uniq -c | sort -r | head -1| xargs | cut -d" " -f2-
Kaylee 25
does this help ?

IMHO, none of these answers will sort the results correctly. The reason is that sort, without the -n, option will sort like this "1 10 11 2 3 4", etc., instead of "1 2 3 4 10 11 12". So, add -n like so:
sort zlist | uniq -c | sort -n -r | head -1
You can then, of course, pipe that to either xargs or sed as described earlier.

awk -
awk '{a[$0]++; if(m<a[$0]){ m=a[$0];s[m]=$0}} END{print s[m]}' t.lis

$ uniq -c list | sort -r | head -1 | awk '{$1=""}1'
Kaylee 25
Is this what you're looking for?

Related

Inconsistency in output field separator

We have to find the difference(d) Between last 2 nos and display rows with the highest value of d in ascending order
INPUT
1 | Latha | Third | Vikas | 90 | 91
2 | Neethu | Second | Meridian | 92 | 94
3 | Sethu | First | DAV | 86 | 98
4 | Theekshana | Second | DAV | 97 | 100
5 | Teju | First | Sangamithra | 89 | 100
6 | Theekshitha | Second | Sangamithra | 99 |100
Required OUTPUT
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
awk 'BEGIN{FS="|";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
Output:
4 $ Theekshana $ Second $ DAV $ 97 $ 100$3
5 $ Teju $ First $ Sangamithra $ 89 $ 100$11
3 $ Sethu $ First $ DAV $ 86 $ 98$12
As you can see there is space before and after $ sign but for the last column (avg) there is no space, please explain why its happening
2)
awk 'BEGIN{FS=" | ";OFS="$";}{
avg=sqrt(($5-$6)^2)
print $1,$2,$3,$4,$5,$6,avg
}'|sort -nk7 -t "$"| tail -3
OUTPUT
4$|$Theekshana$|$Second$|$0
5$|$Teju$|$First$|$0
6$|$Theekshitha$|$Second$|$0
I have not mentiond | as the output field separator but still it appears, why is this happening and the difference is zero too
I am just 6 days old in unix,please answer even if its easy

your field separator is only the pipe symbol, so surrounding whitespace is part of the field definitions and that's what you see in the output. In combined uses pipe has the regex special meaning and need to be escaped. In your second case it means space or space is the field separator.
$ awk 'BEGIN {FS=" *\\| *"; OFS="$"}
{d=sqrt(($NF-$(NF-1))^2); $1=$1;
print d "\t" $0,d}' file | sort -n | tail -3 | cut -f2-
4$Theekshana$Second$DAV$97$100$3
5$Teju$First$Sangamithra$89$100$11
3$Sethu$First$DAV$86$98$12
a slight rewrite will eliminate the number of fields dependency and fixes the format.

From awk output, how to cut or trim characters in columns

At the moment
I want to trim .fmbi1a5nn9sp5o4qy3eyazeq5.eddvrl9sa8t448pb38vibj8ef: and .ilwio0k43fgqt4jqzyfadx19v: so the output take less space :)
First step:
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | column -t
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5.eddvrl9sa8t448pb38vibj8ef: Up 7 days
mon_prometheus.1.ilwio0k43fgqt4jqzyfadx19v: Up 7 days
I know
I can do something like:
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | rev | cut -d"." -f2- | rev
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5
mon_prometheus.1
The issue
is that I'm losing the other columns :-/
Idea
It would sound logical to do something like this (with awk) but it does not work. Any ideas?
docker ps --format "{{.Names}} : {{.Status}}" | sort -k1 | awk '{(print $1 | rev | cut -d"." -f2- | rev),$2,$3,$4,$5,$6}' | column -t
Thank you in advance!
P

to cut the last dot extension
$ docker ... | sort | awk '{sub(/\.[^.]*$/,"",$1)}1' file | column -t
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5 Up 7 days
mon_prometheus.1 Up 7 days
or, delete anything longer than 20 chars after a dot.
$ ... | sed -e 's/\(\.[a-z0-9:]\{20,\}\)* / /' | column -t
mon_node-exporter Up 7 days
mon_prometheus.1 Up 7 days

Works! This trick will make my life so much easier.
(I removed file)
docker ps --format "{{.Names}}: {{.Status}}" | sort -k1 | awk '{sub(/\.[^.]*$/,"",$1)}1' | column -t;
mon_grafana.1 Up 24 hours
mon_node-exporter.fmbi1a5nn9sp5o4qy3eyazeq5 Up 23 hours
Question #2:
Now how would you proceed to cut the characters after the first dot?
Cheers!

awk length is counting +1

I'm trying, as an exercise, to output how many words exist in the dictionary for each possible length.
Here is my code:
$ awk '{print length}' dico.txt | sort -nr | uniq -c
Here is the output:
...
1799 5
427 4
81 3
1 2
My problem is that awk length count one more letter for each word in my file. The right output should have been:
1799 4
427 3
81 2
1 1
I checked my file and it does not contain any space after the word:
ABAISSA
ABAISSABLE
ABAISSABLES
ABAISSAI
...
So I guess awk is counting the newline as a character, despite the fact it is not supposed to.
Is there any solution? Or something I'm doing wrong?

I'm gonna venture a guess. Isn't your awk expecting "U*X" style newlines (LF), but your dico.txt has Windows style (CR+LF). That easily give you the +1 on all lengths.
I took your four words:
$ cat dico.txt
ABAISSA
ABAISSABLE
ABAISSABLES
ABAISSAI
And ran your line:
$ awk '{print length}' dico.txt | sort -nr | uniq -c
1 11
1 10
1 8
1 7
So far so good. Now the same, but dico.txt with windows newlines:
$ cat dico.txt | todos > dico_win.txt
$ awk '{print length}' dico_win.txt | sort -nr | uniq -c
1 12
1 11
1 9
1 8

Print out the value with the highest number of occurrences in a file

In a bash shell script, I want to go through a list of numbers and then print out the number that occurs most often. If there are several different numbers appearing an equal amount of times, I want to print the highest number. For example, in a file like this:
10
10
10
15
15
20
20
20
20
I want to print the value 20.
How can I achieve this?

If the numbers are in a file, one per line:
sort < myfile | uniq -c | sort -r | head -1
without the count:
A=$(sort < myfile | uniq -c | sort -r | head -1)
set $A
echo $2

You can use this command -
echo 10 10 10 15 15 20 20 20 20 | sed 's/ /\n/g' | sort | uniq -c | sort -V | tail -n 1 | awk '{print $2}'
It will print the number you want.

bash uniq, how to show count number at back

Normally when I do cat number.txt | sort -n | uniq -c , I get numbers like this:
3 43
4 66
2 96
1 97
But what I need is the number shows of occurrences at the back, like this:
43 3
66 4
96 2
97 1
Please give advice on how to change this. Thanks.

Use awk to change the order of columns:
cat number.txt | sort -n | uniq -c | awk '{ print $2, $1 }'

Perl version:
perl -lne '$occ{0+$_}++; END {print "$_ $occ{$_}" for sort {$a <=> $b} keys %occ}' < numbers.txt

Through GNU sed,
cat number.txt | sort -n | uniq -c | sed -r 's/^([0-9]+) ([0-9]+)$/\2 \1/g'

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find most frequent line in file in bash - bash

Kaizen ~ $ sort zlist | uniq -c | sort -r | head -1| xargs | cut -d" " -f2- Kaylee 25 does this help ?

awk - awk '{a[$0]++; if(m<a[$0]){ m=a[$0];s[m]=$0}} END{print s[m]}' t.lis

$ uniq -c list | sort -r | head -1 | awk '{$1=""}1' Kaylee 25 Is this what you're looking for?

Related

Inconsistency in output field separator

From awk output, how to cut or trim characters in columns

awk length is counting +1

Print out the value with the highest number of occurrences in a file

bash uniq, how to show count number at back

Categories

Resources