check top 100 domains in email list

check top 100 domains in email list - bash

I have a large email list, and i want to know what is the top 100 domains on this list, example list :
cristiano.ofidiani#libero.it
cristianocurzi70#libero.it
cristianogiustetto#libero.it
cristianopaolieri#fercart.com
cristianoristori#tiscali.it
cristianorollo#tiscali.it
cristianoscavi#alice.it
cristianotradigo#virgilio.it
cristianpassarelli#virgilio.it
cristianprisco#libero.it
cristianriparip#riparifranco.it
cristiansrl.pec#legalmail.it
cristina.arese#vestisolidale.it
cristina.armillotta#coldiretti.it
cristina.bazzi#bazzicroup.it
cristina.bedocchi#tin.it
cristina.benassi#terminalrubiero.com
i need to know the top of domains in this list, example :
libero.it 100
tiscali.it 77
legalmain 44
how i can do this in linux bash ?

cut -d# -f2 | sort | uniq -c | sort -nr | head -n 100 should do the trick. Cut extracts the domain by using # to separate the fields, uniq requires a sorted list, -c gives a count, sort -nr sorts them in decreasing order, and head gives the top one hundred.

Related

How can I count and display only the words that are repeated more than once using unix commands?

I am trying to count and display only the words that are repeated more than once in a file. The basic idea is:
You are given a file with names and characters like commas, colons, slashes, etc..
Use the cut command to display only the first names in the file (other commands are also allowed).
Count and then display only the names repeated more than once.
I got to the point of counting and displaying all the names. However, I haven't found a way to display and to count only those names repeated more than once.
Here is a section of the file:
user1:x:80:200:Mia,Spurs:/home/user1:/bin/bash
user2:x:80:200:Martha,Dalton:/home/user2:/bin/bash
user3:x:80:200:Lucy,Carlson:/home/user3:/bin/bash
user4:x:80:200:Carl,Bingo:/home/user4:/bin/bash
Here is what I have been able to do:
Daniel#Daniel-MacBook-Pro Files % cut -d ":" -f 5-5 file1 | cut -d "," -f 1-1 | sort -n | uniq -c
1 Mia
3 Martha
1 Lucy
1 Carl
1 Jessi
1 Joke
1 Jim
2 Race
1 Sem
1 Shirly
1 Susan
1 Tim

You can filter out the rows with count 1 with grep.
cut -d ":" -f 5 file1 | cut -d "," -f 1 | sort | uniq -c | grep -v '^ *1 '

Sort output in bash script by number of occurances

So I have a text being outputted that has http status codes in one column and an ip adress in the other. I wan't to sort this by number of occurances so that
1 2 1 3 4 5 4 4
Looks like
4 4 4 1 1 2 3 5
This is for the second column of status codes, the ip adresses dont need to be sorted in any particular order
Since 4 is the most common one it should be first and then 1 and so forth.
However all that I can find is how to use uniq for example in order to count the occurances, thereby removing duplicates and prefixing a number to each row.
The regular sort command does not support this as far as i can tell as well.
Any help would be appreciated

You can still use sort | uniq -c, then interpret the number of occurrences by printing the number the given times by looping:
tr ' ' '\n' < file \
| sort | uniq -c | sort -k1,1nr -k2n \
| while read times status ; do
for i in $(seq 1 $times); do
printf '%s ' $status
done
done

Minimal two column numeric input data for `sort` example, with distinct permutations

What's the least number of rows of two-column numeric input needed to produce four unique sort outputs for the following four options:
1. -sn -k1 2. -sn -k2 3. -sn -k1 -k2 4. -sn -k2 -k1 ?
Here's a 6 row example, (with 4 unique outputs):
6 5
3 7
6 3
2 7
4 4
5 2
As a convenience, a function to count those four outputs given 2 columns of numbers, (requires the moreutils pee command), which prints the number of unique outputs:
# Usage: foo c1_1 c2_1 c1_2 c2_2 ...
foo() { echo "$#" | tr -s '[:space:]' '\n' | paste - - | \
pee "sort -sn -k1 | md5sum" \
"sort -sn -k2 | md5sum" \
"sort -sn -k1 -k2 | md5sum" \
"sort -sn -k2 -k1 | md5sum" | \
sort -u | wc -l ; }
So to count the unique permutations of this input:
8 5
3 5
8 4
Run this:
foo 8 5 3 1 8 3
Output:
2
(Only two unique outputs. Not enough...)
Note: This question was inspired by the obscurity of the current version of the sort manual, specifically COLUMNS=65 man sort | grep -A 17 KEYDEF | sed 3,18d. The info sort page's treatment of KEYDEFs is much better.
KEYDEFs are more useful than they might first seem. The -u or --unique switch works nicely with the KEYDEFs, and in effect allows sort to delete unwanted redundant lines, and therefore can furnish a more concise substitute for certain sed or awk scripts and similar pipelines.

I can do it in 3 by varying the whitespace:
1 1
2 1
1 2
Your foo function doesn't produce this kind of output, but since it was only a "convenience" and not a part of the question proper, I declare this answer correct and minimal!
Sneakier version:
2 1
11 1
2 2
(The last line contains a tab; the others don't.)
With the -s option, I can't exploit non-numeric comparisons, but then I can exploit the stability of the sort:
1 2
2 1
1 1
The 1 1 line goes above both of the others if both fields are compared numerically, regardless of which comparison is done first. The ordering of the two comparisons determines the ordering of the other two lines.
On the other hand, if one of the fields isn't used for comparison, the 1 1 line stays below one of the other lines (and which one that is depends on which field is used for comparison).

Print out the value with the highest number of occurrences in a file

In a bash shell script, I want to go through a list of numbers and then print out the number that occurs most often. If there are several different numbers appearing an equal amount of times, I want to print the highest number. For example, in a file like this:
10
10
10
15
15
20
20
20
20
I want to print the value 20.
How can I achieve this?

If the numbers are in a file, one per line:
sort < myfile | uniq -c | sort -r | head -1
without the count:
A=$(sort < myfile | uniq -c | sort -r | head -1)
set $A
echo $2

You can use this command -
echo 10 10 10 15 15 20 20 20 20 | sed 's/ /\n/g' | sort | uniq -c | sort -V | tail -n 1 | awk '{print $2}'
It will print the number you want.

Find most frequent line in file in bash

Suppose I have a file similar to as follows:
Abigail 85
Kaylee 25
Kaylee 25
kaylee
Brooklyn
Kaylee 25
kaylee 25
I would like to find the most repeated line, the output must be just the line.
I've tried
sort list | uniq -c
but I need clean output, just the most repeated line (in this example Kaylee 25).

Kaizen ~
$ sort zlist | uniq -c | sort -r | head -1| xargs | cut -d" " -f2-
Kaylee 25
does this help ?

IMHO, none of these answers will sort the results correctly. The reason is that sort, without the -n, option will sort like this "1 10 11 2 3 4", etc., instead of "1 2 3 4 10 11 12". So, add -n like so:
sort zlist | uniq -c | sort -n -r | head -1
You can then, of course, pipe that to either xargs or sed as described earlier.

awk -
awk '{a[$0]++; if(m<a[$0]){ m=a[$0];s[m]=$0}} END{print s[m]}' t.lis

$ uniq -c list | sort -r | head -1 | awk '{$1=""}1'
Kaylee 25
Is this what you're looking for?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

check top 100 domains in email list - bash

cut -d# -f2 | sort | uniq -c | sort -nr | head -n 100 should do the trick. Cut extracts the domain by using # to separate the fields, uniq requires a sorted list, -c gives a count, sort -nr sorts them in decreasing order, and head gives the top one hundred.

Related

How can I count and display only the words that are repeated more than once using unix commands?

Sort output in bash script by number of occurances

Minimal two column numeric input data for `sort` example, with distinct permutations

Print out the value with the highest number of occurrences in a file

Find most frequent line in file in bash

Categories

Resources