Count query in shell - bash

I have a file with many entries like
asd 13
dsa 14
ert 10
ghj 78
... and many entries like this
We can consider it to be key and count pair. Key entries are distinct.
I need top 6 Keys and their count.
WHAT HAVE I DONE: I dont know how to sort it on the basis of count. If I can get to that, I can print top 6.

sort -nrk2 | head -6
numeric sort
reverse sort
sort by field 2
get top 6

cat c.txt|awk '{print $2" "$1}'|sort -nr|head -6
Assuming file name as c.txt

Related

Sort output in bash script by number of occurances

So I have a text being outputted that has http status codes in one column and an ip adress in the other. I wan't to sort this by number of occurances so that
1 2 1 3 4 5 4 4
Looks like
4 4 4 1 1 2 3 5
This is for the second column of status codes, the ip adresses dont need to be sorted in any particular order
Since 4 is the most common one it should be first and then 1 and so forth.
However all that I can find is how to use uniq for example in order to count the occurances, thereby removing duplicates and prefixing a number to each row.
The regular sort command does not support this as far as i can tell as well.
Any help would be appreciated
You can still use sort | uniq -c, then interpret the number of occurrences by printing the number the given times by looping:
tr ' ' '\n' < file \
| sort | uniq -c | sort -k1,1nr -k2n \
| while read times status ; do
for i in $(seq 1 $times); do
printf '%s ' $status
done
done

Bash Game Scorefile

I'm working on a simple number guessing game (to boost my bash skills) which at the end appends score and name to a file and then displays it to the player, like so:
10 Hana
10 lilka
10 nogba
12 nogba
13 Hana
13 ugaea
1 Lilka
5 lilka
7 borja
7 Hana
8 frina
8 molaa
9 Hana
9 lanma
9 lilka
Before displaying the high score file I'd like to remove all duplicate lines but leave the ones with the lowest score. Like so:
10 nogba
13 ugaea
1 Lilka
5 lilka
7 borja
7 Hana
8 frina
8 molaa
9 lanma
I'm thinking sed could be my answer but i'm not shure.
Maybe something like this?
echo $highscorevalue >> $scorefile
sed -i '$!N; /^\(.*\)\n\1$/!P; D' $scorefile
cat $scorefile | sort
You can try with awk as well:
awk '{if($1 < a[$2] || !a[$2]) a[$2]=$1} END{for(i in a) print a[i], i}' file
This will fill an array a with the minimal value value in the first column for each name of the second column. The array is displayed at the end.
Note the output is not sorted. If you want to sort it, add | sort -k2 to the command.
$ sort -n -k2,2 -k1,1 score.txt | awk '!seen[$2]++' | sort
10 nogba
13 ugaea
1 Lilka
5 lilka
7 borja
7 Hana
8 frina
8 molaa
9 lanma
The first sort command sorts by second column and then numerically sort in ascending order when there are multiple entries for a name.
The awk command discards duplicates based on names in second column, keeping the first entry
Second sort command used only to match output as given in question

Bash sort by number AND word length AND alphabetically

I have these strings in my array:
3 rere 33.33%
2 ena 22.22%
1 something 11.11%
1 som 11.11%
1 ok 11.11%
1 evo 11.11%
Expected results are:
3 rere 33.33%
2 ena 22.22%
1 something 11.11%
1 evo 11.11%
1 som 11.11%
1 ok 11.11%
They are ordered by number descending.
And I want to order them also by length of word in middle, but if words are same length, order them alphabetically.
These are not columns.
I wanted to split it in two arrays and sort them afterwards, but how to join them together?
Anyone got an idea?
You can't sort by length with sort. Let's try a Schwartzian transform:
awk '{print length($2), $0}' file | sort -k2,2nr -k1,1nr -k3,3 | cut -d" " -f2-
The awk command takes 1 something 11.11% and outputs 9 1 something 11.11%.
Then sort sorts first by the 2nd field numerically, then by the 1st field numerically, then by the 3rd field lexically.
Then cut removes the first field.
The idea behind this is very similar to the Schwartzian transform used in choroba's answer: we add a sort field (in this case the length of the second column), use it to sort, then remove it again:
while read -r col1 word rest; do
printf "%d\t%s %s %s\n" "${#word}" "$col1" "$word" "$rest"
done < infile | sort -k 2,2nr -k 1,1nr -k 3,3 | cut -f 2
This results in
3 rere 33.33%
2 ena 22.22%
1 something 11.11%
1 evo 11.11%
1 som 11.11%
1 ok 11.11%
After the while loop, the output looks like this:
4 3 rere 33.33%
3 2 ena 22.22%
9 1 something 11.11%
3 1 som 11.11%
2 1 ok 11.11%
3 1 evo 11.11%
There is a new column with the length of the string in the second column. It's tab separated for easier cutting afterwards.
For sort, we specify what to use for sorting with the -k arguments (sort doesn't care if the fields are tab or space separated): 2,2nr uses just the second field, numerically and in descending order; the same goes for 1,1nr, and 3,3 is just your standard lexical sort.
The output now looks like this:
4 3 rere 33.33%
3 2 ena 22.22%
9 1 something 11.11%
3 1 evo 11.11%
3 1 som 11.11%
2 1 ok 11.11%
Now we only have to get rid of the first column, for which we use cut and take advantage of the tab separation introduced with printf.
The Bash while loop is very slow, the Perl solution is likely orders of magnitude faster.
Perl to the rescue!
perl -l -0777 -aF'\n' -ne '
print for map join(" ", #$_),
sort { $b->[0] <=> $a->[0]
|| length($a->[1]) <=> length($b->[1])
|| $a->[1] cmp $b->[1] }
map [ split ],
#F;
' input-file
-n reads the input record by record
-0777 sets the whole file as one record
-l adds newlines to prints
-a splits the input
-F'\n' tells -a to split on newlines
each line is then split on whitespace by split, sorted numerically (<=>) by the 0th column, or by length of the 1st column, or alphabetically (cmp) by the first column

Sort integer by absolute value

I have a list of integers and i want to sort it with sort but i want to sort on the absolute value of the integers. For example 7 0 5 10 -2 should give 0 -2 5 7 10 (integers are separated on multiple lines in my file)
I don't think there is an option in sort to do that but i can't find an other command to sort lines. The -n options sort with the natural order and -g is not what i want.
I tried to look at awk but i don't know if it can help me.
Use
cat numbers.txt | sed -r 's/-([0-9]+)/\1-/g;' | sort -n | sed -r 's/([0-9]+)-/-\1/g;'
the first sed put the minus behind the digits
sort sort by number
the second sed puts the minus again in front of the digits
I can't find this documented anywhere, but when you run sort -Vd it sorts by absolute value. It's a combination of the "version sort" and "numerical sort" options. With 1 5 3 7 -2 -4 -9, version sort on it's own does something like this:
1
3
5
7
-2
-4
-9
And numerical sort on its own sorts like this;
-9
-4
-2
1
3
5
7
And with both options, it sorts like this;
1
-2
3
-4
5
7
-9
I don't know if this is by design or by accident, and I've only tested it in GNU sort. I have found this trick to be very useful for certain code golfing situations.
A one line perl solution. Works more generally on floating point values as well. For example:
$ cat numbers.txt
1 -100 5 -4 7 -9 12 25.3 1.8 -1 33.5
$ perl -lane 'print(join " ", sort {abs($a) <=> abs($b)} #F);' numbers.txt
1 -1 1.8 -4 5 7 -9 12 25.3 33.5 -100
If you want the order to be descending, just reverse the $a and $b variables.
If your file is named fname then the following should work:
paste <(sed 's/-//' fname) fname | sort -n | cut -f 2
The sed strips out the - to generate an absolute value, paste, joins the absolute value as the first column, by which is then sorted. This is then cut out.

Using awk to get the maximum value of a column, for each unique value of another column

So I have a file such as:
10 1 abc
10 2 def
10 3 ghi
20 4 elm
20 5 nop
20 6 qrs
30 3 tuv
I would like to get the maximum value of the second column for each value of the first column, i.e.:
10 3 ghi
20 6 qrs
30 3 tuv
How can I do using awk or similar unix commands?
You can use awk:
awk '$2>max[$1]{max[$1]=$2; row[$1]=$0} END{for (i in row) print row[i]}' file
Output:
10 3 ghi
20 6 qrs
30 3 tuv
Explanation:
awk command uses an associative array max with key as $1 and value as $2. Every time we encounter a value already stored in this associative array max, we update our previous entry and store whole row in another associative array row with the same key. Finally in END section we simply iterate over associative array row and print it.
shorter alternative with sort
$ sort -k1,1 -k2,2nr file | sort -u -k1,1
10 3 ghi
20 6 qrs
30 3 tuv
sort by field one and field two (numeric, reverse) so that max for each key will be top of the group, pick the first for each key by the second sort.

Resources