Linux sort by column and in reverse order - shell

I'm trying to sort a file by second column, but in reverse order.
I tried:
sort -k2n -r file.txt
The output is not in reverse order, so it seems -r is being ignored.
I'm in CentOS.

Try to add a space after the -k and before the column position. e.g. something like below
sort -k 2n -r file.txt

I just needed to remove the "n" next to column number:
sort -k 2 -r file.txt

Say we have this.txt
one 1
two 2
three 3
four 4
five 5
Now simply do
$ sort -k2,2nr this.txt
five 5
four 4
three 3
two 2
one 1

Related

Why does the shell sort return weird characters?

I use the following command in order to sort the numerical key-value pairs in the input file. Moreover, I need only a single value for each key. In case there are more values for the same key, I intend to select the minimal one.
Input:
2 20
1 10
2 19
Output:
1 10
2 19
I use this shell command:
sort -n -k1 -k2 $MYFILE | sort -n -u -k1
Everything works fine for small inputs (hundreds of pairs). I tried generated a ~3GB file in order to measure time required to do the sorting but I was nothing but disappointed when the end of the output was similar to this:
%T3�����P����
�6">�<�_!r�=_G�A������O<Ce۱��؉l6���3�$a8�����(_ē����7*���&���x���q&�n�PK����h�>�o�a��t�����,o�^��m��l�192�,����N)�$�)� *i�7�-������k�i���P�W�G
W��㛼�C��E���Ә3�)L
�i�����Q�X����/-S�9�
!�Y��EJ<�.�Q�SwMj��"�rÍI�f�y-P�ؚ;Yz
Where is the problem? Is the input too large and the sord command can't process it? Or maybe the pipe is the problem?

Frequency count of particular field appended to line without deleting duplicates

Trying to work out how to get a frequency appended or prepended to each line in a file WITHOUT deleting duplicate occurrences (which uniq can do for me).
So, if input file is:
mango
mango
banana
apple
watermelon
banana
I need output:
mango 2
mango 2
banana 2
apple 1
watermelon 1
banana 2
All the solutions I have seen delete the duplicates. In other words, what I DON'T want is:
mango 2
banana 2
apple 1
watermelon 1
Basically you cannot do it in one pass without keeping everything in memory. If this is what you want to do, then use python/perl/awk/whatever. The algorithm is quite simple.
Let's do it with standard Unix tools. This is a bit cumbersome and can be improved but should do the work:
$ sort input | uniq -c > input.count
$ nl input | sort -k 2 > input.line
$ join -1 2 -2 2 input.line input.count | sort -k 2 | awk '{print $1 " " $3}
The first step is to count the number occurrences of a given word.
As you said you cannot both repeat and keep line ordering. So we have to fix that. The second step prepends the line number that we will use later to fix the ordering issue.
In the last step, we join the two temporary files on the original word, the second column contains the original line number sort we sort on this key and strip it from the final output.

bash: different sort output on files with identical first column

Sorry for the vague title, I couldn't think of a better one...
I have 2 tab-delimited files with identical first columns (different numbers of total columns). I would like to sort both files by their first column.
I think I could do this either with it -t\t option or the -k1,12 option (since first column is never longer than 12 characters.) Both options produce the same (wrong) output.
Even though both files have the same first column, they are sorted differently. Notice that on the file1 I get 23,29,2; file2, I get 2,23,29.
$ head file1 | sort -t\t | cut -f1
rs1000000
rs10000010
rs10000012
rs10000013
rs10000017
rs10000023
rs10000029
rs1000002
rs10000030
$ head file2 | sort -t\t | cut -f1
rs1000000
rs10000010
rs10000012
rs10000013
rs10000017
rs1000002
rs10000023
rs10000029
rs10000030
how I can I sort both files such that the first column is in the same order in each?
Thank you!
sort -t $'\t' -k 1,1
Use $'\t' to have the shell interpret \t as a tab since sort doesn't parse escape sequences. Use -k to tell it to only sort on the first field rather than the entire line.
You might also want the -V flag if you want 2 to sort in between 0 and 10.

How to sort primary column with alphabet order then secondary column with numeric order?

Assuming there is a text file:
10 A QAZ
5 A EDC
14 B RFV
3 A WSX
7 B TGB
I want to sort it with the second column as the main column with alphabet order and the first column as the secondary column with numeric order. The following is the expected result:
3 A WSX
5 A EDC
10 A QAZ
7 B TGB
14 B RFV
I tried sort -k 2,2 -k 1,1 a.txt -n and sort -k 2,2 -k 1,1 a.txt but both give the wrong results. What should I solve this problem? Thanks.
This should work:
sort -b -k2,2 -k1,1n
The -b is essential, without it, the output is wrong, since sort wrongly determines the position of the second column. See man sort (or here) for details.
Also, check your locale. They can influence how sort works.
This might work for you:
sort -k1.5,1.8 -k1.1,1.4n file

Sorting data based on second column of a file

I have a file of 2 columns and n number of rows.
column1 contains names and column2 age.
I want to sort the content of this file in ascending order based on the age (in second column).
The result should display the name of the youngest person along with name and then second youngest person and so on...
Any suggestions for a one liner shell or bash script.
You can use the key option of the sort command, which takes a "field number", so if you wanted the second column:
sort -k2 -n yourfile
-n, --numeric-sort compare according to string numerical value
For example:
$ cat ages.txt
Bob 12
Jane 48
Mark 3
Tashi 54
$ sort -k2 -n ages.txt
Mark 3
Bob 12
Jane 48
Tashi 54
Solution:
sort -k 2 -n filename
more verbosely written as:
sort --key 2 --numeric-sort filename
Example:
$ cat filename
A 12
B 48
C 3
$ sort --key 2 --numeric-sort filename
C 3
A 12
B 48
Explanation:
-k # - this argument specifies the first column that will be used to sort. (note that column here is defined as a whitespace delimited field; the argument -k5 will sort starting with the fifth field in each line, not the fifth character in each line)
-n - this option specifies a "numeric sort" meaning that column should be interpreted as a row of numbers, instead of text.
More:
Other common options include:
-r - this option reverses the sorting order. It can also be written as --reverse.
-i - This option ignores non-printable characters. It can also be written as --ignore-nonprinting.
-b - This option ignores leading blank spaces, which is handy as white spaces are used to determine the number of rows. It can also be written as --ignore-leading-blanks.
-f - This option ignores letter case. "A"=="a". It can also be written as --ignore-case.
-t [new separator] - This option makes the preprocessing use a operator other than space. It can also be written as --field-separator.
There are other options, but these are the most common and helpful ones, that I use often.
For tab separated values the code below can be used
sort -t$'\t' -k2 -n
-r can be used for getting data in descending order.
-n for numerical sort
-k, --key=POS1[,POS2] where k is column in file
For descending order below is the code
sort -t$'\t' -k2 -rn
Use sort.
sort ... -k 2,2 ...
Simply
$ sort -k2,2n <<<"$data"

Resources