sort command (MacOS terminal) gives inconsistent results [duplicate] - bash

I have a very simple text file of 3 fields, each is separated by a space, like following:
123 15 0
123 14 0
345 12 0
345 11 0
And I issued a sort command to sort by the first column: sort -k 1 myfile. But it does not sort just by the first column. It sort by the whole line and I get the following result:
123 14 0
123 15 0
345 11 0
345 12 0
Is there anything wrong on my command or file?

You need to use:
sort -k 1,1 -s myfile
if you want to sort only on the first field. This syntax specifies the start and end field for sorting. sort -k 1 means to sort starting with the first field through to the end of the line. To ensure the lines are kept in the same order with respect to the input where the sort key is the same, you need to use a stable sort with the -s flag (GNU).
See this from the sort(1) man page:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
F is a field number and C a character position in the field; both are
origin 1, and the stop position defaults to the line's end.
and the info page:
The --stable (-s) option disables this last-resort comparison so that
lines in which all fields compare equal are left in their original relative
order.

Related

Trying to find a maximum from a file in shellscript

In shellscript, I'm trying to get the maximum value from different lines. There are 5 things in a line, and the fifth is the value, that I need to compare to the others in the lines. If I found, what the maximum is, then I have to write out the rest of the line too.
Any advices how could I do it?
Sort numerically, by field 5, then print only the line containing the highest value:
sort -nk5,5 data.txt | tail -n 1
Try
< MYFILE sort -k5nr | head -1
< pipes MYFILE into sort, -k5 says to sort on the fifth key n is for numeric order, r sorts in reverse order so the largest number comes first. Then head -1 outputs only the first line. The end result is
69.4206662, 12.3216747, 2021.08.21., 14:44, 20

Bash Ordering csv by colum not as expected with numbers an spaces at the end of the string [duplicate]

I have a very simple text file of 3 fields, each is separated by a space, like following:
123 15 0
123 14 0
345 12 0
345 11 0
And I issued a sort command to sort by the first column: sort -k 1 myfile. But it does not sort just by the first column. It sort by the whole line and I get the following result:
123 14 0
123 15 0
345 11 0
345 12 0
Is there anything wrong on my command or file?
You need to use:
sort -k 1,1 -s myfile
if you want to sort only on the first field. This syntax specifies the start and end field for sorting. sort -k 1 means to sort starting with the first field through to the end of the line. To ensure the lines are kept in the same order with respect to the input where the sort key is the same, you need to use a stable sort with the -s flag (GNU).
See this from the sort(1) man page:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
F is a field number and C a character position in the field; both are
origin 1, and the stop position defaults to the line's end.
and the info page:
The --stable (-s) option disables this last-resort comparison so that
lines in which all fields compare equal are left in their original relative
order.

Bash sort using general numeric value on alphanumeric string not returning rows sorted properly

I have a file containing three TAB-separated columns. The 1st column is a number, the second is a sequence of 8 characters followed by 1-3 digits, the 3rd is the same as the 2nd column. Here's a minimum reproducible example:
1 abceefgh10 abceefgh22
1 abceefgh10 abceefgh9
1 abceefgh11 abceefgh10
1 abceefgh13 abceefgh11
1 abceefgh14 abceefgh13
1 abceefgh15 abceefgh14
1 abceefgh17 abceefgh16
-1 abceefgh18 abceefgh17
1 abceefgh19 abceefgh18
-1 abceefgh1 abceefgh2
-1 abceefgh20 abceefgh12
1 abceefgh21 abceefgh19
1 abceefgh22 abceefgh20
-1 abceefgh23 abceefgh21
1 abceefgh24 abceefgh24
1 abceefgh2 abceefgh1
1 abceefgh3 abceefgh3
1 abceefgh5 abceefgh5
1 abceefgh6 abceefgh25
1 abceefgh6 abceefgh6
1 abceefgh7 abceefgh7
-1 abceefgh8 abceefgh3
1 abceefgh9 abceefgh8
This example is what I get when I try to sort the columns with sort -gk2.9.
To the best of my knowledge I should expect to see the second column sorted from 1 to 24, and with increasing numerical value (i.e. 1,2,3,4,... and not 1,10,2,20,..., which would result if using -n).
If I cut the 2nd column and sort it with the same command (cut -f 2 ${file} | sort -gk1.9), I actually get the sorting that I want. Am I getting something obvious wrong?
Using --debug option you can see column selection does not work as expected:
1>abceefgh10>abceefgh9
^ no match for key
specifying separator in accordance with Nahuel's comment works (sort -t $'\t' --debug -gk2.9):
1>abceefgh10>abceefgh9
__

sort multiple columns numerically

I always wondered how sort works when ordering multiple columns according to their numerical values. For example:
echo -e " 2 3 \n 1 2 \n 2 10" | sort -n
produces:
1 2
2 10
2 3
and so does sort -g. If I want to order numerically the second column as well, the only solution I came up with is:
echo -e " 2 3 \n 1 2 \n 2 10" | sort -k1n -k2n
which produces the desired output:
1 2
2 3
2 10
Someone could please explain this behavior and tell if a simpler solution exists?
The POSIX specification for sort says:
-n
Restrict the sort key to an initial numeric string, consisting of optional <blank> characters, optional minus-sign, and zero or more digits with an optional radix character and thousands separators (as defined in the current locale), which shall be sorted by arithmetic value. An empty digit string shall be treated as zero. Leading zeros and signs on zeros shall not affect ordering.
This is essentially the same as saying -k1n,1. If you want to sort by multiple columns numerically, you must say so:
sort -k1n,1 -k2n,2 …
Be cautious about omitting the 'field end' after the commas.
Simpler, (the leading -k1n isn't needed), but not by much:
echo -e " 2 3 \n 1 2 \n 2 10" | sort -k2n
Output:
1 2
2 3
2 10

The Sort command does not work as expected

I have a very simple text file of 3 fields, each is separated by a space, like following:
123 15 0
123 14 0
345 12 0
345 11 0
And I issued a sort command to sort by the first column: sort -k 1 myfile. But it does not sort just by the first column. It sort by the whole line and I get the following result:
123 14 0
123 15 0
345 11 0
345 12 0
Is there anything wrong on my command or file?
You need to use:
sort -k 1,1 -s myfile
if you want to sort only on the first field. This syntax specifies the start and end field for sorting. sort -k 1 means to sort starting with the first field through to the end of the line. To ensure the lines are kept in the same order with respect to the input where the sort key is the same, you need to use a stable sort with the -s flag (GNU).
See this from the sort(1) man page:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
F is a field number and C a character position in the field; both are
origin 1, and the stop position defaults to the line's end.
and the info page:
The --stable (-s) option disables this last-resort comparison so that
lines in which all fields compare equal are left in their original relative
order.

Resources