I have a file in the format:
key1 1 <value>
key1 2 <value>
key1 3 <value>
key2 1 <value>
key2 2 <value>
key3 1 <value>
key3 2 <value>
I would like to shuffle this file by the key, but I would like the ordering of the blocks with the same key to stay the same. So an acceptable ordering would be:
key3 1 <value>
key3 2 <value>
key2 1 <value>
key2 2 <value>
key1 1 <value>
key1 2 <value>
key1 3 <value>
Is there any way to do this with sort -R?
Without knowing exactly the data in the file or exactly how you want it sorted, it will be hard to give the exact answer for which you seek. Sort has many "gotchas" that aren't always intuitive, remember to visit the man page every now and then. However, here are examples that may help:
sort -R -k1,1 -k2,2n -b -s
1. Sort by random has of keys, Sort column 2 by numeric--overwrite the random for this key, Ignore leading blanks, and disable last-resort comparison
sort -k1,1R -k2,2n
Sort column 1 by random has of keys, Sort column 2 by numeric
sort -k1,1R -k2,2rn
Sort column 1 by random has of keys, revesre-sort column 2 by numeric
Related
I have a very simple text file of 3 fields, each is separated by a space, like following:
123 15 0
123 14 0
345 12 0
345 11 0
And I issued a sort command to sort by the first column: sort -k 1 myfile. But it does not sort just by the first column. It sort by the whole line and I get the following result:
123 14 0
123 15 0
345 11 0
345 12 0
Is there anything wrong on my command or file?
You need to use:
sort -k 1,1 -s myfile
if you want to sort only on the first field. This syntax specifies the start and end field for sorting. sort -k 1 means to sort starting with the first field through to the end of the line. To ensure the lines are kept in the same order with respect to the input where the sort key is the same, you need to use a stable sort with the -s flag (GNU).
See this from the sort(1) man page:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
F is a field number and C a character position in the field; both are
origin 1, and the stop position defaults to the line's end.
and the info page:
The --stable (-s) option disables this last-resort comparison so that
lines in which all fields compare equal are left in their original relative
order.
I have a file containing three TAB-separated columns. The 1st column is a number, the second is a sequence of 8 characters followed by 1-3 digits, the 3rd is the same as the 2nd column. Here's a minimum reproducible example:
1 abceefgh10 abceefgh22
1 abceefgh10 abceefgh9
1 abceefgh11 abceefgh10
1 abceefgh13 abceefgh11
1 abceefgh14 abceefgh13
1 abceefgh15 abceefgh14
1 abceefgh17 abceefgh16
-1 abceefgh18 abceefgh17
1 abceefgh19 abceefgh18
-1 abceefgh1 abceefgh2
-1 abceefgh20 abceefgh12
1 abceefgh21 abceefgh19
1 abceefgh22 abceefgh20
-1 abceefgh23 abceefgh21
1 abceefgh24 abceefgh24
1 abceefgh2 abceefgh1
1 abceefgh3 abceefgh3
1 abceefgh5 abceefgh5
1 abceefgh6 abceefgh25
1 abceefgh6 abceefgh6
1 abceefgh7 abceefgh7
-1 abceefgh8 abceefgh3
1 abceefgh9 abceefgh8
This example is what I get when I try to sort the columns with sort -gk2.9.
To the best of my knowledge I should expect to see the second column sorted from 1 to 24, and with increasing numerical value (i.e. 1,2,3,4,... and not 1,10,2,20,..., which would result if using -n).
If I cut the 2nd column and sort it with the same command (cut -f 2 ${file} | sort -gk1.9), I actually get the sorting that I want. Am I getting something obvious wrong?
Using --debug option you can see column selection does not work as expected:
1>abceefgh10>abceefgh9
^ no match for key
specifying separator in accordance with Nahuel's comment works (sort -t $'\t' --debug -gk2.9):
1>abceefgh10>abceefgh9
__
I always wondered how sort works when ordering multiple columns according to their numerical values. For example:
echo -e " 2 3 \n 1 2 \n 2 10" | sort -n
produces:
1 2
2 10
2 3
and so does sort -g. If I want to order numerically the second column as well, the only solution I came up with is:
echo -e " 2 3 \n 1 2 \n 2 10" | sort -k1n -k2n
which produces the desired output:
1 2
2 3
2 10
Someone could please explain this behavior and tell if a simpler solution exists?
The POSIX specification for sort says:
-n
Restrict the sort key to an initial numeric string, consisting of optional <blank> characters, optional minus-sign, and zero or more digits with an optional radix character and thousands separators (as defined in the current locale), which shall be sorted by arithmetic value. An empty digit string shall be treated as zero. Leading zeros and signs on zeros shall not affect ordering.
This is essentially the same as saying -k1n,1. If you want to sort by multiple columns numerically, you must say so:
sort -k1n,1 -k2n,2 …
Be cautious about omitting the 'field end' after the commas.
Simpler, (the leading -k1n isn't needed), but not by much:
echo -e " 2 3 \n 1 2 \n 2 10" | sort -k2n
Output:
1 2
2 3
2 10
List of files:
sysbench-size-256M-mode-rndrd-threads-1
sysbench-size-256M-mode-rndrd-threads-16
sysbench-size-256M-mode-rndrd-threads-4
sysbench-size-256M-mode-rndrd-threads-8
sysbench-size-256M-mode-rndrw-threads-1
sysbench-size-256M-mode-rndrw-threads-16
sysbench-size-256M-mode-rndrw-threads-4
sysbench-size-256M-mode-rndrw-threads-8
sysbench-size-256M-mode-rndwr-threads-1
sysbench-size-256M-mode-rndwr-threads-16
sysbench-size-256M-mode-rndwr-threads-4
sysbench-size-256M-mode-rndwr-threads-8
sysbench-size-256M-mode-seqrd-threads-1
sysbench-size-256M-mode-seqrd-threads-16
sysbench-size-256M-mode-seqrd-threads-4
sysbench-size-256M-mode-seqrd-threads-8
sysbench-size-256M-mode-seqwr-threads-1
sysbench-size-256M-mode-seqwr-threads-16
sysbench-size-256M-mode-seqwr-threads-4
sysbench-size-256M-mode-seqwr-threads-8
I would like to sort them by mode (rndrd, rndwr etc.) and then number:
sysbench-size-256M-mode-rndrd-threads-1
sysbench-size-256M-mode-rndrd-threads-4
sysbench-size-256M-mode-rndrd-threads-8
sysbench-size-256M-mode-rndrd-threads-16
sysbench-size-256M-mode-rndrw-threads-1
sysbench-size-256M-mode-rndrw-threads-4
sysbench-size-256M-mode-rndrw-threads-8
sysbench-size-256M-mode-rndrw-threads-16
....
I've tried the following loop but it's sorting by number but I need sequence like 1,4,8,16:
$ for f in $(ls -1A); do echo $f; done | sort -t '-' -k 7n
EDIT:
Please note that numeric sort (-n) sort it by number (1,1,1,1,4,4,4,4...) but I need sequence like 1,4,8,16,1,4,8,16...
Sort by more columns:
sort -t- -k5,5 -k7n
Primary sort is by 5th column (and not the rest, that's why 5,5), secondary sorting by number in the 7th column.
The for loop is completely unnecessary as is the -1 argument to ls when piping its output. This yields
ls -A | sort -t- -k 5,5 -k 7,7n
where the first key begins and ends at column 5 and the second key begins and ends at column 7 and is numeric.
I have a very simple text file of 3 fields, each is separated by a space, like following:
123 15 0
123 14 0
345 12 0
345 11 0
And I issued a sort command to sort by the first column: sort -k 1 myfile. But it does not sort just by the first column. It sort by the whole line and I get the following result:
123 14 0
123 15 0
345 11 0
345 12 0
Is there anything wrong on my command or file?
You need to use:
sort -k 1,1 -s myfile
if you want to sort only on the first field. This syntax specifies the start and end field for sorting. sort -k 1 means to sort starting with the first field through to the end of the line. To ensure the lines are kept in the same order with respect to the input where the sort key is the same, you need to use a stable sort with the -s flag (GNU).
See this from the sort(1) man page:
KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where
F is a field number and C a character position in the field; both are
origin 1, and the stop position defaults to the line's end.
and the info page:
The --stable (-s) option disables this last-resort comparison so that
lines in which all fields compare equal are left in their original relative
order.