Sort lines by group and column - bash

I have a csv file separated by semicolons. Which contains lines as shown below.. And I need to sort it by the first and third column, respecting the groups of lines defined by the value of the first column.
booke;book;2
booke;booke;1
booke;bookede;6
booke;bookedes;8
booke;booker;4
booke;bookes;7
booke;booket;3
booking;booking;1
booking;bookingen;2
booking;bookingens;3
booking;bookinger;7
booking;bookingerne;5
booking;bookingernes;6
booking;bookingers;8
booking;bookings;4
Expected output:
booke;booke;1
booke;book;2
booke;booket;3
booke;booker;4
booke;bookede;6
booke;bookes;7
booke;bookedes;8
booking;booking;1
booking;bookingen;2
booking;bookingens;3
booking;bookings;4
booking;bookingerne;5
booking;bookingernes;6
booking;bookinger;7
booking;bookingers;8
I tried it with sort -t; -k3,3n -k1,1 but it's sorted by the third entire column.

What about using two sorts in a pipeline fashion:
sort -t ';' -k 3,3n | sort -t ';' -k 1,1 -s
The -s in the second parameter is necessary in order to enable stable sort. Otherwise it could destroy the previous (third column) sorting.
EDIT: however as #BenjaminW. points out in his comment, you can use multiple -k flags, you only specified them the wrong way. By performing a sort:
sort -t ';' -k 1,1 -k 3,3n
It takes the first column als primary sorting column and the third as secondary.

Related

Sort by multiple conditions ascending and descending in bash

I have a following issue. I have a file containg name,surname,age,mood. I need to sort this file by age (descending). If age is the same that sort it my surname (ascending).
I use this: cat $FILE |sort -r -n -t"," -k3,3 -k2,2 > "$HOME"/people.txt But -r sorts both descending. How can I sort by surname ascending, please?
By default sort will perform the sort in ascending order, the -r flag will perform the sort in descending order; the -r flag can be applied to individual -k directives when you need to use a mix of ascending and descending, eg:
$ cat raw.dat
1,2,4,5
1,2,7,5
1,2,9,5
1,2,3,5
1,3,7,5
1,1,7,5
Sort by column #3 (descending) and then column #2 (ascending):
$ sort -t"," -k3nr -k2n raw.dat
1,2,9,5
1,1,7,5
1,2,7,5
1,3,7,5
1,2,4,5
1,2,3,5
NOTES:
thanks to Ted Lyngmo for adding the n flag to properly handle numerics
if data could contain a mix of characters and numerics the n may need to be replaced depending on desired sort method (eg, V)
key takeaway is that quite a few of the sort flags can be applied at the -key level

Sorting multiple columns in ascending order

Source:
10,10,7.17,1.077383,0.00428382
10,12,7.45,1.177068,0.00390197
10,4,6.86,1.184806,0.00489828
10,6,6.98,1.106846,0.00463645
10,8,7.09,1.106254,0.00451672
12,10,6.71,1.224453,0.00506310
12,12,6.96,1.141856,0.00446641
12,4,6.41,1.510563,0.00590838
12,6,6.51,1.187841,0.00548915
12,8,6.62,1.217152,0.00532222
Desired result
10,4,6.86,1.184806,0.00489828
10,6,6.98,1.106846,0.00463645
10,8,7.09,1.106254,0.00451672
10,10,7.17,1.077383,0.00428382
10,12,7.45,1.177068,0.00390197
12,4,6.41,1.510563,0.00590838
12,6,6.51,1.187841,0.00548915
12,8,6.62,1.217152,0.00532222
12,10,6.71,1.224453,0.00506310
12,12,6.96,1.141856,0.00446641
How do i sort the csv for the first two column such i get the desired result in ascending order.
10,4
10,6
10,8
10,12
sort -k1,2 -n -t, didn't work as expected
10,4,6.86,1.184806,0.00489828
10,6,6.98,1.106846,0.00463645
10,8,7.09,1.106254,0.00451672
12,4,6.41,1.510563,0.00590838
12,6,6.51,1.187841,0.00548915
12,8,6.62,1.217152,0.00532222
You can see that 10,10,7.17,1.077383,0.00428382 is missing
sort -k1,1 -k2,2 -n -t, worked fine
More info : https://unix.stackexchange.com/questions/78925/how-to-sort-by-multiple-columns
To answer your question you should use this:
sort -t, -k1,1n -k2,2n yourFile.csv
The problem with your command is that -n does no apply to the fields you try to sort on; -k1,2n would do that but it sill does not solves your problem because it will consider both fields together (e.g. 10,10, 10,12) and will not work probably because of the you locale.
If you try
LC_ALL=C sort -t, -k1,2n yourFile.csv
you will get something like:
10,10,7.17,1.077383,0.00428382
10,12,7.45,1.177068,0.00390197
10,4,6.86,1.184806,0.00489828
10,6,6.98,1.106846,0.00463645
10,8,7.09,1.106254,0.00451672
12,10,6.71,1.224453,0.00506310
12,12,6.96,1.141856,0.00446641
12,4,6.41,1.510563,0.00590838
12,6,6.51,1.187841,0.00548915
12,8,6.62,1.217152,0.00532222
(ordered by first two fields 'concatenated').

Sorting file content using String value in a certain sequence

I had a jumbled up file content as follows:
13,13,GAME_FINISH,
1,1,GAME_START,
1,1,GROUP_FINISH,
17,17,WAGER,200.00
2,2,GAME_FINISH,
2,2,GAME_START,
22,22,GAME_WIN,290.00
2,2,GROUP_FINISH,
32,32,WAGER,200.00
3,3,GAME_FINISH,
3,3,GAME_START,
.... more lines
I sorted it and currently hold the file content in following format:
1,1,GAME_FINISH,
1,1,GAME_START,
1,1,GROUP_FINISH,
1,1,WAGER,200.00
2,2,GAME_FINISH,
2,2,GAME_START,
2,2,GAME_WIN,290.00
2,2,GROUP_FINISH,
2,2,WAGER,200.00
3,3,GAME_FINISH,
3,3,GAME_START,
3,3,GROUP_FINISH,
3,3,WAGER,200.00
... more lines
But how can I sort it better to obtain following format? 3rd and 4th line may not always exist.
1,1,WAGER,200.00
1,1,GAME_START,
1,1,GAME_WIN,500.00
1,1,BONUS_WIN_1,1100.00
1,1,GAME_FINISH,
1,1,GROUP_FINISH,
2,2, more lines...
For the initial sort, I used
sort -t, -g -k2 nameofunsortedfile.csv >> sortedfile.csv
Added Information:
I want to sort it in this order - Wager, game start, game win, bonus win, game finish, group finish. My current sorted is not in this order. Game win and bonus win may not always be present.
The order I am expecting is not dictionary but also not random. Every number always has a wager, start, game_finish group_finish sequence. game_win, game_bonus are optional. Looking for a way to example target 1,1 sort in the expected sequence mentioned, move on to 2,2 do the same and so on.
The most straightforward way to do this with standard UNIX utilities is probably to add an additional field to each line, which encodes the type of record in a way that sorts into the order you want.
declare -A mapping=( ["WAGER"]=1 ["GAME_START"]=2 ["GAME_WIN"]=3 ["BONUS_WIN"]=4 ["GAME_FINISH"]=5 ["GROUP_FINISH"]=6 )
cut -d, -f3 filename.txt | while read; do echo ${mapping["$REPLY"]}; done | paste -d, - filename.txt | sort | sort -s -t, -n -k 2,3 | cut -d, -f 2-
The declare statement declares a mapping that allows you to look up the ordering of each record type. The specific values (1, 2, etc.) don't matter as long as they sort into the order you want; you could use letters or words if you prefer.
Then the next line consists of the following commands:
cut -d, -f3 filename.txt extracts the thing you want to sort by (WAGER or whatever)
while read; do echo ${mapping["$REPLY"]}; done takes each value (WAGER etc.) and replaces it with its corresponding sortable value from the associative array mapping
paste -d, - filename.txt sticks those values back on to the start of each line from filename.txt
sort | sort -s -t, -n -k 2,3 has the effect of sorting by field 2, then field 3, then field 1 (the one we added). If sort could use three fields as keys, we could do this in a single sort command, but it only accepts up to two fields to sort by.
cut -d, -f 2- strips off the added field, leaving you with your original records, but in sorted order
Perl to the rescue:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $i = 1;
my %order = map { $_ => $i++ }
qw( WAGER GAME_START GAME_WIN BONUS_WIN GAME_FINISH GROUP_FINISH );
chomp( my #lines = <> );
say join ',', #$_ for sort {
$a->[0] <=> $b->[0]
|| $order{ $a->[2] } <=> $order{ $b->[2] }
} map [ split /,/ ], #lines;
The sort block tells Perl to first sort by the first column, and if the values are the same, use the "order" corresponding to the third one.

Sort by an ID column and date column (MM/DD/YYYY)

I'm trying to sort a .txt file by both an ID column and a date column, but the date sort part is not working as I need it to.
Data:
|855986|03/01/1980|100|
|855986|06/01/1979|120|
|868566|01/01/1999|560|
|855986|05/01/2015|856|
|868566|09/01/2000|560|
What I need output to look like:
|855986|06/01/1979|120|
|855986|03/01/1980|100|
|855986|05/01/2015|856|
|868566|01/01/1999|560|
|868566|09/01/2000|560|
Here's my current code, which sorts the ID and month correctly, but seems to ignore the year portion of the date:
sort -t '|' -k 1 -b -k 2.7,2.10 -k 2.1,2.2 file.txt
You are pretty close. However date field is actually field #3 as | is first character in every line.
You can use:
sort -b -t '|' -k 3.7,3.10 -k 3.4,3.5 -k 3.1,3.2 file
|855986|06/01/1979|120|
|855986|03/01/1980|100|
|868566|01/01/1999|560|
|868566|09/01/2000|560|
|855986|05/01/2015|856|

bash sort on column but do not sort same columns

My file contains:
9827259163,0,D<br>
9827961481,0,D<br>
9827202228,0,A<br>
9827529897,5,D<br>
9827529897,0#1#5#8,A<br>
9827700249,0#1,A<br>
9827700249,1#2,D<br>
9883219029,0,A<br>
9861065312,0,A<br>
I want it to sort on the basis of first column, if the records in first column are same, then do not sort those records further.
$ sort -t, -k1,1 test
9827202228,0,A
9827259163,0,D
9827529897,0#1#5#8,A
9827529897,5,D
9827700249,0#1,A
9827700249,1#2,D
9827961481,0,D
9861065312,0,A
9883219029,0,A
but what I expect is:
9827202228,0,A
9827259163,0,D
9827529897,5,D
9827529897,0#1#5#8,A
9827700249,0#1,A
9827700249,1#2,D
9827961481,0,D
9861065312,0,A
9883219029,0,A
because there are two records for 9827529897 and 9827700249, therefore it should not be sorted further.
Please suggest the command in bash shell
add option -s
sort -st, -k1,1 test
output:
9827202228,0,A
9827259163,0,D
9827529897,5,D
9827529897,0#1#5#8,A
9827700249,0#1,A
9827700249,1#2,D
9827961481,0,D
9861065312,0,A
9883219029,0,A

Resources