How to sorting multiple columns from a CSV file - bash

I have a file in following format. I need to sort it by largest code first and then by largest values column.
colour,letter,code,value
red,r,016,949.8
red,r,015,603.9
red,r,014,348.4
blue,b,016,362.29
blue,b,015,460.2
blue,b,014,9850.9
output:
red,r,016,949.8
blue,b,016,362.29
red,r,015,603.9
blue,b,015,460.2
blue,b,014,9850.9
red,r,014,348.4
my implementation
sort -k3,3n -r -k4,4n -t \t data.csv
When I try to do it it sorts the file but doesn't sort the first two columns.

It's not clear if the file is TSV (tab separated), or CSV (comma separated). Question indicate CSV, but answer using tab delimiter (-t \t). Try -t, for CSV. Also the reverse order need to be applied to each key ('r' suffix on each key).
sort -k3,3nr -k4,4nr -t, data.csv

Related

how to check if a file is sorted on nth column in unix?

lets say that I have a file as below:(comma separated)
cat test.csv
Rohit,India
Rahul,India
Surya Kumar,India
Shreyas Iyer,India
Ravindra Jadeja India
Rishabh Pant India
zzabc,abc
Now I want to check if the above file is sorted on 02nd column.
I tried the command sort -ct"," -k2,2 test.csv
I'm expecting it to say disorder in last line, but it is giving me disorder in 02nd line.
Could anybody tell me what is wrong here? and how to get the expected output?
The sort is not guaranteed to be stable. But some implementations of sort support an option which will force that. Try adding -s:
sort -sc -t, -k2,2 test.csv
but note that I would expect the first out of order line to be Ravindra Jadeja India, since the 2nd field of that line is the empty string which should sort before "India".

Sorting file contents numerically by field

I am trying to write a BASH script to sort the contents of a file numerically according to a specific field in the file.
The file is under /etc/group. All of the fields are colon-separated :. I have to sort the contents of /etc/group numerically based on the 3rd field.
Example field: daemon:*:1:root
What I'm trying so far:
#!/bin/bash
sort /etc/group -n | cut -f 3-3 -d ":" /etc/group
This is getting me really close, but it only prints out a sorted list of 3rd field values (since cut literally cuts out the rest of the line). I'm trying to keep the rest of the line but still have it sorted by the 3rd field contents.
You can use sort -t like this:
sort -t : -nk3 /etc/group
-t : tells sort to use field delimiter as :
-nk3 tells sort to sort data numerically on field #3

Sort CSV file based on first column

Is there a way to sort a csv file based on the 1st column using some shell command?
I have this huge file with more than 150k lines hence I can do it in excel:( is there an alternate way ?
sort -k1 -n -t, filename should do the trick.
-k1 sorts by column 1.
-n sorts numerically instead of lexicographically (so "11" will not come before "2,3...").
-t, sets the delimiter (what separates values in your file) to , since your file is comma-separated.
Using csvsort.
Install csvkit if not already installed.
brew install csvkit
Sort CSV by first column.
csvsort -c 1 original.csv > sorted.csv
I don't know why above solution was not working in my case.
15,5
17,2
18,6
19,4
8,25
8,90
9,47
9,49
10,67
10,90
13,96
159,9
however this command solved my problem.
sort -t"," -k1n,1 fileName

Sort and remove duplicates based on column

I have a text file:
$ cat text
542,8,1,418,1
542,9,1,418,1
301,34,1,689070,1
542,9,1,418,1
199,7,1,419,10
I'd like to sort the file based on the first column and remove duplicates using sort, but things are not going as expected.
Approach 1
$ sort -t, -u -b -k1n text
542,8,1,418,1
542,9,1,418,1
199,7,1,419,10
301,34,1,689070,1
It is not sorting based on the first column.
Approach 2
$ sort -t, -u -b -k1n,1n text
199,7,1,419,10
301,34,1,689070,1
542,8,1,418,1
It removes the 542,9,1,418,1 line but I'd like to keep one copy.
It seems that the first approach removes duplicate but not sorts correctly, whereas the second one sorts right but removes more than I want. How should I get the correct result?
The problem is that when you provide a key to sort the unique occurrences are looked for that particular field. Since the line 542,8,1,418,1 is displayed, sort sees the next two lines starting with 542 as duplicate and filters them out.
Your best bet would be to either sort all columns:
sort -t, -nk1,1 -nk2,2 -nk3,3 -nk4,4 -nk5,5 -u text
or
use awk to filter duplicate lines and pipe it to sort.
awk '!_[$0]++' text | sort -t, -nk1,1
When sorting on a key, you must provide the end of the key as well, otherwise sort uses all following keys as well.
The following should work:
sort -t, -u -k1,1n text

Sorting a CSV file from greatest to least based a number appearing in a column

I have a CSV file like this:
bear,1
fish,20
tiger,4
I need to sort it from greatest to least number, based on what is found in the second column, e.g.:
fish,20
tiger,4
bear,1
How can the file be sorted in this way?
sort -t, -k+2 -n -r filename
will do what you want.
-t, specifies the field separator to be a comma
-k+2 specifies the field to sort on (field2)
-r specifies a reverse sort
-n specifies a numeric sort

Resources