Cygwin CSV sort data by column - shell

UPDATE 2:
OK I can't get this to work properly, it was working on one file but not another:
C:\cygwin\bin\sort -t"," -k5,2 c:\tmp\CI-tosplit.csv > c:\tmp\CI-DEN-sorted.csv
This seems to sort the data but it's ignoring the header, I thought the ,2 was saying start # line 2 which it does on one file but not another.
All I am trying to do is sort a csv file by column 5 and keep the header.
Thanks again for all the input.
UPDATED:
OK I've now switched to cygwin for this and I'm using the following command:
C:\cygwin\bin>sort -t"," -k8 c:\tmp\test.csv > c:\tmp\test-sorted.csv
-t to set deliminiter
-k for column number 12
This works, but I cannot get the header to stay in place.
Any input would be great, thanks guys.
I am trying to sort a CSV by a specified column using awk, but I cannot find anything that works.
sort -t, -k2 - u test.csv
Input file specified two times.
Please help, I am using Windows BTW.

It looks like you are using sort rather than awk. which has a -u option (no space) for unique, also the -t option should have a value (the seperator). Try:
sort -t, -k2 -u test.csv

If anybody else does find this useful, I found the easiest way to sort csv files by columns was using cygwin, the following works for me:
sort -t"," -k5,2 c:\tmp\test.csv > c:\tmp\test-sorted.csv
-t set the determiner
-K Columnn , 2 for line 2 if you have a header
input file
output file
Thanks for all the input guys.

Related

how to check if a file is sorted on nth column in unix?

lets say that I have a file as below:(comma separated)
cat test.csv
Rohit,India
Rahul,India
Surya Kumar,India
Shreyas Iyer,India
Ravindra Jadeja India
Rishabh Pant India
zzabc,abc
Now I want to check if the above file is sorted on 02nd column.
I tried the command sort -ct"," -k2,2 test.csv
I'm expecting it to say disorder in last line, but it is giving me disorder in 02nd line.
Could anybody tell me what is wrong here? and how to get the expected output?
The sort is not guaranteed to be stable. But some implementations of sort support an option which will force that. Try adding -s:
sort -sc -t, -k2,2 test.csv
but note that I would expect the first out of order line to be Ravindra Jadeja India, since the 2nd field of that line is the empty string which should sort before "India".

Unix custom sorting

Hi I want to sort my file output using custom sort please help me.
Input File.
USA|123|Pin
UK|1243|Pin
Australia|555|Pin
Germany|1|Pin
Singapore|65|Pin
Germany|10|Pin
Here I want to show Row contains Germany in first position and rest of rows n same order as in file.
Output >>
Germany|1|Pin
Germany|10|Pin
USA|123|Pin
UK|1243|Pin
Australia|555|Pin
Singapore|65|Pin
Easy enough with two passes through the file.
grep '^Germany|' originalfile >newfile
grep -v '^Germany|' originalfile >>newfile

Sort CSV file based on first column

Is there a way to sort a csv file based on the 1st column using some shell command?
I have this huge file with more than 150k lines hence I can do it in excel:( is there an alternate way ?
sort -k1 -n -t, filename should do the trick.
-k1 sorts by column 1.
-n sorts numerically instead of lexicographically (so "11" will not come before "2,3...").
-t, sets the delimiter (what separates values in your file) to , since your file is comma-separated.
Using csvsort.
Install csvkit if not already installed.
brew install csvkit
Sort CSV by first column.
csvsort -c 1 original.csv > sorted.csv
I don't know why above solution was not working in my case.
15,5
17,2
18,6
19,4
8,25
8,90
9,47
9,49
10,67
10,90
13,96
159,9
however this command solved my problem.
sort -t"," -k1n,1 fileName

Remove column from a csv file directly from mac terminal?

How would one remove the 3rd column for example from a csv file directly from the command line of the Mac terminal. I understand
cut -d',' -f3 data.csv
extracts the column info out directly into the terminal, but I want the 3rd column to be entirely removed from the dataset. How can I do this via the terminal?
Try
cut -d',' -f1-2,4- data.csv
All the examples seem a bit tricky if trying to delete multiple fields or if you want to see only several columns. As such I simply show only the columns I want. So if I only want to get columns 1,2 and 5 I'd do this:
cut -d, -f 1,2,5 hugeData.csv
NB: -d sets whatever the separator is in the file. In the example above it is a comma ,
My grep/regex is rusty, but if the number of columns is fixed, then you can simply make a grep statement for each quote pair (and its contents), and then replace with all but the third pair. That's a clunky of doing it; but nonetheless should get the job done.
man grep
and page down to the REGULAR EXPRESSIONS section for help with how to specify.

Sort and remove duplicates based on column

I have a text file:
$ cat text
542,8,1,418,1
542,9,1,418,1
301,34,1,689070,1
542,9,1,418,1
199,7,1,419,10
I'd like to sort the file based on the first column and remove duplicates using sort, but things are not going as expected.
Approach 1
$ sort -t, -u -b -k1n text
542,8,1,418,1
542,9,1,418,1
199,7,1,419,10
301,34,1,689070,1
It is not sorting based on the first column.
Approach 2
$ sort -t, -u -b -k1n,1n text
199,7,1,419,10
301,34,1,689070,1
542,8,1,418,1
It removes the 542,9,1,418,1 line but I'd like to keep one copy.
It seems that the first approach removes duplicate but not sorts correctly, whereas the second one sorts right but removes more than I want. How should I get the correct result?
The problem is that when you provide a key to sort the unique occurrences are looked for that particular field. Since the line 542,8,1,418,1 is displayed, sort sees the next two lines starting with 542 as duplicate and filters them out.
Your best bet would be to either sort all columns:
sort -t, -nk1,1 -nk2,2 -nk3,3 -nk4,4 -nk5,5 -u text
or
use awk to filter duplicate lines and pipe it to sort.
awk '!_[$0]++' text | sort -t, -nk1,1
When sorting on a key, you must provide the end of the key as well, otherwise sort uses all following keys as well.
The following should work:
sort -t, -u -k1,1n text

Resources