Sort text file using bash sort - bash

I'm trying to sort the following file by date with earliest to latest:
$NAME DIA
# Date,Open,High,Low,Close,Volume,Adj Close
01-10-2014,169.91,169.98,167.42,167.68,11019000,167.68
29-04-2014,164.62,165.27,164.49,165.00,4581400,163.40
17-10-2013,152.11,153.59,152.05,153.48,9916600,150.26
06-09-2013,149.70,149.97,147.77,149.09,9001900,145.68
02-11-2012,132.56,132.61,130.47,130.67,5141300,125.01
01-11-2012,131.02,132.44,130.97,131.98,3807400,126.27
sort -t- -k3 -k2 -k1 DIA.txt gets the year right but scrambles the month and day.
any help would be greatly appreciated.

This seems to produce correct output
sort -s -t- -k3,3 -k2,2 -k1,1
output:
$ sort -s -t- -k3,3 -k2,2 -k1,1 dia.txt
# Date,Open,High,Low,Close,Volume,Adj Close
01-11-2012,131.02,132.44,130.97,131.98,3807400,126.27
02-11-2012,132.56,132.61,130.47,130.67,5141300,125.01
06-09-2013,149.70,149.97,147.77,149.09,9001900,145.68
17-10-2013,152.11,153.59,152.05,153.48,9916600,150.26
29-04-2014,164.62,165.27,164.49,165.00,4581400,163.40
01-10-2014,169.91,169.98,167.42,167.68,11019000,167.68

I would try changing the date format first.
sed -r "s/(..)-(..)-(....)/\\3-\\2-\\1/" DIA.txt | sort
You can also change it back after sorting the lines.
sed -r "s/(..)-(..)-(....)/\\3-\\2-\\1/" DIA.txt | sort | sed -r "s/(....)-(..)-(..)/\\3-\\2-\\1/"

sort's -k flag only allows you to specify two columns that give the range of keys to use in the sort. Here you want to involve a third column before that. There is a special syntax to use an additional column to resolve ties (here between rows when sorting with column 3 and 2):
sort -t'-' -k3,2.1 d

Related

Sort point character first

I would like to sort a list of file names using sort.
For instance:
file.ext
file1.ext
z_file2.ext
Using sort, I get
file1.ext
file.ext
z_file2.ext
How can I do so that file. is sorted before fileXXXX. ?
As suggested in a comment, your problem is that your locale produces an odd sort order. Setting the locale to C for the sort should fix the problem:
LC_ALL=C sort
For a more precise fix, assuming you want to use locale-aware collation order but still separate the sort key at the extension, specify . as the field delimiter and use two sort keys:
sort -t. -k1,1 -k2
You have to separate the filenames from the digits, sort them accordingly and merge back
$ sed -r 's/([0-9]*)\./ &/' file | sort -k1,1 -k2n | sed 's/ //'
file.ext
file1.ext
z_file2.ext
z_file11.ext
You can use -d option
From manpage:
-d, --dictionary-order consider only blanks and alphanumeric characters
$ cat toto
file.ext
file1.ext
z_file2.ext
$ sort -d toto
file1.ext
file.ext
z_file2.ext

Sort text file with cat and sort concatenation

I got a txt file with some content looking like
stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02
and I want to use cat and sort to get them reverse ordered by the date as an output on my command-line by concatenation.
not sure why you think you need to cat a file into sort, but here are 2 options
cat yourFile | sort -t, -k3r
sort -t, -k3r yourFile
To test this I did
echo "stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02" \
| sort -t, -k3r
output
stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02
And finally, you can overwrite your existing file using the -o option like
sort -t, -o yourFile -k3r yourFile
Thanks to #karakfa for reminding me your your requirement for reverse order sort. This is accomplished by adding an r to the key specification, hence -k3r.
IHTH

Unix shell script to sort files depending on the 'date string' present in their file name

I am trying to sort files in a directory, depending on the 'date string' attached in the file name, for example files looks as below
SSA_F12_05122013.request.done
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
Where 05122013,12142012 and 01062013 represents the dates in format.
Please help me in providing a unix shell script to sort these files on the date string present in their file name(in descending and ascending order).
Thanks in advance.
Hmmm... why call on heavyweights like awk and Perl when sort itself has the capability to define what exactly to sort by?
ls SSA_F*.request.done | sort -k 1.13,1.16 -k 1.9,1.10 -k 1.11,1.12
Each -k option defines a "sort key":
-k 1.13,1.16
This defines a sort key ranging from field 1, column 13 to field 1, column 16. (A field is by default delimited by whitespace, which your filenames don't have.)
If your filenames are varying in length, defining the underscore as field separator (using the -t option) and then addressing columns in the third field would be the way to go.
Refer to man sort for details. Use the -r option to sort in descending order.
one way with awk and sort:
ls -1|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|sort|awk '$0=$NF'
if we break it down:
ls -1|
awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
the ls -1 just example. I think you have your way to get the file list, one per line.
test a little bit:
kent$ echo "SSA_F13_12142012.request.done
SSA_F12_05122013.request.done
SSA_F14_01062013.request.done"|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
SSA_F12_05122013.request.done
ls -lrt *.done | perl -lane '#a=split /_|\./,$F[scalar(#F)-1];$a[2]=~s/(..)(..)(....)/$3$2$1/g;print $a[2]." ".$_' | sort -rn | awk '{$1=""}1'
ls *.done | perl -pe 's/^.*_(..)(..)(....)/$3$2$1$&/' | sort -rn | cut -b9-
this would do +

Sort command not working as expected?

I have got a dataset like this
tack2#domain.com,2009-11-27
overflow#domain2.com,2009-11-27
overflow#domain2.com,2009-11-27
When I am running command to delete all of the same entries of column2
sort -t ',' -k2 stars.txt -u
It is deleting the entry of column1, and in order to delete the duplicate entries of column2, I am having to enter -k3 flag
sort -t ',' -k3 stars.txt -u
Can anyone explain to me why it is happening? Why I have to enter +1 to the column in the file to delete the column?
In my system all works correctly:
$ sort -t, -k1 -u 1.txt
overflow#domain2.com,2009-11-27
tack2#domain.com,2009-11-27
$ sort -t, -k2 -u 1.txt
tack2#domain.com,2009-11-27
It may be due to your locale.
Can you please repleat the command but with LANG=C?
$ LANG=C sort -t, -k1 -u 1.txt
$ LANG=C sort -t, -k2 -u 1.txt
this is typical awk job, no sorting needed. I add one short line here, in case you want to give it a try.
awk -F, '!a[$2]++' file
will do the job.

Sorting with unix tools and multiple columns

I am looking for the easiest way to solve this problem. I have a huge data set that i cannot load into excel of this type of format
This is a sentence|10
This is another sentence|5
This is the last sentence|20
What I want to do is sort this from least to greatest based on the number.
cat MyDataSet.txt | tr "|" "\t" | ???
Not sure what the best way is to do this, I was thinking about using awk to switch the columns and the do a sort, but I was having trouble doing it.
Help me out please
sort -t\| -k +2n dataset.txt
Should do it. field separator and alternate key selection
You usually don't need cat to send the file to a filter. That said, you can use the sort filter.
sort -t "|" -k 2 -n MyDataSet.txt
This sorts the MyDataSet.txt file using the | character as field separator and sorting numerically according to the second field (the number).
have you tried sort -n
$ sort -n inputFile
This is another sentence|5
This is a sentence|10
This is the last sentence|20
you could switch the columns with awk too
$ awk -F"|" '{print $2"|"$1}' inputFile
10|This is a sentence
5|This is another sentence
20|This is the last sentence
combining awk and sort:
$ awk -F"|" '{print $2"|"$1}' inputFile | sort -n
5|This is another sentence
10|This is a sentence
20|This is the last sentence
per comments
if you have numbers in the sentence
$ sort -n -t"|" -k2 inputFile
This is another sentence|5
This is a sentence|10
This is the last sentence|20
this is a sentence with a number in it 2|22
and of course you could redirect it to a new file:
$ awk -F"|" '{print $2"|"$1}' inputFile | sort -n > outFile
Try this sort command:
sort -n -t '|' -k2 file.txt
Sort by number, change the separator and grab the second group using sort.
sort -n -t'|' -k2 dataset.txt

Resources