Unix- smallest value in column - sorting

I have a txt file which is just three columns of numbers separated by space. I need to use "sort" to display the smallest value of column 3 and only that value.
I tried
sort -k3 file.txt|head -1
but it shows the first value of all three columns.

This is what's expected. sort -k3 file.txt | head -1 says "show me the first line of output"
Use just plain sort -k3 file.txt | head to get the first 10 lines.
What were you expecting or wanting?
In response to the comment: No worries! We're all beginners at the beginning :-)
sort -r file.txt will sort in reverse order, and as #shellter says, sort -r -k3 file.txt | awk 'NR==1{print $3} will print the third value on the first line.

Related

sort lines by the last "element" [non-csv text file]

for lines with same number of columns separated by a dot delimiter, like
aa.bb
cc.dd
...
it's easy to sort by last column
sort -t. -k2,2 file
if the text file have different "columns", like
aa.b.xb
cc.dd
xx.cc.aa
a.b.c.d.e
...
then how to sort the lines by the last "column"
xx.cc.aa
cc.dd
a.b.c.d.e
aa.b.xb
...
You can make use of the Schwartzian transform in bash.
awk -F. '{print $NF "\t" $0}' file | sort -k1,1 | cut -f2-
First extract the last column and prepend it to the line delimited by
a tab character.
Then sort the lines with the 1st (prepended) column.
Finaly remove the 1st column with cut command.

Sort by length of column

Need help sorting by length of the 4th column with a Unix command.
Example data (all data is made up, and not actual).
5032:Stack:overflows#business.com:123:JamesPeterson
3200:Admin:admin#me.com:12ej3dij23i2j32:AdminAdmin
1024:GregoryJames:greg#admin.com:12329232:GregJames
Preferred format (Because the length of 4th column is the longest).
3200:Admin:admin#me.com:12ej3dij23i2j32:AdminAdmin
1024:GregoryJames:greg#admin.com:12329232:GregJames
5032:Stack:overflows#business.com:123:JamesPeterson
Use awk to add a column containing the length of the column, sort by that, then remove it.
awk -F: '{printf("%d %s\n", length($4), $0)}' input.txt | sort -nr | cut -d' ' -f2- > output.txt

Sort a file in unix by the absolute value of a field

I want to sort this file by the absolute value of the Linear regression (p) column in descending order. My attempt to do this didnt quite work. Im not sure what it fails. I found this code from http://www.unix.com/shell-programming-and-scripting/168144-sort-absolute-value.html.
awk -F',' '{print ($2>=0)?$2:-$2, $0}' OFS=',' mycsv1.csv | sort -n -k8,8 | cut -d ',' -f2-
X var,Y var,MIC (strength),MIC-p^2 (nonlinearity),MAS (non-monotonicity),MEV (functionality),MCN (complexity),Linear regression (p)
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
...
Please help me to understand the awk script to sort this file.
You could use sed and sort for this and follow the #hek2mgl's very smart logic of adding and removing a field at the end to retain the original number:
sed -E 's/,([-]?)([0-9.]+)$/,\1\2,\2/' file | sort -t, -k9,9 -nr | cut -f1-8 -d,
sed -E 's/,([-]?)([0-9.]+)$/,\1\2,\2/' => creates field 9 as the absolute value of field 8
sort -t, -k9,9 -nr => sorts by the newly created field, numeric and descending order
cut -f1-8 -d, => removes the 9th field, restoring the output to its original format, with the desired sorting order
Here is the output:
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
Take three steps:
(1) Temporarily create a 9th field which contains the abs value of field 8:
LC_COLLATE=C awk -F, 'NR>1{v=$NF;sub(/-/,"",v);printf "%s%s%s%s",$0,FS,v,RS}' file
^ ------ make sure this is set since sorting, especially the decimal point
depends on the local.
(2) Sort that output based on the 9th field:
command_1 | sort -t, -k9r
(3) Pipe that back to awk to remove the last field. NF-- decreases the number of fields which will effectively remove the last field. 1 is always true, that makes awk print the line:
command_2 | cut -d, -f1-8
Output:
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
Could get awk to do it all:
awk -F, 'NR>1{n[substr($NF,1,1)=="-"?substr($NF,2):$NF]=$0}NR==1;END{asorti(n,out);for(i in out)print n[out[i]]}' file

Unix - Sorting file name with a key but not knowing its position

I would like to sort those files using Unix commands:
MyFile_fdfdsf_20140326.txt
MyFile_4fg5d6_20100301.csv
MyFile_dfgfdklm_19990101.tar.gz
The result I am waiting for here is MyFile_fdfdsf_20140326.txt
So I'd like to get the file with the newest date.
I can't use 'sort -k', as the position of the key (the date) may vary
But in my file name there are always two "_" delimiters and a dot '.' for the file extension
Any help would be appreciated :)
Then use -t to indicate the field separator and set it to _:
sort -t'_' -k3
See an example of sorting the file names if they are in a file. I used -n for numeric sort and -r for reverse order:
$ sort -t'_' -nk3 file
MyFile_dfgfdklm_19990101.tar.gz
MyFile_4fg5d6_20100301.csv
MyFile_fdfdsf_20140326.txt
$ sort -t'_' -rnk3 file
MyFile_fdfdsf_20140326.txt
MyFile_4fg5d6_20100301.csv
MyFile_dfgfdklm_19990101.tar.gz
From man sort:
-t, --field-separator=SEP
use SEP instead of non-blank to blank transition
-n, --numeric-sort
compare according to string numerical value
-r, --reverse
reverse the result of comparisons
Update
Thank you for you answer. It's perfect. But out of curiosity, what if
I had an unknown number of delimiters, but the date was always after
the last "_" delimiter. MyFile_abc_def_...20140326.txt sort -t''
-nk??? file – user3464809
You can trick it a little bit: print the last field, sort and then remove it.
awk -F_ '{print $NF, $0}' a | sort | cut -d'_' -f2-
See an example:
$ cat a
MyFile_fdfdsf_20140326.txt
MyFile_4fg5d6_20100301.csv
MyFile_dfgfdklm_19990101.tar.gz
MyFile_dfgfdklm_asdf_asdfsadfas_19940101.tar.gz
MyFile_dfgfdklm_asdf_asdfsadfas_29990101.tar.gz
$ awk -F_ '{print $NF, $0}' a | sort | cut -d'_' -f2-
dfgfdklm_asdf_asdfsadfas_19940101.tar.gz
dfgfdklm_19990101.tar.gz
4fg5d6_20100301.csv
fdfdsf_20140326.txt
dfgfdklm_asdf_asdfsadfas_29990101.tar.gz

bash sort on multiple fields and deduplicating

I want to sort data like the below content on the first field first and then on the date in the third field. Then keep only the latest for each ID(field 1) - irrespective of the second field.
id1,description1,2013/11/20
id2,description2,2013/06/11
id2,description3,2012/10/28
id2,description4,2011/12/04
id3,description5,2014/02/09
id3,description6,2013/12/05
id4,description7,2013/12/05
id5,description8,2013/08/14
So the expected output will be
id1,description1,2013/11/20
id2,description2,2013/06/11
id3,description5,2014/02/09
id4,description7,2013/12/05
id5,description8,2013/08/14
Thanks
Jomon
You can use this awk:
> cat file
id1,description1,2013/11/20
id1,description1,2013/11/19
id2,description2,2013/06/11
id2,description3,2012/10/28
id2,description4,2011/12/04
id3,description5,2014/02/09
id3,description6,2013/12/05
id4,description7,2013/12/05
id5,description8,2013/08/14
> sort -t, -k1,1 -k3,3r file | awk -F, '!a[$1]++'
id1,description1,2013/11/20
id2,description2,2013/06/11
id3,description5,2014/02/09
id4,description7,2013/12/05
id5,description8,2013/08/14
Call sort twice; the first time, sort by the date. On the second call, sort uniquely on the first field, but do so stably so that items with the same id remain sorted by date.
sort -t, -k3,3r data.txt | sort -t, -su -k1,1
Try this:
cat file |sort -u|awk -F, '{if(map[$1] == ""){print $0; map[$1]="printed"}}'
Explanation:
I use sort to sort (well could not be more simple)
And I use awk to store in a map if the first column item was already printed.
If not (map[$1] == "") I print and store "printed" into map[$1] (so next time it won't be equal to "" for the current value of $1).

Resources