Finding biggest number from text file contaning two columns and several rows - bash

For example, I have got a text file containing 2 columns:
0.000000e+00 0.000000e+00
1.958870e-02 1.566242e-02
3.923750e-02 6.509739e-03
4.394830e-01 3.216723e-03
4.594830e-01 2.508868e-03
4.794890e-01 3.813512e-04
4.995070e-01 8.846235e-04
5.997070e-01 1.671057e-03
I want to find maximum values in column 2 which shows corresponding value of column 1 in the output.

this awk one-liner will do it without sorting:
awk '$2>m{f=$1;m=$2}END{print f}' file
it outputs:
1.958870e-02

I was just testing the exact same solution that #jeanrjc just posted as a comment - I think, if I understand your question, it is the correct answer (to get the MAX row):
sort -n -k2 file.dat | tail -1

If you have 1e-2 notation, you need to sort with -g option :
Max :
sort -k2g file.dat | tail -1
Min :
sort -k2gr file.dat | tail -1
-k2 stands for column 2
-r (or -k2r) for reverse order
if you have a header you can remove it with awk :
awk 'NR>1' file.dat | sort -k2g | tail -1
You can alternatively use head instead of tail to get opposite result, eg :
sort -k2g file.dat | head -1
will give you the min
Hope this helps.

Related

Unix- smallest value in column

I have a txt file which is just three columns of numbers separated by space. I need to use "sort" to display the smallest value of column 3 and only that value.
I tried
sort -k3 file.txt|head -1
but it shows the first value of all three columns.
This is what's expected. sort -k3 file.txt | head -1 says "show me the first line of output"
Use just plain sort -k3 file.txt | head to get the first 10 lines.
What were you expecting or wanting?
In response to the comment: No worries! We're all beginners at the beginning :-)
sort -r file.txt will sort in reverse order, and as #shellter says, sort -r -k3 file.txt | awk 'NR==1{print $3} will print the third value on the first line.

Sort a file in unix by the absolute value of a field

I want to sort this file by the absolute value of the Linear regression (p) column in descending order. My attempt to do this didnt quite work. Im not sure what it fails. I found this code from http://www.unix.com/shell-programming-and-scripting/168144-sort-absolute-value.html.
awk -F',' '{print ($2>=0)?$2:-$2, $0}' OFS=',' mycsv1.csv | sort -n -k8,8 | cut -d ',' -f2-
X var,Y var,MIC (strength),MIC-p^2 (nonlinearity),MAS (non-monotonicity),MEV (functionality),MCN (complexity),Linear regression (p)
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
...
Please help me to understand the awk script to sort this file.
You could use sed and sort for this and follow the #hek2mgl's very smart logic of adding and removing a field at the end to retain the original number:
sed -E 's/,([-]?)([0-9.]+)$/,\1\2,\2/' file | sort -t, -k9,9 -nr | cut -f1-8 -d,
sed -E 's/,([-]?)([0-9.]+)$/,\1\2,\2/' => creates field 9 as the absolute value of field 8
sort -t, -k9,9 -nr => sorts by the newly created field, numeric and descending order
cut -f1-8 -d, => removes the 9th field, restoring the output to its original format, with the desired sorting order
Here is the output:
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
Take three steps:
(1) Temporarily create a 9th field which contains the abs value of field 8:
LC_COLLATE=C awk -F, 'NR>1{v=$NF;sub(/-/,"",v);printf "%s%s%s%s",$0,FS,v,RS}' file
^ ------ make sure this is set since sorting, especially the decimal point
depends on the local.
(2) Sort that output based on the 9th field:
command_1 | sort -t, -k9r
(3) Pipe that back to awk to remove the last field. NF-- decreases the number of fields which will effectively remove the last field. 1 is always true, that makes awk print the line:
command_2 | cut -d, -f1-8
Output:
AT1G01030,AT3G06520,0.61732,0.17639545,0.23569,0.58557,4.0,0.6640215
AT1G01030,AT1G55280,0.57287,0.20705527,0.19536,0.52857,4.0,0.6048262
AT1G01030,AT1G80040,0.56268,0.22935495,0.18583998,0.52728,4.0,-0.5773431
AT1G01030,AT1G32310,0.67958,0.4832027,0.32644996,0.63247,4.0,-0.44314474
AT1G01030,AT5G30490,0.56509,0.37536618,0.16172999,0.51847,4.0,-0.43557298
AT1G01030,AT5G42580,0.61579,0.5019064,0.30105,0.58143,4.0,0.33746648
Could get awk to do it all:
awk -F, 'NR>1{n[substr($NF,1,1)=="-"?substr($NF,2):$NF]=$0}NR==1;END{asorti(n,out);for(i in out)print n[out[i]]}' file

How to do specific sorting in unix

How can I sort following two lines
ABCTz.T.3a.B Student 1 1.4345
ABCTz.T.3.B Student 1 1.5465
to print them like below.
ABCTa.T.3.B Student 1 1.5465
ABCTa.T.3a.B Student 1 1.4345
It can be definitely done using a mixture of sed and sort command but that's not a generic solution. Here is the sample code,
cat 1 | sed "s/\./ ./g" | sort -k3,3 | sed "s/ \././g"
This solution requires customization if the length of string changes or number of character changes between two dots(i.e....
ABCTz.T.SC.D.3a.B Student 1 1.4345
ABCTz.T.SC.D.3.B Student 1 1.5465
Again, I need to modify the sort expression to consider the length in this case. Looking forward to have something very generic.
Regards, Divesh
You can use version sort, available with gnu sort on first field:
sort -V -rk1 file
ABCTz.T.3.B Student 1 1.5465
ABCTz.T.3a.B Student 1 1.4345
If the format is based on tabs, it's easy.
cat 1|sort -t"[Control-V][TAB]" -n -r -k4
But if the number of spaces is variable, I sort with awk.
This formula will put the 4th field at the beginning, followed by |, then it will sort based on this field, and then will strip it out:
cat 1|awk '{print $4 "|" $0}' |sort -t"|" -n -r -k1|cut -d"|" -f2-
Example:
boxes#osboxes Desktop]$ cat 1
asdfa safadf 1.2
asldfkañ sdlfsld 1.3
[osboxes#osboxes Desktop]$ cat 1 | awk '{print $3 "|" $0}'|sort -t"|" -n -r -k1|cut -d"|" -f2-
asldfkañ sdlfsld 1.3
asdfa safadf 1.2
Enjoy!

How to remove duplicates by column (inverse ordering)

I've looking for this in here, but did not found the exact case. Sorry if it is duplicated, but I couldn't find it.
I have a huge file in Debian that contains 4 columns separated by "#", with the following format:
username#source#date#time
For example:
A222222#Windows#2014-08-18#10:47:16
A222222#Juniper#2014-08-07#14:31:40
A222222#Juniper#2014-08-08#09:15:34
A111111#Juniper#2014-08-10#14:32:55
A111111#Windows#2014-08-08#10:27:30
I want to print unique rows based on the first two columns, and if duplicates found, it has to print the last event based on date/time. With the list above, the result should be:
A222222#Windows#2014-08-18#10:47:16
A222222#Juniper#2014-08-08#09:15:34
A111111#Juniper#2014-08-10#14:32:55
A111111#Windows#2014-08-08#10:27:30
I have tested it using two commands:
cat file | sort -u -t# -k1,2
cat file | sort -r -u -t# -k1,2
But both of them print the following:
A222222#Windows#2014-08-18#10:47:16
A222222#Juniper#2014-08-07#14:31:40 --> Wrong line, it is older than the duplicate one
A111111#Juniper#2014-08-10#14:32:55
A111111#Windows#2014-08-08#10:27:30
Is there any way to do it?
Thanks!
This should work
tac file | awk -F# '!a[$1,$2]++' | tac
Output
A222222#Windows#2014-08-18#10:47:16
A222222#Juniper#2014-08-08#09:15:34
A111111#Juniper#2014-08-10#14:32:55
A111111#Windows#2014-08-08#10:27:30
First, you need sort the input file to ensure the order of lines, e.g. for duplicate username#source you will get ordered times. Best is sort reverse, so last event comes first. This can be done with an simple sort, like:
sort -r < yourfile
This will produce from your input the next:
A222222#Windows#2014-08-18#10:47:16
A222222#Juniper#2014-08-08#09:15:34
A222222#Juniper#2014-08-07#14:31:40
A111111#Windows#2014-08-08#10:27:30
A111111#Juniper#2014-08-10#14:32:55
reverse-ordered lines, where for the each username#source combination the latest event comes first.
next, you need somewhat filter the sorted lines, to get only the first event. This can be done, with several tools, like awk or uniq or perl and such,
So, the solution
sort -r <yourfile | uniq -w16
or
sort -r <yourfile | awk -F# '!seen[$1,$2]++'
or
sort -r yourfile | perl -F'#' -lanE 'say $_ unless $seen{"$F[0],$F[1]"}++'
all the above will print the next
A222222#Windows#2014-08-18#10:47:16
A222222#Juniper#2014-08-08#09:15:34
A111111#Windows#2014-08-08#10:27:30
A111111#Juniper#2014-08-10#14:32:55
Finally you can re-sort the unique lines as you want and needed.
awk -F\# '{ p = ($1 FS $2 in a ); a[$1 FS $2] = $0 }
!p { keys[++k] = $1 FS $2 }
END { for (k = 1; k in keys; ++k) print a[keys[k]] }' file
Output:
A222222#Windows#2014-08-18#10:47:16
A222222#Juniper#2014-08-08#09:15:34
A111111#Juniper#2014-08-10#14:32:55
A111111#Windows#2014-08-08#10:27:30
If you know for a fact that the first column is always 7 chars long, and second column also 7 chars long, you can extract unique lines considering only the first 16 characters with:
uniq file -w 16
Since you want the latter duplicate, you can reverse the data using tac prior to uniq and then reverse the output again:
tac file | uniq -w 16 | tac
Update: As commented below, uniq needs the lines to be sorted. In which case this starts to become contrived, and the awk based suggestions are better. Something like this would still work though:
sort -s -t"#" -k1,2 file | tac | uniq -w 16 | tac

Bash sort, I don't know how to sort the same line, and choose the older

I have some question, I don't find and I don't know how implement good sort in my script. I want to sort some of input string, to show me redundancy (some repetition), and from this, return the line of biggest repetition, and if I have more the same repetition for example
input :
qwe1
qwe1
wer2
wer2
wer4
output: // What I want
2 qwe1
input:
asd1
asd1
asd1
asd2
asd2
asd2
asd3
asd3
asd3
output: // What i want
3 asd1 // If I have a the same name return id of alphabeticall first
#!/bin/bash
sort -n|uniq -c -i | sort -dr | head -n1
I tried some of other arguments of sort, but I didn't find a solution.
Sorry for my english, please someone can help me with this?
This might work for you:
sort | uniq -c | sort -nrs | head -1
sort | uniq -c | sort -k1nr -k2 | head -1
where -k1nr means sort on the first column numerically and reverse (high to low)
and -k2 means, where first keys are equal, sort by column2 (alphabetically)
sort | uniq -c | sort -nr | awk '{if(a&&a!=$1){print a,b;exit;}a=$1;b=$2}'
I think this can all be done in one single awk command. Consider this:
awk '{freq[$0]++;} END {for (var in freq)
{if (freq[var]>max || (freq[var]==max && var < item))
{max=freq[var]; item=var} } print max, item; }' file.txt

Resources