Bash: sort numbers with exponents - bash

I was trying to sort one file with numeric values like this:
414e-05
435e-05
0.5361
0.7278
0.1341
0.9592
0.2664
With sort all the numers get sorted except for the ones with the exponent, is there some way for sort function to evaluate this expression?

If your version of the sort command is new enough, it should support the -g option (or --general-numeric-sort) if you like your options long). It is described like this in the info manual:
Sort numerically, using the standard C function strtod to
convert a prefix of each line to a double-precision floating point
number. This allows floating point numbers to be specified in scientific
notation, like '1.0e-34' and '10e100'.

I don't have enough rep. to comment, so I am writing this to complement the accepted answer:
for those who have locales which use a comma instead of a period to indicate decimals, the sorting of decimals will not work properly, as pointed out by HongboZhu
Solution: the sorting of lists with period-delimited numbers will work properly when using something like the following command (important is the LC_ALL=C):
ls yourFolder|LC_ALL=C sort -g
This solution comes from the following post:
https://unix.stackexchange.com/questions/506965/bash-sort-g-does-not-work-properly

If you don't have sort -g, an alternative you can get is scisort.

perl -e 'print sort { $a<=>$b } <>' < input-file

Related

Terminal: SORT command; how to sort correctly?

I have written a shell script that gets all the file names from a folder, and all its sub-folders, and copies them to the clipboard after sorting (removing all paths; I just need a simple file list of the thousands of randomly named files within).
What I can’t figure out is how to get the SORT command to sort properly. Meaning, the way a spreadsheet would sort things. Or the way your Mac finder sorts things.
Underscores > numbers > letters (regardless of case)
Anyone know how to do this? Sort -n only works for files starting with numbers, sort -f was close but separated the lower case and capitals in a weird way, and anything starting with a number was all over the place. Sort -V was the closest, but anything started with an underscore went to the bottom instead of the top… I’m about to lose my mind. 🤣
I’ve been trying to figure this out for a week, and no combination of anything I have tried gets the sort command to actually, ya know, sort properly.
Help?
If I understand the problem correctly, you want the "natural sort order" as described in Natural sort order - Wikipedia, Sorting for Humans : Natural Sort Order, and macos - How does finder sort folders when they contain digits and characters?.
Using Linux sort(1) you need the -V (--version-sort) option for "natural" sort. You also need the -f (--ignore-case) option to disregard the case of letters. So, assuming that the file names are stored one-per-line in a file called files.txt you can produce a list (mostly) sorted in the way that you want with:
sort -Vf files.txt
However, sort -Vf sorts underscores after digits and letters on my system. I've tried using different locales (see How to set locale in the current terminal's session?), but with no success. I can't see a way to change this with sort options (but I may be missing something).
The characters . and ~ seem to consistently sort before numbers and letters with sort -V. A possible hack to work around the problem is to swap underscore with one of them, sort, and then swap again. For example:
tr '_~' '~_' <files.txt | LC_ALL=C sort -Vf | tr '_~' '~_'
seems to do what you want on my system. I've explicitly set the locale for the sort command with LC_ALL=C ... so it should behave the same on other systems. (See Why doesn't sort sort the same on every machine?.)
It appears you want to sort in dictionary order and fold case, so it would be sort -df.

How to sort datafiles with some numbers of Fortran-style D+01?

I have several Fortran datafiles that contain numbers in a format like this:
-0.53D+02
I want to combine these with simple floating point data like
-0.53
and then sort them, like with Unix sort.
Unfortunately sort can't recognize this format, so I am looking for a simple converter, but couldn't find anything online. I thought about a Fortran script and converting it from double precision to float, but I am not quite sure about the number of digits and this is always a bit tedious with Fortran.
So does anyone know a script that can do this, a sorting program that can read that format or maybe even just a short sed command that might help? I am not that good with sed, so it would cost me quite a while to figure out how...
I think you can use this:
sed 's/D/E/g' YourNumbersFile | sort -g
The sed command changes all Ds to Es - read it like this... "substitute all Ds with Es, globally".
Thesort command needs the -g option to sort general numerical numbers.
If your sort doesn't accept the -g switch, I guess another option might be to use this awk to reformat your numbers:
awk '{sub(/D/,"e");printf "%8.3f\n", $1}' YourNumbersFile | sort -n

Sorting on multiple columns and ignoring fields in vim

is it possible to sort on multiple columns and ignore certain lines starting with # ?
I have my a text like this:
#Comments
#More comments
foo;1;1
foo;3;2
bar;2;1
I'd like to sort on the first number and if those are equal on the last number.
I tried this:
:%!sort -t';' -k2n -k3n
but this will affect the comments section.
I know i can make vim ignore the comments like this:
:sort /^#/
but how do i select the fields now??
Does the shell sort have a field ignorer? Or can the VIM sort use fields?
BTW the comments section's length can increase so head/tail won't work.
I do not think that
:sort /^#/
does what you want. It will sort the comments, putting them at the end of the buffer, and leave the other lines in the original order. A lot closer to what you want is
:sort /;/
This will leave all comments at the top of the buffer, in the original order, and sort on the part of the line after the first ;. Probably lexicographic sort is not what you want. Instead, you could use
:sort /;/ n
This will do numeric sort, but ignore the part of the line after the first number.
In order to avoid sorting comment lines that happen to contain ; characters, you could use a more complicated pattern:
:sort /^\(\s*#\)\#!.\{-};/ n
or (using a feature that I may never have tried before)
:sort /^\s*[^#]\&.\{-\};/ n
I am old-school, and use vim's default settings, but a lot of people prefer the \v (very magic) setting. That makes these a little simpler:
:sort /\v^(\s*#)#!.{-};/ n
:sort /\v^\s*[^#]&.{-};/ n
OTOH, the version you suggested using the external sort seems to work perfectly.
$ sort --version
sort (GNU coreutils) 5.93

Sort ignores an apostrophe - sometimes (except when it is the only column used); WHY?

This happens to me both on Linux and on cygwin, so I suspect it is not a bug. Still, I don't understand it. Can anyone explain?
Consider the following file (tab-delimited, and that's a regular apostrophe)
(I create it with cat to ensure that it wasn't non-printing characters that were the source of the problem)
$cat > temp
cat 1389
cat' 1747
ca't 3175
cat 46848484
ca't 720
$sort temp
<gives the exact same output as cat temp>
$sort -k1,1 temp
cat 1389
cat 46848484
cat' 1747
ca't 3456
ca't 720
Why do I have to ignore the second column in order to sort correctly?
I pulled up the manual for sort and noticed the following:
* WARNING * The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
As it turns out, locales actually specify how lexicographic ordering works for a given locale. This makes a lot of sense, but for some reason it trips over multi field files...
(see also:)
Unusual behaviour of linux's sort command
Why does the sort command sort differently if there are trailing fields?
There are a couple of things you can do:
You can sort naively by byte value using
LC_ALL="C" sort temp
This will give a more logical result, but it might not be the one you actually want.
You could try to get sort to do a more basic lexicographical ordering by setting the locale to C and telling it you want dictionary ordering:
LC_ALL="C" sort -d temp
To have sort output your locale information and hilight the sort key, you can use
sort --debug temp
Personally I'm really curious to know what rule is being specified that makes sort behave unintuitively across multiple fields.
They're supposed to specify correct lexicographic order in the given language and dialect. Do the locales' functions simply not handle the multiple field case at all, or are they taking some kind of different interpretation on the "meaning" of the line?

Case Sensitive Sort Unix Bash

Here is a screenshot of an issue I'm having with sort:
http://i.imgur.com/cIvAF.png
The objective I want out of this, is to put all equal strings on consecutive lines. It works for 99% of the list I'm sorting, but there's a few hitches such as those in the screen shot.
So all the yahoo.coms should be next to each other, and then all the Yahoo.coms then the YAHOO.coms yahoo.cmos yhoo.c etc. (The typos even getting their own group of lines)
Not entirely sure how to handle this with sort, but I'm certainly trying.
I print all the domains unsorted to a file and then sort it with just vanilla sort filename
Would love some advice/input.
You probably need to override the locale; most Linux systems default to a UTF8 locale which specifies both case independent sorting and ignoring punctuation.
LANG=C sort filename
normalize your input a bit
tr [A-Z] [a-z]
Try reading "Unix for poets"

Resources