How to sort datafiles with some numbers of Fortran-style D+01? - bash

I have several Fortran datafiles that contain numbers in a format like this:
-0.53D+02
I want to combine these with simple floating point data like
-0.53
and then sort them, like with Unix sort.
Unfortunately sort can't recognize this format, so I am looking for a simple converter, but couldn't find anything online. I thought about a Fortran script and converting it from double precision to float, but I am not quite sure about the number of digits and this is always a bit tedious with Fortran.
So does anyone know a script that can do this, a sorting program that can read that format or maybe even just a short sed command that might help? I am not that good with sed, so it would cost me quite a while to figure out how...

I think you can use this:
sed 's/D/E/g' YourNumbersFile | sort -g
The sed command changes all Ds to Es - read it like this... "substitute all Ds with Es, globally".
Thesort command needs the -g option to sort general numerical numbers.
If your sort doesn't accept the -g switch, I guess another option might be to use this awk to reformat your numbers:
awk '{sub(/D/,"e");printf "%8.3f\n", $1}' YourNumbersFile | sort -n

Related

Terminal: SORT command; how to sort correctly?

I have written a shell script that gets all the file names from a folder, and all its sub-folders, and copies them to the clipboard after sorting (removing all paths; I just need a simple file list of the thousands of randomly named files within).
What I can’t figure out is how to get the SORT command to sort properly. Meaning, the way a spreadsheet would sort things. Or the way your Mac finder sorts things.
Underscores > numbers > letters (regardless of case)
Anyone know how to do this? Sort -n only works for files starting with numbers, sort -f was close but separated the lower case and capitals in a weird way, and anything starting with a number was all over the place. Sort -V was the closest, but anything started with an underscore went to the bottom instead of the top… I’m about to lose my mind. 🤣
I’ve been trying to figure this out for a week, and no combination of anything I have tried gets the sort command to actually, ya know, sort properly.
Help?
If I understand the problem correctly, you want the "natural sort order" as described in Natural sort order - Wikipedia, Sorting for Humans : Natural Sort Order, and macos - How does finder sort folders when they contain digits and characters?.
Using Linux sort(1) you need the -V (--version-sort) option for "natural" sort. You also need the -f (--ignore-case) option to disregard the case of letters. So, assuming that the file names are stored one-per-line in a file called files.txt you can produce a list (mostly) sorted in the way that you want with:
sort -Vf files.txt
However, sort -Vf sorts underscores after digits and letters on my system. I've tried using different locales (see How to set locale in the current terminal's session?), but with no success. I can't see a way to change this with sort options (but I may be missing something).
The characters . and ~ seem to consistently sort before numbers and letters with sort -V. A possible hack to work around the problem is to swap underscore with one of them, sort, and then swap again. For example:
tr '_~' '~_' <files.txt | LC_ALL=C sort -Vf | tr '_~' '~_'
seems to do what you want on my system. I've explicitly set the locale for the sort command with LC_ALL=C ... so it should behave the same on other systems. (See Why doesn't sort sort the same on every machine?.)
It appears you want to sort in dictionary order and fold case, so it would be sort -df.

Sort ignores an apostrophe - sometimes (except when it is the only column used); WHY?

This happens to me both on Linux and on cygwin, so I suspect it is not a bug. Still, I don't understand it. Can anyone explain?
Consider the following file (tab-delimited, and that's a regular apostrophe)
(I create it with cat to ensure that it wasn't non-printing characters that were the source of the problem)
$cat > temp
cat 1389
cat' 1747
ca't 3175
cat 46848484
ca't 720
$sort temp
<gives the exact same output as cat temp>
$sort -k1,1 temp
cat 1389
cat 46848484
cat' 1747
ca't 3456
ca't 720
Why do I have to ignore the second column in order to sort correctly?
I pulled up the manual for sort and noticed the following:
* WARNING * The locale specified by the environment affects sort
order. Set LC_ALL=C to get the traditional sort order that uses native
byte values.
As it turns out, locales actually specify how lexicographic ordering works for a given locale. This makes a lot of sense, but for some reason it trips over multi field files...
(see also:)
Unusual behaviour of linux's sort command
Why does the sort command sort differently if there are trailing fields?
There are a couple of things you can do:
You can sort naively by byte value using
LC_ALL="C" sort temp
This will give a more logical result, but it might not be the one you actually want.
You could try to get sort to do a more basic lexicographical ordering by setting the locale to C and telling it you want dictionary ordering:
LC_ALL="C" sort -d temp
To have sort output your locale information and hilight the sort key, you can use
sort --debug temp
Personally I'm really curious to know what rule is being specified that makes sort behave unintuitively across multiple fields.
They're supposed to specify correct lexicographic order in the given language and dialect. Do the locales' functions simply not handle the multiple field case at all, or are they taking some kind of different interpretation on the "meaning" of the line?

Swap characters in specific positions in strings of varying lengths

I've been trying to learn sed and the examples I've found here are for swapping dates from 05082012 to 20120805 and I'm having trouble adapting them to my current need.
I need to convert an IP address 10.4.13.22 to a reverse lookup of 22.13.4.10 for a nsupdate script. My biggest problem is the fact that sometimes each octet can change lengths e.g. 10.4.13.2 and 10.19.8.126
Thanks for any help!
echo 10.0.2.99 | sed 's/\(....\)\(....\)/\2\1/'
this is currently what I've tried, just based off another question here, but since the examples don't provide much explanation as to what .... means, Im having trouble understanding what it does.
This is the output of that command .2.910.09 and I am expecting 99.2.0.10
Directly, I want to rearrange each "section" that is separated by a "."
A "bruteforce" method to "reverse" an IPv4 address would be:
sed 's/\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\)\.\([0-9]\+\)/\4.\3.\2.\1/g'
or, for GNU sed,
sed -r 's/([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/\4.\3.\2.\1/g'

Case Sensitive Sort Unix Bash

Here is a screenshot of an issue I'm having with sort:
http://i.imgur.com/cIvAF.png
The objective I want out of this, is to put all equal strings on consecutive lines. It works for 99% of the list I'm sorting, but there's a few hitches such as those in the screen shot.
So all the yahoo.coms should be next to each other, and then all the Yahoo.coms then the YAHOO.coms yahoo.cmos yhoo.c etc. (The typos even getting their own group of lines)
Not entirely sure how to handle this with sort, but I'm certainly trying.
I print all the domains unsorted to a file and then sort it with just vanilla sort filename
Would love some advice/input.
You probably need to override the locale; most Linux systems default to a UTF8 locale which specifies both case independent sorting and ignoring punctuation.
LANG=C sort filename
normalize your input a bit
tr [A-Z] [a-z]
Try reading "Unix for poets"

Bash: sort numbers with exponents

I was trying to sort one file with numeric values like this:
414e-05
435e-05
0.5361
0.7278
0.1341
0.9592
0.2664
With sort all the numers get sorted except for the ones with the exponent, is there some way for sort function to evaluate this expression?
If your version of the sort command is new enough, it should support the -g option (or --general-numeric-sort) if you like your options long). It is described like this in the info manual:
Sort numerically, using the standard C function strtod to
convert a prefix of each line to a double-precision floating point
number. This allows floating point numbers to be specified in scientific
notation, like '1.0e-34' and '10e100'.
I don't have enough rep. to comment, so I am writing this to complement the accepted answer:
for those who have locales which use a comma instead of a period to indicate decimals, the sorting of decimals will not work properly, as pointed out by HongboZhu
Solution: the sorting of lists with period-delimited numbers will work properly when using something like the following command (important is the LC_ALL=C):
ls yourFolder|LC_ALL=C sort -g
This solution comes from the following post:
https://unix.stackexchange.com/questions/506965/bash-sort-g-does-not-work-properly
If you don't have sort -g, an alternative you can get is scisort.
perl -e 'print sort { $a<=>$b } <>' < input-file

Resources