Numerical sort of file names - bash

I have a folder with scripts with a name pattern of UPDATE[x.y.z] where x.y.z is the script's version.
What I need is a bash script that runs the scripts ordered by their version, hence alphabetic sort is not good.
For example UPDATE1.11.0 should be executed after UPDATE1.2.3.
is there a comparator I can use on order to dictate that sorting order? if not, how else can it be done?

And if you don't have GNU sort (i.e. you're in OSX or FreeBSD or NetBSD), you may be able to fake it by sorting different fields.
[ghoti ~]$ printf "foo1.2.3\nfoo1.11.0\nfoo1.4.1\nfoo1.4.0\n" | sort -nt. -k2,3
foo1.2.3
foo1.4.0
foo1.4.1
foo1.11.0
[ghoti ~]$
This misses out on the major version number because it's not delimited by a dot. But it may work for you anyway.

If you have GNU sort or another version that supports it, you can use version sort.
sort -V
or
sort --version-sort
Also, in my answer here, I posted a script which compares versions.

Related

How to sort characters after a period

Need help on how to sort characters or numbers after a period(.)
test2.rod1
test1.rod1
test3.rod1
test1.mor2
test2.mor2
test3.mor2
zbcd1.abc1
abcd2.abc1
dbcd3.abc1
I would like the sort result anything after the period (.). Result should be something like below.
abcd2.abc1
dbcd3.abc1
zbcd1.abc1
test1.mor2
test2.mor2
test3.mor2
test2.rod1
test1.rod1
test3.rod1
If you're using a system with Unix like utilities such as MacOS, Linux, BSD, etc, then you can use the system sort command. The secret is to specify the field delimiter, which in your case is a period. The argument is either -t or --field-separator. So the following should work:
sort -t. -k 2 test.dat
Assuming that your data is in a file called test.dat

List (ls) filenames with numbers in Mac OSX [duplicate]

I'm working on a some build scripts that I'd like to depend on only standardized features.
I need to sort some files by version. Say the files are bar-1.{0,2,3} bar-11.{0,2,3}.
By default, ls gives me:
bar-1_0
bar-11_0
bar-11_2
bar-11_3
bar-1_2
bar-1_3
Getting what I want is easy using 'ls -v':
bar-1_0
bar-1_2
bar-1_3
bar-11_0
bar-11_2
bar-11_3
The problem is that 'ls -v' is not standard. Standard sort also seems to lack the option I want, though I could be looking at old versions of the specs.
Can anyone suggest a portable way to achieve this effect short of writing my own sort routine?
Thanks,
Rhys
sort -n -t- -k2 seems to do what you want. -n gives you numeric (i.e. not alphabetic) sort; -t- sets the field separator to -, and -k2 selects the second field, i.e. the version number.
My sort, on Ubuntu, even does the part with the underscore correctly, but I'm not sure if that is standard. To be sure, you could sort by the minor version first, then by major version.

unix sort -n -t"," gives unexpected result

unix numeric sort gives strange results, even when I specify the delimiter.
$ cat example.csv # here's a small example
58,1.49270399401
59,0.000192136419373
59,0.00182092924724
59,1.49270399401
60,0.00182092924724
60,1.49270399401
12,13.080339685
12,14.1531049905
12,26.7613447051
12,50.4592437035
$ cat example.csv | sort -n --field-separator=,
58,1.49270399401
59,0.000192136419373
59,0.00182092924724
59,1.49270399401
60,0.00182092924724
60,1.49270399401
12,13.080339685
12,14.1531049905
12,26.7613447051
12,50.4592437035
For this example, sort gives the same result regardless if you specify the delimiter. I know if I set LC_ALL=C then sort starts to give expected behavior again. But I do not understand why the default environment settings, as shown below, would make this happen.
$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
I've read from many other questions (e.g. here, here, and here) how to avoid this behavior in sort, but still, this behavior is incredibly weird and unpredictable and has caused me a week of heartache. Can someone explain why sort with default environment settings on Mac OS X (10.8.5) would behave this way? In other words: what is sort doing (with local variables set to en_US.UTF-8) to get that result?
I'm using
sort 5.93 November 2005
$ type sort
sort is /usr/bin/sort
UPDATE
I've discussed this on the gnu-coreutils list and now understand why sort with english unicode default locale settings gave the output it did. Because in English unicode, the comma character "," is considered a numeric (so as to allow for comma's as thousand's (or e.g. hundreds) separators), and sort defaults to "being greedy" when it interprets a line, it read the example numbers as approximately
581.491...
590.000...
590.001...
591.492...
600.001...
601.492...
1213.08...
1214.15...
1226.76...
1250.45...
Although this was not what I had intended and chepner is right that to get the actual result I want, I need to specify that I want sort to key on only the first field. sort defaults to interpreting more of the line as a key rather than just the first field as a key.
This behavior of sort has been discussed in gnu-coreutil's FAQ, and is further specified in the POSIX description of sort.
So that, as Eric Blake on the gnu-coreutil's list put it, if the field-separator is also a numeric (which a comma is) then "Without -k to stop things, [the field-separator] serves as BOTH a separator AND a numeric character - you are sorting on numbers that span multiple fields."
I'm not sure this is entirely correct, but it's close.
sort -n -t, will try to sort numerically by the given key(s). In this case, the key is a tuple consisting of an integer and a float. Such tuples cannot be sorted numerically.
If you explicitly specify which single keys to sort on with
sort -k1,1n -k2,2n -t,
it should work. Now you are explicitly telling sort to first sort on the first field (numerically), then on the second field (also numerically).
I suspect that -n is useful as a global option only if each line of the input consists of a single numerical value. Otherwise, you need to use the -n option in conjunction with the -k option to specify exactly which fields are numbers.
Use sort --debug to find out what's going on.
I've used that to explain in detail your issue at:
http://lists.gnu.org/archive/html/coreutils/2013-10/msg00004.html
If you use
cat example.csv | sort
instead of
cat example.csv | sort -n --field-separator=,
then it would give correct output. Use this command, hope this is helpful to you.
Note: I tested with "sort (GNU coreutils) 7.4"

Opposite of Linux Split

I have a huge file and I split the big file into several small chunks and divide and conquer. Now I have a folder contains a list of files like below:
output_aa #(the output file done: cat input_aa | python parse.py > output_aa)
output_ab
output_ac
output_ad
...
I am wondering is there a way to merge those files back together FOLLOWING THE INDEX ORDER:
I know I could do it by using
cat * > output.all
but I am more curious another magical command already exist comes with split..
The magic command would be:
cat output_* > output.all
There is no need to sort the file names as the shell already does it (*).
As its name suggests, cat original design was precisely to conCATenate files which is basically the opposite of split.
(*) Edit:
Should you use an (hypothetical ?) locale that use a collating order where the a-z order is not abcdefghijklmnopqrstuvwxyz, here is one way to overcome the issue:
LC_ALL=C "sh -c cat output_* > output.all"
There are other ways to concat files together, but there is no magical "opposite of split" in "linux".
Of course, talking about "linux" in general is a bit far fetched, as many distributions have different tools (most of them use a different shell already by default, like sh, bash, csh, zsh, ksh, ...), but if you're talking about debian based linux at least, I don't know of any distribution which would provide such a tool.
For sorting you can use the linux command "sort" ;
Also be aware that using ">" for redirecting stdout will override maybe existing contents, while ">>" will concat to an existing file.
I don't want to copycat, but still make this answer complete, so what jlliagre said about the cat command should also be considered of course (that "cat" was made to con-"cat" files, effectively making it possible to reverse the split command - but that's only provided you use the same ordering of files, so it's not exactly the "opposite of split", but will work that way in close to 100% of the cases (see comments under jlliagre answer for specifics))

strip version from package name using Bash

I'm trying to strip the version out of a package name using only Bash. I have one solution but I don't think that's the best one available, so I'd like to know if there's a better way to do it. by better I mean cleaner, easier to understand.
suppose I have the string "my-program-1.0" and I want only "my-program". my current solution is:
#!/bin/bash
PROGRAM_FULL="my-program-1.0"
INDEX_OF_LAST_CHARACTER=`awk '{print match($0, "[A-Za-z0-9]-[0-9]")} <<< $PROGRAM_FULL`
PROGRAM_NAME=`cut -c -$INDEX_OF_LAST_CHARACTER <<< $PROGRAM_FULL`
actually, the "package name" syntax is an RPM file name, if it matters.
thanks!
Pretty well-suited to sed:
# Using your matching criterion (first hyphen with a number after it
PROGRAM_NAME=$(echo "$PROGRAM_FULL" | sed 's/-[0-9].*//')
# Using a stronger match
PROGRAM_NAME=$(echo "$PROGRAM_FULL" | sed 's/-[0-9]\+\(\.[0-9]\+\)*$//')
The second match ensures that the version number is a sequence of numbers separated by dots (e.g. X, X.X, X.X.X, ...).
Edit: So there are comments all over based on the fact that the notion of version number isn't very well-defined. You'll have to write a regex for the input you expect. Hopefully you won't have anything as awful as "program-name-1.2.3-a". Absent any additional request from the OP though, I think all the answers here are good enough.
Bash:
program_full="my-program-1.0"
program_name=${program_full%-*} # remove the last hyphen and everything after
Produces "my-program"
Or
program_full="alsa-lib-1.0.17-1.el5.i386.rpm"
program_name=${program_full%%-[0-9]*} # remove the first hyphen followed by a digit and everything after
Produces "alsa-lib"
How about:
$ echo my-program-1.0 | perl -pne 's/-[0-9]+(\.[0-9]+)+$//'
my-program

Resources