How can I sort file names by version numbers? - bash

In the directory "data" are these files:
command-1.9a-setup
command-2.0a-setup
command-2.0c-setup
command-2.0-setup
I would like to sort the files to get this result:
command-1.9a-setup
command-2.0-setup
command-2.0a-setup
command-2.0c-setup
I tried this
find /data/ -name 'command-*-setup' | sort --version-sort --field-separator=- -k2
but the output was
command-1.9a-setup
command-2.0a-setup
command-2.0c-setup
command-2.0-setup
The only way I found that gave me my desired output was
tree -v /data
How could I get with sort the output in the wanted order?

Edit: It turns out that Benoit was sort of on the right track and Roland tipped the balance
You simply need to tell sort to consider only field 2 (add ",2"):
find ... | sort --version-sort --field-separator=- --key=2,2
Original Answer: ignore
If none of your filenames contain spaces between the hyphens, you can try this:
find ... | sed 's/.*-\([^-]*\)-.*/\1 \0/;s/[^0-9] /.&/' | sort --version-sort --field-separator=- --key=2 | sed 's/[^ ]* //'
The first sed command makes the lines look like this (I added "10" to show that the sort is numeric):
1.9.a command-1.9a-setup
2.0.c command-2.0c-setup
2.0.a command-2.0a-setup
2.0 command-2.0-setup
10 command-10-setup
The extra dot makes the letter suffixed version number sort after the version number without the suffix. The second sed command removes the prefixed version number from each line.
There are lots of ways this can fail.

If you specify to sort that you only want to consider the second field (-k2) don't complain that it does not consider the third one.
In your case, run sort --version-sort without any other argument, maybe this will suit better.

Looks like this works:
find /data/ -name 'command-*-setup' | sort -t - -V -k 2,2
not with sort but it works:
tree -ivL 1 /data/ | perl -nlE 'say if /\Acommand-[0-9][0-9a-z.]*-setup\z/'
-v: sort the output by version
-i: makes tree not print the indentation lines
-L level: max display depth of the directory tree

Another way to do this is to pad your numbers.
This example pads all numbers to 8 digits.
Then, it does a plain alphanumeric sort.
Then, it removes the pad.
$ pad() { perl -pe 's/(\d+)/0000000\1/g' | perl -pe 's/0*(\d{8})/\1/g'; }
$ unpad() { perl -pe 's/0*([1-9]\d*|0)/\1/g'; }
$ cat files | pad | sort | unpad
command-1.9a-setup
command-2.0-setup
command-2.0a-setup
command-2.0c-setup
command-10.1-setup
To get some insight into how this works, let's look at the padded sorted result:
$ cat files | pad | sort
command-00000001.00000009a-setup
command-00000002.00000000-setup
command-00000002.00000000a-setup
command-00000002.00000000c-setup
command-00000010.00000001-setup
You'll see that with all the numbers nicely padded to 8 digits, the alphanumeric sort puts the filenames into their desired order.

Old post, but... ls -l --sort=version may be of assistance (although for OP's example the sort is the same as done by ls -l in a RHEL 7.2):
command-1.9a-setup
command-2.0a-setup
command-2.0c-setup
command-2.0-setup
YMMV i guess.

$ cat files
command-1.9a-setup
command-2.0c-setup
command-10.1-setup
command-2.0a-setup
command-2.0-setup
$ cat files | sort -t- -k2,2 -n
command-1.9a-setup
command-2.0-setup
command-2.0a-setup
command-2.0c-setup
command-10.1-setup
$ tac files | sort -t- -k2,2 -n
command-1.9a-setup
command-2.0-setup
command-2.0a-setup
command-2.0c-setup
command-10.1-setup

I have files in a folder and need to get those name in sort order, based on the number. E.g. -
abc_dr-1.txt
hg_io-5.txt
kls_er_we-3.txt
sd-4.txt
sl_rt_we_yh-2.txt
I need to sort them based on number.
So I used this to sort.
ls -1 | sort -t '-' -nk2
It gave me files in sort order based on number.

Related

Sort point character first

I would like to sort a list of file names using sort.
For instance:
file.ext
file1.ext
z_file2.ext
Using sort, I get
file1.ext
file.ext
z_file2.ext
How can I do so that file. is sorted before fileXXXX. ?
As suggested in a comment, your problem is that your locale produces an odd sort order. Setting the locale to C for the sort should fix the problem:
LC_ALL=C sort
For a more precise fix, assuming you want to use locale-aware collation order but still separate the sort key at the extension, specify . as the field delimiter and use two sort keys:
sort -t. -k1,1 -k2
You have to separate the filenames from the digits, sort them accordingly and merge back
$ sed -r 's/([0-9]*)\./ &/' file | sort -k1,1 -k2n | sed 's/ //'
file.ext
file1.ext
z_file2.ext
z_file11.ext
You can use -d option
From manpage:
-d, --dictionary-order consider only blanks and alphanumeric characters
$ cat toto
file.ext
file1.ext
z_file2.ext
$ sort -d toto
file1.ext
file.ext
z_file2.ext

why uniq don't give non-duplicated results

find ./2012 -type f | cut -d '/' -f 5 | uniq
The usual filenames look like
./2012/NY/F/Zoe
./2012/NJ/M/Zoe
I suppose the command above should give non-duplicated result of file names like Zoe only for once, but it turns out not so.
Why? and how should I write to get the desired result?
uniq only detects duplicates if they're in consecutive lines. The usual idiom
is to sort | uniq to ensure that any duplicates will appear together.
uniq requires the duplicates to be adjacent, which means you need to sort the input, which means you might as well use sort -u;
find 2012 -type f | cut -d/ -f5 | sort -u

Unix shell script to sort files depending on the 'date string' present in their file name

I am trying to sort files in a directory, depending on the 'date string' attached in the file name, for example files looks as below
SSA_F12_05122013.request.done
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
Where 05122013,12142012 and 01062013 represents the dates in format.
Please help me in providing a unix shell script to sort these files on the date string present in their file name(in descending and ascending order).
Thanks in advance.
Hmmm... why call on heavyweights like awk and Perl when sort itself has the capability to define what exactly to sort by?
ls SSA_F*.request.done | sort -k 1.13,1.16 -k 1.9,1.10 -k 1.11,1.12
Each -k option defines a "sort key":
-k 1.13,1.16
This defines a sort key ranging from field 1, column 13 to field 1, column 16. (A field is by default delimited by whitespace, which your filenames don't have.)
If your filenames are varying in length, defining the underscore as field separator (using the -t option) and then addressing columns in the third field would be the way to go.
Refer to man sort for details. Use the -r option to sort in descending order.
one way with awk and sort:
ls -1|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|sort|awk '$0=$NF'
if we break it down:
ls -1|
awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
the ls -1 just example. I think you have your way to get the file list, one per line.
test a little bit:
kent$ echo "SSA_F13_12142012.request.done
SSA_F12_05122013.request.done
SSA_F14_01062013.request.done"|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
SSA_F12_05122013.request.done
ls -lrt *.done | perl -lane '#a=split /_|\./,$F[scalar(#F)-1];$a[2]=~s/(..)(..)(....)/$3$2$1/g;print $a[2]." ".$_' | sort -rn | awk '{$1=""}1'
ls *.done | perl -pe 's/^.*_(..)(..)(....)/$3$2$1$&/' | sort -rn | cut -b9-
this would do +

Get the newest file based on timestamp

I am new to shell scripting so i need some help need how to go about with this problem.
I have a directory which contains files in the following format. The files are in a diretory called /incoming/external/data
AA_20100806.dat
AA_20100807.dat
AA_20100808.dat
AA_20100809.dat
AA_20100810.dat
AA_20100811.dat
AA_20100812.dat
As you can see the filename of the file includes a timestamp. i.e. [RANGE]_[YYYYMMDD].dat
What i need to do is find out which of these files has the newest date using the timestamp on the filename not the system timestamp and store the filename in a variable and move it to another directory and move the rest to a different directory.
For those who just want an answer, here it is:
ls | sort -n -t _ -k 2 | tail -1
Here's the thought process that led me here.
I'm going to assume the [RANGE] portion could be anything.
Start with what we know.
Working Directory: /incoming/external/data
Format of the Files: [RANGE]_[YYYYMMDD].dat
We need to find the most recent [YYYYMMDD] file in the directory, and we need to store that filename.
Available tools (I'm only listing the relevant tools for this problem ... identifying them becomes easier with practice):
ls
sed
awk (or nawk)
sort
tail
I guess we don't need sed, since we can work with the entire output of ls command. Using ls, awk, sort, and tail we can get the correct file like so (bear in mind that you'll have to check the syntax against what your OS will accept):
NEWESTFILE=`ls | awk -F_ '{print $1 $2}' | sort -n -k 2,2 | tail -1`
Then it's just a matter of putting the underscore back in, which shouldn't be too hard.
EDIT: I had a little time, so I got around to fixing the command, at least for use in Solaris.
Here's the convoluted first pass (this assumes that ALL files in the directory are in the same format: [RANGE]_[yyyymmdd].dat). I'm betting there are better ways to do this, but this works with my own test data (in fact, I found a better way just now; see below):
ls | awk -F_ '{print $1 " " $2}' | sort -n -k 2 | tail -1 | sed 's/ /_/'
... while writing this out, I discovered that you can just do this:
ls | sort -n -t _ -k 2 | tail -1
I'll break it down into parts.
ls
Simple enough ... gets the directory listing, just filenames. Now I can pipe that into the next command.
awk -F_ '{print $1 " " $2}'
This is the AWK command. it allows you to take an input line and modify it in a specific way. Here, all I'm doing is specifying that awk should break the input wherever there is an underscord (_). I do this with the -F option. This gives me two halves of each filename. I then tell awk to output the first half ($1), followed by a space (" ")
, followed by the second half ($2). Note that the space was the part that was missing from my initial suggestion. Also, this is unnecessary, since you can specify a separator in the sort command below.
Now the output is split into [RANGE] [yyyymmdd].dat on each line. Now we can sort this:
sort -n -k 2
This takes the input and sorts it based on the 2nd field. The sort command uses whitespace as a separator by default. While writing this update, I found the documentation for sort, which allows you to specify the separator, so AWK and SED are unnecessary. Take the ls and pipe it through the following sort:
sort -n -t _ -k 2
This achieves the same result. Now you only want the last file, so:
tail -1
If you used awk to separate the file (which is just adding extra complexity, so don't do it sheepish), you can replace the space with an underscore again with sed:
sed 's/ /_/'
Some good info here, but I'm sure most people aren't going to read down to the bottom like this.
This should work:
newest=$(ls | sort -t _ -k 2,2 | tail -n 1)
others=($(ls | sort -t _ -k 2,2 | head -n -1))
mv "$newest" newdir
mv "${others[#]}" otherdir
It won't work if there are spaces in the filenames although you could modify the IFS variable to affect that.
Try:
$ ls -lr
Hope it helps.
Use:
ls -r -1 AA_*.dat | head -n 1
(assuming there are no other files matching AA_*.dat)
ls -1 AA* |sort -r|tail -1
Due to the naming convention of the files, alphabetical order is the same as date order. I'm pretty sure that in bash '*' expands out alphabetically (but can not find any evidence in the manual page), ls certainly does, so the file with the newest date, would be the last one alphabetically.
Therefore, in bash
mv $(ls | tail -1) first-directory
mv * second-directory
Should do the trick.
If you want to be more specific about the choice of file, then replace * with something else - for example AA_*.dat
My solution to this is similar to others, but a little simpler.
ls -tr | tail -1
What is actually does is to rely on ls to sort the output, then uses tail to get the last listed file name.
This solution will not work if the filename you require has a leading dot (e.g. .profile).
This solution does work if the file name contains a space.

Shell script to sort debian version numbers (line_5.4.3-2) [duplicate]

This question already has answers here:
How can I sort file names by version numbers?
(7 answers)
Closed 3 years ago.
I have a text file with entries representing build tags with version and build number in the same format as debian packages like this:
nimbox-apexer_1.0.0-12
nimbox-apexer_1.1.0-2
nimbox-apexer_1.1.0-1
nimbox-apexer_1.0.0-13
Using a shell script I need to sort the above list by 'version-build' and get the last line, which in the above example is nimbox-apexer_1.1.0-2.
Get the latest build with:
cat file.txt | sort -V | tail -n1
Now, to catch it into a variable:
BUILD=$(cat file.txt | sort -V | tail -n1)
sort -n -t "_" -k2.3 file | tail -1
cat file.txt | cut -d_ -f 2 | sed "s/-/./g" | sort -n -t . -k 1,2n -k 2,2n -k 3,3n -k 4,3n
The 2n,3n are the number of characters considered relevant in that field. Increase them if you use really big version numbers...
With GNU sort:
sort --version-sort file | tail -n -1
GNU tail doesn't like tail -1.
I haven't been able to find a simple way to do this. I've been looking at code to sort ip address, which is similar to my problem, and trying to change my situation to that one. This what I have come up with. Please tell me there is a simpler better way !!!
sed 's/^[^0-9]*\([0-9]*\)\.\([0-9]*\)\.\([0-9]*\)-\([0-9]*\)/\1.\2.\3.\4 &/' list.txt | \
sort -t . -n -k 1,1 -k 2,2 -k 3,3 -k 4,4 | \
sed 's/^[^ ]* \(.*\)/\1/' | \
tail -n 1
So starting with this data:
nimbox-apexer_11.9.0-2
nimbox-apexer_1.10.0-9
nimbox-apexer_1.9.0-1
nimbox-apexer_1.0.0-12
nimbox-apexer_1.1.0-2
nimbox-apexer_1.1.0-1
nimbox-apexer_1.0.0-13
The first sed converts my problem into a sorting IPs problem keeping the original line to reverse the change at the end:
11.9.0.2 nimbox-apexer_11.9.0-2
1.10.0.9 nimbox-apexer_1.10.0-9
1.9.0.1 nimbox-apexer_1.9.0-1
1.0.0.12 nimbox-apexer_1.0.0-12
1.1.0.2 nimbox-apexer_1.1.0-2
1.1.0.1 nimbox-apexer_1.1.0-1
1.0.0.13 nimbox-apexer_1.0.0-13
The sort orders the line using the first four numbers which in my case represent mayor.minor.release.build
1.0.0.12 nimbox-apexer_1.0.0-12
1.0.0.13 nimbox-apexer_1.0.0-13
1.1.0.1 nimbox-apexer_1.1.0-1
1.1.0.2 nimbox-apexer_1.1.0-2
1.9.0.1 nimbox-apexer_1.9.0-1
1.10.0.9 nimbox-apexer_1.10.0-9
11.9.0.2 nimbox-apexer_11.9.0-2
The last sed eliminates the data used to sort
nimbox-apexer_1.0.0-12
nimbox-apexer_1.0.0-13
nimbox-apexer_1.1.0-1
nimbox-apexer_1.1.0-2
nimbox-apexer_1.9.0-1
nimbox-apexer_1.10.0-9
nimbox-apexer_11.9.0-2
Finally tail gets the last line which is the one I need.

Resources