Sort point character first - bash

I would like to sort a list of file names using sort.
For instance:
file.ext
file1.ext
z_file2.ext
Using sort, I get
file1.ext
file.ext
z_file2.ext
How can I do so that file. is sorted before fileXXXX. ?

As suggested in a comment, your problem is that your locale produces an odd sort order. Setting the locale to C for the sort should fix the problem:
LC_ALL=C sort
For a more precise fix, assuming you want to use locale-aware collation order but still separate the sort key at the extension, specify . as the field delimiter and use two sort keys:
sort -t. -k1,1 -k2

You have to separate the filenames from the digits, sort them accordingly and merge back
$ sed -r 's/([0-9]*)\./ &/' file | sort -k1,1 -k2n | sed 's/ //'
file.ext
file1.ext
z_file2.ext
z_file11.ext

You can use -d option
From manpage:
-d, --dictionary-order consider only blanks and alphanumeric characters
$ cat toto
file.ext
file1.ext
z_file2.ext
$ sort -d toto
file1.ext
file.ext
z_file2.ext

Related

How can I deduplicate filenames across directories?

I run the following gsutil command:
gsutil ls -d gs://mybucket/v${version}/folder1/*/*.whl |
sort -V |
grep -e "/*.whl"
I get:
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561595893/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561654308/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563319372/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563319400/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563329633/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563411368/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1565916833/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1565921265/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1566258114/file1-cp27-cp27mu-linux_x86_64.whl
Since some files in different folders have the same names, how can I retrieve unique filenames ignoring the path?
I would do it like this:
blabla_your_command | rev | sort -t'/' -u -k1,1 | rev
rev reverses lines. Then I unique sort using / as a separator on the first field. After the line is reversed, the first field will be the filename, so sorting -u on it would return only unique filenames. Then the line needs to be reversed back.
The following command:
cat <<EOF |
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561595893/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561654308/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563319372/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563319400/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563329633/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1563411368/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1565916833/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1565921265/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1566258114/file1-cp27-cp27mu-linux_x86_64.whl
EOF
rev | sort -t'/' -u -k1,1 | rev
outputs:
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
Please check awk option given below, this will print the last occurrence of delimiter '/', it worked for me
example:
gsutil ls gs://mybucket/v1.0.0/folder1/1560930522 | awk -F/ '{print $(NF)}'
print all the file names under '1560930522'
your_command|awk -F/ '!($NF in a){a[$NF]; print}'
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
4 different ways of saying the same thing
nawk -F'^.+/' '++_[$NF]<NF'
gawk -F'/' '__[$NF]++<!_'
mawk -F/ '_^__[$NF]++'
mawk2 -F/ '!_[$NF]--'
gs://mybucket/v1.0.0/folder1/1560924028/file1-cp27-cp27mu-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560926922/file1-cp36-cp36m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1560930522/file1-cp35-cp35m-linux_x86_64.whl
gs://mybucket/v1.0.0/folder1/1561568612/file1-cp37-cp37m-linux_x86_64.whl
Here's a simple, straightforward solution:
$ your_gsutil_command | xargs -L 1 basename | sort -u
The easiest way to remove paths is with basename. Unfortunately it accepts only a single filename, which must be on the command line (not from stdin), so we need to take the following steps:
Create the list of files.
We do this with your_gsutil_command, but you can use any command that generates a list of files.
Send each one to basename to remove its path.
The xargs command does this for us by reading its stdin and invoking basename repeatedly, passing the data as command-line arguments. But xargs efficiently tries to reduce the number of invocations by passing multiple filenames on each command line, and that breaks basename. We prevent that with -L 1, limiting it to only one line (that is, one filename) at a time.
Remove duplicates.
The sort -u command does this.
Using your example data:
$ gsutil ls -d gs://mybucket/v${version}/folder1/*/*.whl |
xargs -L 1 basename | sort -u
file1-cp27-cp27mu-linux_x86_64.whl
file1-cp35-cp35m-linux_x86_64.whl
file1-cp36-cp36m-linux_x86_64.whl
file1-cp37-cp37m-linux_x86_64.whl
Caveat: Spaces break everything. 😡
So far we've assumed the filenames and folders do not contain spaces. Spaces break basename because needs exactly one filename, and it would interpret spaces as separators between multiple filenames. We can get around this in two ways:
ls -Q: If you're deduplicating local filenames, you can use the (non-gsutil) ls command with the -Q flag to put the filenames in quotes, so basename will interpret spaces as part of the filenames rather than separators.
gsutil: The -Q flag is unfortunately not supported, so we'll need to escape the spaces manually:
$ your_gsutil_command | sed 's/ /\\ /g' | xargs -L 1 basename | sort -u
Here we use the sed command to escape each space by inserting a backslash before it. (That is, we replace with \ . Note that we also need to escape the backslash in the sed command, which is why we use \\ and not just \.)

Sort text file with cat and sort concatenation

I got a txt file with some content looking like
stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02
and I want to use cat and sort to get them reverse ordered by the date as an output on my command-line by concatenation.
not sure why you think you need to cat a file into sort, but here are 2 options
cat yourFile | sort -t, -k3r
sort -t, -k3r yourFile
To test this I did
echo "stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02" \
| sort -t, -k3r
output
stuff,stuff,2012-12-12
morestuff,morestuff,2012-09-09
evenmorestuff,yeah,2012-08-02
And finally, you can overwrite your existing file using the -o option like
sort -t, -o yourFile -k3r yourFile
Thanks to #karakfa for reminding me your your requirement for reverse order sort. This is accomplished by adding an r to the key specification, hence -k3r.
IHTH

Sort text file using bash sort

I'm trying to sort the following file by date with earliest to latest:
$NAME DIA
# Date,Open,High,Low,Close,Volume,Adj Close
01-10-2014,169.91,169.98,167.42,167.68,11019000,167.68
29-04-2014,164.62,165.27,164.49,165.00,4581400,163.40
17-10-2013,152.11,153.59,152.05,153.48,9916600,150.26
06-09-2013,149.70,149.97,147.77,149.09,9001900,145.68
02-11-2012,132.56,132.61,130.47,130.67,5141300,125.01
01-11-2012,131.02,132.44,130.97,131.98,3807400,126.27
sort -t- -k3 -k2 -k1 DIA.txt gets the year right but scrambles the month and day.
any help would be greatly appreciated.
This seems to produce correct output
sort -s -t- -k3,3 -k2,2 -k1,1
output:
$ sort -s -t- -k3,3 -k2,2 -k1,1 dia.txt
# Date,Open,High,Low,Close,Volume,Adj Close
01-11-2012,131.02,132.44,130.97,131.98,3807400,126.27
02-11-2012,132.56,132.61,130.47,130.67,5141300,125.01
06-09-2013,149.70,149.97,147.77,149.09,9001900,145.68
17-10-2013,152.11,153.59,152.05,153.48,9916600,150.26
29-04-2014,164.62,165.27,164.49,165.00,4581400,163.40
01-10-2014,169.91,169.98,167.42,167.68,11019000,167.68
I would try changing the date format first.
sed -r "s/(..)-(..)-(....)/\\3-\\2-\\1/" DIA.txt | sort
You can also change it back after sorting the lines.
sed -r "s/(..)-(..)-(....)/\\3-\\2-\\1/" DIA.txt | sort | sed -r "s/(....)-(..)-(..)/\\3-\\2-\\1/"
sort's -k flag only allows you to specify two columns that give the range of keys to use in the sort. Here you want to involve a third column before that. There is a special syntax to use an additional column to resolve ties (here between rows when sorting with column 3 and 2):
sort -t'-' -k3,2.1 d

Unix shell script to sort files depending on the 'date string' present in their file name

I am trying to sort files in a directory, depending on the 'date string' attached in the file name, for example files looks as below
SSA_F12_05122013.request.done
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
Where 05122013,12142012 and 01062013 represents the dates in format.
Please help me in providing a unix shell script to sort these files on the date string present in their file name(in descending and ascending order).
Thanks in advance.
Hmmm... why call on heavyweights like awk and Perl when sort itself has the capability to define what exactly to sort by?
ls SSA_F*.request.done | sort -k 1.13,1.16 -k 1.9,1.10 -k 1.11,1.12
Each -k option defines a "sort key":
-k 1.13,1.16
This defines a sort key ranging from field 1, column 13 to field 1, column 16. (A field is by default delimited by whitespace, which your filenames don't have.)
If your filenames are varying in length, defining the underscore as field separator (using the -t option) and then addressing columns in the third field would be the way to go.
Refer to man sort for details. Use the -r option to sort in descending order.
one way with awk and sort:
ls -1|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|sort|awk '$0=$NF'
if we break it down:
ls -1|
awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
the ls -1 just example. I think you have your way to get the file list, one per line.
test a little bit:
kent$ echo "SSA_F13_12142012.request.done
SSA_F12_05122013.request.done
SSA_F14_01062013.request.done"|awk -F'[_.]' '{s=gensub(/^([0-9]{4})(.*)/,"\\2\\1","g",$3);print s,$0}'|
sort|
awk '$0=$NF'
SSA_F13_12142012.request.done
SSA_F14_01062013.request.done
SSA_F12_05122013.request.done
ls -lrt *.done | perl -lane '#a=split /_|\./,$F[scalar(#F)-1];$a[2]=~s/(..)(..)(....)/$3$2$1/g;print $a[2]." ".$_' | sort -rn | awk '{$1=""}1'
ls *.done | perl -pe 's/^.*_(..)(..)(....)/$3$2$1$&/' | sort -rn | cut -b9-
this would do +

How can I sort file names by version numbers?

In the directory "data" are these files:
command-1.9a-setup
command-2.0a-setup
command-2.0c-setup
command-2.0-setup
I would like to sort the files to get this result:
command-1.9a-setup
command-2.0-setup
command-2.0a-setup
command-2.0c-setup
I tried this
find /data/ -name 'command-*-setup' | sort --version-sort --field-separator=- -k2
but the output was
command-1.9a-setup
command-2.0a-setup
command-2.0c-setup
command-2.0-setup
The only way I found that gave me my desired output was
tree -v /data
How could I get with sort the output in the wanted order?
Edit: It turns out that Benoit was sort of on the right track and Roland tipped the balance
You simply need to tell sort to consider only field 2 (add ",2"):
find ... | sort --version-sort --field-separator=- --key=2,2
Original Answer: ignore
If none of your filenames contain spaces between the hyphens, you can try this:
find ... | sed 's/.*-\([^-]*\)-.*/\1 \0/;s/[^0-9] /.&/' | sort --version-sort --field-separator=- --key=2 | sed 's/[^ ]* //'
The first sed command makes the lines look like this (I added "10" to show that the sort is numeric):
1.9.a command-1.9a-setup
2.0.c command-2.0c-setup
2.0.a command-2.0a-setup
2.0 command-2.0-setup
10 command-10-setup
The extra dot makes the letter suffixed version number sort after the version number without the suffix. The second sed command removes the prefixed version number from each line.
There are lots of ways this can fail.
If you specify to sort that you only want to consider the second field (-k2) don't complain that it does not consider the third one.
In your case, run sort --version-sort without any other argument, maybe this will suit better.
Looks like this works:
find /data/ -name 'command-*-setup' | sort -t - -V -k 2,2
not with sort but it works:
tree -ivL 1 /data/ | perl -nlE 'say if /\Acommand-[0-9][0-9a-z.]*-setup\z/'
-v: sort the output by version
-i: makes tree not print the indentation lines
-L level: max display depth of the directory tree
Another way to do this is to pad your numbers.
This example pads all numbers to 8 digits.
Then, it does a plain alphanumeric sort.
Then, it removes the pad.
$ pad() { perl -pe 's/(\d+)/0000000\1/g' | perl -pe 's/0*(\d{8})/\1/g'; }
$ unpad() { perl -pe 's/0*([1-9]\d*|0)/\1/g'; }
$ cat files | pad | sort | unpad
command-1.9a-setup
command-2.0-setup
command-2.0a-setup
command-2.0c-setup
command-10.1-setup
To get some insight into how this works, let's look at the padded sorted result:
$ cat files | pad | sort
command-00000001.00000009a-setup
command-00000002.00000000-setup
command-00000002.00000000a-setup
command-00000002.00000000c-setup
command-00000010.00000001-setup
You'll see that with all the numbers nicely padded to 8 digits, the alphanumeric sort puts the filenames into their desired order.
Old post, but... ls -l --sort=version may be of assistance (although for OP's example the sort is the same as done by ls -l in a RHEL 7.2):
command-1.9a-setup
command-2.0a-setup
command-2.0c-setup
command-2.0-setup
YMMV i guess.
$ cat files
command-1.9a-setup
command-2.0c-setup
command-10.1-setup
command-2.0a-setup
command-2.0-setup
$ cat files | sort -t- -k2,2 -n
command-1.9a-setup
command-2.0-setup
command-2.0a-setup
command-2.0c-setup
command-10.1-setup
$ tac files | sort -t- -k2,2 -n
command-1.9a-setup
command-2.0-setup
command-2.0a-setup
command-2.0c-setup
command-10.1-setup
I have files in a folder and need to get those name in sort order, based on the number. E.g. -
abc_dr-1.txt
hg_io-5.txt
kls_er_we-3.txt
sd-4.txt
sl_rt_we_yh-2.txt
I need to sort them based on number.
So I used this to sort.
ls -1 | sort -t '-' -nk2
It gave me files in sort order based on number.

Resources