sort by name on bash same as graphical on windows - windows

I have this folder in windows
if I do a simple ls , find, either in bash (cygwin) or msdos, it shows me like this.
$ ls -1
su-01-01.jpg
su-01-02-03.jpg
su-01-12-13.jpg
su-01-14.jpg
su-01-15.jpg
su-01-16.jpg
su-01-18.jpg
su-01-19.jpg
su-01-20.jpg
su-01-21.jpg
su-01-31.jpg
su-01-34.jpg
su-01-35.jpg
su-01-38.jpg
su-01-39.jpg
su-01-42-43.jpg
su-01-44.jpg
su-01-45.jpg
su-01-47.jpg
su-01-48.jpg
su01-00.jpg
su01-04.jpg
su01-05.jpg
su01-06.jpg
su01-07.jpg
su01-08.jpg
I have tried ordering and it does not take into account 0 00 1
$ ls -1 |sort -V
su01-00.jpg
su01-04.jpg
su01-05.jpg
su01-06.jpg
su01-07.jpg
su01-08.jpg
su01-09.jpg
su01-10.jpg
su01-11.jpg
su01-22-23.jpg
su01-24.jpg
su01-25.jpg
su01-26.jpg
su01-27.jpg
su01-28-29.jpg
su01-30.jpg
su01-32.jpg
su01-33.jpg
su01-40-41.jpg
su-01-01.jpg
su-01-02-03.jpg
su-01-12-13.jpg
su-01-14.jpg
su-01-15.jpg
but how do I make it ignore the (-)?
thank you very much for your help

find doesn't guaranty alphabetical ordering; ls and sort do, but the char - value is 45 while the 0 char value is 48, so su- will come ahead of the su0 in an alphabetical sorting.
While a printf '%s\n' su* | LANG=en_US.utf8 sort -n seems to display the files the way you want, the best thing to do for making your life easier would be to rename some of the files:
#!/bin/bash
for f in su0*
do
mv "$f" "su-0${f#su0}"
done
Update
renaming the files to 001.jpg 002.jpg ...
#!/bin/bash
shopt -s nullglob
n=1
while IFS='' read -r file
do
printf -v newname '%03d.%s' "$((n++))" "${file##*.}"
printf '%q %q %q\n' mv "$file" "$newname"
done < <(
printf '%s\n' su* |
sed -nE 's,su-?([^/]*)$,\1/&,p' |
LANG=C sort -nt '-' |
sed 's,[^/]*/,,'
)

The simplest way to control the sort order in Bash, both for ls and sort, so to set your LANG variable to the locale you want.
In your .bashrc or .profile, add
export LANG=en_US.utf8
and then
ls -1
or
ls -1 | sort
will output the order you're looking for.
If you want to test with different locales and see their effect, your can set LANG one command at a time. For example, compare the output of these commands:
LANG=en_US.utf8 ls -1 # what you're looking for
LANG=C ls -1 # "ASCIIbetic" order
LANG=fr_FR.utf8 ls -1 # would consider é as between e and f

Related

Correct way of quoting command substitution

I have simple bash script which only outputs the filenames that are given to the script as positional arguments:
#!/usr/bin/env bash
for file; do
echo "$file"
done
Say I have files with spaces (say "f 1" and "f 2"). I can call the script with a wildcard and get the expected output:
$ ./script f*
> f 1
> f 2
But if I use command substitution it doesn't work:
$ ./script $(echo f*)
> f
> 1
> f
> 2
How can I get the quoting right when my command substition outputs multiple filenames with spaces?
Edit: What I ultimatively want is to pass filenames to a script (that is slightly more elaborate than just echoing their names) in a random order, e.g. something like that:
./script $(ls f* | shuf)
With GNU shuf and Bash 4.3+:
readarray -d '' files < <(shuf --zero-terminated --echo f*)
./script "${files[#]}"
where the --zero-terminated can handle any filenames, and readarray also uses the null byte as the delimiter.
With older Bash where readarray doesn't support the -d option:
while IFS= read -r -d '' f; do
files+=("$f")
done < <(shuf --zero-terminated --echo f*)
./script "${files[#]}"
In extreme cases with many files, this might run into command line length limitations; in that case,
shuf --zero-terminated --echo f*
could be replaced by
printf '%s\0' f* | shuf --zero-terminated
Hat tip to Socowi for pointing out --echo.
It's very difficult to get this completely correct. A simple attempt would be to use %q specifier to printf, but I believe that is a bashism. You still need to use eval, though. eg:
$ cat a.sh
#!/bin/sh
for x; do echo $((i++)): "$x"; done
$ ./a.sh *
0: a.sh
1: name
with
newlines
2: name with spaces
$ eval ./a.sh $(printf "%q " *)
0: a.sh
1: name
with
newlines
2: name with spaces
This feels like an XY Problem. Maybe you should explain the real problem, someone might have a much better solution.
Nonetheless, working with what you posted, I'd say read this page on why you shouldn't try to parse ls as it has relevant points; then I suggest an array.
lst=(f*)
./script "${lst[#]}"
This will still fail if you reparse it as the output of a subshell, though -
./script $( echo "${lst[#]}" ) # still fails same way
./script "$( echo "${lst[#]}" )" # *still* fails same way
Thinking about how we could make it work...
You can use xargs:
$ ls -l
total 4
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 1'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 2'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 3'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 4'
-rwxr-xr-x 1 root root 35 2021-08-13 00:25 script
$ ./script *file*
file 1
file 2
file 3
file 4
$ ls *file* | shuf | xargs -d '\n' ./script
file 4
file 2
file 1
file 3
If your xargs does not support -d:
$ ls *file* | shuf | tr '\n' '\0' | xargs -0 ./script
file 3
file 1
file 4
file 2

Bash: Subshell behaviour of ls

I am wondering why I do not get se same output from:
ls -1 -tF | head -n 1
and
echo $(ls -1 -tF | head -n 1)
I tried to get the last modified file, but using it inside a sub shell sometimes I get more than one file as result?
Why that and how to avoid?
The problem arises because you are using an unquoted subshell and -F flag for ls outputs shell special characters appended to filenames.
-F, --classify
append indicator (one of */=>#|) to entries
Executable files are appended with *.
When you run
echo $(ls -1 -tF | head -n 1)
then
$(ls -1 -tF | head -n 1)
will return a filename, and if it happens to be an executable and also be the prefix to another file, then it will return both.
For example if you have
test.sh
test.sh.backup
then it will return
test.sh*
which when echoed expands to
test.sh test.sh.backup
Quoting the subshell prevents this expansion
echo "$(ls -1 -tF | head -n 1)"
returns
test.sh*
I just found the error:
If you use echo $(ls -1 -tF | head -n 1)
the file globing mechanism may result in additional matches.
So echo "$(ls -1 -tF | head -n 1)" would avoid this.
Because if the result is an executable it contains a * at the end.
I tried to place the why -F in a comment, but now I decided to put it here:
I added the following lines to my .bashrc, to have a shortcut to get last modified files or directories listed:
function L {
myvar=$1; h=${myvar:="1"};
echo "last ${h} modified file(s):";
export L=$(ls -1 -tF|fgrep -v / |head -n ${h}| sed 's/\(\*\|=\|#\)$//g' );
ls -l $L;
}
function LD {
myvar=$1;
h=${myvar:="1"};
echo "last ${h} modified directories:";
export LD=$(ls -1 -tF|fgrep / |head -n $h | sed 's/\(\*\|=\|#\)$//g'); ls -ld $LD;
}
alias ol='L; xdg-open $L'
alias cdl='LD; cd $LD'
So now I can use L (or L 5) to list the last (last 5) modified files. But not directories.
And with L; jmacs $L I can open my editor, to edit it. Traditionally I used my alias lt='ls -lrt' but than I have to retype the name...
Now after mkdir ... I use cdl to change to that dir.

How to store NUL output of a program in bash script?

Suppose there is a directory 'foo' which contains several files:
ls foo:
1.aa 2.bb 3.aa 4.cc
Now in a bash script, I want to count the number of files with specific suffix in 'foo', and display them, e.g.:
SUFF='aa'
FILES=`ls -1 *."$SUFF" foo`
COUNT=`echo $FILES | wc -l`
echo "$COUNT files have suffix $SUFF, they are: $FILES"
The problem is: if SUFF='dd', $COUNT also equal to 1. After google, the reason I found is when SUFF='dd', $FILES is an empty string, not really the null output of a program, which will be considered to have one line by wc. NUL output can only be passed through pipes. So one solution is:
COUNT=`ls -1 *."$SUFF" foo | wc -l`
but this will lead to the ls command being executed twice. So my question is: is there any more elegant way to achieve this?
$ shopt -s nullglob
$ FILES=(*)
$ echo "${#FILES[#]}"
4
$ FILES=(*aa)
$ echo "${#FILES[#]}"
2
$ FILES=(*dd)
$ echo "${#FILES[#]}"
0
$ SUFFIX=aa
$ FILES=(*"$SUFFIX")
$ echo "${#FILES[#]}"
2
$ SUFFIX=dd
$ FILES=(*"$SUFFIX")
$ echo "${#FILES[#]}"
0
you can also try this;
#!/bin/bash
SUFF='aa'
FILES=`ls -1 *."$SUFF" foo`
FILENAMES=`echo $FILES | awk -F ':' '{print $2}'`
COUNT=`echo $FILENAMES | wc -w`
echo "$COUNT files have suffix $SUFF, they are: $FILENAMES"
if inserted echo $FILES in your script, output is foo: 1.aa 2.aa 3.aa so
awk -F ':' '{print $2}' gets 1.aa 2.aa 3.aa from $FILES variable
wc -w prints the word counts
If you only need the file count, I would actually use find for that:
find '/path/to/directory' -mindepth 1 -maxdepth 1 -name '*.aa' -printf '\n' | wc -l
This is more reliable as it handles correctly filenames with line breaks. The way this works is that find outputs one empty line for each matching file.
Edit: If you want to keep the file list in an array, you can use a glob:
GLOBIGNORE=".:.."
shopt -s nullglob
FILES=(*aa)
COUNT=${#arr[#]}
echo "$COUNT"
The reason is that the option nullglob is unset by default in bash:
If no matching file names are found, and the shell option nullglob is not enabled, the word is left unchanged. If the nullglob option is set, and no matches are found, the word is removed.
So, just set the nullglob option, and run you code again:
shopt -s nullglob
SUFF='aa'
FILES="$(printf '%s\n' foo/*."$SUFF")"
COUNT="$(printf '%.0s\n' foo/*."$SUFF" | wc -l)"
echo "$COUNT files have suffix $SUFF, they are: $FILES"
Or better yet:
shopt -s nullglob
suff='aa'
files=( foo/*."$suff" )
count=${#file[#]}
echo "$count files have suffix $suff, they are: ${files[#]}"

Separate a list of files by keyword group, then sort each group by reverse date

I use this:
for f in $( ls -tr ${repSource}*.txt );
...to loop on a list of files, sorted oldest to newest.
I want to add another sort "filter": filenames that don't start with "abc" come first, whatever their timestamp.
So if the files are "abc.txt", "def.txt" and "ghi.txt", then "abc.txt" must be last, and the other two come before in the list (sorted by reverse date).
You don't want to parse the output of the ls command. The construct in your question is a classic bash pitfall and parsing ls is well known to be problematic.
You can instead use stat to get the time of each file that you find in a for loop on a pattern expansion.
#!/usr/bin/env bash
uname_s=$(uname -s)
stamp() {
case "$uname_s" in
Linux) stat -c '%Y' "$1" ;;
Darwin|*BSD) stat -f '%m' "$1" ;;
esac
}
inputfiles=( "$#" )
for file in "${inputfiles[#]}"; do
n=$(stamp "$file")
while [ -n "${forward[$n]}" ]; do
((n++)) # introduce fudge to avoid timestamp collissions
done
forward[$n]="$file"
done
declare -p inputfiles
for n in "${!forward[#]}"; do
reverse[$(( 2**32 - $n ))]="${forward[$n]}"
done
declare -p forward reverse
This example script takes a list of files as command line options (which can be a glob), then uses declare -p to show you the original list, the forward-sorted list, and a reverse-sorted list.
The case statement in the stamp function makes it portable between Linux, OS X (Darwin), FreeBSD, NetBSD, etc. since I don't know what operating system you're using. (If it's something less common like Solaris, HP/UX, etc, then the stat command may not be available or useful and this solution might not work.)
Once you have a sorted (non-associative) array in bash, you can process the files one by one with constructs like:
for file in "${forward[#]}"; do
# something to $file
done
or
for file in "${reverse[#]}"; do
# something to $file
done
And you can trust that since non-associative bash arrays are always numerically ordered, you'll get the files in their date order.
And of course, you've got the dates themselves as indexes, if you want. :)
The files:
ls -log
total 12
-rw-rw-r-- 1 4 Apr 8 11:15 abc.txt
-rw-rw-r-- 1 4 Apr 8 11:15 def.txt
-rw-rw-r-- 1 4 Apr 8 11:16 ghi.txt
This seems to work, but there must be a better way:
ls -log --time-style +%s | tail -n +2 | \
pee \
'grep -v abc.txt | sort -k4 -rn' \
'grep abc.txt | sort -k4 -rn' | \
cut -d " " -f 5
Output:
ghi.txt
def.txt
abc.txt
Note: the unseemly named util pee is from debian's moreutils package, it's a "like tee for pipes".
Now to make a loop of it:
# usage: foo filename # filter above sort code by filename
foo() { ls -log --time-style +%s | tail -n +2 | \
pee 'grep -v '"$1"' | sort -k4 -rn' \
'grep '"$1"' | sort -k4 -rn' | \
cut -d " " -f 5 ; }
for f in $( foo abc.txt )

Randomizing arg order for a bash for statement

I have a bash script that processes all of the files in a directory using a loop like
for i in *.txt
do
ops.....
done
There are thousands of files and they are always processed in alphanumerical order because of '*.txt' expansion.
Is there a simple way to random the order and still insure that I process all of the files only once?
Assuming the filenames do not have spaces, just substitute the output of List::Util::shuffle.
for i in `perl -MList::Util=shuffle -e'$,=$";print shuffle<*.txt>'`; do
....
done
If filenames do have spaces but don't have embedded newlines or backslashes, read a line at a time.
perl -MList::Util=shuffle -le'$,=$\;print shuffle<*.txt>' | while read i; do
....
done
To be completely safe in Bash, use NUL-terminated strings.
perl -MList::Util=shuffle -0 -le'$,=$\;print shuffle<*.txt>' |
while read -r -d '' i; do
....
done
Not very efficient, but it is possible to do this in pure Bash if desired. sort -R does something like this, internally.
declare -a a # create an integer-indexed associative array
for i in *.txt; do
j=$RANDOM # find an unused slot
while [[ -n ${a[$j]} ]]; do
j=$RANDOM
done
a[$j]=$i # fill that slot
done
for i in "${a[#]}"; do # iterate in index order (which is random)
....
done
Or use a traditional Fisher-Yates shuffle.
a=(*.txt)
for ((i=${#a[*]}; i>1; i--)); do
j=$[RANDOM%i]
tmp=${a[$j]}
a[$j]=${a[$[i-1]]}
a[$[i-1]]=$tmp
done
for i in "${a[#]}"; do
....
done
You could pipe your filenames through the sort command:
ls | sort --random-sort | xargs ....
Here's an answer that relies on very basic functions within awk so it should be portable between unices.
ls -1 | awk '{print rand()*100, $0}' | sort -n | awk '{print $2}'
EDIT:
ephemient makes a good point that the above is not space-safe. Here's a version that is:
ls -1 | awk '{print rand()*100, $0}' | sort -n | sed 's/[0-9\.]* //'
If you have GNU coreutils, you can use shuf:
while read -d '' f
do
# some stuff with $f
done < <(shuf -ze *)
This will work with files with spaces or newlines in their names.
Off-topic Edit:
To illustrate SiegeX's point in the comment:
$ a=42; echo "Don't Panic" | while read line; do echo $line; echo $a; a=0; echo $a; done; echo $a
Don't Panic
42
0
42
$ a=42; while read line; do echo $line; echo $a; a=0; echo $a; done < <(echo "Don't Panic"); echo $a
Don't Panic
42
0
0
The pipe causes the while to be executed in a subshell and so changes to variables in the child don't flow back to the parent.
Here's a solution with standard unix commands:
for i in $(ls); do echo $RANDOM-$i; done | sort | cut -d- -f 2-
Here's a Python solution, if its available on your system
import glob
import random
files = glob.glob("*.txt")
if files:
for file in random.shuffle(files):
print file

Resources