Separate a list of files by keyword group, then sort each group by reverse date - bash

I use this:
for f in $( ls -tr ${repSource}*.txt );
...to loop on a list of files, sorted oldest to newest.
I want to add another sort "filter": filenames that don't start with "abc" come first, whatever their timestamp.
So if the files are "abc.txt", "def.txt" and "ghi.txt", then "abc.txt" must be last, and the other two come before in the list (sorted by reverse date).

You don't want to parse the output of the ls command. The construct in your question is a classic bash pitfall and parsing ls is well known to be problematic.
You can instead use stat to get the time of each file that you find in a for loop on a pattern expansion.
#!/usr/bin/env bash
uname_s=$(uname -s)
stamp() {
case "$uname_s" in
Linux) stat -c '%Y' "$1" ;;
Darwin|*BSD) stat -f '%m' "$1" ;;
esac
}
inputfiles=( "$#" )
for file in "${inputfiles[#]}"; do
n=$(stamp "$file")
while [ -n "${forward[$n]}" ]; do
((n++)) # introduce fudge to avoid timestamp collissions
done
forward[$n]="$file"
done
declare -p inputfiles
for n in "${!forward[#]}"; do
reverse[$(( 2**32 - $n ))]="${forward[$n]}"
done
declare -p forward reverse
This example script takes a list of files as command line options (which can be a glob), then uses declare -p to show you the original list, the forward-sorted list, and a reverse-sorted list.
The case statement in the stamp function makes it portable between Linux, OS X (Darwin), FreeBSD, NetBSD, etc. since I don't know what operating system you're using. (If it's something less common like Solaris, HP/UX, etc, then the stat command may not be available or useful and this solution might not work.)
Once you have a sorted (non-associative) array in bash, you can process the files one by one with constructs like:
for file in "${forward[#]}"; do
# something to $file
done
or
for file in "${reverse[#]}"; do
# something to $file
done
And you can trust that since non-associative bash arrays are always numerically ordered, you'll get the files in their date order.
And of course, you've got the dates themselves as indexes, if you want. :)

The files:
ls -log
total 12
-rw-rw-r-- 1 4 Apr 8 11:15 abc.txt
-rw-rw-r-- 1 4 Apr 8 11:15 def.txt
-rw-rw-r-- 1 4 Apr 8 11:16 ghi.txt
This seems to work, but there must be a better way:
ls -log --time-style +%s | tail -n +2 | \
pee \
'grep -v abc.txt | sort -k4 -rn' \
'grep abc.txt | sort -k4 -rn' | \
cut -d " " -f 5
Output:
ghi.txt
def.txt
abc.txt
Note: the unseemly named util pee is from debian's moreutils package, it's a "like tee for pipes".
Now to make a loop of it:
# usage: foo filename # filter above sort code by filename
foo() { ls -log --time-style +%s | tail -n +2 | \
pee 'grep -v '"$1"' | sort -k4 -rn' \
'grep '"$1"' | sort -k4 -rn' | \
cut -d " " -f 5 ; }
for f in $( foo abc.txt )

Related

sort by name on bash same as graphical on windows

I have this folder in windows
if I do a simple ls , find, either in bash (cygwin) or msdos, it shows me like this.
$ ls -1
su-01-01.jpg
su-01-02-03.jpg
su-01-12-13.jpg
su-01-14.jpg
su-01-15.jpg
su-01-16.jpg
su-01-18.jpg
su-01-19.jpg
su-01-20.jpg
su-01-21.jpg
su-01-31.jpg
su-01-34.jpg
su-01-35.jpg
su-01-38.jpg
su-01-39.jpg
su-01-42-43.jpg
su-01-44.jpg
su-01-45.jpg
su-01-47.jpg
su-01-48.jpg
su01-00.jpg
su01-04.jpg
su01-05.jpg
su01-06.jpg
su01-07.jpg
su01-08.jpg
I have tried ordering and it does not take into account 0 00 1
$ ls -1 |sort -V
su01-00.jpg
su01-04.jpg
su01-05.jpg
su01-06.jpg
su01-07.jpg
su01-08.jpg
su01-09.jpg
su01-10.jpg
su01-11.jpg
su01-22-23.jpg
su01-24.jpg
su01-25.jpg
su01-26.jpg
su01-27.jpg
su01-28-29.jpg
su01-30.jpg
su01-32.jpg
su01-33.jpg
su01-40-41.jpg
su-01-01.jpg
su-01-02-03.jpg
su-01-12-13.jpg
su-01-14.jpg
su-01-15.jpg
but how do I make it ignore the (-)?
thank you very much for your help
find doesn't guaranty alphabetical ordering; ls and sort do, but the char - value is 45 while the 0 char value is 48, so su- will come ahead of the su0 in an alphabetical sorting.
While a printf '%s\n' su* | LANG=en_US.utf8 sort -n seems to display the files the way you want, the best thing to do for making your life easier would be to rename some of the files:
#!/bin/bash
for f in su0*
do
mv "$f" "su-0${f#su0}"
done
Update
renaming the files to 001.jpg 002.jpg ...
#!/bin/bash
shopt -s nullglob
n=1
while IFS='' read -r file
do
printf -v newname '%03d.%s' "$((n++))" "${file##*.}"
printf '%q %q %q\n' mv "$file" "$newname"
done < <(
printf '%s\n' su* |
sed -nE 's,su-?([^/]*)$,\1/&,p' |
LANG=C sort -nt '-' |
sed 's,[^/]*/,,'
)
The simplest way to control the sort order in Bash, both for ls and sort, so to set your LANG variable to the locale you want.
In your .bashrc or .profile, add
export LANG=en_US.utf8
and then
ls -1
or
ls -1 | sort
will output the order you're looking for.
If you want to test with different locales and see their effect, your can set LANG one command at a time. For example, compare the output of these commands:
LANG=en_US.utf8 ls -1 # what you're looking for
LANG=C ls -1 # "ASCIIbetic" order
LANG=fr_FR.utf8 ls -1 # would consider é as between e and f

Correct way of quoting command substitution

I have simple bash script which only outputs the filenames that are given to the script as positional arguments:
#!/usr/bin/env bash
for file; do
echo "$file"
done
Say I have files with spaces (say "f 1" and "f 2"). I can call the script with a wildcard and get the expected output:
$ ./script f*
> f 1
> f 2
But if I use command substitution it doesn't work:
$ ./script $(echo f*)
> f
> 1
> f
> 2
How can I get the quoting right when my command substition outputs multiple filenames with spaces?
Edit: What I ultimatively want is to pass filenames to a script (that is slightly more elaborate than just echoing their names) in a random order, e.g. something like that:
./script $(ls f* | shuf)
With GNU shuf and Bash 4.3+:
readarray -d '' files < <(shuf --zero-terminated --echo f*)
./script "${files[#]}"
where the --zero-terminated can handle any filenames, and readarray also uses the null byte as the delimiter.
With older Bash where readarray doesn't support the -d option:
while IFS= read -r -d '' f; do
files+=("$f")
done < <(shuf --zero-terminated --echo f*)
./script "${files[#]}"
In extreme cases with many files, this might run into command line length limitations; in that case,
shuf --zero-terminated --echo f*
could be replaced by
printf '%s\0' f* | shuf --zero-terminated
Hat tip to Socowi for pointing out --echo.
It's very difficult to get this completely correct. A simple attempt would be to use %q specifier to printf, but I believe that is a bashism. You still need to use eval, though. eg:
$ cat a.sh
#!/bin/sh
for x; do echo $((i++)): "$x"; done
$ ./a.sh *
0: a.sh
1: name
with
newlines
2: name with spaces
$ eval ./a.sh $(printf "%q " *)
0: a.sh
1: name
with
newlines
2: name with spaces
This feels like an XY Problem. Maybe you should explain the real problem, someone might have a much better solution.
Nonetheless, working with what you posted, I'd say read this page on why you shouldn't try to parse ls as it has relevant points; then I suggest an array.
lst=(f*)
./script "${lst[#]}"
This will still fail if you reparse it as the output of a subshell, though -
./script $( echo "${lst[#]}" ) # still fails same way
./script "$( echo "${lst[#]}" )" # *still* fails same way
Thinking about how we could make it work...
You can use xargs:
$ ls -l
total 4
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 1'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 2'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 3'
-rw-r--r-- 1 root root 0 2021-08-13 00:23 ' file 4'
-rwxr-xr-x 1 root root 35 2021-08-13 00:25 script
$ ./script *file*
file 1
file 2
file 3
file 4
$ ls *file* | shuf | xargs -d '\n' ./script
file 4
file 2
file 1
file 3
If your xargs does not support -d:
$ ls *file* | shuf | tr '\n' '\0' | xargs -0 ./script
file 3
file 1
file 4
file 2

append output of each iteration of a loop to the same in bash

I have 44 files (2 for each chromosome) divided in two types: .vcf and .filtered.vcf.
I would like to make a wc -l for each of them in a loop and append the output always to the same file. However, I would like to have 3 columns in this file: chr[1-22], wc -l of .vcf and wc -l of .filtered.vcf.
I've been trying to do independent wc -l for each file and paste together columnwise the 2 outputs for each of the chromosomes, but this is obviously not very efficient, because I'm generating a lot of unnecessary files. I'm trying this code for the 22 pairs of files:
wc -l file1.vcf | cut -f 1 > out1.vcf
wc -l file1.filtered.vcf | cut -f 1 > out1.filtered.vcf
paste -d "\t" out1.vcf out1.filtered.vcf
I would like to have just one output file containing three columns:
Chromosome VCFCount FilteredVCFCount
chr1 out1 out1.filtered
chr2 out2 out2.filtered
Any help will be appreciated, thank you very much in advance :)
printf "%s\n" *.filtered.vcf |
cut -d. -f1 |
sort |
xargs -n1 sh -c 'printf "%s\t%s\t%s\n" "$1" "$(wc -l <"${1}.vcf")" "$(wc -l <"${1}.filtered.vcf")"' --
Output newline separated list of files in the directory
Remove the extension with cut (probably something along xargs -i basename {} .filtered.vcf would be safer)
Sort it (for nice sorted output!) (probably something along sort -tr -k2 -n would sort numerically and would be even better).
xargs -n1 For each one file execute the script sh -c
printf "%s\t%s\t%s\n" - output with custom format string ...
"$1" - the filename and...
"(wc -l <"${1}.vcf")" - the count the lines in .vcf file and...
"$(wc -l <"${1}.filtered.vcf")" - the count of the lines in the .filtered.vcf
Example:
> touch chr{1..3}{,.filtered}.vcf
> echo > chr1.filtered.vcf ; echo > chr2.vcf ;
> printf "%s\n" *.filtered.vcf |
> cut -d. -f1 |
> sort |
> xargs -n1 sh -c 'printf "%s\t%s\t%s\n" "$1" "$(wc -l <"${1}.filtered.vcf")" "$(wc -l <"${1}.vcf")"' --
chr1 0 1
chr2 1 0
chr3 0 0
To have nice looking table with headers, use column:
> .... | column -N Chromosome,VCFCount,FilteredVCFCount -t -o ' '
Chromosome VCFCount FilteredVCFCount
chr1 0 1
chr2 1 0
chr3 0 0
Maybe try this.
for chr in chr*.vcf; do
base=${chr%.vcf}
awk -v base="$base" 'BEGIN { OFS="\t"
# Remove this to not have this pesky header line
print "Chromosome", "VCFCount", "FilteredVCFCount"
}
FNR==1 && n { p=n }
{ n=FNR }
END { print base, p, n }' "$chr" "$base.filtered.vcf"
done >counts.txt
The very simple Awk script just collects the highest line number for each file (so we basically reimplement wc -l) and prints the collected numbers in the desired format. FNR is the line number in the current input file; we simply save this, and copy the value to p to keep the saved value from the previous file in a separate variable when we switch to a new file (starting over at line number 1).
The shell parameter substitution ${variable%pattern} retrieves the value of variable with any suffix match on pattern removed. (There is also ${variable#pattern} to remove a prefix, and Bash has ## and %% to trim the longest pattern match instead of the shortest.)
If efficiency is important, you could probably refactor all of the script into a single Awk script, but this way, all the pieces are simple and hopefully understandable.

adding history reverse line number to a bash script

I have the following nice little bash function to make searches in my history (here for example, looking for ls commands):
history | grep --color=always ls | sort -k2 | uniq -f 1 | sort -n
I packaged it into a bash script, linked to an alias (histg) and it works great:
#!/bin/bash
if [ "$1" == "-h" ]; then
echo "A bash script to find patterns in history, avoiding duplicates (also non consecutive)"
echo "expands to: XX"
exit 0
fi
HISTSIZE=100000 # need this, because does not read .bashrc so does not know how big a HISTSIZE
HISTFILE=~/.bash_history # Or wherever you bash history file lives
set -o history # enable history
OUTPUT="$(history | grep --color=always $1 | sort -k2 | uniq -f 1 | sort -n)"
echo "${OUTPUT}"
Typically, I get this kind of output:
$ histg SI
16424 git commit -m "working on SI"
16671 git commit -m "updated SI"
17782 cd SI/
However I want to do one more improvement, and I do not know how to proceed. I want to be able to quickly call those commands again, but as you see I have a big hist, so typing !17782 is a bit long. If the current size of my history is for example 17785 (I have a max history size 100000), I would like to see:
$ histg SI
16424 -1361 git commit -m "working on SI"
16671 -1114 git commit -m "updated SI"
17782 -3 cd ~/Desktop/crrt/wrk/SI/
so that I can type in -3
Any idea how I can adapt my bash command to add this column?
In a first try, my code was not working as expected because the negative numbers didn't match: the current session history was not taken into account. So I changed your script to a function (to add to .bashrc). The tricky part is handled by awk:
function histg() {
history | grep --color=always $1 | sort -k2 | uniq -f 1 | sort -n \
| awk '
BEGIN { hist_size = '$(history|wc -l)' }
{
n = $1; $1 = ""
printf("%-7i %-7i %s\n", n, n - hist_size, $0)
}'
history -d $(history 1)
}
The last line deletes the call to histg in history, so the negative numbers still keep sense.

Bash: Subshell behaviour of ls

I am wondering why I do not get se same output from:
ls -1 -tF | head -n 1
and
echo $(ls -1 -tF | head -n 1)
I tried to get the last modified file, but using it inside a sub shell sometimes I get more than one file as result?
Why that and how to avoid?
The problem arises because you are using an unquoted subshell and -F flag for ls outputs shell special characters appended to filenames.
-F, --classify
append indicator (one of */=>#|) to entries
Executable files are appended with *.
When you run
echo $(ls -1 -tF | head -n 1)
then
$(ls -1 -tF | head -n 1)
will return a filename, and if it happens to be an executable and also be the prefix to another file, then it will return both.
For example if you have
test.sh
test.sh.backup
then it will return
test.sh*
which when echoed expands to
test.sh test.sh.backup
Quoting the subshell prevents this expansion
echo "$(ls -1 -tF | head -n 1)"
returns
test.sh*
I just found the error:
If you use echo $(ls -1 -tF | head -n 1)
the file globing mechanism may result in additional matches.
So echo "$(ls -1 -tF | head -n 1)" would avoid this.
Because if the result is an executable it contains a * at the end.
I tried to place the why -F in a comment, but now I decided to put it here:
I added the following lines to my .bashrc, to have a shortcut to get last modified files or directories listed:
function L {
myvar=$1; h=${myvar:="1"};
echo "last ${h} modified file(s):";
export L=$(ls -1 -tF|fgrep -v / |head -n ${h}| sed 's/\(\*\|=\|#\)$//g' );
ls -l $L;
}
function LD {
myvar=$1;
h=${myvar:="1"};
echo "last ${h} modified directories:";
export LD=$(ls -1 -tF|fgrep / |head -n $h | sed 's/\(\*\|=\|#\)$//g'); ls -ld $LD;
}
alias ol='L; xdg-open $L'
alias cdl='LD; cd $LD'
So now I can use L (or L 5) to list the last (last 5) modified files. But not directories.
And with L; jmacs $L I can open my editor, to edit it. Traditionally I used my alias lt='ls -lrt' but than I have to retype the name...
Now after mkdir ... I use cdl to change to that dir.

Resources