Sort folders by name - sorting

I would like to sort a list of subfolders contained in a folder, according to the name in an increasing manner:
Ex: folder names:
TF_list_to_test10679
TF_list_to_test1062
TF_list_to_test1078
...
Desired output:
TF_list_to_test1062
TF_list_to_test1078
TF_list_to_test10679
...
How this can be done?

The following should work if you don't have file names with tabs:
find . -type d | sed 's/[[:digit:]]*$/&\t&/' | sort -nk 2 | cut -f 1

Related

delete all but the last match

I want to delete all but the last match of a set of files matching file* that are present in each folder within a directory.
For example:
Folder 1
file
file_1-1
file_1-2
file_2-1
stuff.txt
stuff
Folder 2
file_1-1
file_1-2
file_1-3
file_2-1
file_2-2
stuff.txt
Folder 3
...
and so on. Within every subfolder I want to keep only the last of the matched files, so for Folder 1 this would be file_2-1, in Folder 2 it would be file_2-2. The number of files is generally different within each subfolder.
Since I have a very nestled folder structure I thought about using the find command somehow like this
find . -type f -name "file*" -delete_all_but_last_match
I know how to delete all matches but not how to exclude the last match.
I also found the following piece of code:
https://askubuntu.com/questions/1139051/how-to-delete-all-but-x-last-items-from-find
but when I apply a modified version to a test folder
find . -type f -name "file*" -print0 | head -zn-1 | xargs -0 rm -rf
it deletes all the matches in most cases, only in some the last file is spared. So it does not work for me, presumably because of the different number of files in each folder.
Edit:
The folders contain no further subfolders, but they are generally at the end of several subfolder levels. It would therefore be a benefit if the script can be executed some levels above as well.
#!/bin/bash
shopt -s globstar
for dir in **/; do
files=("$dir"file*)
unset 'files[-1]'
rm "${files[#]}"
done
Try the following solution utilising awk and xargs:
find . -type f -name "file*" | awk -F/ '{ map1[$(NF-1)]++;map[$(NF-1)][map1[$(NF-1)]]=$0 }END { for ( i in map ) { for (j=1;j<=(map1[i]-1);j++) { print "\""map[i][j]"\"" } } }' | xargs rm
Explanation:
find . -type f -name "file*" | awk -F/ '{ # Set the field delimiter to "/" in awk
map1[$(NF-1)]++; # Create an array map1 with the sub-directory as the index and an incrementing counter the value (number of files in each sub-directory)
map[$(NF-1)][map1[$(NF-1)]]=$0 # Create a two dimentional array with the sub directory index one and the file count the second. The line the value
}
END {
for ( i in map ) {
for (j=1;j<=(map1[i]-1);j++) {
print "\""map[i][j]"\"" # Loop through the map array utilising map1 to get the last but one file and printing the results
}
}
}' | xargs rm # Run the result through xargs rm
Remove the pipe to xargs to verify that the files are listing as expected before adding back in to actually remove the files.

Filter folders that do not contain any audio files with bash

Given a root folder, how do I filter down subfolders that do not contain any audio files (mp3, wav and flac)? Do I need to set a variable like
folders = find /parentfolder/ -type d
and then pass some expression on ${folders} or is there a one-liner for this?
All the subdirectories of . (we write that into a file):
find . -type d | sort > all_dirs.txt
All subdirectories that do contain an mp3 file (goes into another file):
find . -name "*.mp3" | xargs dirname | sort | uniq > music_dirs.txt
And this is the lines that are only contained in the first file but not the second:
diff --new-line-format="" --unchanged-line-format="" all_dirs.txt music_dirs.txt
If you think oneliners are cool and you are working in bash, here it is a bit more condensed:
diff --new-line-format="" --unchanged-line-format="" <(find . -type d | sort) <(find . -name "*.mp3" | xargs dirname | sort | uniq)

bash script to list duplicate hash files [duplicate]

This question already has answers here:
Linux Command Line using for loop and formatting results
(3 answers)
Closed 5 years ago.
I want to create a bash script that searches a given directory for pictures to copy. the pictures have to have the name format IMG_\d\d\d\d.JPG. If the pictures have a duplicate filename, then copy them to /images/archives and append .JPG to the end of their name, so the duplicates have .JPG.JPG. There are also duplicate pictures, so I want to hash each picture and check if it is a duplicate picture. If it is a duplicate picture, then do not copy the duplicate into /archives but store the duplicate file path into a file called output.txt.
I am struggling with trying to get the duplicate hashes to display the filenames as well. This is what I had so far:
if [ -d $1 ]
then echo using directory $1 as source
else echo Sorry, not a valid drive
exit
fi
if [ -d $2 ]
then echo $2 target location already exists
else mkdir -p $2
fi
cd $1
myList=`find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG"`
echo $myList
ImagesToCopy=`find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -exec md5sum {} \; | cut -f1 -d" " | sort | uniq`
echo $ImagesToCopy
This gives me a list of the files I need to copy and their hashes. In the command line if I type in the command:
# find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -exec md5sum {} \; | sort | cut -f1 -d" "| uniq -d
I receive the results:
266ab54fd8a6dbc7ba61a0ee526763e5
88761da2c2a0e57d8aab5327a1bb82a9
cc640e50f69020dd5d2d4600e20524ac
This is the list of duplicate files that I do not want to copy but I want to also display the file path and filenames alongside this, like this:
# find . -mindepth 1 -type f -name "*MG_[0-9][0-9][0-9][0-9].JPG" -exec md5sum {} \; | sort -k1 | uniq -u
043007387f39f19b3418fcba67b8efda ./IMG_1597.JPG
05f0c10c49983f8cde37d65ee5790a9f ./images/IMG_2012/IMG_2102.JPG
077c22bed5e0d0fba9e666064105dc72 ./DCIM/IMG_0042.JPG
1a2764a21238aaa1e28ea6325cbf00c2 ./images/IMG_2012/IMG_1403.JPG
1e343279cd05e8dbf371331314e3a2f6 ./images/IMG_1959.JPG
2226e652bf5e3ca3fbc63f3ac169c58b ./images/IMG_0058.JPG
266ab54fd8a6dbc7ba61a0ee526763e5 ./images/IMG_0079.JPG
266ab54fd8a6dbc7ba61a0ee526763e5 ./images/IMG_2012/IMG_0079.JPG
2816dbcff1caf70aecdbeb934897fd6e ./images/IMG_1233.JPG
451110cc2aff1531e64f441d253b7fec ./DCIM/103canon/IMG_0039.JPG
45a00293c0837f10e9ec2bfd96edde9f ./DCIM/103canon/IMG_0097.JPG
486f9dd9ee20ba201f0fd9a23c8e7289 ./images/IMG_2013/IMG_0060.JPG
4c2054c57a2ca71d65f92caf49721b4e ./DCIM/IMG_1810.JPG
53313e144725be3993b1d208c7064ef6 ./IMG_2288.JPG
5ac56dcddd7e0fd464f9b243213770f5 ./images/IMG_2012/favs/IMG_0039.JPG
65b15ebd20655fae29f0d2cf98588fc3 ./DCIM/IMG_2564.JPG
88761da2c2a0e57d8aab5327a1bb82a9 ./images/IMG_2012/favs/IMG_1729.JPG
88761da2c2a0e57d8aab5327a1bb82a9 ./images/IMG_2013/IMG_1729.JPG
8fc75b0dd2806d5b4b2545aa89618eb6 ./DCIM/103canon/IMG_2317.JPG
971f0a4a064bb1a2517af6c058dc3eb3 ./images/IMG_2012/favs/IMG_2317.JPG
aad617065e46f97d97bd79d72708ec10 ./images/IMG_2013/IMG_1311.JPG
c937509b5deaaee62db0bf137bc77366 ./DCIM/IMG_1152.JPG
cc640e50f69020dd5d2d4600e20524ac ./images/IMG_2012/favs/IMG_2013.JPG
cc640e50f69020dd5d2d4600e20524ac ./images/IMG_2013/IMG_2013.JPG
d8edfcc3f9f322ae5193e14b5f645368 ./images/IMG_2012/favs/IMG_1060.JPG
dcc1da7daeb8507f798e4017149356c5 ./DCIM/103canon/IMG_1600.JPG
ded2f32c88796f40f080907d7402eb44 ./IMG_0085.JPG
Thanks in advance.
Let's suppose that you have the results of md5sum. For example:
$ cat file
266ab54fd8a6dbc7ba61a0ee526763e5 /path/to/file1a
88761da2c2a0e57d8aab5327a1bb82a9 /path/to/file2a
266ab54fd8a6dbc7ba61a0ee526763e5 /path/to/file1b
cc640e50f69020dd5d2d4600e20524ac /path/to/file3
88761da2c2a0e57d8aab5327a1bb82a9 /path/to/file2b
To remove duplicates from the list, use awk:
$ awk '!($1 in a){a[$1]; print}' file
266ab54fd8a6dbc7ba61a0ee526763e5 /path/to/file1a
88761da2c2a0e57d8aab5327a1bb82a9 /path/to/file2a
cc640e50f69020dd5d2d4600e20524ac /path/to/file3
This uses the array a to keep track of which md5 sums we have seen so far. For each line, if the md5 has not appeared before, !($1 in a), we mark that md5 as having been seen and print the line.
Alternative
A shorter version of the code is:
$ awk '!a[$1]++' file
266ab54fd8a6dbc7ba61a0ee526763e5 /path/to/file1a
88761da2c2a0e57d8aab5327a1bb82a9 /path/to/file2a
cc640e50f69020dd5d2d4600e20524ac /path/to/file3
This uses array a to count the number of times that md5sum $1 has appeared. If the count is initially zero, then the line is printed.

find only the first file from many directories

I have a lot of directories:
13R
613
AB1
ACT
AMB
ANI
Each directories contains a lots of file:
20140828.13R.file.csv.gz
20140829.13R.file.csv.gz
20140830.13R.file.csv.gz
20140831.13R.file.csv.gz
20140901.13R.file.csv.gz
20131114.613.file.csv.gz
20131115.613.file.csv.gz
20131116.613.file.csv.gz
20131117.613.file.csv.gz
20141114.ab1.file.csv.gz
20141115.ab1.file.csv.gz
20141116.ab1.file.csv.gz
20141117.ab1.file.csv.gz
etc..
The purpose if to have the first file from each directories
The result what I expect is:
13R|20140828
613|20131114
AB1|20141114
Which is the name of the directories pipe the date from the filename.
I guess I need a find and head command + awk but I can't make it, I need your help.
Here what I have test it
for f in $(ls -1);do ls -1 $f/ | head -1;done
But the folder name is missing.
When I mean the first file, is the first file returned in an alphabetical order within the folder.
Thanks.
You can do this with a Bash loop.
Given:
/tmp/test
/tmp/test/dir_1
/tmp/test/dir_1/file_1
/tmp/test/dir_1/file_2
/tmp/test/dir_1/file_3
/tmp/test/dir_2
/tmp/test/dir_2/file_1
/tmp/test/dir_2/file_2
/tmp/test/dir_2/file_3
/tmp/test/dir_3
/tmp/test/dir_3/file_1
/tmp/test/dir_3/file_2
/tmp/test/dir_3/file_3
/tmp/test/file_1
/tmp/test/file_2
/tmp/test/file_3
Just loop through the directories and form an array from a glob and grab the first one:
prefix="/tmp/test"
cd "$prefix"
for fn in dir_*; do
cd "$prefix"/"$fn"
arr=(*)
echo "$fn|${arr[0]}"
done
Prints:
dir_1|file_1
dir_2|file_1
dir_3|file_1
If your definition of 'first' is different that Bash's, just sort the array arr according to your definition before taking the first element.
You can also do this with find and awk:
$ find /tmp/test -mindepth 2 -print0 | awk -v RS="\0" '{s=$0; sub(/[^/]+$/,"",s); if (s in paths) next; paths[s]; print $0}'
/tmp/test/dir_1/file_1
/tmp/test/dir_2/file_1
/tmp/test/dir_3/file_1
And insert a sort (or use gawk) to sort as desired
sort has an unique option. Only the directory should be unique, so use the first field in sorting -k1,1. The solution works when the list of files is sorted already.
printf "%s\n" */* | sort -k1,1 -t/ -u | sed 's#\(.*\)/\([0-9]*\).*#\1|\2#'
You will need to change the sed command when the date field may be followed by another number.
This works for me:
for dir in $(find "$FOLDER" -type d); do
FILE=$(ls -1 -p $dir | grep -v / | head -n1)
if [ ! -z "$FILE" ]; then
echo "$dir/$FILE"
fi
done

how can I get unix to list number of lines in a file and just output the file name?

I am new to unix and was kind of stuck with the below, could you let me know how I can do this?
List the number of lines in the files in
/courses/projweek/unix/commands/quotes, sorted by
the number of lines they contain, so it looks like this:
2 deadlines.txt
2 live.txt
3 airports.txt
3 universe.txt
6 universe2.txt
How would you make that list contain just the file names? e.g. like this:
deadlines.txt
live.txt
airports.txt
universe.txt
universe2.txt
You can use combination of find,wc and sort like below to achieve first part -
find /courses/projweek/unix/commands/quotes -type f -exec wc -l {} + | sort -n
and second part you can achieve using below command
find /courses/projweek/unix/commands/quotes -type f -exec wc -l {} + | sort -n | cut -d "/" -f2

Resources