How can I copy all my disorganized files into a single directory? (on linux) - bash

I have thousands of mp3s inside a complex folder structure which resides within a single folder. I would like to move all the mp3s into a single directory with no subfolders. I can think of a variety of ways of doing this using the find command but one problem will be duplicate file names. I don't want to replace files since I often have multiple versions of a same song. Auto-rename would be best. I don't really care how the files are renamed.
Does anyone know a simple and safe way of doing this?

You could change a a/b/c.mp3 path into a - b - c.mp3 after copying. Here's a solution in Bash:
find srcdir -name '*.mp3' -printf '%P\n' |
while read i; do
j="${i//\// - }"
cp -v "srcdir/$i" "dstdir/$j"
done
And in a shell without ${//} substitution:
find srcdir -name '*.mp3' -printf '%P\n' |
sed -e 'p;s:/: - :g' |
while read i; do
read j
cp -v "srcdir/$i" "dstdir/$j"
done
For a different scheme, GNU's cp and mv can make numbered backups instead of overwriting -- see -b/--backup[=CONTROL] in the man pages.
find srcdir -name '*.mp3' -exec cp -v --backup=numbered {} dstdir/ \;

bash like pseudocode:
for i in `find . -name "*.mp3"`; do
NEW_NAME = `basename $i`
X=0
while ! -f move_to_dir/$NEW_NAME
NEW_NAME = $NEW_NAME + incr $X
mv $i $NEW_NAME
done

#!/bin/bash
NEW_DIR=/tmp/new/
IFS="
"; for a in `find . -type f `
do
echo "$a"
new_name="`basename $a`"
while test -e "$NEW_DIR/$new_name"
do
new_name="${new_name}_"
done
cp "$a" "$NEW_DIR/$new_name"
done

I'd tend to do this in a simple script rather than try to fit in in a single command line.
For instance, in python, it would be relatively trivial to do a walk() through the directory, copying each mp3 file found to a different directory with an automatically incremented number.
If you want to get fancier, you could have a dictionary of existing file names, and simply append a number to the duplicates. (the index of the dictionary being the file name, and the value being the number of files found so far, which would become your suffix)

find /path/to/mp3s -name *.mp3 -exec mv \{\} /path/to/target/dir \;

At the risk of many downvotes, a perl script could be written in short time to accomplish this.
Pseudocode:
while (-e filename)
change filename to filename . "1";

In python: to actually move the file, change debug=False
import os, re
from_dir="/from/dir"
to_dir = "/target/dir"
re_ext = "\.mp3"
debug = True
w = os.walk(from_dir)
n = w.next()
while n:
d, arg, names = n
names = filter(lambda fn: re.match(".*(%s)$"%re_ext, fn, re.I) , names)
n = w.next()
for fn in names:
from_fn = os.path.join(d,fn)
target_fn = os.path.join(to_dir, fn)
file_exists = os.path.exists(target_fn)
if not debug:
if not file_exists:
os.rename(from_fn, target_fn)
else:
print "DO NOT MOVE - FILE EXISTS ", from_fn
else:
print "MOVE ", from_fn, " TO " , target_fn

Since you don't care how the duplicate files are named, utilize the 'backup' option on move:
find /path/to/mp3s -name *.mp3 -exec mv --backup=numbered {} /path/to/target/dir \;
Will get you:
song.mp3
song.mp3.~1~
song.mp3.~2~

Related

Bash find: exec in reverse oder

I am iterating over files like so:
find $directory -type f -exec codesign {} \;
Now the problem here is that files on a higher hierarchy are signed first.
Is there a way to iterate over a directory tree and handle the deepest files first?
So that
/My/path/to/app/bin
is handled before
/My/path/mainbin
Yes, just use -depth:
-depth
The primary shall always evaluate as true; it shall cause descent of the directory hierarchy to be done so that all entries in a directory are acted on before the directory itself. If a -depth primary is not specified, all entries in a directory shall be acted on after the directory itself. If any -depth primary is specified, it shall apply to the entire expression even if the -depth primary would not normally be evaluated.
For example:
$ mkdir -p top/a/b/c/d/e/f/g/h
$ find top -print
top
top/a
top/a/b
top/a/b/c
top/a/b/c/d
top/a/b/c/d/e
top/a/b/c/d/e/f
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f/g/h
$ find top -depth -print
top/a/b/c/d/e/f/g/h
top/a/b/c/d/e/f/g
top/a/b/c/d/e/f
top/a/b/c/d/e
top/a/b/c/d
top/a/b/c
top/a/b
top/a
top
Note that at a particular level, ordering is still arbitrary.
Using GNU utilities, and decorate-sort-undecorate pattern (aka Schwartzian transform):
find . -type f -printf '%d %p\0' |
sort -znr |
sed -z 's/[0-9]* //' |
xargs -0 -I# echo codesign #
Drop the echo if the output looks ok.
Using find's -depth option as my other answer, or naive sort as some others, only ensures that sub-directories of a directory are processed before the directory itself, but not that the deepest level is processed first.
For example:
$ mkdir -p top/a/b/d/f/h top/a/c/e/g
$ find top -depth -print
top/a/c/e/g
top/a/c/e
top/a/c
top/a/b/d/f/h
top/a/b/d/f
top/a/b/d
top/a/b
top/a
top
For overall deepest level to be processed first, the ordering should be something like:
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
To determine this ordering, the entire list must be known, and then the number of levels (ie. /) of each path counted to enable ranking.
A simple-ish Perl script (assigned to a shell function for this example) to do this ordering is:
$ dsort(){
perl -ne '
BEGIN { $/ = "\0" } # null-delimited i/o
$fname[$.] = $_;
$depth[$.] = tr|/||;
END {
print
map { $fname[$_] }
sort { $depth[$b] <=> $depth[$a] }
keys #fname
}
'
}
Then:
$ find top -print0 | dsort | xargs -0 -I# echo #
top/a/b/d/f/h
top/a/c/e/g
top/a/b/d/f
top/a/c/e
top/a/b/d
top/a/c
top/a/b
top/a
top
How about sorting the output of find in descending order:
while IFS= read -d "" -r f; do
codesign "$f"
done < <(find "$directory" -type f -print0 | sort -zr)
<(command ..) is a process substitution which feeds the output
of the command to the read command in while loop via the redirect.
-print0, sort -z and read -d "" combo uses a null character
as a file delimiter. It is useful to protect filenames which include
special characters such as whitespace.
I don't know if there is a native way in find, but you may pipe the output of it into a loop and process it line by line as you wish this way:
find . | while read file; do echo filename: "$file"; done
In your case, if you are happy just reversing the output of find, you may go with something like:
find $directory -type f | tac | while read file; do codesign "$file"; done

Want to add the headers in text file in shell script

I want to add the header at the start of the file for that I use the following code but it will not add the header can please help where i am doing wrong.
start_MERGE_JobRec()
{
FindBatchNumber
export TEMP_SP_FORMAT="Temp_${file_indicator}_SP_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_SS_FORMAT="Temp_${file_indicator}_SS_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_SG_FORMAT="Temp_${file_indicator}_SG_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export TEMP_GS_FORMAT="Temp_${file_indicator}_GS_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt"
export SP_OUTPUT_FILE="RTBCON_${file_indicator}_SP_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export SS_OUTPUT_FILE="RTBCON_${file_indicator}_SS_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export SG_OUTPUT_FILE="RTBCON_${file_indicator}_SG_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
export GS_OUTPUT_FILE="RTBCON_${file_indicator}_GS_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
#---------------------------------------------------
# Add header at the start for each file
#---------------------------------------------------
awk '{print "recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason"$0}' >> $SP_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SP_FORMAT}" -exec cat {} + >> $SP_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SS_FORMAT}" -exec cat {} + >> $SS_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_SG_FORMAT}" -exec cat {} + >> $SG_OUTPUT_FILE
find . -maxdepth 1 -type f -name "${TEMP_GS_FORMAT}" -exec cat {} + >> $GS_OUTPUT_FILE
}
I use awk to add the header but it's not working.
Awk requires an input file before it will print anything.
A common way to force Awk to print something even when there is no input is to put the print statement in a BEGIN block;
awk 'BEGIN { print "something" }' /dev/null
but if you want to prepend a header to all the output files, I don't see why you are using Awk here at all, let alone printing the header in front of every output line. Are you looking for this, instead?
echo 'recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason' |
tee "$SS_OUTPUT_FILE" "$SG_OUTPUT_FILE" "$GS_OUTPUT_FILE" >"$SP_OUTPUT_FILE"
Notice also how we generally always quote shell variables unless we specifically want the shell to perform word splitting and wildcard expansion on their values, and avoid upper case for private variables.
There also does not seem to be any reason to export your variables -- neither Awk nor find pays any attention to them, and there are no other processes here. The purpose of export is to make a variable visible to the environment of subprocesses. You might want to declare them as local, though.
Perhaps break out a second function to avoid all this code repetition, anyway?
merge_individual_job() {
echo 'recordType|lifetimeId|MSISDN|status|effectiveDate|expiryDate|oldMSISDN|accountType|billingAccountNumber|usageTypeBen|IMEI|IMSI|cycleCode|cycleMonth|firstBillExperience|recordStatus|failureReason'
find . -maxdepth 1 -type f -name "$1" -exec cat {} +
}
start_MERGE_JobRec()
{
FindBatchNumber
local id
for id in SP SS SG GS; do
merge_individual_job \
"Temp_${file_indicator}_${id}_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_INSTANCE[0-9][0-9].txt" \
>"RTBCON_${file_indicator}_${id}_${ONLINE_DATE}${TIME}_${BATCH_NUMBER}.txt"
done
}
If FindBatchNumber sets the variable file_indicator, a more idiomatic and less error-prone approach is to have it just echo it, and have the caller assign it:
file_indicator=$(FindBatchNumber)

delete all but the last match

I want to delete all but the last match of a set of files matching file* that are present in each folder within a directory.
For example:
Folder 1
file
file_1-1
file_1-2
file_2-1
stuff.txt
stuff
Folder 2
file_1-1
file_1-2
file_1-3
file_2-1
file_2-2
stuff.txt
Folder 3
...
and so on. Within every subfolder I want to keep only the last of the matched files, so for Folder 1 this would be file_2-1, in Folder 2 it would be file_2-2. The number of files is generally different within each subfolder.
Since I have a very nestled folder structure I thought about using the find command somehow like this
find . -type f -name "file*" -delete_all_but_last_match
I know how to delete all matches but not how to exclude the last match.
I also found the following piece of code:
https://askubuntu.com/questions/1139051/how-to-delete-all-but-x-last-items-from-find
but when I apply a modified version to a test folder
find . -type f -name "file*" -print0 | head -zn-1 | xargs -0 rm -rf
it deletes all the matches in most cases, only in some the last file is spared. So it does not work for me, presumably because of the different number of files in each folder.
Edit:
The folders contain no further subfolders, but they are generally at the end of several subfolder levels. It would therefore be a benefit if the script can be executed some levels above as well.
#!/bin/bash
shopt -s globstar
for dir in **/; do
files=("$dir"file*)
unset 'files[-1]'
rm "${files[#]}"
done
Try the following solution utilising awk and xargs:
find . -type f -name "file*" | awk -F/ '{ map1[$(NF-1)]++;map[$(NF-1)][map1[$(NF-1)]]=$0 }END { for ( i in map ) { for (j=1;j<=(map1[i]-1);j++) { print "\""map[i][j]"\"" } } }' | xargs rm
Explanation:
find . -type f -name "file*" | awk -F/ '{ # Set the field delimiter to "/" in awk
map1[$(NF-1)]++; # Create an array map1 with the sub-directory as the index and an incrementing counter the value (number of files in each sub-directory)
map[$(NF-1)][map1[$(NF-1)]]=$0 # Create a two dimentional array with the sub directory index one and the file count the second. The line the value
}
END {
for ( i in map ) {
for (j=1;j<=(map1[i]-1);j++) {
print "\""map[i][j]"\"" # Loop through the map array utilising map1 to get the last but one file and printing the results
}
}
}' | xargs rm # Run the result through xargs rm
Remove the pipe to xargs to verify that the files are listing as expected before adding back in to actually remove the files.

Efficient way to find paths from a list of filenames

From a list of file names stored in a file f, what's the best way to find the relative path of each file name under dir, outputting this new list to file p? I'm currently using the following:
while read name
do
find dir -type f -name "$name" >> p
done < f
which is too slow for a large list, or a large directory tree.
EDIT: A few numbers:
Number of directories under dir: 1870
Number of files under dir: 80622
Number of filenames in f: 73487
All files listed in f do exist under dir.
The following piece of python code does the trick. The key is to run find once and store the output in a hashmap to provide an O(1) way to get from file_name to the list of paths for the filename.
#!/usr/bin/env python
import os
file_names = open("f").readlines()
file_paths = os.popen("find . -type f").readlines()
file_names_to_paths = {}
for file_path in file_paths:
file_name = os.popen("basename "+file_path).read()
if file_name not in file_names_to_paths:
file_names_to_paths[file_name] = [file_path]
else:
file_names_to_paths[file_name].append(file_path) # duplicate file
out_file = open("p", "w")
for file_name in file_names:
if file_names_to_paths.has_key(file_name):
for path in file_names_to_paths[file_name]:
out_file.write(path)
Try this perl one-liner
perl -e '%H=map{chomp;$_=>1}<>;sub R{my($p)=#_;map R($_),<$p/*> if -d$p;($b=$p)=~s|.*/||;print"$p\n" if$H{$b}}R"."' f
1- create an hashmap whose keys are filenames : %H=map{chomp;$_=>1}<>
2- define a recursive subroutine to traverse directories : sub R{}
2.1- recusive call for directories : map R($_), if -d$p
2.2- extract the filename from the path : ($b=$p)=~s|.*/||
2.3- print if hashmap contains filename : print"$p\n" if$H{$b}
3- call R with path current directory : R"."
EDIT : to traverse hidden directories (.*)
perl -e '%H=map{chomp;$_=>1}<>;sub R{my($p)=#_;map R($_),grep !m|/\.\.?$|,<$p/.* $p/*> if -d$p;($b=$p)=~s|.*/||;print"$p\n" if$H{$b}}R"."' f
I think this should do the trick:
xargs locate -b < f | grep ^dir > p
Edit: I can't think of an easy way to prefix dir/*/ to the list of file names, otherwise you could just pass that directly to xargs locate.
Depending on what percentage of the directory tree is considered a match, it might be faster to find every file, then grep out the matching ones:
find "$dir" -type f | grep -f <( sed 's+\(.*\)+/\1$+' "$f" )
The sed command pre-processes your list of file names into regular expressions that will only match full names at the end of a path.
Here is an alternative using bash and grep
#!/bin/bash
flist(){
for x in "$1"/*; do #*/ for markup
[ -d "$x" ] && flist $x || echo "$x"
done
}
dir=/etc #the directory you are searching
list=$(< myfiles) #the file with file names
#format the list for grep
list="/${list//
/\$\|/}"
flist "$dir" | grep "$list"
...if you need full posix shell compliance (busybox ash, hush, etc...) replace the $list substring manipulation with a variant of chepner's sed and replace $(< file) with $(cat file)

Recursively check length of directory name

I need to determine if there are any directory names > 31 characters in a given directory (i.e. look underneath that root).
I know I can use something like find /path/to/root/dir -type d >> dirnames.txt
This will give me a text file of complete paths.
What I need is to get the actual number of characters in each directory name. Not sure if parsing the above results w/sed or awk makes sense. Looking for ideas/thoughts/suggestions/tips on how to accomplish this. Thanks!
This short script does it all in one go, i.e. finds all directory names and then outputs any which are greater than 31 characters in length (along with their length in characters):
for d in `find /path/to/root/dir -type d -exec basename {} \;` ; do
len=$(echo $d | wc -c)
if [ $len -gt 31 ] ; then
echo "$d = $len characters"
fi
done
Using your dirnames.txt file created by your find cmd, you can then sort the data by length of pathname, i.e.
awk '{print length($0) "\t" $0}' dirnames.txt | sort +0nr -1 > dirNamesWithSize.txt
This will present the longest path names (based on the value of length) at the top of the file.
I hope this helps.
Try this
find . -type d -exec bash -c '[ $(wc -c <<<"${1##*/}") -gt 32 ] && echo "${1}"' -- {} \; 2>/dev/null
The one bug, which I consider minor, is that it will over-count directory name length by 1 every time.
If what you wanted was the whole path rather than the last path component, then use this:
find . -type d | sed -e '/.\{32,\}/!d'
This version also has a bug, but only when file names have embedded newlines.
The output of both commands is a list of file names which match the criteria. Counting the length of each one is trivial from there.

Resources