We have a very large file structure that has been very badly built. Paths contain lots of spaces, #, spaces around dashes.
It's all hosted on a Synology NAS, so I don't have access to the whole array of tools usually included (like rename).
I'm trying to rename file AND folder names that have a leading and trailing spaces.
# Global vars
tstamp=$(date +%Y-%m-%d_%H%M%S)
# Change for separator to newline
IFS=$'\n'
echo "$tstamp - Renaming files with leading space: \n"
for filename in $(find . -type f -name '[[:space:]]*')
do
newFilename=$(echo $filename |sed 's/\/[[:space:]]/\//g')
echo "original: $filename"
echo "new : $newFilename"
mv -i -v -n $filename $newFilename
echo "\n"
done
echo "$tstamp - Renaming files with trailing space: \n"
for filename in $(find . -type f -name '*[[:space:]]')
do
newFilename=$(echo $filename |sed 's/[[:space:]]$//g')
echo "original: $filename"
echo "new : $newFilename"
mv -i -v -n $filename $newFilename
echo "\n"
done
# A slash "/" in a filename is not possible thus it's not verified
echo "$tstamp - Renaming files with unsupported characters (\ / \" : < > ; | * ?):"
for filename in $(find . -type f -name '*\**' -o -name '*\\*' -o -name '*"*' -o -name '*:*' -o -name '*<*' -o -name '*>*' -o -name '*;*' -o -name '*|*' -o -name '*\?*')
do
newFilename=$(echo $filename |sed 's/\(\\\|"\|:\|<\|>\|;\||\|\*\|\?\)//g')
echo "original: $filename"
echo "new : $newFilename"
mv -i -v -n $filename $newFilename
echo "\n"
done
echo "Done."
#EOF
Renaming files with unsupported characters works well, but not the leading and trainling spaces.
Here's an actual output where I replaced some names for security purposes:
original:
./ABC- Financing/2018 - ABC Capital Bl Fund 2018 (VCCI)/0 - Dataroom/8 - Vérification diligente/3. Governance/ 2017Q1/ Documents de Julie/#eaDir/ PPP#SynoResource
new:
./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/2017Q1/Documents de Julie/#eaDir/PPP#SynoResource
./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/ 2017Q1/ Documents de Julie/#eaDir/ CDP#SynoResource → ./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/2017Q1/Documents de Julie/#eaDir/PPP#SynoResource
mv: cannot move "./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/ 2017Q1/ Documents de Julie/#eaDir/ PPP#SynoResource" to "./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/2017Q1/Documents de Julie/#eaDir/PPP#SynoResource": No such file or directory
I don't understand why the file isn't found by the mv command.
Start with this (uses GNU versions of find and sed):
#/bin/env bash
readarray -d '' paths < <(find . -depth -print0)
for old in "${paths[#]}"; do
printf 'Working on path %q\n' "$old" >&2
new=$(
printf '%s' "$old" |
sed -z '
s#[\\":<>;|*?]##g
s#[[:space:]]*/[[:space:]]*#/#g
s#[[:space:]]*$##
'
)
if [[ "$new" != "$old" ]]; then
printf 'old: %q\n' "$old" >&2
printf 'new: %q\n' "$new" >&2
[[ -f "$new" ]] && printf 'Warning: %q already exists.\n' "$new" >&2
mv -i -v -n -- "$old" "$new"
printf '\n'
fi
done
You can probably replace the printf | sed with some bash builtins for a performance improvement but life's too short for me to try to figure that out and the above should be clear and simple enough for any other changes you need to make.
The above is untested so make sure you take a backup of your files and test it thoroughly on a temp dir before running on your real files.
Lets try to do it safely and correctly this way instead:
#!/usr/bin/env bash
shopt -s extglob # setup extended globbing so it can match group multipe times
# Find all files or directories names that:
# either starts with spaces,
# or ends with spaces,
# or contains any of the \ " : < > ; | * ? prohibited characters
find . \
-depth \
\( -type f -or -type d \) \
-regextype posix-extended \
-regex '.*/([[:space:]].*|.*[[:space:]]|.*[\\":<>;|*?].*)' \
-print0 \
| while IFS= read -r -d '' filename; do
# Isolates the file name from its directory path
base="$(basename -- "${filename}")"
# ExtGlob strips-out all instances of prohibited characters class using //
# [\\\":<>;|*?]
base="${base//[\\\":<>;|*?]/}"
# ExtGlob strips-out leading spaces
# *([[:space:]]):
# * 0 or any times the following (group)
# [[:space:]] any space character
base="${base/*([[:space:]])/}"
# ExtGlob strips-out trailing spaces using %%
base="${base%%*([[:space:]])}"
# Compose a new file name from the new base
newFilename="$(dirname -- "${filename}")/${base}"
# Prevent the new file name to collide with existing files
# by adding a versionned suffix
suffix=''
count=1
while [[ -e "${newFilename}${suffix}" ]]; do
suffix=".$((count++))"
done
newFilename="${newFilename}${suffix}"
printf \
"original: '%s'\\nnew : '%s'\\n\\n" \
"${filename}" \
"${newFilename}"
mv -- "${filename}" "${newFilename}"
done
echo 'Done.'
Related
I wrote a cleanup Script to delete some certain files. The files are stored in Subfolders. I use find to get those files into a Array and its recursive because of find. So an Array entry could look like this:
(path to File)
./2021_11_08_17_28_45_1733556/2021_11_12_04_15_51_1733556_0.jfr
As you can see the filenames are Timestamps. Find sorts by the Folder name only (./2021_11_08_17_28_45_1733556) but I need to sort all Files which can be in different Folders by the timestamp only of the files and not of the folders (they can be completely ignored), so I can delete the oldest files first. Here you can find my Script at the not properly working state, I need to add some sorting to fix my problems.
Any Ideas?
#!/bin/bash
# handle -h (help)
if [[ "$1" == "-h" || "$1" == "" ]]; then
echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
exit 0
fi
# handle parameters
while getopts p:f:d: flag
do
case "${flag}" in
p) pathToFolder=${OPTARG};;
f) maxFiles=${OPTARG};;
d) dryRun=${OPTARG};;
*) echo -e '-p [Pfad zum Zielordner] \n-f [Anzahl der Files welche noch im Ordner vorhanden sein sollen] \n-d [false um dryRun zu deaktivieren]'
esac
done
if [[ -z $dryRun ]]; then
dryRun=true
fi
# fill array specified by .jfr files an sorted that the oldest files get deleted first
fillarray() {
files=($(find -name "*.jfr" -type f))
totalFiles=${#files[#]}
}
# Return size of file
getfilesize() {
filesize=$(du -k "$1" | cut -f1)
}
count=0
checkfiles() {
# Check if File matches the maxFiles parameter
if [[ ${#files[#]} -gt $maxFiles ]]; then
# Check if dryRun is enabled
if [[ $dryRun == "false" ]]; then
echo "msg=\"Removal result\", result=true, file=$(realpath $1) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
((count++))
rm $1
else
((count++))
echo msg="\"Removal result\", result=true, file=$(realpath $1 ) filesize=$(getfilesize $1), reason=\"outside max file boundary\""
fi
# Remove the file from the files array
files=(${files[#]/$1})
else
echo msg="\"Removal result\", result=false, file=$( realpath $1), reason=\"within max file boundary\""
fi
}
# Scan for empty files
scanfornullfiles() {
for file in "${files[#]}"
do
filesize=$(! getfilesize $file)
if [[ $filesize == 0 ]]; then
files=(${files[#]/$file})
echo msg="\"Removal result\", result=false, file=$(realpath $file), reason=\"empty file\""
fi
done
}
echo msg="jfrcleanup.sh started", maxFiles=$maxFiles, dryRun=$dryRun, directory=$pathToFolder
{
cd $pathToFolder > /dev/null 2>&1
} || {
echo msg="no permission in directory"
echo msg="jfrcleanup.sh stopped"
exit 0
}
fillarray #> /dev/null 2>&1
scanfornullfiles
for file in "${files[#]}"
do
checkfiles $file
done
echo msg="\"jfrcleanup.sh finished\", totalFileCount=$totalFiles filesRemoved=$count"
Assuming the file paths do not contain newline characters, would tou please try
the following Schwartzian transform method:
#!/bin/bash
pat="/([0-9]{4}(_[0-9]{2}){5})[^/]*\.jfr$"
while IFS= read -r -d "" path; do
if [[ $path =~ $pat ]]; then
printf "%s\t%s\n" "${BASH_REMATCH[1]}" "$path"
fi
done < <(find . -type f -name "*.jfr" -print0) | sort -k1,1 | head -n 1 | cut -f2- | tr "\n" "\0" | xargs -0 echo rm
The string pat is a regex pattern to extract the timestamp from the
filename such as 2021_11_12_04_15_51.
Then the timestamp is prepended to the filename delimited by a tab
character.
The output lines are sorted by the timestamp in ascending order
(oldest first).
head -n 1 picks the oldest line. If you want to change the number of files
to remove, modify the number to the -n option.
cut -f2- drops the timestamp to retrieve the filename.
tr "\n" "\0" protects the filenames which contain whitespaces or
tab characters.
xargs -0 echo rm just outputs the command lines as a dry run.
If the output looks good, drop echo.
If you have GNU find, and pathnames don't contain new-line ('\n') and tab ('\t') characters, the output of this command will be ordered by basenames:
find path/to/dir -type f -printf '%f\t%p\n' | sort | cut -f2-
TL;DR but Since you're using find and if it supports the -printf flag/option something like.
find . -type f -name '*.jfr' -printf '%f/%h/%f\n' | sort -k1 -n | cut -d '/' -f2-
Otherwise a while read loop with another -printf option.
#!/usr/bin/env bash
while IFS='/' read -rd '' time file; do
printf '%s\n' "$file"
done < <(find . -type f -name '*.jfr' -printf '%T#/%p\0' | sort -zn)
That is -printf from find and the -z flag from sort is a GNU extension.
Saving the file names you could change
printf '%s\n' "$file"
To something like, which is an array named files
files+=("$file")
Then "${files[#]}" has the file names as elements.
The last code with a while read loop does not depend on the file names but the time stamp from GNU find.
I solved the problem! I sort the array with the following so the oldest files will be deleted first:
files=($(printf '%s\n' "${files[#]}" | sort -t/ -k3))
Link to Solution
I have an audio sample library with thousands of files. I would like to shuffle/randomize the order of these files. Can someone provide me with a bash script/line that would prepend a single random character to all files in a folder (including files in sub-folders). I do not want to prepend a random character to any of the folder names though.
Example:
Kickdrum73.wav
Kickdrum SUB.wav
Kick808.mp3
Renamed to:
f_Kickdrum73.wav
!_Kickdrum SUB.wav
4_Kick808.mp3
If possible, I would like to be able to run this script more than once, but on subsequent runs, it just changes the randomly prepended character instead of prepending a new one.
Some of my attempts:
find ~/Desktop/test -type f -print0 | xargs -0 -n1 bash -c 'mv "$0" "a${0}"'
find ~/Desktop/test/ -type f -exec mv -v {} $(cat a {}) \;
find ~/Desktop/test/ -type f -exec echo -e "Z\n$(cat !)" > !Hat 15.wav
for file in *; do
mv -v "$file" $RANDOM_"$file"
done
Note: I am running on macOS.
Latest attempt using code from mr. fixit:
find . -type f -maxdepth 999 -not -name ".*" |
cut -c 3- - |
while read F; do
randomCharacter="${F:2:1}"
if [ $randomCharacter == '_' ]; then
new="${F:1}"
else
new="_$F"
fi
fileName="`basename $new`"
newFilename="`jot -r -c $fileName 1 A Z`"
filePath="`dirname $new`"
newFilePath="$filePath$newFilename"
mv -v "$F" "$newFilePath"
done
Here's my first answer, enhanced to do sub-directories.
Put the following in file randomize
if [[ $# != 1 || ! -d "$1" ]]; then
echo "usage: $0 <path>"
else
find $1 -type f -not -name ".*" |
while read F; do
FDIR=`dirname "$F"`
FNAME=`basename "$F"`
char2="${FNAME:1:1}"
if [ $char2 == '_' ]; then
new="${FNAME:1}"
else
new="_$FNAME"
fi
new=`jot -r -w "%c$new" 1 A Z`
echo mv "$F" "${FDIR}/${new}"
done
fi
Set the permissions with chmod a+x randomize.
Then call it with randomize your/path.
It'll echo the commands required to rename everything, so you can examine them to ensure they'll work for you. If they look right, you can remove the echo from the 3rd to last line and rerun the script.
cd ~/Desktop/test, then
find . -type f -maxdepth 1 -not -name ".*" |
cut -c 3- - |
while read F; do
char2="${F:2:1}"
if [ $char2 == '_' ]; then
new="${F:1}"
else
new="_$F"
fi
new=`jot -r -w "%c$new" 1 A Z`
mv "$F" "$new"
done
find . -type f -maxdepth 1 -not -name ".*" will get all the files in the current directory, but not the hidden files (names starting with '.')
cut -c 3- - will strip the first 2 chars from the name. find outputs paths, and the ./ gets in the way of processing prefixes.
while read VAR; do <stuff>; done is a way to deal with one line at a time
char2="${VAR:2:1} sets a variable char2 to the 2nd character of the variable VAR.
if - then - else sets new to the filename, either preceded by _ or with the previous random character stripped off.
jot -r -w "%c$new" 1 A Z tacks random 1 character from A-Z onto the beginning of new
mv old new renames the file
You can also do it all in bash and there are several ways to approach it. The first is simply creating an array of letters containing whatever letters you want to use as a prefix and then generating a random number to use to choose the element of the array, e.g.
#!/bin/bash
letters=({0..9} {A..Z} {a..z}) ## array with [0-9] [A-Z] [a-z]
for i in *; do
num=$(($RANDOM % 63)) ## generate number
## remove echo to actually move file
echo "mv \"$i\" \"${letters[num]}_$i\"" ## move file
done
Example Use/Output
Current the script outputs the changes it would make, you must remove the echo "..." surrounding the mv command and fix the escaped quotes to actually have it apply changes:
$ bash ../randprefix.sh
mv "Kick808.mp3" "4_Kick808.mp3"
mv "Kickdrum SUB.wav" "h_Kickdrum SUB.wav"
mv "Kickdrum73.wav" "l_Kickdrum73.wav"
You can also do it by generating a random number representing the ASCII character between 48 (character '0') through 126 (character '~'), excluding 'backtick'), and then converting the random number to an ASCII character and prefix the filename with it, e.g.
#!/bin/bash
for i in *; do
num=$((($RANDOM % 78) + 48)) ## generate number for '0' - '~'
letter=$(printf "\\$(printf '%03o' "$num")") ## letter from number
while [ "$letter" = '`' ]; do ## exclude '`'
num=$((($RANDOM % 78) + 48)) ## generate number
letter=$(printf "\\$(printf '%03o' "$num")")
done
## remove echo to actually move file
echo "mv \"$i\" \"${letter}_$i\"" ## move file
done
(similar output, all punctuation other than backtick is possible)
In each case you will want to place the script in your path or call it from within the directory you want to move the file in (you split split dirname and basename and join them back together to make the script callable passing the directory to search as an argument -- that is left to you)
How to rename group of files in directory in bash?
For example :
I had group of file:
> 0001.txt
> 0002.txt
> 0003.txt
> 0004.txt
...
I need that 0001.txt become 0002.txt; 0002.txt become 0003.txt and etc.
And result should be so:
0002.txt
0003.txt
0004.txt
0005.txt
...
If your filenames follow the given pattern, you can do something like this :
for file in `ls | egrep '^[[:digit:]]+.txt$' | sort -r`
do
mv $file `printf %04d $(expr ${file%.*} + 1)`.txt
done
Edit
For filenames with the prefix tet you can modify the script above like this :
for file in `ls | egrep '^tet[[:digit:]]+.txt$' | sort -r`
do
filename=${file%.*}
mv $file tet`printf %04d $(expr ${filename:3} + 1)`.txt
done
Just for curiosity, I would appreciate if some bash experts know a way to avoid the temporary variable filename
You can use a below simple script:-
#!/bin/bash
while IFS= read -r -d '' file; do
filename=$(basename "$file") # Get the absolute path of the file
filename=${filename%.*} # Getting file-name without the extension part 'tet0002', 'tet0001'
filename=${filename:3} # Getting the numerical part '002', '001'
# To preserve the leading pad '0's, retaining the decimal representation
# using printf and appending '10#' notation. '-v' for verbose only (can
# be removed)
mv -v "$file" tet"$(printf %04d "$((10#$filename + 1))")".txt
done < <(find . -maxdepth 1 -mindepth 1 -name "tet*.txt" -type f -print0)
See this in action
$ ls tet*
tet0003.txt tet0005.txt tet0008.txt
$ ./script.sh
`./tet0005.txt' -> `tet0006.txt'
`./tet0008.txt' -> `tet0009.txt'
`./tet0003.txt' -> `tet0004.txt'
i am trying to upload some files to s3 and have this bash script:
#!/bin/bash
s3upload() {
echo $1
for f in $(find $d \( ! -regex '.*/\..*' \) -type f)
do
extension=$(file $f | cut -d ' ' -f2 | awk '{print tolower($0)}')
mimetype=$(file --mime-type $f | cut -d ' ' -f2)
echo $mimetype
fullpath=$(readlink -f $f)
#response=$(s3cmd put -v setacl --acl-public \
# --add-header="Expires: $(date -u +"%a, %d %b %Y %H:%M:%S GMT" --date "+1 years")" \
# --add-header="Cache-Control: max-age=1296000, public" \
# --mime-type=$mimetype \
# $fullpath \
# s3://ccc-public/catalog/)
#echo $response
done
}
BASE='./nas/cdn/catalog'
echo $BASE
for d in $(find . -type d -regex '\{$BASE}/[^.]*')
do
echo "Uploading $d"
s3upload $d
done
the issue is that i can't pass the $BASE to the regex
basically i want to append the directory path after catalog/ to the s3 path s3://ccc-public/catalog/
./nas/cdn/catalog/swatches
./nas/cdn/catalog/product_shots
./nas/cdn/catalog/product_shots/high_res
./nas/cdn/catalog/product_shots/high_res/back
./nas/cdn/catalog/product_shots/high_res/front
./nas/cdn/catalog/product_shots/low_res
./nas/cdn/catalog/product_shots/low_res/back
./nas/cdn/catalog/product_shots/low_res/front
./nas/cdn/catalog/product_shots/thumbs
./nas/cdn/catalog/full_length
./nas/cdn/catalog/full_length/high_res
./nas/cdn/catalog/full_length/low_res
./nas/cdn/catalog/cropped
./nas/cdn/catalog/drawings
to s3://ccc-public/catalog/
any advice much appreciated
The variables in 'single quotes' will be never evaluated. You need "double quotes" for $BASE.
See http://mywiki.wooledge.org/Quotes, http://mywiki.wooledge.org/Arguments and http://wiki.bash-hackers.org/syntax/words.
Moreover, instead of using for loops, you should use while IFS= read -r to treat files with special characters like spaces and other surprises.
Also, find can do the whole work alone :
BASE='./nas/cdn/catalog'
find . -type d -regex "${BASE}/[^.]*" -exec s3upload {} \;
I created script to compare files in folder (with the name .jpg and without it BUT with the same NAME).The problem that script searches for files in ONE directory ,not in SubDirectories!How i can fix it?
for f in *
do
for n in *.jpg
do
tempfile="${n##*/}"
echo "Processing"
echo "${tempfile%.*}"
echo "$f"
if [[ "${tempfile%.*}" = $f ]]
then
echo "This files have the same name!"
//do something here
else
echo "No files"
fi
done
done
This requires bash version 4 for associative arrays.
shopt -s globstar nullglob extglob
declare -A jpgs
for jpg in **/*.jpg; do
name=$(basename "${jpg%.jpg}")
jpgs["$name"]=$jpg
done
for f in **/!(*.jpg); do
name=$(basename "$f")
if [[ -n ${jpgs["$name"]} ]]; then
echo "$f has the same name as ${jpgs["$name"]}"
fi
done
You can also try using find
find . -type f -name "*.sh" -printf "%f\n" | cut -f1 -d '.' > jpg.txt
while read line
do
find . -name "$line.*" -print
done < jpg.txt