split directory paths and use with regex, bash - bash

i am trying to upload some files to s3 and have this bash script:
#!/bin/bash
s3upload() {
echo $1
for f in $(find $d \( ! -regex '.*/\..*' \) -type f)
do
extension=$(file $f | cut -d ' ' -f2 | awk '{print tolower($0)}')
mimetype=$(file --mime-type $f | cut -d ' ' -f2)
echo $mimetype
fullpath=$(readlink -f $f)
#response=$(s3cmd put -v setacl --acl-public \
# --add-header="Expires: $(date -u +"%a, %d %b %Y %H:%M:%S GMT" --date "+1 years")" \
# --add-header="Cache-Control: max-age=1296000, public" \
# --mime-type=$mimetype \
# $fullpath \
# s3://ccc-public/catalog/)
#echo $response
done
}
BASE='./nas/cdn/catalog'
echo $BASE
for d in $(find . -type d -regex '\{$BASE}/[^.]*')
do
echo "Uploading $d"
s3upload $d
done
the issue is that i can't pass the $BASE to the regex
basically i want to append the directory path after catalog/ to the s3 path s3://ccc-public/catalog/
./nas/cdn/catalog/swatches
./nas/cdn/catalog/product_shots
./nas/cdn/catalog/product_shots/high_res
./nas/cdn/catalog/product_shots/high_res/back
./nas/cdn/catalog/product_shots/high_res/front
./nas/cdn/catalog/product_shots/low_res
./nas/cdn/catalog/product_shots/low_res/back
./nas/cdn/catalog/product_shots/low_res/front
./nas/cdn/catalog/product_shots/thumbs
./nas/cdn/catalog/full_length
./nas/cdn/catalog/full_length/high_res
./nas/cdn/catalog/full_length/low_res
./nas/cdn/catalog/cropped
./nas/cdn/catalog/drawings
to s3://ccc-public/catalog/
any advice much appreciated

The variables in 'single quotes' will be never evaluated. You need "double quotes" for $BASE.
See http://mywiki.wooledge.org/Quotes, http://mywiki.wooledge.org/Arguments and http://wiki.bash-hackers.org/syntax/words.
Moreover, instead of using for loops, you should use while IFS= read -r to treat files with special characters like spaces and other surprises.
Also, find can do the whole work alone :
BASE='./nas/cdn/catalog'
find . -type d -regex "${BASE}/[^.]*" -exec s3upload {} \;

Related

How to mv using sed to rename files with trailing space?

We have a very large file structure that has been very badly built. Paths contain lots of spaces, #, spaces around dashes.
It's all hosted on a Synology NAS, so I don't have access to the whole array of tools usually included (like rename).
I'm trying to rename file AND folder names that have a leading and trailing spaces.
# Global vars
tstamp=$(date +%Y-%m-%d_%H%M%S)
# Change for separator to newline
IFS=$'\n'
echo "$tstamp - Renaming files with leading space: \n"
for filename in $(find . -type f -name '[[:space:]]*')
do
newFilename=$(echo $filename |sed 's/\/[[:space:]]/\//g')
echo "original: $filename"
echo "new : $newFilename"
mv -i -v -n $filename $newFilename
echo "\n"
done
echo "$tstamp - Renaming files with trailing space: \n"
for filename in $(find . -type f -name '*[[:space:]]')
do
newFilename=$(echo $filename |sed 's/[[:space:]]$//g')
echo "original: $filename"
echo "new : $newFilename"
mv -i -v -n $filename $newFilename
echo "\n"
done
# A slash "/" in a filename is not possible thus it's not verified
echo "$tstamp - Renaming files with unsupported characters (\ / \" : < > ; | * ?):"
for filename in $(find . -type f -name '*\**' -o -name '*\\*' -o -name '*"*' -o -name '*:*' -o -name '*<*' -o -name '*>*' -o -name '*;*' -o -name '*|*' -o -name '*\?*')
do
newFilename=$(echo $filename |sed 's/\(\\\|"\|:\|<\|>\|;\||\|\*\|\?\)//g')
echo "original: $filename"
echo "new : $newFilename"
mv -i -v -n $filename $newFilename
echo "\n"
done
echo "Done."
#EOF
Renaming files with unsupported characters works well, but not the leading and trainling spaces.
Here's an actual output where I replaced some names for security purposes:
original:
./ABC- Financing/2018 - ABC Capital Bl Fund 2018 (VCCI)/0 - Dataroom/8 - Vérification diligente/3. Governance/ 2017Q1/ Documents de Julie/#eaDir/ PPP#SynoResource
new:
./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/2017Q1/Documents de Julie/#eaDir/PPP#SynoResource
./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/ 2017Q1/ Documents de Julie/#eaDir/ CDP#SynoResource → ./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/2017Q1/Documents de Julie/#eaDir/PPP#SynoResource
mv: cannot move "./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/ 2017Q1/ Documents de Julie/#eaDir/ PPP#SynoResource" to "./ABC - Financing/2018 - ABC Capital Innovation Fund 2018 (GGGG)/0 - Dataroom/8 - Vérification diligente/3. Governance/2017Q1/Documents de Julie/#eaDir/PPP#SynoResource": No such file or directory
I don't understand why the file isn't found by the mv command.
Start with this (uses GNU versions of find and sed):
#/bin/env bash
readarray -d '' paths < <(find . -depth -print0)
for old in "${paths[#]}"; do
printf 'Working on path %q\n' "$old" >&2
new=$(
printf '%s' "$old" |
sed -z '
s#[\\":<>;|*?]##g
s#[[:space:]]*/[[:space:]]*#/#g
s#[[:space:]]*$##
'
)
if [[ "$new" != "$old" ]]; then
printf 'old: %q\n' "$old" >&2
printf 'new: %q\n' "$new" >&2
[[ -f "$new" ]] && printf 'Warning: %q already exists.\n' "$new" >&2
mv -i -v -n -- "$old" "$new"
printf '\n'
fi
done
You can probably replace the printf | sed with some bash builtins for a performance improvement but life's too short for me to try to figure that out and the above should be clear and simple enough for any other changes you need to make.
The above is untested so make sure you take a backup of your files and test it thoroughly on a temp dir before running on your real files.
Lets try to do it safely and correctly this way instead:
#!/usr/bin/env bash
shopt -s extglob # setup extended globbing so it can match group multipe times
# Find all files or directories names that:
# either starts with spaces,
# or ends with spaces,
# or contains any of the \ " : < > ; | * ? prohibited characters
find . \
-depth \
\( -type f -or -type d \) \
-regextype posix-extended \
-regex '.*/([[:space:]].*|.*[[:space:]]|.*[\\":<>;|*?].*)' \
-print0 \
| while IFS= read -r -d '' filename; do
# Isolates the file name from its directory path
base="$(basename -- "${filename}")"
# ExtGlob strips-out all instances of prohibited characters class using //
# [\\\":<>;|*?]
base="${base//[\\\":<>;|*?]/}"
# ExtGlob strips-out leading spaces
# *([[:space:]]):
# * 0 or any times the following (group)
# [[:space:]] any space character
base="${base/*([[:space:]])/}"
# ExtGlob strips-out trailing spaces using %%
base="${base%%*([[:space:]])}"
# Compose a new file name from the new base
newFilename="$(dirname -- "${filename}")/${base}"
# Prevent the new file name to collide with existing files
# by adding a versionned suffix
suffix=''
count=1
while [[ -e "${newFilename}${suffix}" ]]; do
suffix=".$((count++))"
done
newFilename="${newFilename}${suffix}"
printf \
"original: '%s'\\nnew : '%s'\\n\\n" \
"${filename}" \
"${newFilename}"
mv -- "${filename}" "${newFilename}"
done
echo 'Done.'

read printf format from a bash var

I have a bash script I'm happy with::
$ printf ' Number of xml files: %s\n' `find . -name '*.xml' | wc -l`
42
$
then the message became longer:
$ printf ' Very long message here about number of xml files: %s\n' `find . -name '*.xml' | wc -l`
42
$
So I try to put it in a MSG var to stay at 80cols::
$ MSG=' Number of xml files after zip-zip extraction: %s\n'
$ printf $MSG `find xml_out -name '*.xml' | wc -l`
with no success::
$ printf $MSG `find xml_out -name '*.xml' | wc -l`
Number$
$
you need to put it inside double quotation
printf "$MSG" `ls | wc -l`
You can use this way:
msg=' Number of xml files after zip-zip extraction: %s\n'
printf "$msg" "$(find xml_out -name '*.xml' -exec printf '.' \; | wc -c)"
msg should be quoted in printf command.
Avoiding pipeline with wc -l to address issues when filename may contain newlines, spaces or wildcard characters.
Avoid all uppercase variables in shell.

Bash new line feed in results [duplicate]

This question already has answers here:
Iterate over a list of files with spaces
(12 answers)
Closed 5 years ago.
Trying to create a mysql backup script.
However, I am finding that I am getting line feeds in the results:
#!/bin/bash
cd /home
for i in $(find $PWD -type f -name "wp-config.php" );
do echo "'$i'";
done
And the results show:
'/home/site1/public_html/folders/wp-config.php'
\'/home/site2/public_html/New'
'Website/wp-config.php'
'/home/site3/public_html/wp-config.php'
'/home/site4/public_html/old'
'website/wp-config.php'
'/home/site5/public_html/wp-config.php'
Do a ls from the command-line, we see for the folders in question:
New\ website
old\ website
and is treating the '\' as newline character.
OK.. Doing some research:
https://stackoverflow.com/a/5928254/175063
${foo/ /.}
Updating for what we may want:
${i/\ /}
The code now becomes:
#!/bin/bash
cd /home
for i in $(find $PWD -type f -name "wp-config.php" |${i/\ /});
do echo "'$i'";
done
Ref. https://tomjn.com/2014/03/01/wordpress-bash-magic/
Ultimately, I really want something like this:
!/bin/bash
# delete files older than 7 days
## find /home/dummmyacount/backups/ -type f -name '*.7z' -mtime +7 -exec rm {} \;
# set a date variable
DT=$(date +"%m-%d-%Y")
cd /home
for i in $(find $PWD -type f -name "wp-config.php" );
WPDBNAME=`cat $i | grep DB_NAME | cut -d \' -f 4`
WPDBUSER=`cat $i | grep DB_USER | cut -d \' -f 4`
WPDBPASS=`cat $i | grep DB_PASSWORD | cut -d \' -f 4`
do echo "$i";
#do echo $File;
#mysqldump...
done
You can do this
find . -type f -name "wp-config.php" -print0 | while read -rd $'\x00' f
do
printf '[%s]\n' "$f"
done
which uses the NUL character as the delimiter to avoid special chars

Unexpected Termination of While Loop in Bash

The below code snippet is for searching files recursively and iterating them.
find . -type f -not -name '*.ini' -print0 | while IFS= read -r -d '' filename; do
echo "$filename"
done
It gives this resut:
1.jpg
2.jpg
3.jpg
But if I want to process the file somehow like this
find . -type f -not -name '*.ini' -print0 | while IFS= read -r -d '' filename; do
echo "$filename"
echo "$(${ExternalApp} -someparams $filename 2> /dev/null| cut -f 2- -d: | cut -f 2- -d ' ' )"
done
The loop terminates after the first iteration and result become like this:
1.jpg
I have recently updated bash (I'm on windows with MSYS). What is the problem here?
find's output is read by the command. This is an especially common problem when using ssh, ffmpeg or mplayer.
You can redirect from /dev/null if it doesn't need input at all:
find . -type f -not -name '*.ini' -print0 | while IFS= read -r -d '' filename; do
echo "$filename"
# v-- here
echo "$(${ExternalApp} -someparams $filename < /dev/null 2> /dev/null |
cut -f 2- -d: | cut -f 2- -d ' ' )"
done

FInd all files that contains both the string1 and string2

The following script finds and prints the names of all those files that contains either string1 or string2.
However I could not figure out how to make change into this code so that it prints only those files that contains both string1 and string2. Kindly suggest the required change
number=0
for file in `find -name "*.txt"`
do
if [ "`grep "string2\|string1" $file`" != "" ] // change has to be done here
then
echo "`basename $file`"
number=$((number + 1))
fi
done
echo "$number"
Using grep and cut:
grep -H string1 input | grep -E '[^:]*:.*string2' | cut -d: -f1
You can use this with the find command:
find -name '*.txt' -exec grep -H string1 {} \; | grep -E '[^:]*:.*string2'
And if the patterns are not necessarily on the same line:
find -name '*.txt' -exec grep -l string1 {} \; | \
xargs -n 1 -I{} grep -l string2 {}
This solution can handle files with spaces in their names:
number=0
oldIFS=$IFS
IFS=$'\n'
for file in `find -name "*.txt"`
do
if grep -l "string1" "$file" >/dev/null; then
if grep -l "string2" "$file" >/dev/null; then
basename "$file"
number=$((number + 1))
fi
fi
done
echo $number
IFS=$oldIFS

Resources