How to write script for incremental backup in Ubuntu? - bash

I want to implement incremental backup in Ubuntu, so I am thinking of finding md5sum of all files from source and target and check if any two files have same md5sum then keep that file in destination else if different copy the file from source into directory.
I am thinking of doing this in bash
Can anyone help me with the commands of how to check md5sum of two files in different directories ?
Thanks in advance!!
#!/bin/bash
#
SOURCE="/home/pallavi/backup1"
DEST="/home/pallavi/BK"
count=1
TODAY=$(date +%F_%H%M%S)
cd "${DEST}" || exit 1
mkdir "${TODAY}"
while [ $count -le 1 ]; do
count=$(( $count + 1 ))
cp -R $SOURCE/* $DEST/$TODAY
mkdir "MD5"
cd ${DEST}/${TODAY}
for f in *;do
md5sum "${f}" >"${TODAY}${f}.md5"
echo ${f}
done
if [ $? -ne 0 ] && [[ $IGNORE_ERR -eq 0 ]]; then
#error or eof
echo "end of source or error"
break
fi
done

This is reinventing the wheel sort of thing.
There are some utility written for this kind of purpose, to name a few.
rsync
GNU cp(1) has the -u flag.
cp
For comparing files
cmp
diff
For finding duplicates
fdupes
rmlint
Here is what I've come up with re-inventing the wheel sort of thing.
#!/usr/bin/env bash
shopt -s extglob
declare -A source_array
while IFS= read -r -d '' files; do
read -r source_hash source_files < <(sha512sum "$files")
source_array["$source_hash"]="$source_files"
done < <(find source/ -type f -print0)
source=$( IFS='|'; printf '%s' "#(${!source_array[*]})" )
while IFS= read -r -d '' files0 ; do
read -r destination_hash destination_files < <(sha512sum "$files0")
if [[ $destination_hash == $source ]]; then
echo "$destination_files" FOUND from source/ directory
else
echo "$destination_files" NOT-FOUND from source/ directory
fi
done < <(find destination/ -type f -print0)
Should be safe enough from files with spaces and tabs and new lines, but since I don't have files with new lines so I can't really say.
Change the action from the if-else statement depending on what you want to do.
Ok maybe sha512sum is a bit over kill, change it to md5sum
Add set -x after the shebang to see what's actually being executed, good luck.

Related

glob operator with for loop is stuck

I am trying to traverse all files in /home directory recursively. I want to do some linux command with each file . So, I am making use of for loop as below:
for i in /home/**/*
I have put below statements as start of script as well:
shopt -s globstar
shopt -s nullglob
But its getting stuck in for loop. It might be the problem with handling so many files. If I give some another directory(with less no of files) to for loop loop, then it traverse properly.
What else I can try.
Complete code:
#!/bin/bash
shopt -s globstar
shopt -s nullglob
echo "ggg"
for i in /home/**/*
do
NAME=${i}
echo "It's there." $NAME
if [ -f "$i" ]; then
echo "It's there." $NAME
printf "\n\n"
fi
done
Your code isn't getting stuck. It will just be very, very slow since it needs to build up the list of all files before entering the for loop. The standard alternative is to use find, but you need to be careful about what exactly you want to do. If you want it to behave exactly like your for loop, which means i) ignore hidden files (those whose name starts with .) and ii) follow symlinks, you can do this (assuming GNU find since you are on Linux):
find -L . -type f -not -name '.*' -printf '.\n' | wc -l
That will print a . for each file found, so wc -l will give you the number of files. The -L makes find dereference symlinks and the -not -name '.*' will exclude hidden files.
If you want to iterate over the output and do something to each file, you would need to use this:
find -L . -type f -not -name '.*' -print0 |
while IFS= read -r -d '' file; do
printf -- "FILE: %s\n" "$file"
done
Perhaps this approach may help:
#!/bin/bash
shopt -s globstar
shopt -s nullglob
echo "ggg"
for homedir in /home/*/
do
for i in "$homedir"**
do
NAME=${i}
echo "It's there." "$NAME"
if [ -f "$i" ]; then
echo "It's there." "$NAME"
printf "\n\n"
fi
done
done
Update: Another approach in pure bash might be
#!/bin/bash
shopt -s nullglob
walktree() {
local file
for file in *; do
[[ -L $file ]] && continue
if [[ -f $file ]]; then
# Do something with the file "$PWD/$file"
echo "$PWD/$file"
elif [[ -d $file ]]; then
cd "./$file" || exit
walktree
cd ..
fi
done
}
cd /home || exit
walktree

bash move 500 directories at a time to subdirectory from a total of 160,000 directories

I needed to move a large s3 bucket to a local file store for a variety of reasons, and the files were stored as 160,000 directories with subdirectories.
As this is just far too many folders to look at with something like a gui FTP interface, I'd like to move the 160,000 root directories into, say, 320 directories - 500 directories in each.
I'm a newbie at bash scripting, and I just wrote this up, but I'm scared I'm going to mangle the whole thing and have to redo the transfer. I tested with [[ "$i" -ge 3 ]]; and some directories with subdirectories and it looked like it worked okay, but I'm quite nervous. Do not want to retransfer all this data.
i=0;
j=0;
for file in *; do
if [[ -d "$file" && ! -L "$file" ]];
then
((i++))
echo "directory $file is being written to assets_$j";
mv $file ./assets_$j/;
if [[ "$i" -ge 499 ]];
then
((j++));
((i=0));
fi
fi;
done
Thanks for the help!
find all the directories in the current folder.
Read a count of the folders.
Exec mv for each chunk
find . -mindepth 1 -maxdepth 1 -type d |
while IFS= readarray -n10 -t files && ((${#files[#]})); do
dest="./assets_$((j++))/"
echo mkdir -v -p "$dest"
echo mv -v "${files[#]}" "$dest";
done
On the condition that assets_1, assets_2, etc. do not exist in the working directory yet:
dirs=(./*/)
for (( i=0,j=1; i<${#dirs[#]}; i+=500,j++ )); do
echo mkdir ./assets_$j/
echo mv "${dirs[#]:i:500}" ./assets_$j/
done
If you're happy with the output, remove echos.
A possible way, but you have no control on the counter, is:
find . -type d -mindepth 1 -maxdepth 1 -print0 \
| xargs -0 -n 500 sh -c 'echo mkdir -v ./assets_$$ && echo mv -v "$#" ./assets_$$' _
This gets the counter of assets from the PID which only recycles when the wrap-around is reached (Linux PID recycling)
The order which findreturns is slight different then the glob * (Find command default sorting order)
If you want to have the sort order alphabetically, you can add a simple sort:
find . -type d -mindepth 1 -maxdepth 1 -print0 | sort -z \
| xargs -0 -n 500 sh -c 'echo mkdir -v ./assets_$$ && echo mv -v "$#" ./assets_$$' _
note: remove the echo if you are pleased with the output

Can't move executables

I'm working on this script but the option -x isn't working, it's supposed to move only the executable files.
This is the error I'm receiving:
$ sh wizlast.sh u555 -x
mv: target ‘./u555/ud’ is not a directory
it targets the right file (ud) but doesn't move it. I tried different types of combinations.
#!/bin/bash
dir=$1
if [ $# -lt 1 ] ; then
echo "ERROR: no argument"
exit 1 # pas 0
else
case $2 in
-d)
mv $dir/* /tmp/*
echo 'moving with -d'
;;
-x)
find -executable -type f | xargs mv -t "$dir"/* /tmp
echo 'moving executables'
;;
*)
mv $dir/* /tmp/
echo 'no flag passed so moving all'
echo "mv $dir/* /tmp/"
;;
esac
fi
man mv shows:
-t, --target-directory=DIRECTORY
You can't use $dir/* as a target directory, as the shell expands it and treats the first file in the list as the target (hence the error).
Use this format
For example to move files into $dir
find -executable -type f | xargs -I{} mv {} "$dir"/
The I{} tell xargs to replace and occurence of {} with the strings from pipe, so after mv each string is substituted before the directory "$dir"/ and the command works like normal.
The reason yours wasn't working was the strings from the find were read last and so treated as the directory to move into.
As you're working with Bash you should leverage it's tools and syntax improvement.
Solution for loop and globbing
So instead of using find you can use globbing and [[ -x ]] to test if the current file is executable:
for f in "$dir"/*; do
if [[ -x $f ]]; then
mv "$f" /tmp
fi
done
It use the conditionnal expression -x in [[ … ]]:
-x file
True if file exists and is executable
As a one-liner
You can rewrite it like: for f in "$dir"/*; do [[ -x $f ]] && mv "$f" /tmp; done
Deeper search (d>1)
Current loop is limited to what is directly in your "$dir/", if you want to explore deeper like "$dir///*" you will need:
to toggle the globstar shell option using shopt built-in ;
update your glob in the for loop to use it: "$dir"/**
shopt -s globstar # enable/set
for f in "$dir"/**/*; do [[ -x $f ]] && mv "$f" /tmp; done
shopt -u globstar # disable/unset
Arithmethic context
Bash has syntactic sugar, that let you replace:
if [ $# -lt 1 ] ; then … fi
with
if (( $# < 1 )); then … fi
More about Arithmetic Expression read articles at:
1. wooledge's wiki ;
2. bash-hackers' wiki.
Don't use wildcards in the destination part of a mv command, so instead of
mv $dir/* /tmp/*
do
mv $dir/* /tmp/

why does the mv command in bash delete files?

running the following script to rename all .jpg files in the current folder works well sometimes, but it often deletes one or more files it is renaming. How would I write a script to rename files without deleting them in the process? This is running on Mac OSX 10.8 using GNU bash, version 3.2.48
this is an example file listing I would change for illustration:
original files
red.jpg
blue.jpg
green.jpg
renamed files if counter is set to 5
file_5.jpg
file_6.jpg
file_7.jpg
instead I get usually lose one or more files
#!/bin/bash
counter=5
for file in *.jpg; do
echo renaming "$file" to "file_${counter}.jpg";
mv "$file" "file_${counter}.jpg";
let "counter+=1";
done
** UPDATE **
it no longer seems to be deleting files, but the output is still not as expected. for example:
file_3.jpg
file_4.jpg
turns into
file_3.jpg
file_5.jpg
when counter is set to 4, when the expected output is
file_4.jpg
file_5.jpg
-
#!/bin/bash
counter=3
for file in *.jpg; do
if [[ -e file_${counter}.jpg ]] ; then
echo Skipping "$file", file exists.
else
echo renaming "$file" to "file_${counter}.jpg"
mv "$file" "file_${counter}.jpg"
fi
let "counter+=1"
done
The problem is that some of the files already have names corresponding to the target names. For example, if there are files
file_1.jpg
file_7.jpg
and you start with counter=7, you overwrite file_7.jpg with file_1.jpg in the first step, and then rename it to file_8.jpg.
You can use mv -n to prevent clobbering (if supported), or test for existence before running the command
if [[ -e file_${counter}.jpg ]] ; then
echo Skipping "$file", file exists.
else
mv "$file" "file_${counter}.jpg"
fi
I think you are glazing over an obvious problem with the glob. If the glob matches file_2.jpg, it will try and create file_file_2.jpg (I don't mean that in the literal sense, just that you will be reprocessing files you already processed). To solve this, you need to make sure your initial glob expression doesn't match the files you have already moved:
shopt -s extglob
i=0
for f in !(file_*).jpg ; do
while [[ -e "file_${i}.jpg" ]] ; do
(( i++ ))
done
mv -v "$f" "file_$i.jpg"
(( i++ ))
done
What choroba said is correct. You can also use:
mv "$file" "file_${counter}.jpg" -n
to simply neglect the move when the destination filename already exists, or
mv "$file" "file_${counter}.jpg" -i
to ask whether it should overwrite or not.
Instead of iterating over *.jpg you should skip your already renamed files i.e. file_[0-9]*.jpg and run your loop like this:
counter=5
while read file; do
echo renaming "$file" to "file_${counter}.jpg";
mv -n "$file" "file_${counter}.jpg";
let "counter+=1";
done < <(find . -maxdepth 1 -name "*.jpg" -not -name "file_[0-9]*.jpg")
Another way is to continue your counting until a file does not exist:
#!/bin/bash
counter=1
shopt -s extglob
for file in *.jpg; do
[[ $file == ?(*/)file_+([[:digit:]]).jpg ]] && continue
until
newname=file_$(( counter++ )).jpg
[[ ! -e $newname ]]
do
continue
done
echo "renaming $file to $newname.";
mv -i "$file" "$newname" ## Remove the -i option if you think it's safe already.
done
When doing things recursively:
#!/bin/bash
shopt -s extglob
counter=1
while read file; do
dirprefix=${file%%+([^/])
until
newfile=$dirprefix/file_$(( counter++ )).jpg
[[ ! -e $newfile ]]
do
continue
done
echo "renaming $file to $newfile."
mv -i "$file" "$newfile" ## Remove the -i option if you think it's safe already.
done < <(find -type f -name '*.jpg' -and -not -regex '^.*/file_[0-9]\+$')

Comparing large numbers of files in Bash quickly

I downloaded many files (~10,000) from a website, most of which are a bunch of useless html that all say the same thing. However, there are some files in this haystack that have useful information (and are thus fairly different files) and I need a quick way to separate those from the rest. I know I can go through all of the files one by one and use cmp to compare to a template and see if they are the same, and the delete them. However, this is rather slow. Is there a faster way to do this? I don't mind if I only have a 99% recovery rate.
This one lists the unique files in the tree passed as the argument:
#!/bin/bash
declare -A uniques
while IFS= read -r file; do
[[ ! "${uniques[${file%% *}]}" ]] && uniques[${file%% *}]="${file##* }"
done< <(find "$1" -type f -exec md5sum -b "{}" \;)
for file in ${uniques[#]}; do
echo "$file"
done
Many thanks to triplee for the better approach using md5sum!
Previous version:
#!/bin/bash
declare -a files uniques
while IFS= read -r -d $'\0' file; do
files[${#files[#]}]="$file"
done< <(find "$1" -type f -print0)
uniques=( ${files[#]} )
for file in "${files[#]}"; do
for unique in "${!uniques[#]}"; do
[[ "$file" != "${uniques[$unique]}" ]] && cmp -s "$file" "${uniques[$unique]}" && && unset -v uniques[$unique]
done
done
for unique in "${uniques[#]}"; do
echo "$unique"
done
Assuming all the files are in or below the current directory, and the template is in the parent directory, and the filenames have no spaces:
find . -type f -print | while read -r filename; do
if ! cmp --quiet $filename ../template; then
echo rm $filename
fi
done
remove the "echo" if you're satisfied this works.

Resources