Comparing directories with cksum to include spaces in file names - performance

I recently used rsync to transfer a large directory with a lot of subdirectories; several GB size in total. There are several thousand files to compare. Trying to see if everything went well and that every file is where it should be, ie. I have a clone of the original directory.
I want to feed this script (compare_directories) the two directories to compare: compare_directories dir1 dir2
#!/bin/bash
if [[ -z $2 ]];
then
# simple help text with no error checking
echo "compare_directories dir1 dir2"
echo " dir1 directory with files copied from"
echo " dir2 directory with files copied to"
else
export FROM=$1
export TO=$2
cnt=0
for fname in *
do
from=$(cksum $FROM/$fname)
from=${from%% *}
to=$(cksum $TO/${fname})
to=${to%% *}
cnt=$(($cnt + 1))
[ $(($cnt % 5)) -eq 0 ] && echo "$cnt files processed"
[ "$from" = "$to" ] && continue
echo "failure on file $fname"
done
fi
The line with the "x files processed" is just there to indicate that the comparison is still running by printing a counter after every 5 processed files.
I used this script to compare the checksums of each file, but it doesn't like spaces in file names, and I'm not entirely sure if it's running efficiently.

Related

How to access a target directory using bash scripting

I am relatively new to shell scripting. I am writing a script to compress all the files in current and target directory. I have found success in compressing the files of a current directory but I'm unable to write a script for compressing files in a target directory can anyone guide me?
I want to do something like this
% myCompress -t /home/users/bigFoot/ pdf ppt jpg
next time try to spread your code (it will make it easier to answer):
#!/bin/bash
if [[ $# == 0 ]]; then
echo "This shell script compress files with a specific extensions"
echo "Call Syntax: compress <extension_list>"
exit
fi
for ext in $*; do
for file in ls *.$ext; do
gzip -k $file
done
done
Mistakes made
1) $* - all args coming after command - so.... -t and path are not $ext variables
2) ls *.$ext is red in loop as 2 strings "ls and *.$ext" should be written as $(ls *.$ext) to get ls command executed
My script for your request
#!/bin/bash
script_name=`basename "$0"`
if [[ $# == 0 ]]; then
echo "This shell script compress files with a specific extensions"
echo "Call Syntax: $script_name <dirctories_list> <extension_list>"
exit
fi
# check if $1 is a directory
path=". "
file_type=""
for check_type in $* ; do
if [[ -d $check_type ]]; then
path=$path$check_type" "
else
file_type=$file_type"*."$check_type" "
fi
done
echo paths to gzip $path
echo files type to check "$file_type"
for x in $path; do
cd $x
for file in $(ls $file_type); do
gzip $file
done
cd -
done
Explanation
1) basename "$0" - get scripts name - it is more generic for usage - in case you change script's name
2) path=". " - variable hold a string of all directories to be compressed, your request is to run it also on current directory ". "
file_type="" - variable hold a string of all extensions to be compressed in $path string
3) running a loop on all input ARGS and concatenate directories names to $path string and other file types to $file_type
4) for each of the directories inserted to script:
i. cd $x - enter directorie
ii. gzip - compress all files with inserted extensions
iii. cd - - go back to base directories
Check gzip
I'm not familiar with the gzip command , check that you have -k flag

Delete lines from log files and update existing file without the lines

For some development work I need to remove some "noise" from a series of Log files all stored in a folder. (I have this on Linux, but can also do this in Windows.) A line that I want to remove would look like this:
Sep 5/2017 23:59:50:324 [MISC ]: ValueType:ST / SetId: / ObID:0002-d007^RhySta^MDIL / ObSubID:0 / Detail:Sinus Rhythm / Units: / AccessChecks: / ObxTimeStamp:
Anytime I see [MISC ]: I want to remove the whole line and leave nothing in its place. As soon as lines are deleted from the file I want to move to save the file with existing name, and then move to the next file in the folder.
I am not a scripter.. thus the request for assistance.
Here is one to do it using for, find, sed and mv. .
Sample dir:
[zee]$ find .
./test2/file1.tx
./test2
One-liner:
[zee]$ for file in $(find test1/ -type f) ; do echo "checking if file $file has a match" ; grep -q "MIS" $x ; if [ $? -eq 0 ]; then echo "$file has a match" ; sed -i '/MIS/d' $file && echo "deleted matches...moving $file" && mv $file test2/ ; fi ; done
Output of running the above one-liner:
checking if file test1/file1.tx has a match
test1/file1.tx has a match
deleted matches...moving test1/file1.tx
Here is what it does: find all files in directory "test1", checks if a given file has a match. If the file has a match, it removes the match(s) and move the file to directory "test2".

How to identify files which are not in list using bash?

Unfortunately my knowledge in bash not so well and I have very non-standard task.
I have a file with the files list.
Example: /tmp/my/file1.txt /tmp/my/file2.txt
How can I write a script which can check that files from folder /tmp/my exist and to have two types messages after script is done.
1 - Files exist and show files:
/tmp/my/file1.txt
/tmp/my/file2.txt
2 - The folder /tmp/my including files and folders which are not in your list. The files and folders:
/tmp/my/test
/tmp/my/1.txt
You speak of files and folders, which seems unclear.
Anyways, I wanted to try it with arrays, so here we go :
unset valid_paths; declare -a valid_paths
unset invalid_paths; declare -a invalid_paths
while read -r line
do
if [ -e "$line" ]
then
valid_paths=("${valid_paths[#]}" "$line")
else
invalid_paths=("${invalid_paths[#]}" "$line")
fi
done < files.txt
echo "VALID PATHS:"; echo "${valid_paths[#]}"
echo "INVALID PATHS:"; echo "${invalid_paths[#]}"
You can check for the files' existence (assuming a list of files, one filename per line) and print the existing ones with a prefix using this
# Part 1 - check list contents for files
while read thefile; do
if [[ -n "$thefile" ]] && [[ -f "/tmp/my/$thefile" ]]; then
echo "Y: $thefile"
else
echo "N: $thefile"
fi
done < filelist.txt | sort
# Part 2 - check existing files against list
for filepath in /tmp/my/* ; do
filename="$(basename "$filepath")"
grep "$filename" filelist.txt -q || echo "U: $filename"
done
The files that exist are prefixed here with Y:, all others are prefixed with N:
In the second section, files in the tmp directory that are not in the file list are labelled with U: (unaccounted for/unexpected)
You can swap the -f test which checks that a path exists and is a regular file for -d (exists and is a directory) or -e (exists)
See
man test
for more options.

shell backup script renaming

I was able to script the backup process, but I want to make an another script for my storage server for a basic file rotation.
What I want to make:
I want to store my files in my /home/user/backup folder. Only want to store the 10 most fresh backup files and name them like this:
site_foo_date_1.tar site_foo_date_2.tar ... site_foo_date_10.tar
site_foo_date_1.tar being the most recent backup file.
Past num10 the file will be deleted.
My incoming files from the other server are simply named like this: site_foo_date.tar
How can I do this?
I tried:
DATE=`date "+%Y%m%d"`
cd /home/user/backup/com
if [ -f site_com_*_10.tar ]
then
rm site_com_*_10.tar
fi
FILES=$(ls)
for file in $FILES
do
echo "$file"
if [ "$file" != "site_com_${DATE}.tar" ]
then
str_new=${file:18:1}
new_str=$((str_new + 1))
to_rename=${file::18}
mv "${file}" "$to_rename$new_str.tar"
fi
done
file=$(ls | grep site_com_${DATE}.tar)
filename=`echo "$file" | cut -d'.' -f1`
mv "${file}" "${filename}_1.tar"
The main issue with your code is that looping through all files in the directory with ls * without some sort of filter is a dangerous thing to do.
Instead, I've used for i in $(seq 9 -1 1) to loop through files from *_9 to *_1 to move them. This ensures we only move backup files, and nothing else that may have accidentally got into the backup directory.
Additionally, relying on the sequence number to be the 18th character in the filename is also destined to break. What happens if you want more than 10 backups in the future? With this design, you can change 9 to be any number you like, even if it's more than 2 digits.
Finally, I added a check before moving site_com_${DATE}.tar in case it doesn't exist.
#!/bin/bash
DATE=`date "+%Y%m%d"`
cd "/home/user/backup/com"
if [ -f "site_com_*_10.tar" ]
then
rm "site_com_*_10.tar"
fi
# Instead of wildcarding all files in the directory
# this method picks out only the expected files so non-backup
# files are not changed. The renumbering is also made easier
# this way.
# Loop through from 9 to 1 in descending order otherwise
# the same file will be moved on each iteration
for i in $(seq 9 -1 1)
do
# Find and expand the requested file
file=$(find . -maxdepth 1 -name "site_com_*_${i}.tar")
if [ -f "$file" ]
then
echo "$file"
# Create new file name
new_str=$((i + 1))
to_rename=${file%_${i}.tar}
mv "${file}" "${to_rename}_${new_str}.tar"
fi
done
# Check for latest backup file
# and only move it if it exists.
file=site_com_${DATE}.tar
if [ -f $file ]
then
filename=${file%.tar}
mv "${file}" "${filename}_1.tar"
fi

Bash get list of zip files in dir and perform some operations on each of them

I have a bash script, which goes through list of directories and if some directory contains zip files it bind zip file name into variable and perform some actions over it and then goes to another in this dir. Unfortunately, it works when there is one zip file per directory. If more - it gives error "Binary operator expected"
Script:
if [ -e $currdir/*.zip ]; then
for file in $currdir/*.zip; do
echo the zip is "${file##*/}"
done
Please help me to rework script accordingly.
If you need exactly check then you can use:
if [[ -n $(echo "$currdir"/*.zip) ]]; then
for f in "$currdir"/*.zip; do
echo "Processing $f file..";
done
fi
But I'd prefer just looping over files that contain *.zip extension:
for f in "$currdir"/*.zip; do
echo "Processing $f file..";
done
Use
for file in "$currdir"/*.zip; do
[ -e "$file" ] || continue
echo the zip is "${file##*/}"
done
As pointed out in the comments the glob will happen in the shell, then [ is called with the output, i.e:
[ -e * ]
will become:
[ -e Desktop Documents Downloads ... ]
Therefore trying to expand and checking in the for iteration will work.
Please see: http://mywiki.wooledge.org/WordSplitting and http://wiki.bash-hackers.org/syntax/expansion/globs
I think the case construct is too often overlooked.
case *.jpg in *.jpg ) echo found files ;; * ) echo no files found ;; esac
produces the correct message in my dir with 1000s+ jpgs ;-)
Change both references from jpg to zip and see if it works for you.
IHTH

Resources