Bash script to compare files - bash

I have a folder with a ton of old photos with many duplicates. Sorting it by hand would take ages, so I wanted to use the opportunity to use bash.
Right now I have the code:
#!/bin/bash
directory="~/Desktop/Test/*"
for file in ${directory};
do
for filex in ${directory}:
do
if [ $( diff {$file} {$filex} ) == 0 ]
then
mv ${filex} ~/Desktop
break
fi
done
done
And getting the exit code:
diff: {~/Desktop/Test/*}: No such file or directory
diff: {~/Desktop/Test/*:}: No such file or directory
File_compare: line 8: [: ==: unary operator expected
I've tried modifying working code I've found online, but it always seems to spit out some error like this. I'm guessing it's a problem with the nested for loop?
Also, why does it seem there are different ways to call variables? I've seen examples that use ${file}, "$file", and "${file}".

You have the {} in the wrong places:
if [ $( diff {$file} {$filex} ) == 0 ]
They should be at:
if [ $( diff ${file} ${filex} ) == 0 ]
(though the braces are optional now), but you should allow for spaces in the file names:
if [ $( diff "${file}" "${filex}" ) == 0 ]
Now it simply doesn't work properly because when diff finds no differences, it generates no output (and you get errors because the == operator doesn't expect nothing on its left-side). You could sort of fix it by double quoting the value from $(…) (if [ "$( diff … )" == "" ]), but you should simply and directly test the exit status of diff:
if diff "${file}" "${filex}"
then : no difference
else : there is a difference
fi
and maybe for comparing images you should be using cmp (in silent mode) rather than diff:
if cmp -s "$file" "$filex"
then : no difference
else : there is a difference
fi

In addition to the problems Jonathan Leffler pointed out:
directory="~/Desktop/Test/*"
for file in ${directory};
~ and * won't get expanded inside double-quotes; the * will get expanded when you use the variable without quotes, but since the ~ won't, it's looking for files under an directory actually named "~" (not your home directory), it won't find any matches. Also, as Jonathan pointed out, using variables (like ${directory}) without double-quotes will run you into trouble with filenames that contain spaces or some other metacharacters. The better way to do this is to not put the wildcard in the variable, use it when you reference the variable, with the variable in double-quotes and the * outside them:
directory=~/"Desktop/Test"
for file in "${directory}"/*;
Oh, and another note: when using mv in a script it's a good idea to use mv -i to avoid accidentally overwriting another file with the same name.
And: use shellcheck.net to sanity-check your code and point out common mistakes.

If you are simply interested in knowing if two files differ, cmp is the best option. Its advantages are:
It works for text as well as binary files, unlike diff which is for text files only
It stops after finding the first difference, and hence it is very efficient
So, your code could be written as:
if ! cmp -s "$file" "$filex"; then
# files differ...
mv "$filex" ~/Desktop
# any other logic here
fi
Hope this helps. I didn't understand what you are trying to do with your loops and hence didn't write the full code.

You can use diff "$file" "$filex" &>/dev/null and get the last command result with $? :
#!/bin/bash
SEARCH_DIR="."
DEST_DIR="./result"
mkdir -p "$DEST_DIR"
directory="."
ls $directory | while read file;
do
ls $directory | while read filex;
do
if [ ! -d "$filex" ] && [ ! -d "$file" ] && [ "$filex" != "$file" ];
then
diff "$file" "$filex" &>/dev/null
if [ "$?" == 0 ];
then
echo "$filex is a duplicate. Copying to $DEST_DIR"
mv "$filex" "$DEST_DIR"
fi
fi
done
done
Note that you can also use fslint or fdupes utilities to find duplicates

Related

Identifying folder with name as largest number in the directory

there is a directory which contains folders named with numbers, i've to find the folder with largest number in that directory.
This is the script i've written to find that folder:
files='ls path/'
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
But it's not working. It gives below error after invoking the script using command sudo ./restore2.sh.
ls
path/
./restore2.sh: line 6: path/: syntax error: operand expected (error token is "/")
0
Try this:
#!/bin/bash
files=`ls path/`
var=0
for file in $files
do
echo $file
tmp=$((file-"0"))
if [ $tmp -gt $var ]
then
var=$tmp
fi
done
echo $var
there's a backtick here: ls path/ instead of single or double-quotes.
I've only corrected this statement and it worked. and notice to add #!/bin/bash at the top of the script. This will tell your system to run the script in a bash shell.
You're using single quotes instead of backticks files='ls path/'. It's trying to use it as a literal string instead of evaluating it.
Also, for that specific task, you can just do:
ls test | awk '{if($1 > largest){largest = $1}} END{print largest}'
To have it a bit simpler.
Use find instead:
find . -maxdepth 1 -type d -regextype "posix-extended" -regex "^.*[[:digit:]]+.*$" | sort -n | tail -1
Set the maxdepth to 1 to check for directories within this directory only and no deeper. Set the regular expression type to posix-extended and search for all directories that have one or more digits. Print the result and order through sort before taking the largest one with tail -1.
Does path/ have any files in it? It looks like it's empty.
You should be getting a completely different complaint...
You don't want the path info in the filename. Rather than strip it with ${file##*/}, just go there and use non-path'd names.
An adaptation using your own logic as its base -
cd /whatever/path/ # go where the files are
var=-1 # initialize comparator
for file in [0-9]* # each entry that starts with a digit
do [[ "$file" =~ [^0-9] ]] && continue # skip any file with nondigit contents
[[ -f "$file" ]] || continue # only process plain files
(( file > var )) && var=$file # remember largest seen
done
echo $var # report largest
If you are sure there will be no negative numbered filenames, this should do it.
If there can be valid negatives, then your initialization needs to be appropriately lower, and the exclusion of nondigits should include the minus sign, as well as the list of files to select.
Note that this doesn't parse ls and doesn't require piping through a sort or spawning any other processes -- it's all handled in the bash interpreter and should be pretty efficient.
If you are sure of your data, and know there aren't any negatives or files named just 0 or non-plain-file entries in the directory that match the [0-9]* pattern, you can simplify it to just
cd /whatever/path/ # go where the files are
for file in [0-9]*; do (( file > var )) && var=$file; done
echo $var # report largest
As an aside, if you wanted to preserve the "make a list first" logic, you should still NOT use ls. Use an array.
cd /wherever/your/files/are/
files=( [0-9]* )
for file in "${files[#]}"
do : ...

Bash script throws too many arguments error

I have this bash script that looks like this
#!/bin/bash
for f in src/styles/*.less src/styles/brand/*.less
do
filename=$(basename "$f")
if [ ${filename:0:1} != '_' ]; then
lessc --no-color -x "${f}" "$(sed 's|^src/styles/|httpdocs/css/|;s/.less$/.css/' <<< $f)"
fi
done
It is supposed to compile a series of .less files in the style and brand directories, I think. When I run it though, I get this:
./compile_less.sh: line 6: [: too many arguments
Any idea what is going wrong here? I'm a newb with bash scripts and am using this one from a former co-worker.
Thanks!
There's an existing answer that addresses best practices, but doesn't explain why it's a too-many-arguments error.
"${filename:0:1}" needs double quotes if you want to ensure that it will always be one word, and not zero words or several words -- if it comes out empty, it's zero words, so you run [ != _ ].
Leaving out the quotes is even worse if the first character is a * -- then it expands to one word per file in the current directory, and that's far too many operands.
[ != _ ] is too many arguments because there's nothing in a position to be a binary operator, so [ expects to see only a single operand.
[ all your filenames here != _ ] is too many arguments for a far less subtle reason. :)
The problem with using
for f in src/styles/*.less src/styles/brand/*.less
is that if no file matches the then f will be src/styles/*.less src/styles/brand/*.less and filename will be *.less. This causes problem in the if statement.
You can check if file exists and continue:
...
[[ ! -e "${f}" ]] && continue # or any other appropriate action
filename=$(basename "$f")
...

Looping through content of the current directory

What is the best way to look inside a directory and determine whether its content are directories or files. Its a homework question.
My homework requires me to write a script that counts the number of directories, files, how many are executable, writeable, and readable.
Assuming you're talking about the bourne shell family, take a look at the -d, -x, -w... and I'm guessing -r tests. Look up how a for loop works in bash to see how to iterate over the files... the general idea is
for var in directory/*; do
#stuff with $var
done
But there are some particulars relating to spaces in filenames that can make this trickier.
Use the -X style operators:
[ -d "${item}" ] && echo "${item} is a directory"
See the bash man page (search for "CONDITIONAL EXPRESSIONS") to see the complete list.
Looping through contents of a directory and counting looks like this:
dirs=0
writeable=0
for item in /path/to/directory/*; do
[ -d "${item}" ] && dirs=$(( dirs + 1 )) # works in bash
[ -w "${item}" ] && writeable=`expr ${writeable} + 1` # works in bourne shell
# Other tests
done
echo "Found ${dirs} sub-directories"
echo "Found ${writeable} writeable files"

bash filename start matching

I've got a simple enough question, but no guidance yet through the forums or bash. The question is as follows:
I want to add a prefix string to each filename in a directory that matches *.h or *.cpp. HOWEVER, if the prefix has already been applied to the filename, do NOT apply it again.
Why the following doesn't work is something that has yet to be figured out:
for i in *.{h,cpp}
do
if [[ $i!="$pattern*" ]]
then mv $i $pattern$i
fi
done
you can try this:
for i in *.{h,cpp}
do
if ! ( echo $i | grep -q "^$pattern" )
# if the file does not begin with $pattern rename it.
then mv $i $pattern$i
fi
done
Others have shown replacements comparisons that work; I'll take a stab at why the original version didn't. There are two problems with the original prefix test: you need spaces between the comparison operator (!=) and its operands, and the asterisk was in quotes (meaning it gets matched literally, rather than as a wildcard). Fix these, and (at least in my tests) it works as expected:
if [[ $i != "$pattern"* ]]
#!/bin/sh
pattern=testpattern_
for i in *.h *.cpp; do
case "$i" in
$pattern*)
continue;;
*)
mv "$i" "$pattern$i";;
esac
done
This script will run in any Posix shell, not just bash. (I wasn't sure if your question was "why isn't this working" or "how do I make this work" so I guessed it was the second.)
for i in *.{h,cpp}; do
[ ${i#prefix} = $i ] && mv $i prefix$i
done
Not exactly conforming to your script, but it should work. The check returns true if there is no prefix (i.e. if $i, with the prefix "prefix" removed, equals $i).

Bash Compound Conditional, With Wildcards and File Existence Check

I've mastered the basics of Bash compound conditionals and have read a few different ways to check for file existence of a wildcard file, but this one is eluding me, so I figured I'd ask for help...
I need to:
1.) Check if some file matching a pattern exists
AND
2.) Check that text in a different file exists.
I know there's lots of ways to do this, but I don't really have the knowledge to prioritize them (if you have that knowledge I'd be interested in reading about that as well).
First things that came to mind is to use find for #1 and grep for #2
So something like
if [ `grep -q "OUTPUT FILE AT STEP 1000" ../log/minimize.log` ] \
&& [ `find -name "jobscript_minim\*cmd\*o\*"` ]; then
echo "Both passed! (1)"
fi
That fails, though curiously:
if `grep -q "OUTPUT FILE AT STEP 1000" ../log/minimize.log` ;then
echo "Text passed!"
fi
if `find -name "jobscript_minim\*cmd\*o\*"` ;then
echo "File passed!"
fi
both pass...
I've done a bit of reading and have seen people talking about the problem of multiple filenames matching wildcards within an if statement. What's the best solution to this? (in answer my question, I'd assumed you take a crack at that question, as well, in the process)
Any ideas/solutions/suggestions?
Let's tackle why your attempt failed first:
if [ `grep -q …` ];
This runs the grep command between backticks, and interpolates the output inside the conditional command. Since grep -q doesn't produce any output, it's as if you wrote if [ ];
The conditional is supposed to test the return code of grep, not anything about its output. Therefore it should be simply written as
if grep -q …;
The find command returns 0 (i.e. true) even if it finds nothing, so this technique won't work. What will work is testing whether its output is empty, by collecting its output any comparing it to the empty string:
if [ "$(find …)" != "" ];
(An equivalent test is if [ -n "$(find …)" ].)
Notice two things here:
I used $(…) rather than backticks. They're equivalent, except that backticks require strange quoting inside them (especially if you try to nest them), whereas $(…) is simple and reliable. Just use $(…) and forget about backticks (except that you need to write \` inside double quotes).
There are double quotes around $(…). This is really important. Without the quotes, the shell would break the output of the find command into words. If find prints, say, two lines dir/file and dir/otherfile, we want if [ "dir/file dir/otherfile" = "" ]; to be executed, not if [ dir/file dir/otherfile = "" ]; which is a syntax error. This is a general rule of shell programming: always put double quotes around a variable or command substitution. (A variable substitution is $foo or ${foo}; a command substitution is $(command).)
Now let's see your requirements.
Check if some file matching a pattern exists
If you're looking for files in the current directory or in any directory below it recursively, then find -name "PATTERN" is right. However, if the directory tree can get large, it's inefficient, because it can spend a lot of time printing all the matches when we only care about one. An easy optimization is to only retain the first line by piping into head -n 1; find will stop searching once it realizes that head is no longer interested in what it has to say.
if [ "$(find -name "jobscript_minimcmdo" | head -n 1)" != "" ];
(Note that the double quotes already protect the wildcards from expansion.)
If you're only looking for files in the current directory, assuming you have GNU find (which is the case on Linux, Cygwin and Gnuwin32), a simple solution is to tell it not to recurse deeper than the current directory.
if [ "$(find -maxdepth 1 -name "jobscript_minim*cmd*o*")" != "" ];
There are other solutions that are more portable, but they're more complicated to write.
Check that text in a different file exists.
You've already got a correct grep command. Note that if you want to search for a literal string, you should use grep -F; if you're looking for a regexp, grep -E has a saner syntax than plain grep.
Putting it all together:
if grep -q -F "OUTPUT FILE AT STEP 1000" ../log/minimize.log &&
[ "$(find -name "jobscript_minim*cmd*o*")" != "" ]; then
echo "Both passed! (1)"
fi
bash 4
shopt -s globstar
files=$(echo **/jobscript_minim*cmd*o*)
if grep -q "pattern" file && [[ ! -z $files ]];then echo "passed"; fi
for i in filename*; do FOUND=$i;break;done
if [ $FOUND == 'filename*' ]; then
echo “No files found matching wildcard.”
else
echo “Files found matching wildcard.”
fi

Resources