bash to print certain file names to text - bash

I have spent a lot of time the past few weeks and posting on here. I finally think I am much closer with learning bash but I have one problem with my code I cannot for the life of me figure out why it will not run. I can run each line in the terminal and it returns a result but for some reason when I point it to run, it will do nothing. I get a a syntax error: word unexpected (expecting "do").
#!/bin/bash
image="/Home/Desktop/epubs/images"
for f in $(ls "$image"*.jpg); do
fsize=$(stat --printf= '%s' "$f");
if [ "$fsize" -eq "40318" ]; then
echo "$(basename $f)" >> results.txt
fi
done
What am I missing???

The problem might be in line endings. Make sure your script file has unix line endings, not the Windows ones.
Also, do not iterate over output of ls. Use globbing right in the shell:
for f in "$file"/*.jpg ; do

Your for loop appears to be missing a list of values to iterate over:
image="/Home/Desktop/epubs/images"
for f in $(ls "$image"*.jpg); do
Because $image does not end with a /, your ls command expands to
for f in $(ls /Home/Desktop/epubs/images*.jpg); do
which probably results in
for f in ; do
causing the syntax error. The simplest fix is
for f in $(ls "$image"/*.jpg); do
but you should take the advice in the other answers and skip ls:
for f in "$image"/*.jpg; do

Here's how I would do that.
#!/bin/bash -e
image="/Home/Desktop/epubs/images"
(cd "$image"
for f in *.jpg; do
let fsize=$(stat -c %s "$f")
if (( fsize == 40318 )); then
echo "$f"
fi
done) >results.txt
The -e means the script will exit if anything goes wrong (can't cd into the directory, for instance). Saves a lot of error checking when you're happy with that behavior.
The parentheses mean that the cd command is in a subshell; the surrounding script (including the redirection into results.txt) is still in whatever directory you started in.
Now that we're in the directory, we can just look for *.jpg, no directory prefix, and no need to call basename on anything.
Using let and (( == )) treats the size value as a number instead of a string, so we won't get tripped up by any wonkiness in the way stat chooses to format the value.
We just redirect the output the entire loop into the result file instead of appending every time through; it's more efficient. If you have existing contents in results.txt that you want to keep, you can just change the > back to a >>, but leaving it around the whole loop is still more efficient than opening the file and appending to it on every iteration.

Related

Rename file to a hidden file for loop

I'm writing a fairly basic shell script that loops through files within a directory and renames the file and adds a dot(.) to the start of the file however it does not work
any insight on whats going wrong?
for file in /tmp/test/*; do
mv $file \\.$file;
done
There are two problems.
You're putting the dot before the whole pathname, not just the filename part.
You're prefixing the filename with \. instead of just .. There's no need for \\ in the mv command.
Corrected code:
for file in /tmp/test/*; do
mv "$file" "${file%/*}/.${file##*/}";
done
${file%/*} returns the value of $file with everything starting from the last / removed, which is the directory part of the pathname. ${file##*/}" returns the value of $file with everything up to the last / removed, which is the filename part. Then it puts them back together with /. between them, which adds the . prefix that you want to the filename part. See Bash parameter expansion documentation for details of these operators.
Also, remember to quote variables so you don't get errors when the variable contains whitespace.
This is a simple script that takes a directory argument:
hide_files.sh:
if [ $# -ne 1 ] || [ ! -d $1 ]; then
echo 'invalid dir arg.'
exit 1
fi
for f in $(ls $1); do
mv -v "$1/$f" "$1/.$f"
done
output:
$ bash hide_files.sh mydir
mydir/a -> mydir/.a
mydir/c -> mydir/.c

bash finding files in directories recursively

I'm studying the bash shell and lately understood i'm not getting right recursive calls involving file searching- i know find is made for this but I'm recently asked to implement a certain search this way or another.
I wrote the next script:
#!/bin/bash
function rec_search {
for file in `ls $1`; do
echo ${1}/${item}
if[[ -d $item ]]; then
rec ${1}/${item}
fi
done
}
rec $1
the script gets as argument file and looking for it recursively.
i find it a poor solution of mine. and have a few improvement questions:
how to find files that contain spaces in their names
can i efficiently use pwd command for printing out absolute address (i tried so, but unsuccessfully)
every other reasonable improvement of the code
Your script currently cannot work:
The function is defined as rec_search, but then it seems you mistakenly call rec
You need to put a space after the "if" in if[[
There are some other serious issues with it too:
for file in `ls $1` goes against the recommendation to "never parse the output of ls", won't work for paths with spaces or other whitespace characters
You should indent the body of if and for to make it easier to read
The script could be fixed like this:
rec() {
for path; do
echo "$path"
if [[ -d "$path" ]]; then
rec "$path"/*
fi
done
}
But it's best to not reinvent the wheel and use the find command instead.
If you are using bash 4 or later (which is likely unless you running this under Mac OS X), you can use the ** operator.
rec () {
shopt -s globstar
for file in "$1"/**/*; do
echo "$file"
done
}

In shell, how do I delete numbered duplicate files?

I've got a directory with a few thousand files in it, named things like:
filename.ext
filename (1).ext
filename (2).ext
otherfile.ext
otherfile (1).ext
etc.
Most of the files with bracketed numbers are duplicates of the original, but in some cases they're not.
How can I keep my original files, delete the duplicates, but not lose the files that are different?
I know that I could rm *\).ext, but that obviously doesn't make sure that files match the original.
I'm using OS X, so I have a md5 program that functions sort of like md5sum in Linux, though it puts the hash at the end of the line instead of the beginning. I was thinking I could use an awk script to take the output of md5 *.ext | awk 'some script', find duplicates by md5, and delete them, but the command line is too long (bash: /sbin/md5: Argument list too long).
And I don't know what to write in the script. I was thinking of storing things in an array with this:
awk '{a[$NF]++} a[$NF]>1{sub(/).*/,""); sub(/.*(/,""); system("rm " $0);}'
But that always seems to delete my original.
What am I doing wrong? How do I do it right?
Thanks.
Your awk script deletes original files because when you sort your files, . (period) sorts after (space). SO the first file that's seen is numbered, not the original, and subsequent checks (including the one against the original) compare files to the first numbered one.
Not only does rm *\).txt fail to match the original, it loses files that may not have an original in the first place.
I wouldn't do this quite this way. Rather than checking every numbered file and verifying whether it matches an original, you can go through your list of originals, then delete the numbered files that match them.
Instead:
$ for file in *[^\)].txt; do echo "-- Found: $file"; rm -v $(basename "$file" .txt)\ \(*\).txt; done
You can expand this to check MD5's along the way. But it's more code, so I'll break it into multiple lines, in a script:
#!/bin/bash
shopt -s nullglob # Show nothing if a fileglob matches no files
for file in *[^\)].ext; do
md5=$(md5 -q "$file") # The -q option gives you only the message digest
echo "-- Found: $file ($md5)"
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
if [[ "$md5" = "$(md5 -q "$duplicate")" ]]; then
rm -v "$duplicate"
fi
done
done
As an alternative, you can probably get away with doing this a little more simply, with less CPU overhead than calculating MD5 digests. Unix and Linux have a shell tool called cmp, which is like diff without the output. So:
#!/bin/bash
shopt -s nullglob
for file in *[^\)].ext; do
for duplicate in $(basename "$file" .ext)\ \(*\).ext; do
  if cmp "$file" "$duplicate"; then
rm -v "$file"
fi
done
done
If you don't need to use AWK, you could maybe do something simpler in bash:
for file in *\([0-9]*\)*; do
[ -e "$(echo "$file" | sed -e 's/ ([0-9]\+)//')" ] && rm "$file"
done
Hope this helps a little =)

Basename puts single quotes around variable

I am writing a simple shell script to make automated backups, and I am trying to use basename to create a list of directories and them parse this list to get the first and the last directory from the list.
The problem is: when I use basename in the terminal, all goes fine and it gives me the list exactly as I want it. For example:
basename -a /var/*/
gives me a list of all the directories inside /var without the / in the end of the name, one per line.
BUT, when I use it inside a script and pass a variable to basename, it puts single quotes around the variable:
while read line; do
dir_name=$(echo $line)
basename -a $dir_name/*/ > dir_list.tmp
done < file_with_list.txt
When running with +x:
+ basename -a '/Volumes/OUTROS/backup/test/*/'
and, therefore, the result is not what I need.
Now, I know there must be a thousand ways to go around the basename problem, but then I'd learn nothing, right? ;)
How to get rid of the single quotes?
And if my directory name has spaces in it?
If your directory name could include spaces, you need to quote the value of dir_name (which is a good idea for any variable expansion, whether you expect spaces or not).
while read line; do
dir_name=$line
basename -a "$dir_name"/*/ > dir_list.tmp
done < file_with_list.txt
(As jordanm points out, you don't need to quote the RHS of a variable assignment.)
Assuming your goal is to populate dir_list.tmp with a list of directories found under each directory listed in file_with_list.txt, this might do.
#!/bin/bash
inputfile=file_with_list.txt
outputfile=dir_list.tmp
rm -f "$outputfile" # the -f makes rm fail silently if file does not exist
while read line; do
# basic syntax checking
if [[ ! ${line} =~ ^/[a-z][a-z0-9/-]*$ ]]; then
continue
fi
# collect targets using globbing
for target in "$line"/*; do
if [[ -d "$target" ]]; then
printf "%s\n" "$target" >> $outputfile
fi
done
done < $inputfile
As you develop whatever tool will process your dir_list.tmp file, be careful of special characters (including spaces) in that file.
Note that I'm using printf instead of echo so that targets whose first character is a hyphen won't cause errors.
This might work
while read; do
find "$REPLY" >> dir_list.tmp
done < file_with_list.txt

Performance with bash loop when renaming files

Sometimes I need to rename some amount of files, such as add a prefix or remove something.
At first I wrote a python script. It works well, and I want a shell version. Therefore I wrote something like that:
$1 - which directory to list,
$2 - what pattern will be replacement,
$3 - replacement.
echo "usage: dir pattern replacement"
for fname in `ls $1`
do
newName=$(echo $fname | sed "s/^$2/$3/")
echo 'mv' "$1/$fname" "$1/$newName&&"
mv "$1/$fname" "$1/$newName"
done
It works but very slowly, probably because it needs to create a process (here sed and mv) and destroy it and create same process again just to have a different argument. Is that true? If so, how to avoid it, how can I get a faster version?
I thought to offer all processed files a name (using sed to process them at once), but it still needs mv in the loop.
Please tell me, how you guys do it? Thanks. If you find my question hard to understand please be patient, my English is not very good, sorry.
--- update ---
I am sorry for my description. My core question is: "IF we should use some command in loop, will that lower performance?" Because in for i in {1..100000}; do ls 1>/dev/null; done creating and destroying a process will take most of the time. So what I want is "Is there any way to reduce that cost?".
Thanks to kev and S.R.I for giving me a rename solution to rename files.
Every time you call an external binary (ls, sed, mv), bash has to fork itself to exec the command and that takes a big performance hit.
You can do everything you want to do in pure bash 4.X and only need to call mv
pat_rename(){
if [[ ! -d "$1" ]]; then
echo "Error: '$1' is not a valid directory"
return
fi
shopt -s globstar
cd "$1"
for file in **; do
echo "mv $file ${file//$2/$3}"
done
}
Simplest first. What's wrong with rename?
mkdir tstbin
for i in `seq 1 20`
do
touch tstbin/filename$i.txt
done
rename .txt .html tstbin/*.txt
Or are you using an older *nix machine?
To avoid re-executing sed on each file, you could instead setup two name streams, one original, and one transformed, then sip from the ends:
exec 3< <(ls)
exec 4< <(ls | sed 's/from/to/')
IFS=`echo`
while read -u3 orig && read -u4 to; do
mv "${orig}" "${to}";
done;
I think you can store all of file names into a file or string, and use awk and sed do it once instead of one by one.

Resources