bash uses only first entry from find - bash

I'm trying to list all PDF files under a given directory $1 (and its subdirectories), get the number of pages in each file and calculate two numbers using the pagecount. My script used to work, but only on filenames that don't contain spaces and only in one directory that is only filled with PDF files. I've modified it a bit already (using quotes around variables and such), but now I'm a bit stuck.
The problem I'm having is that, as it is now, the script only processes the first file found by find . -name '*.pdf'. How would I go about processing the rest?
#!/bin/bash
wd=`pwd`
pppl=0.03 #euro
pppnl=0.033 #eruo
cd $1
for entry in "`find . -name '*.pdf'`"
do
filename="$(basename "$entry")"
pagecount=`pdfinfo "$filename" | grep Pages | sed 's/[^0-9]*//'`
pricel=`echo "$pagecount * $pppl" | bc`
pricenl=`echo "$pagecount * $pppnl" | bc`
echo -e "$filename\t\t$pagecount\t$pricel\t$pricenl"
done
cd "$wd"

The problem with using find in a for loop, is that if you don't quote the command, the filenames with spaces will be split, and if you do quote the command, then the entire results will be parsed in a single iteration.
The workaround is to use a while loop instead, like this:
find . -name '*.pdf' -print0 | while IFS= read -r -d '' entry
do
....
done
Read this article for more discussion: http://mywiki.wooledge.org/ParsingLs

It's a bad idea to use word splitting. Use a while loop instead.
while read -r entry
do
filename=$(basename "$entry")
pagecount=$(pdfinfo "$filename" | grep Pages | sed 's/[^0-9]*//')
pricel=$(echo "$pagecount * $pppl" | bc)
pricenl=$(echo "$pagecount * $pppnl" | bc)
echo -e "$filename\t\t$pagecount\t$pricel\t$pricenl"
done < <(exec find . -name '*.pdf')
Also prefer $() over backticks when possible. You also don't need to place around "" variables or command substitutions when they are being used for assignment.
filename=$(basename "$entry")
As well could simply be just
filename=${entry##*/}

Related

Handle files with space in filename and output file names

I need to write a Bash script that achieve the following goals:
1) move the newest n pdf files from folder 1 to folder 2;
2) correctly handles files that could have spaces in file names;
3) output each file name in a specific position in a text file. (In my actual usage, I will use sed to put the file names in a specific position of an existing file.)
I tried to make an array of filenames and then move them and do text output in a loop. However, the following array cannot handle files with spaces in filename:
pdfs=($(find -name "$DOWNLOADS/*.pdf" -print0 | xargs -0 ls -1 -t | head -n$NUM))
Suppose a file has name "Filename with Space". What I get from the above array will have "with" and "Space" in separate array entries.
I am not sure how to avoid these words in the same filename being treated separately.
Can someone help me out?
Thanks!
-------------Update------------
Sorry for being vague on the third point as I thought I might be able to figure that out after achieving the first and second goals.
Basically, it is a text file that have a line start with "%comment" near the end and I will need to insert the filenames before that line in the format "file=PATH".
The PATH is the folder 2 that I have my pdfs moved to.
You can achieve this using mapfile in conjunction with gnu versions of find | sort | cut | head that have options to operate on NUL terminated filenames:
mapfile -d '' -t pdfs < <(find "$DOWNLOADS/*.pdf" -name 'file*' -printf '%T#:%p\0' |
sort -z -t : -rnk1 | cut -z -d : -f2- | head -z -n $NUM)
Commands used are:
mapfile -d '': To read array with NUL as delimiter
find: outputs each file's modification stamp in EPOCH + ":" + filename + NUL byte
sort: sorts reverse numerically on 1st field
cut: removes 1st field from output
head: outputs only first $NUM filenames
find downloads -name "*.pdf" -printf "%T# %p\0" |
sort -z -t' ' -k1 -n |
cut -z -d' ' -f2- |
tail -z -n 3
find all *.pdf files in downloads
for each file print it's modifition date %T with the format specifier # that means seconds since epoch with fractional part, then print space, filename and separate with \0
Sort the null separated stream using space as field separator using only first field using numerical sort
Remove the first field from the stream, ie. creation date, leaving only filenames.
Get the count of the newest files, in this example 3 newest files, by using tail. We could also do reverse sort and use head, no difference.
Don't use ls in scripts. ls is for nice formatted output. You could do xargs -0 stat --printf "%Y %n\0" which would basically move your script forward, as ls isn't meant to be used for scripts. Just that I couldn't make stat output fractional part of creation date.
As for the second part, we need to save the null delimetered list to a file
find downloads ........ >"$tmp"
and then:
str='%comment'
{
grep -B$((2**32)) -x "$str" "$out" | grep -v "$str"
# I don't know what you expect to do with newlines in filenames, but I guess you don't have those
cat "$tmp" | sed -z 's/^/file=/' | sed 's/\x0/\n/g'
grep -A$((2**32)) -x "$str" "$out"
} | sponge "$out"
where output is the output file name
assuming output file name is stored in variable "$out"
filter all lines before the %comment and remove the line %comment itself from the file
output each filename with file= on the beginning. I also substituted zeros for newlines.
the filter all lines after %comment including %comment itself
write the output for outfile. Remember to use a temporary file.
Don't use pdf=$(...) on null separated inputs. You can use mapfile to store that to an array, as other answers provided.
Then to move the files, do smth like
<"$tmp" xargs -0 -i mv {} "$outdir"
or faster, with a single move:
{ cat <"$tmp"; printf "%s\0" "$outdir"; } | xargs -0 mv
or alternatively:
<"$tmp" xargs -0 sh -c 'outdir="$1"; shift; mv "$#" "$outdir"' -- "$outdir"
Live example at turorialspoint.
I suppose following code will be close to what you want:
IFS=$'\n' pdfs=($(find -name "$DOWNLOADS/*.pdf" -print0 | xargs -0 -I ls -lt "{}" | tail -n +1 | head -n$NUM))
Then you can access the output through ${pdfs[0]}, ${pdfs[1]}, ...
Explanations
IFS=$'\n' makes the following line to be split only with "\n".
-I option for xargs tells xargs to substitute {} with filenames so it can be quoted as "{}".
tail -n +1 is a trick to suppress an error message saying "xargs: 'ls' terminated by signal 13".
Hope this helps.
Bash v4 has an option globstar, after enabling this option, we can use ** to match zero or more subdirectories.
mapfile is a built-in command, which is used for reading lines into an indexed array variable. -t option removes a trailing newline.
shopt -s globstar
mapfile -t pdffiles < <(ls -t1 **/*.pdf | head -n"$NUM")
typeset -p pdffiles
for f in "${pdffiles[#]}"; do
echo "==="
mv "${f}" /dest/path
sed "/^%comment/i${f}=/dest/path" a-text-file.txt
done

echo prints too many spaces

I have code with two variables in echo. I don't know why it prints spaces before $NEXT even though I have just one space in code.
NEXT=$(find "${DIR}" -type f -name "*.$ext" | sed "s/.*\/\.//g" | sed "s/.*\///g" |
sed -n '/.*\..*/p' | wc -l)
echo "Files .$ext: $NEXT"
Files .tar: 1
Your find expression is not doing what you think it is:
NEXT=$(find "${DIR}" -type f -name "*.$ext" | sed "s/.*\/\.//g" | sed "s/.*\///g" |
sed -n '/.*\..*/p' | wc -l)
When you pipe to wc -l you are left with a Number. The format of the number will depend on your distributions default compile options for wc. While generally when information is piped or redirected to wc the value returned should be without any leading whitespace (but there is no guarantee that your install of wc will work that way). All you can do it test and see what results, e.g.
ls "$HOME" | wc -l
If whitespace is returned before the value -- you have found your problem.
If the last line is the output, then it seems it is an output of something else than displayed code. When your output looks weird, try putting single quotes around each variable:
echo " Average file size .'$ext': '$AEXT'"
That way, you will know, if the spaces (or tabs) are coming from the variables themselves or from the script.

Bash: losing quotation marks in expressions that contain variables after expansion [duplicate]

This question already has answers here:
Why does shell ignore quoting characters in arguments passed to it through variables? [duplicate]
(3 answers)
Closed 4 years ago.
I just started learning Bash scripting and wrote a little something:
#!bin/bash
for str in stra strb strc
do
find . -name "*${str}*" | sort | cut -c3- > "${str}.list"
done
As you can see, I'm trying to create three files ("stra.list", "strb.list" and "strc.list") which would list the names of the files containing "stra", "strb", or "strc" respectively in the current directory. The cut -c3- hack is just for getting rid of the path name ./ at the beginning of find results.
But all my script does right now is creating three empty files...
So when I run
for str in stra strb strc;
do
echo "find .-name "*${str}*" | sort | cut -c3- > "${str}.list"";
done
I only see
find .-name *stra* | sort | cut -c3- > stra.list
find .-name *strb* | sort | cut -c3- > strb.list
find .-name *strc* | sort | cut -c3- > strc.list
So how can I retain the quotes around the expressions containing the variables after the expansion? I tried putting an extra set of quotes as well as using eval, but neither seems to work.
Update:
What I'm asking is how I can write my find command in such a way that Bash would successfully produce the three lists with the target file names, instead of for whatever reason just creating three blank lists. Sorry about the confusion.
If your goal is to generate valid shell commands, then simply trying to preserve quotes is doing it wrong (in the sense that it's actually insecure; maliciously generated variable contents can perform shell injection attacks). Instead, tell the shell itself to generate valid quoting for the content you want to preserve; printf %q will do this in bash.
This particular variant requires a new enough bash to have printf -v.
#!/bin/bash
for str in stra strb strc; do
printf -v start '%q ' find . -name "*${str}*"
printf -v end '%q ' "${str}.list"
printf '%s\n' "$start | sort | cut -c3- >$end"
done
By contrast, if you simply want to fix the bugs in your original script, assuming you have GNU find:
#!/bin/bash
for str in stra strb strc; do
find . -maxdepth 1 -name "*${str}*" -printf '%P\n' | sort > "${str}.list"
done
The find action -printf '%P\n' prints the name without the starting directory, meaning no ./ is present to need to be stripped.
Since you say you're only looking for files in the current directory, by the way, this whole mess is overkill. You don't need find for the job at all.
for str in stra strb strc; do
files=( *"$str"* )
[[ -e $files ]] && printf '%s\n' "${files[#]}" >"$str.list"
done
Note that output from this command can be misleading if any of your filenames contain literal newlines. (For this reason, storing filenames in newline-delimited files is a bad idea to start with).
There are two ways:
Enclose your echo output with single quotes instead of double quotes:
echo 'find .-name "*${str}*" | sort | cut -c3- > "${str}.list"';
or put a \ (backslash) in front of your inner quotes:
echo "find .-name \"*${str}*\" | sort | cut -c3- > \"${str}.list\"";

How to stop this script from moving renamed files out of source folder?

The script works as far as renaming the files but it moves the renamed files out of their respective folders.
I would like it to not move them but only rename them and I have failed after a few days of trying. I know this code is a mess and there is unneeded code in it but it nearly works.
Also the renamed file isn’t getting an extension of .txt but that isn't really an issue for me. I just want to see the "Dynamic Range Value" that is taken from inside the file as the file name so I don’t have to open every file (a couple thousand albums worth) to see what the DR is. Here is the code:
#!/bin/bash
cd /media/Storage/MusicWorks/Processing
find . -name 'dr14.txt' | while IFS=$'\n' read -r i
do mv -n "$i" `egrep -m1 -e 'Official DR value:' "$i" | sed -e 's/Official DR value://'`;
echo "Done"
done
I run this script from the terminal with a bash alias.
I have reservations about the egrep | sed part of your script, but if they work for you, so be it. You need to preserve the pathname of the file, for example like this:
find . -name 'dr14.txt' |
while IFS=$'\n' read -r i
do
newname="${i%/*}"/$(egrep -m1 -e 'Official DR value:' "$i" | sed -e 's/Official DR value://');
mv -n "$i" "$newname"
echo "Done $i ($newname)"
done
The ${i%/*} notation removes anything from the last slash to the end of the name in $i. Since all the names from find will start with ./, this is secure enough; it would not work well on absolute names such as / and /unix (the output would be the empty string, but /usr/bin/sh would be fine).
Under a little prompting by tripleee in a comment, it is possible to simplify the egrep | sed part of the code to:
newname="${i%/*}"/$(sed -n -e '/Official DR value:/{s///p;q;}' "$i");
The second semicolon is needed with BSD sed but not with GNU sed.

Rename Files to original extensions

Need help on writing a bash script that will rename files that are being outputted as file name.suffix.date I need these files to be rewritten as name.date.suffix instead.
Edited:
Changed suffix from date to ~
Here's what I have so far:
find . -type f -name "*.~" -print0 | while read -d $'\0' f
do
new=`echo "$f" | sed -e "s/~//"`
mv "$f" "$new"
done
This changes the suffix back to original but can't figure out how to get the date to be named before the extension (fname??)
You can use regular expression matching to pull apart the original file name:
find . -type f -name "*.~" -print0 | while read -d $'\0' f
do
dir=${f%/*}
fname=${f##*/}
[[ $fname =~ (.+)\.([^.]+)\.([^.]+)\.~$ ]] || continue
name=${BASH_REMATCH[1]}
suffix=${BASH_REMATCH[2]}
d=${BASH_REMATCH[3]}
mv "$f" "$dir/$name.$d.$suffix"
done
Bash-only solution:
while IFS=. read -r -u 9 -d '' name suffix date tilde
do
mv "${name}.${suffix}.${date}.~" "${name}.${date}.${suffix}"
done 9< <(find . -type f -name "*.~" -print0)
Notes:
-d '' gives you the same result as -d $'\0'
Splits file names by the dots while reading them. Of course this means it would break if there are dots anywhere else.
Should otherwise work with pretty much any filenames, including those containing space, newlines and other funny business.
create a list of the files first and redirect to a file.
ls > fileList.txt
Open the file and read line by line in Perl. Use a regex to match the parts of the files and capture them like this
my ($fileName,$suffix,$date)=($WholeFileName=~/(.*)\.(.*)\.(.*)/);
This should capture the three seperate variables for you. Now all you need to do is move the old file to the new file name. The new file name will be a concatenation of the above three variables that you have got. $newFileName=$fileName. ".".$date.".".$suffix. If you have a sample fileName post a comment and I can reply with a short script. Perl is not the only way. You could just use bash or awk and find alternate ways to do this.
cut each part of your filenames:
FIN=$(echo test.12345.ABCDEF | sed -e 's/[a-zA-Z0-9]*[\\.][a-zA-Z0-9]*[\\.]//')
DEBUT=$(echo test.12345.ABCDEF | sed -e 's/[\\.][a-zA-Z0-9]*[\\.][a-zA-Z0-9]*//')
MILIEU=$(echo test.12345.ABCDEF | sed -e 's/'${FIN}'//' -e 's/'${DEBUT}'//' -e 's/[\.]*//g')
paste each part as expected:
echo ${DEBUT}.${FIN}.${MILIEU}
rename --no-act 's/\(name-regex\).\(suffix-regex\).\(date-regex\)/\1.\3.\2' *
Tweak the three regexes to fit your file names, and remove --no-act when you're happy with the result to actually rename the files.

Resources