Run script on multiple files - bash

I would like to execute a script on a batch of files all of which have .xml extension.
Inspired by previous posts, I tried the following:
for file in *.xml; do ./script.sh <"$file"; done
And
for i in $(\ls -d *.xml)
do
./script.sh -i /path/*.xml -d /output_folder $i
done
Both of these run the script many times but only on the first .xml file in that folder. So I end up with a dozen output files but all of them are file1.txt, file1.txt_1, file1.txt_2 etc. The loop stops randomly, sometimes after 3 iterations, sometimes after 28.
Any help would be appreciated,
Thank you,
TP

for f in /input_path/*.xml; do
./interproscan.sh -mode convert -f raw -i "$f" -d /output_path
done

More simple and safe method is this:
while IFS= read -r -d $'\0'; do
./interproscan.sh -mode convert -f raw -i "$REPLY" -d /output_path
done < <(find . -iname "*.xml" -print0)
NOTE
1) Using iname you search using case insensitive.
2) "$variable" help you if filename have some space.

Instead of looping though the files you could use find's -exec option. It will execute the command on each file, replacing {} with the file's path. Note you must end the command with an escaped semicolon (\;).
Something like this could work for you:
find . -name "*.xml" -exec ./script.sh -i /path/*.xml -d /output_folder {} \;
But you are limited that you can only insert the {} once, alternitlvity to do it with a loop you could do this:
xmlFiles=( $(find . -name "*.xml") )
for i in ${xmlFiles[#]}
do
./script.sh -i /path/*.xml -d /output_folder $i
done

Related

Issues renaming files using bash script with input from .txt file with find -exec rename command

Update 01/12/2022
With triplee's helpful suggestions, I resolved it to take both files & directories by adding a comma in between f and d, the final code now looks like this:
while read -r old new;
do echo "replacing ${old} by ${new}" >&2
find '/path/to/dir' -depth -type d,f -name "$old" -exec rename
"s/${old}/${new}/" {} ';'
done <input.txt
Thank you!
Original request:
I am trying to rename a list of files (from $old to $new), all present in $homedir or in subdirectories in $homedir.
In the command line this line works to rename files in the subfolders:
find ${homedir}/ -name ${old} -exec rename "s/${old}/${new}/" */${old} ';'
However, when I want to implement this line in a simple bash script getting the $old and $new filenames from input.txt, it doesn't work anymore...
input.txt looks like this:
name_old name_new
name_old2 name_new2
etc...
the script looks like this:
#!/bin/bash
homedir='/path/to/dir'
cat input.txt | while read old new;
do
echo 'replacing' ${old} 'by' ${new}
find ${homedir}/ -name ${old} -exec rename "s/${old}/${new}/" */${old} ';'
done
After running the script, the text line from echo with $old and $new filenames being replaced is printed for the entire loop, but no files are renamed. No error is printed either. What am I missing? Your help would be greatly appreaciated!
I checked whether the $old and $new variables were correctly passed to the find -exec rename command, but because they are printed by echo that doesn't seem to be the issue.
If you add an echo, like -exec echo rename ..., you'll see what actually gets executed. I'd say that both the path to $old is wrong (you're not using the result of find in the -exec clause), and */$old isn't quoted and might be expanded by the shell before find ever gets to see it.
You're also having most other expansions unquoted, which can lead to all sorts of trouble.
You could do it in pure Bash (drop echo when output looks good):
shopt -s globstar
for f in **/"$old"; do echo mv "$f" "${f/%*/$new}"; done
Or with rename directly, though this would run into trouble if too many files match (drop -n when output looks good):
rename -n "s/$old\$/$new/" **/"$old"
Or with GNU find, using -execdir to run in the same directory as the matching file (drop echo when output looks good):
find -type f -name "$old" -execdir echo mv "$old" "$new" \;
And finally, a version with find that spawns just a single subshell (drop echo when output looks right):
find -type f -name "$old" -exec bash -c '
new=$1
shift
for f; do
echo mv "$f" "${f/%*/$new}"
done
' bash "$new" {} +
The argument to rename should be the file itself, not */${old}. You also have a number of quoting errors, and a useless cat).
#!/bin/bash
while read -r old new;
do
echo "replacing ${old} by ${new}" >&2
find /path/to/dir -name "$old" -exec rename "s/${old}/${new}/" {} ';'
done <input.txt
Running find multiple times on the same directory is hugely inefficient, though. Probably a better solution is to find all files in one go, and abort if it's not one of the files on the list.
find /path/to/dir -type f -exec sh -c '
for f in "$#"; do
awk -v f="$f" "f==\$1 { print \"s/\" \$1 \"/\" \$2 \"/\" }" "$0" |
xargs -I _ -r rename _ "$f"
done' input.txt {} +
(Untested; probably try with echo before you run this live.)

Searching for hundreds of files on a server

I have a list of 577 image files that I need to search for on a large server. I am no expert when it comes to bash so the best I could do myself was 577 lines of this:
find /source/directory -type f -iname "alternate1_1052956.tif" -exec cp {} /dest/directory \;
...repeating this line for each file name. It works... but it's unbelievably slow because it searches the entire server for one file and then moves on to the next line, but each search could take 20 minutes. I left this overnight and it only found 29 of them by the morning which is just way too slow. It could take two weeks at that rate to find all of these.
I've tried separating each line with -o as an OR separator in the hopes that it would search once for 577 files but I can't get it to work.
Does anyone have any suggestions? I also tried using the .txt file I have of the file names as a basis for the search but couldn't get that to work either. Unfortunately I don't have the paths for these files, only the basenames.
If you want to copy all .tif files
find /source/directory -type f -name "*.tif" -exec cp {} /dest/directory \;
# ^
On MacOS, use the mdfind command that will look for the filename in the SpotLight index. This is very fast as it is only an index lookup, just like the locate command in Linux:
cp $(mdfind alternate1_1052956.tif) /dest/directory
If you have all the filenames in a file (one line per file) use xargs
xargs -L 1 -I {} cp $(mdfind {}) /dest/directory < file_with_list
Create a file with all filenames, then write a loop which runs through that file and executes command in background.
Note, that this will take a lot of memory, as you will be executing this simultaneously multiple times. So make sure you have enough memory for this.
while read -r line; do
find /source/directory -type f -iname "$line" -exec cp {} /dest/directory \ &;
done < input.file
There are a few assumption made in this answer. You have a list of all 577 file names, let's call it, inputfile.list. There are no whitespaces in the file names. Following may work:
$ cat findcopy.sh
#!/bin/bash
cmd=$(
echo -n 'find /path/to/directory -type f '
readarray -t filearr < inputfile.list # Read the list to an array
n=0
for f in "${filearr[#]}" # Loop over the array and print -iname
do
(( n > 0 )) && echo "-o -iname ${f}" || echo "-iname ${f}"
((n++))
done
echo -n ' | xargs -I {} cp {} /path/to/destination/'
)
eval $cmd
execute: ./findcopy.sh
Note for MacOS. It doesn't have readarray. Instead use any other simple method to feed the list into array, for example,
filearr=($(cat inputfile.list))

Bash - Rename ".tmp" files recursively

A bunch of Word & Excel documents were being moved on the server when the process terminated before it was complete. As a result, we're left with several perfectly fine files that have a .tmp extension, and we need to rename these files back to the appropriate .xlsx or .docx extension.
Here's my current code to do this in Bash:
#!/bin/sh
for i in "$(find . -type f -name *.tmp)"; do
ft="$(file "$i")"
case "$(file "$i")" in
"$i: Microsoft Word 2007+")
mv "$i" "${i%.tmp}.docx"
;;
"$i: Microsoft Excel 2007+")
mv "$i" "${i%.tmp}.xlsx"
;;
esac
done
It seems that while this does search recursively, it only does 1 file. If it finds an initial match, it doesn't go on to rename the rest of the files. How can I get this to loop correctly through the directories recursively without it doing just 1 file at a time?
Try find command like this:
while IFS= read -r -d '' i; do
ft="$(file "$i")"
case "$ft" in
"$i: Microsoft Word 2007+")
mv "$i" "${i%.tmp}.docx"
;;
"$i: Microsoft Excel 2007+")
mv "$i" "${i%.tmp}.xlsx"
;;
esac
done < <(find . -type f -name '*.tmp' -print0)
Using <(...) is called process substitution to run find command here
Quote filename pattern in find
Use -print0 to get find output delimited by a null character to allow space/newline characters in file names
Use IFS= and -d '' to read null separated filenames
I too would recommend using find. I would do this in two passes of find:
find . -type f -name \*.tmp \
-exec sh -c 'file "{}" | grep -q "Microsoft Word 2007"' \; \
-exec sh -c 'f="{}"; echo mv "$f" "${f%.tmp}.docx"' \;
find . -type f -name \*.tmp \
-exec sh -c 'file "{}" | grep -q "Microsoft Excel 2007"' \; \
-exec sh -c 'f="{}"; echo mv "$f" "${f%.tmp}.xlsx"' \;
Lines are split for readability.
Each instance of find will search for tmp files, then use -exec to test the output of find. This is similar to how you're doing it within the while loop in your shell script, only it's launched from within find itself. We're using the pipe to grep instead of your case statement.
The second -exec only gets run if the first one returned "true" (i.e. grep -q ... found something), and executes the rename in a tiny shell instance.
I haven't profiled this to see whether it would be faster or slower than a loop in a shell script. Just another way to handle things.

Bash script to apply operation to each file found by find

I'm trying to execute an operation to each file found by find - with a specific file extension (wma). For example, in python, I would simply write the following script:
for file in os.listdir('.'):
if file.endswith('wma'):
name = file[:-4]
command = "ffmpeg -i '{0}.wma' '{0}.mp3'".format(name)
os.system(command)
I know I need to execute something similar to
find -type f -name "*.wma" \
exec ffmpeg -i {}.wma {}.mp3;
But obviously this isn't working or else I wouldn't be asking this question =]
Most of the time it's better to use read when parsing input than doing word splitting with for and depending on IFS as there's risk with unexpected pathname expansion.
while IFS= read -u 4 -r LINE; do
ffmpeg -i "$LINE" "${LINE%.*}.mp3"
done 4< <(exec find -type f -name '*.wma')
Or use readarray (Bash 4.0+)
readarray -t FILES < <(exec find -type f -name '*.wma')
for FILE in "${FILES[#]}"; do
ffmpeg -i "$FILE" "${FILE%.*}.mp3"
done
Sticking to the basics always gets the job done (does not handle spaces in filenames):
for f in $(find "." -type f -name "*.wma"); do ffmpeg -i "$f" "${f//wma/mp3}"; done
Starting from konsolebox's suggestions below, I've come up with this complete version:
find "." -type f -name "*.wma" | while read -d $'\n' f; do ffmpeg -i "$f" "${f//wma/mp3}"; done

How do I check if all files inside directories are valid jpegs (Linux, sh script needed)?

Ok, I got a directory (for instance, named '/photos') in which there are different directories
(like '/photos/wedding', '/photos/birthday', '/photos/graduation', etc...) which have .jpg files in them. Unfortunately, some of jpeg files are broken. I need to find a way how to determine, which files are broken.
I found out, that there is tool named imagemagic, which can help a lot. If you use it like this:
identify -format '%f' whatever.jpg
it prints the name of the file only if file is valid, if it is not it prints something like "identify: Not a JPEG file: starts with 0x69 0x75 `whatever.jpg' # jpeg.c/EmitMessage/232.".
So the correct solution should be find all files ending with ".jpg", apply to them "identify", and if the result is just the name of the file - don't do anything, and if the result is different from the name of the file - then save the name of the file somethere (like in a file "errors.txt").
Any ideas how I can probably do that?
The short-short version:
find . -iname "*.jpg" -exec jpeginfo -c {} \; | grep -E "WARNING|ERROR"
You might not need the same find options, but jpeginfo was the solution that worked for me:
find . -type f -iname "*.jpg" -o -iname "*.jpeg"| xargs jpeginfo -c | grep -E "WARNING|ERROR" | cut -d " " -f 1
as a script (as requested in this question)
#!/bin/sh
find . -type f \
\( -iname "*.jpg" \
-o -iname "*.jpeg" \) \
-exec jpeginfo -c {} \; | \
grep -E "WARNING|ERROR" | \
cut -d " " -f 1
I was clued into jpeginfo for this by http://www.commandlinefu.com/commands/view/2352/find-corrupted-jpeg-image-files and this explained mixing find -o OR with -exec
One problem with identify -format is that it doesn't actually verify that the file is not corrupt, it just makes sure that it's really a jpeg.
To actually test it you need something to convert it. But the convert that comes with ImageMagick seems to silently ignore non-fatal errors in the jpeg (such as being truncated.)
One thing that works is this:
djpeg -fast -grayscale -onepass file.jpg > /dev/null
If it returns an error code, the file has a problem. If not, it's good.
There are other programs that could be used as well.
You can put this into bash script file or run directly:
find -name "*.jpg" -type f |xargs --no-run-if-empty identify -format '%f' 1>ok.txt 2>errors.txt
In case identify is missing, here is how to install it in Ubuntu:
sudo apt install imagemagick --no-install-recommends
This script will print out the names of the bad files:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if [[ $(identify -format '%f' "$FILE" 2>/dev/null) != $FILE ]]; then
echo "$FILE"
fi
done
You could run it as is or as ./badjpegs > errors.txt to save the output to a file.
To break it down, the find command finds *.jpg files in /photos or any of its subdirectories. These file names are piped to a while loop, which reads them in one at a time into the variable $FILE. Inside the loop, we grab the output of identify using the $(...) operator and check if it matches the file name. If not, the file is bad and we print the file name.
It may be possible to simplify this. Most UNIX commands indicate success or failure in their exit code. If the identify command does this, then you could simplify the script to:
#!/bin/bash
find /photos -name '*.jpg' | while read FILE; do
if ! identify "$FILE" &> /dev/null; then
echo "$FILE"
fi
done
Here the condition is simplified to if ! identify; then which means, "did identify fail?"

Resources