A script that iterates over all files in folder - shell

There is a script on a server that I need to run over all the files in a folder. To run this script over one file I use this shell script:
for input in /home/arashsa/duo-bokmaal/Bokmaal/DUO_BM_28042.txt ; do
name=$(basename "$input")
/corpora/bokm/tools/The-Oslo-Bergen-Tagger/./tag-lbk.sh "$input" > "/home/arashsa/duo-bokmaal-obt/$name"
done
I'm terrible at writing shell scripts, and have not managed to found out how to iterate over files. What I want it is to make the script iterate over all files in a given folder that end with .txt and not those that end with _metadata.txt. So I'm thinking I would give it the folder path as argument, make it iterate over all the files in that folder, and run script on files ending with .txt and not _metadata.txt

Use find and the exec option.
$ find /path/to/dir -exec <command here> \;
Each file or directory can be obtained by using {}.
Example usage: $ find . -exec echo {} \;, this will echo each file name recursively or directory name in the current directory. You can use some other options to further specify the desired files and directories you wish to handle. I will briefly explain some of them. Note that the echo is redundant because the output of find will automatically print but I'll leave it there to illustrate the working of exec. This being said, following commands yield the same result: $ find . -exec echo {} \; and $ find .
maxdepth and mindepth
Specifying the maxdepth and mindepth allows you to go as deep down the directory structure as you like. Maxdepth determines how many times find will enter a directory and mindepth determines how many times a directory should be entered before selecting a file or dir.
Example usages:
(1) listing only elements from this dir, including . (= current dir).
(2) listing only elements from current dir excluding .
(3) listing elements from root dir and all dirs in this dir
(1)$ find . -maxdepth 1 -exec echo {} \;
(2)$ find . -mindepth 1 -maxdepth 1 -exec echo {} \;
# or, alternatively
(2)$ find . ! -path . -maxdepth 1 -exec echo {} \;
(3)$ find / -maxdepth 2 -exec echo {} \;
type
Specifying a type option allows you to filter files or directories only, example usage:
(1) list all files in this dir
(2) call shell script funtion func on every directory in the root dir.
(1)$ find . -maxdepth 1 -type f -exec echo {} \;
(2)$ find / -maxdepth 1 -type d -exec func {} \;
name & regex
The name option allows you to search for specific filenames, you can also look for files and dirs using a regex format.
Example usage: find all movies in a certain directory
$ find /path/to/dir -maxdepth 1 -regextype sed -regex ".*\.\(avi\|mp4\|mkv\)"
size
Another filter is the file size, any file or dir greater than this value will be returned. Example usage:
(1) find all empty files in current dir.
(2) find all non empty files in current dir.
(1)$ find . -maxdepth 1 -type f -size 0
(2)$ find . -maxdepth 1 -type f ! -size 0
Further examples
Move all files of this dir to a directory tmp present in .
$ find . -type f -maxdepth 1 -exec mv {} tmp \;
Convert all mkv files to mp4 files in a dir /path/to/dir and child directories
$ find /path/to/dir -maxdepth 2 -regextype sed -regex ".*\.mkv" -exec ffmpeg -i {} -o {}.mp4 \;
Convert all your jpeg files to png (don't do this, it will take very long to both find them and convert them).
$ find ~ -maxdepth 420 -regextype sed -regex '.*\.jpeg' -exec mogrify -format png {} \;
Note
The find command is a strong tool and it can prove to be fruitful to pipe the output to xargs. It's important to note that this method is superior to the following construction:
for file in $(ls)
do
some commands
done,
as the latter will handle files and directories containing spaces the wrong way.

In bash:
shopt -s extglob
for input in /dir/goes/here/*!(_metadata).txt
do
...
done

Related

Find Directory Name and Save as Variable

I'm trying to find a directory name and save it as a variable. In this case, this directory will always start with the character "2" and be the only such directory in its parent that starts with a "2". I'm trying to do the following but missing something:
#!/bin/bash
existing_dir=find $PARENT_DIR -maxdepth 1 -type d "2*"
rm -r $PARENT_DIR/$existing_dir
mkdir $PARENT_DIR/$((existing_dir+1))
#do stuff in new directory
In particular, I'm trying to grab that number that starts with the "2" (the directory name will always be only numerals), not the full path. Any help would be appreciated!
use basename to fetch only file name but no full path.
Then combine it with find when using -exec.
-exec basename {} \;
{} is the placeholder to pass a single file name to -exec called command, and \; is to finish -exec.
You had wrong usage of find. Here shows right style.
find $PARENT_DIR -maxdepth 1 -type d -name '2*' -exec basename {} \;
The whole command is generally equivalent to
for f in `find $PARENT_DIR -maxdepth 1 -type d -name '2*'`;
do basename ${f};
done
Sum up, you should correct it by using this.
existing_dir=`find . -maxdepth 1 -type d -name '2*' -exec basename {} \;`
Here is how to do it using bash only features:
#!/usr/bin/env bash
parent_dir='/define/some/path/here'
# Capture directories matching the 2* pattern into array
matching_dirs=("$parent_dir/2"*/)
# Regex match captures the numerical leaf directory name
# from the first mach in the matching_dirs array
[[ "${matching_dirs[0]}" =~ ([[:digit:]]+)/$ ]] || exit 1
existing_dir="${BASH_REMATCH[1]}"
new_dir="$((existing_dir + 1))"
# If could create new directory
if mkdir -p -- "${parent_dir:?}/${new_dir}"; then
# Deletes old directory recursively
# #see: https://github.com/koalaman/shellcheck/wiki/SC2115
rm -r -- "${parent_dir:?}/${existing_dir}"
fi

How can I diff two directories in bash recursively for only 1 file name?

Currently I am trying this:
diff -r /develop /us-prod
which shows all the differences between the two, but all I really care about here is a file named schema.json, which is guaranteed to be there in all directories, but this file can be different.
I want to diff these two directories, but only if the file name is schema.json.
I see that you can do -x to exclude files, but it is difficult to say which other files could be in there.
There are some guaranteed files to be there, but some are not. Is there more an "inclusion" than an exclude?
You can try this :
find /develop -type f -name schema.json -exec bash -c\
'diff "$1" "/us-prod${1#/develop}"' _ {} \;
Assuming the both directories have just one schema.json file for each directory
including their subdirectories, would you please try:
diff $(find /develop -type f -name schema.json) $(find /us-prod -type f -name schema.json)

"dir/*: No such file or directory" with find -exec ... "{}/*"

The current directory contains files and directories. The directories have no sub-directories, but may contain zero or more files, for example:
./file1
./file2
./directory1/file3
./directory2/file4
./directory2/file5
./directory3/
When I execute find . -type d -maxdepth 1 I get a listing of the directories:
./directory1
./directory2
If I execute mv ./directory1/* . all files in directory1 are moved to the current level . so I thought I could use find -exec to do everything in one go:
find . -type d -maxdepth 1 -exec mv "{}/*" . \;
But I get this response:
mv: rename ./directory1/* to ./*: No such file or directory
How can I move all the files in subdirectories to the current level?
Globbing (replacing foo/* with foo/dirA, foo/dirB, etc) is performed by the shell, not by mv. find -exec doesn't start a shell unless you do so manually; for example:
find . -type d -mindepth 1 -maxdepth 1 \
-exec sh -c 'for dir; do mv -- "$dir"/* .; done' _ {} +
There's no real need to use find. You can do it with a single mv to move the files and rmdir to remove the now-empty directories.
mv */* .
rmdir */

Bash script for removing specific file from certain subdirectories

On a unix server, I'm trying to figure out how to remove a file, say "example.xls", from any subdirectories that start with v0 ("v0*").
I have tried something like:
find . -name "v0*" -type d -exec find . -name "example.xls" -type f
-exec rm {} \;
But i get errors. I have a solution but it works too well, i.e. it will delete the file in any subdirectory, regardless of it's name:
find . -type f -name "example.xls" -exec rm -f {} \;
Any ideas?
You will probably have to do it in two steps -- i.e. first find the directories, and then the files -- you can use xargs to make it in a single line, like
find . -name "v0*" -type d | \
xargs -l -I[] \
find [] -name "example.xls" -type f -exec rm {} \;
what it does, is first generating a list of viable directory name, and let xargs call the second find with the names locating the file name within that directory
Try:
find -path '*/v0*/example.xls' -delete
This matches only files named example.xls which, somewhere in its path, has a parent directory name that starts with v0.
Note that since find offers -delete as an action, it is not necessary to invoke the external executable rm.
Example
Consider this directory structure:
$ find .
.
./a
./a/example.xls
./a/v0
./a/v0/b
./a/v0/b/example.xls
./a/v0/example.xls
We can identify files example.xls who have one of their parent directories named v0*:
$ find -path '*/v0*/example.xls'
./a/v0/b/example.xls
./a/v0/example.xls
To delete those files:
find -path '*/v0*/example.xls' -delete
Alternative: find only those files directly under directory v0*
find -regex '.*/v0[^/]*/example.xls'
Using the above directory structure, this approach returns one file:
$ find -regex '.*/v0[^/]*/example.xls'
./a/v0/example.xls
To delete such files:
find -regex '.*/v0[^/]*/example.xls' -delete
Compatibility
Although my tests were performed with GNU find, both -regex and -path are required by POSIX and also supported by OSX.

How to copy files recursively, rename them but keep the same extension in Bash?

I have a folder with tens of thousands of different file types. Id like to copy them all to a new folder (Copy1) but also rename them all to $RANDOM but keep the extension intact. I realize I can write a line specifying which extension to find and how to name it, but there is got to be a way to do it dynamically, because there are at least 100 file types and may be more in the future.
I have the following so far:
find ./ -name '*.*' -type f -exec bash -c 'cp "$1" "${1/\/123_//_$RANDOM}"' -- {} \;
but that puts the random number after the extension, and also it puts the all in the same folder. I cant figure out how to do the following 2 things:
1 - Keep all paths intact, but in a new root folder (Copy1)
2 - How to have name be $RANDOM.extension, instead of .extension.$RANDOM
PS - by $RANDOM i mean actual randomly generated number. I am interested in keeping folder structure, so we are dealing with a few hundred files at most per directory, but all directories/files need to be renamed to $RANDOM. Another way to look at what I need to do. Copy all contents or Folder1 with all subdirectories and files to Folder2 (where Fodler2 is a $RANDOM name), then rename all folders and files to random names but keep all extensions.
EDIT: Ok i figured out how to rename and keep extension. But I have a problem where its dumping all of the files into the root directory where script is run from. How do I keep them in their respective folders? Command Im using is:
find ./ -name '*.*' -type f -exec bash -c 'mv "$1" $RANDOM.${1##*.}' -- {} \;
Thanks!
Ok i figured out how to rename and keep extension. But I have a
problem where its dumping all of the files into the root directory
where script is run from. How do I keep them in their respective
folders? Command Im using is:
find ./ -name '*.*' -type f -exec bash -c 'mv "$1" $RANDOM.${1##*.}' -- {} \;
Change your command to:
PATH=/bin:/usr/bin find . -name '*.*' -type f -execdir bash -c 'mv "$1" $RANDOM.${1##*.}' -- {} \;
Or alternatively using uuids instead of random numbers:
PATH=/bin:/usr/bin find . -name '*.*' -type f -execdir bash -c 'mv "$1" $(uuidgen).${1##*.}' -- {} \;
Here's what I came up with :
i=1
random="whatever"
find . -name "*.*" -type f | while read f
do
newbase=${f/*./$random$i.} //added counter to filename
cp $f /Path/Name/"$newbase"
((i++))
done
I had to add a counter to random (i), otherwise, if the extensions are similar, your files would overwrite themselves when copied.
In your new folder, your files should look like this :
whatever1.txt
whatever2.txt
etc etc
I hope this is what you were looking for.
Here is the command that worked for me.
find . -name '*.pdf' -type f -exec bash -c 'echo "{}" && cp "$1" ./$RANDOM.${1##*.}' -- {} \;

Resources