Using find command in bash script

Using find command in bash script - bash

I just start to use bash script and i need to use find command with more than one file type.
list=$(find /home/user/Desktop -name '*.pdf')
this code work for pdf type but i want to search more than one file type like .txt or .bmp together.Have you any idea ?

Welcome to bash. It's an old, dark and mysterious thing, capable of great magic. :-)
The option you're asking about is for the find command though, not for bash. From your command line, you can man find to see the options.
The one you're looking for is -o for "or":
list="$(find /home/user/Desktop -name '*.bmp' -o -name '*.txt')"
That said ... Don't do this. Storage like this may work for simple filenames, but as soon as you have to deal with special characters, like spaces and newlines, all bets are off. See ParsingLs for details.
$ touch 'one.txt' 'two three.txt' 'foo.bmp'
$ list="$(find . -name \*.txt -o -name \*.bmp -type f)"
$ for file in $list; do if [ ! -f "$file" ]; then echo "MISSING: $file"; fi; done
MISSING: ./two
MISSING: three.txt
Pathname expansion (globbing) provides a much better/safer way to keep track of files. Then you can also use bash arrays:
$ a=( *.txt *.bmp )
$ declare -p a
declare -a a=([0]="one.txt" [1]="two three.txt" [2]="foo.bmp")
$ for file in "${a[#]}"; do ls -l "$file"; done
-rw-r--r-- 1 ghoti staff 0 24 May 16:27 one.txt
-rw-r--r-- 1 ghoti staff 0 24 May 16:27 two three.txt
-rw-r--r-- 1 ghoti staff 0 24 May 16:27 foo.bmp
The Bash FAQ has lots of other excellent tips about programming in bash.

If you want to loop over what you "find", you should use this:
find . -type f -name '*.*' -print0 | while IFS= read -r -d '' file; do
printf '%s\n' "$file"
done
Source: https://askubuntu.com/questions/343727/filenames-with-spaces-breaking-for-loop-find-command

You can use this:
list=$(find /home/user/Desktop -name '*.pdf' -o -name '*.txt' -o -name '*.bmp')
Besides, you might want to use -iname instead of -name to catch files with ".PDF" (upper-case) extension as well.

Related

How to get file count and names in directory on bash

I want to get the file count & file names & folder names in directory:
mkdir -p /tmp/test/folder1
mkdir -p /tmp/test/folder2
touch /tmp/test/file1
touch /tmp/test/file2
file_names=$(find "/tmp/test" -mindepth 1 -maxdepth 1 -type f -print0 | xargs -0 -I {} basename "{}")
echo $file_names
here is the output:
file2 file1
For folder:
folder_names=$(find "/tmp/test" -mindepth 1 -maxdepth 1 -type d -print0 | xargs -0 -I {} basename "{}")
echo $folder_names
here is the output:
folder2 folder1
For count:
file_count=0 && $(find "/tmp/test" -mindepth 1 -maxdepth 1 -type f -print0 | let "file_count=file_count+1")
echo $file_count
folder_count=0 && $(find "/tmp/test" -mindepth 1 -maxdepth 1 -type d -print0 | let "folder_count=folder_count+1")
echo $folder_count
The file_count and folder_count does not work
Question 1:
How to get the correct file_count and folder_count?
Question 2:
Is it possible for getting names into an array and check the count from array size?

The answer to the second question is really the answer to the first, too.
mapfile -d '' files < <( find /tmp/test -type f \
-mindepth 1 -maxdepth 1 \
-printf '%f\0')
echo "${#files} files"
printf '%s\n' "${files[#]}"
The use of double quotes and # in the array expansion are essential for printing file names with whitespace correctly. The use of a null byte terminator between file names ensures that even newlines in file names are disambiguated.
Notice also the use of -printf with a specific format string to avoid having to run basename separately. However, the -printf option and its various format strings, as well as the -print0 option you used, are a GNU find extension, and thus not portable. (Linux typically ships with GNU tools; on other platforms, they are obviously easy to install, but typically not installed out of the box.)
If you have an older version of Bash which doesn't support mapfiles, try an explicit loop:
files=()
while IFS= read -r -d $'\0' file; do
files+=("$file")
done < <(find ...)
If you don't have GNU find, a common workaround is to print a fixed string for each found file, and then the line or character count reliably reflects the number of found files.
find /tmp/test -type f \
-mindepth 1 -maxdepth 1 \
-exec printf . \; |
wc -c
Though then, how do you collect the file names? If (as in your case) you don't require recursion into subdirectories, simply loop over all items in the directory.
In which case, again, the number of items in the collected array will also tell you how many there are.
files=()
dirs=()
for item in /tmp/test/*; do
if [[ -f "$item"]]; then
files+=("$item")
elif [[ -d "$item" ]]; then
dirs+=("$item")
fi
done
echo "${#dirs[#] directories}
printf '- %s\n' "${dirs[#]}"
echo "${#files[#]} files"
printf '%s\n' "${dirs[#]}"
For a further discussion, see also https://mywiki.wooledge.org/BashFAQ/020
Needlessly collecting items into an array so you can loop over it once is a common beginner antipattern. If you just want to process each item in isolation and then forget it, there is no need to waste memory on remembering all the other items - just loop over them directly.
As an aside, running find in a subprocess will create a new shell instance with its own set of variables; thus in your attempt, the pipe to let would increment from 0 to 1 each time you ran it (though of course, piping to let also does not do anything useful anyway).

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks

Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]

Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.

For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.

Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

How to copy the latest file modified with in a given date using unix shell commands?

I have to write a shell script that should copy the latest file in to a target directory. I use following shell command.
find . -type f -daystart -mtime -$dateoffset
It gives me the latest file set. But i need to get the latest file from that list and copy it to a target directory.
Thanks.

I can't think of a way to do this in Bourne shell, since you need to use a tool that actually reads datestamps and sorts them, and Bourne shell doesn't do that.
But here's a solution in PHP:
<?php
$fdate=array();
foreach(glob("*") as $filename)
$fdate[filemtime($filename)]=$filename;
krsort($fdate);
print "Newest item: " . reset($fdate) . "\n";'
?>
And if you hapen to be using bash instead of Bourne, he's a round-about way of getting what you want using an associative array:
#!/usr/local/bin/bash
declare -A fdate
highest=0
for file in *; do
timestamp=$(stat -f '%m' "$file")
fdate[$timestamp]="$file"
if [ "$timestamp" -gt "$highest" ]; then
highest=$timestamp
fi
done
printf "Newest file: %s\n" "${fdate[$highest]}"
Note that I'm using FreeBSD, so this solution will also work in OSX, but if you happen to be using Linux, you'll need to figure out how your implementation of the stat command differs from mine. (Hint: you may be able to use stat -c '%y', but man stat to be sure. Solaris, HP/UX, OSF/1, etc do not seem to include a stat binary that can be called from your shell.)

Update: #ghoti's neat solution is recommended over this one. The following has been proved nonrobust. It is left here only because, as a partial answer, it might point the way toward a better one-line solution.
ls -1dt $(find . -type f -daystart -mtime -$dateoffset) | head -n1
To copy the file to $TARGET_DIR,
A=$(ls -1dt $(find . -type f -daystart -mtime -$dateoffset) | head -n1)
if [ -n "$A" ] cp -u "$A" "$TARGET_DIR/$(basename $A)"

find . -name "*" -type f -daystart -mtime -$dateoffset | xargs -i mv {} /where/to/put/files
or
mv `find . -name "*" -type f -daystart -mtime -$dateoffset` /where/to/put/files

If you use ls command with -lt option then it will give you newest file at top.
So using this you can easily extract latest file name

You can use something like this:
find . -type f -name "*" -mtime +x_NUMBER_OF_DAYS|ls -lrt|awk -F' ' '{print $(COLUMN_NUMBER_IN_WHICH_FILE_NAME_APPEARS)}'|tail -1
This will give you the latest file till a given date.
As somebody suggested above daystart is only present in GNU flavor of find while -mtime is a more generic command.
P.S.: This again suffers from parsing problem if file name has space in it.But till we come-up with something more creative you can use this!

How to go to each directory and execute a command?

How do I write a bash script that goes through each directory inside a parent_directory and executes a command in each directory.
The directory structure is as follows:
parent_directory (name could be anything - doesnt follow a pattern)
001 (directory names follow this pattern)
0001.txt (filenames follow this pattern)
0002.txt
0003.txt
002
0001.txt
0002.txt
0003.txt
0004.txt
003
0001.txt
the number of directories is unknown.

This answer posted by Todd helped me.
find . -maxdepth 1 -type d \( ! -name . \) -exec bash -c "cd '{}' && pwd" \;
The \( ! -name . \) avoids executing the command in current directory.

You can do the following, when your current directory is parent_directory:
for d in [0-9][0-9][0-9]
do
( cd "$d" && your-command-here )
done
The ( and ) create a subshell, so the current directory isn't changed in the main script.

You can achieve this by piping and then using xargs. The catch is you need to use the -I flag which will replace the substring in your bash command with the substring passed by each of the xargs.
ls -d */ | xargs -I {} bash -c "cd '{}' && pwd"
You may want to replace pwd with whatever command you want to execute in each directory.

If you're using GNU find, you can try -execdir parameter, e.g.:
find . -type d -execdir realpath "{}" ';'
or (as per #gniourf_gniourf comment):
find . -type d -execdir sh -c 'printf "%s/%s\n" "$PWD" "$0"' {} \;
Note: You can use ${0#./} instead of $0 to fix ./ in the front.
or more practical example:
find . -name .git -type d -execdir git pull -v ';'
If you want to include the current directory, it's even simpler by using -exec:
find . -type d -exec sh -c 'cd -P -- "{}" && pwd -P' \;
or using xargs:
find . -type d -print0 | xargs -0 -L1 sh -c 'cd "$0" && pwd && echo Do stuff'
Or similar example suggested by #gniourf_gniourf:
find . -type d -print0 | while IFS= read -r -d '' file; do
# ...
done
The above examples support directories with spaces in their name.
Or by assigning into bash array:
dirs=($(find . -type d))
for dir in "${dirs[#]}"; do
cd "$dir"
echo $PWD
done
Change . to your specific folder name. If you don't need to run recursively, you can use: dirs=(*) instead. The above example doesn't support directories with spaces in the name.
So as #gniourf_gniourf suggested, the only proper way to put the output of find in an array without using an explicit loop will be available in Bash 4.4 with:
mapfile -t -d '' dirs < <(find . -type d -print0)
Or not a recommended way (which involves parsing of ls):
ls -d */ | awk '{print $NF}' | xargs -n1 sh -c 'cd $0 && pwd && echo Do stuff'
The above example would ignore the current dir (as requested by OP), but it'll break on names with the spaces.
See also:
Bash: for each directory at SO
How to enter every directory in current path and execute script? at SE Ubuntu

If the toplevel folder is known you can just write something like this:
for dir in `ls $YOUR_TOP_LEVEL_FOLDER`;
do
for subdir in `ls $YOUR_TOP_LEVEL_FOLDER/$dir`;
do
$(PLAY AS MUCH AS YOU WANT);
done
done
On the $(PLAY AS MUCH AS YOU WANT); you can put as much code as you want.
Note that I didn't "cd" on any directory.
Cheers,

for dir in PARENT/*
do
test -d "$dir" || continue
# Do something with $dir...
done

While one liners are good for quick and dirty usage, I prefer below more verbose version for writing scripts. This is the template I use which takes care of many edge cases and allows you to write more complex code to execute on a folder. You can write your bash code in the function dir_command. Below, dir_coomand implements tagging each repository in git as an example. Rest of the script calls dir_command for each folder in directory. The example of iterating through only given set of folder is also include.
#!/bin/bash
#Use set -x if you want to echo each command while getting executed
#set -x
#Save current directory so we can restore it later
cur=$PWD
#Save command line arguments so functions can access it
args=("$#")
#Put your code in this function
#To access command line arguments use syntax ${args[1]} etc
function dir_command {
#This example command implements doing git status for folder
cd $1
echo "$(tput setaf 2)$1$(tput sgr 0)"
git tag -a ${args[0]} -m "${args[1]}"
git push --tags
cd ..
}
#This loop will go to each immediate child and execute dir_command
find . -maxdepth 1 -type d \( ! -name . \) | while read dir; do
dir_command "$dir/"
done
#This example loop only loops through give set of folders
declare -a dirs=("dir1" "dir2" "dir3")
for dir in "${dirs[#]}"; do
dir_command "$dir/"
done
#Restore the folder
cd "$cur"

I don't get the point with the formating of the file, since you only want to iterate through folders... Are you looking for something like this?
cd parent
find . -type d | while read d; do
ls $d/
done

you can use
find .
to search all files/dirs in the current directory recurive
Than you can pipe the output the xargs command like so
find . | xargs 'command here'

#!/bin.bash
for folder_to_go in $(find . -mindepth 1 -maxdepth 1 -type d \( -name "*" \) ) ;
# you can add pattern insted of * , here it goes to any folder
#-mindepth / maxdepth 1 means one folder depth
do
cd $folder_to_go
echo $folder_to_go "########################################## "
whatever you want to do is here
cd ../ # if maxdepth/mindepath = 2, cd ../../
done
#you can try adding many internal for loops with many patterns, this will sneak anywhere you want

You could run sequence of commands in each folder in 1 line like:
for d in PARENT_FOLDER/*; do (cd "$d" && tar -cvzf $d.tar.gz *.*)); done

for p in [0-9][0-9][0-9];do
(
cd $p
for f in [0-9][0-9][0-9][0-9]*.txt;do
ls $f; # Your operands
done
)
done

Bash: recursively copy and rename files

I have a lot of files whose names end with '_100.jpg'. They spread in nested folder / sub-folders. Now I want a trick to recursively copy and rename all of them to have a suffix of '_crop.jpg'. Unfortunately I'm not familiar with bash scripting so don't know the exact way to do this thing. I googled and tried the 'find' command with the '-exec' para but with no luck.
Plz help me. Thanks.

find bar -iname "*_100.jpg" -printf 'mv %p %p\n' \
| sed 's/_100\.jpg$/_crop\.jpg/' \
| while read l; do eval $l; done

if you have bash 4
shopt -s globstar
for file in **/*_100.jpg; do
echo mv "$file" "${file/_100.jpg/_crop.jpg}"
one
or using find
find . -type f -iname "*_100.jpg" | while read -r FILE
do
echo mv "${FILE}" "${FILE/_100.jpg/_crop.jpg}"
done

This uses a Perl script that you may have already on your system. It's sometimes called prename instead of rename:
find /dir/to/start -type f -iname "*_100.jpg" -exec rename 's/_100/_crop' {} \;
You can make the regexes more robust if you need to protect filenames that have "_100" repeated or in parts of the name you don't want changed.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio