Find all files recursively which are not in an exclude file

Find all files recursively which are not in an exclude file - bash

I would like to get all the files from a directory which have a pattern and are not in a .ignore file.
I've tried this command :
find . -name '*.js' | grep -Fxv .ignore
but find output is like ./directory/file.js and the format in my .ignore is the following:
*.min.js
directory/directory2/*
directory/file_56.js
So grep does not match any...
Does anyone has an idea/clue of how to do this?
Update
So i've found some things but it's not completely working:
find . -name '*.js' -type f $(printf "! -name %s " $(cat .ignore | sed 's/\//\\/g')) | # keeps the path
sed 's/^\.\///' | # deleting './'
grep -Fxvf .ignore
It works (not showing) for *.min.js and directory/file_56.js but not for directory/directory2/*

It looks like you're looking for a subset of the functionality supported by Git's .gitignore file:
args=()
while read -r pattern; do
[[ ${#args[#]} -gt 0 ]] && args+=( '-o' )
[[ $pattern == */* ]] && args+=( -path "./$pattern" ) || args+=( -name "$pattern" )
done < .ignore
find . -name '*.js' ! \( "${args[#]}" \)
The exclusion tests for find are built up in a Bash array first, which allows applying line-specific logic:
Note how a -path or -name test is used, depending on whether the pattern at hand from .ignore contains at least one / or not:
Patterns for -path tests are prefixed with ./ to match the paths output by find.
Patterns for -name are left as-is; patterns for *.min.js will match anywhere in the subtree.
With your sample .ignore file, the above results in the following find command:
find . -name '*.js' ! \( \
-name '*.min.js' -o -path './directory/directory2/*' -o -path './directory/file_56.js' \
\)

Related

Keep latest pair of files and move older files to another (Unix)

For example I have following files in a directory
FILE_1_2021-01-01.csum
FILE_1_2021-01-01.csv
FILE_1_2021-01-02.csum
FILE_1_2021-01-02.csv
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
I want to keep FILE_1_2021-01-03.csum and FILE_1_2021-01-03.csv in current directory but zip and move rest of the older files to another directory.
So far I have tried like this but stuck how to correctly identify the pairs
file_count=0
PATH=/path/to/dir
ARCH=/path/to/dir
for file in ${PATH}/*
do
if [[ ! -d $file ]]
then
file_count=$(($file_count+1))
fi
done
echo "file count $file_count"
if [ $file_count -gt 2 ]
then
echo "moving old files to $ARCH"
// How to do it
fi

Since the timestamps are in a format that naturally sorts out with earliest first, newest last, an easy approach is to just use filename expansion to store the .csv and .csum filenames in a pair of arrays, and then do something with all but the last element of both:
declare -a csv=( FILE_*.csv ) csum=( FILE_*.csum )
mv "${csv[#]:0:${#csv[#]}-1}" "${csum[#]:0:${#csum[#]}-1}" new_directory/
(Or tar them up first, or whatever.)

First off ...
it's bad practice to use all uppercase variables as these can clash with OS-level variables (also all uppercase); case in point ...
PATH is a OS-level variable for keeping track of where to locate binaries but in this case ...
OP has just wiped out the OS-level variable with the assignment PATH=/path/to/dir
As for the question, some assumptions:
each *.csv file has a matching *.csum file
the 2 files to 'keep' can be determined from the first 2 lines of output resulting from a reverse sort of the filenames
not sure what OP means by 'zip and move' (eg, zip? gzip? tar all old files into a single .tar and then (g)zip?) so for the sake of this answer I'm going to just gzip each file and move to a new directory (OP can adjust the code to fit the actual requirement)
Setup:
srcdir='/tmp/myfiles'
arcdir='/tmp/archive'
rm -rf "${srcdir}" "${arcdir}"
mkdir -p "${srcdir}" "${arcdir}"
cd "${srcdir}"
touch FILE_1_2021-01-0{1..3}.{csum,csv} abc XYZ
ls -1
FILE_1_2021-01-01.csum
FILE_1_2021-01-01.csv
FILE_1_2021-01-02.csum
FILE_1_2021-01-02.csv
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
XYZ
abc
Get list of *.csum/*.csv files and sort in reverse order:
$ find . -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r
/tmp/myfiles/FILE_1_2021-01-03.csv
/tmp/myfiles/FILE_1_2021-01-03.csum
/tmp/myfiles/FILE_1_2021-01-02.csv
/tmp/myfiles/FILE_1_2021-01-02.csum
/tmp/myfiles/FILE_1_2021-01-01.csv
/tmp/myfiles/FILE_1_2021-01-01.csum
Eliminate first 2 files (ie, generate list of files to zip/move):
$ find "${srcdir}" -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r | tail +3
/tmp/myfiles/FILE_1_2021-01-02.csv
/tmp/myfiles/FILE_1_2021-01-02.csum
/tmp/myfiles/FILE_1_2021-01-01.csv
/tmp/myfiles/FILE_1_2021-01-01.csum
Process our list of files:
while read -r fname
do
gzip "${fname}"
mv "${fname}".gz "${arcdir}"
done < <(find "${srcdir}" -maxdepth 1 -type f \( -name '*.csum' -o -name '*.csv' \) | sort -r | tail +3)
NOTE: the find|sort|tail results could be piped to xargs (or parallel) to perform the gzip/mv operations but without more details on what OP means by 'zip and move' I've opted for a simpler, albeit less performant, while loop
Results:
$ ls -1 "${srcdir}"
FILE_1_2021-01-03.csum
FILE_1_2021-01-03.csv
XYZ
abc
$ ls -1 "${arcdir}"
FILE_1_2021-01-01.csum.gz
FILE_1_2021-01-01.csv.gz
FILE_1_2021-01-02.csum.gz
FILE_1_2021-01-02.csv.gz

Your algorithm of counting files can be simplified using find. You seem to look for non-directories. The option -not -type d does exactly that. By default find searches into the subfolders, so you need to pass -maxdepth 1 to limit the search to a depth of 1.
find "$PATH" -maxdepth 1 -not -type d
If you want to get the number of files, you may pipe the command to wc:
file_count=$(find "$PATH" -maxdepth 1 -not -type d | wc -l)
Now there are two ways of detecting which file is the more recent: by looking at the filename, or by looking at the date when the files were last created/modified/etc. Since your naming convention looks pretty solid, I would recommend the first option. Sorting by creation/modification date is more complex and there are numerous cases where this information is not reliable, such as copying files, zipping/unzipping them, touching files, etc.
You can sort with sort and then grab the last element with tail -1:
find "$PATH" -maxdepth 1 -not -type d | sort | tail -1
You can do the same thing by sorting in reverse order using sort -r and then grab the first element with head -1. From a functional point of view, it is strictly equivalent, but it is slightly faster because it stops at the first result instead of parsing all results. Plus it will be more relevant later on.
find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1
Once you have the filename of the most recent file, you can extract the base name in order to create a pattern out of it.
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1)
most_recent_file=${most_recent_file%.*}
most_recent_file=${most_recent_file##*/}
Let’s explain this:
first, we grab the filename into a variable called most_recent_file
then we remove the extension using ${most_recent_file%.*} ; the % symbol will cut at the end, and .* will cut everything after the last dot, including the dot itself
finally, we remove the folder using ${most_recent_file##*/} ; the ## symbol will cut at the beginning with a greedy catch, and */ will cut everything before the last slash, including the slash itself
The difference between # and ## is how greedy the pattern is. If your file is /path/to/file.csv then ${most_recent_file#*/} (single #) will cut the first slash only, i.e. it will output path/to/file.csv, while ${most_recent_file##*/} (double #) will cut all paths, i.e. it will output file.csv.
Once you have this string, you can make a pattern to include/exclude similar files using find.
find "$PATH" -maxdepth 1 -not -type d -name "$most_recent_file.*"
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*"
The first line will list all files which match your pattern, and the second line will list all files which do not match the pattern.
Since you want to move your 'old' files to a folder, you may execute a mv command for the last list.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -exec mv {} "$ARCH" \;
If your version of find supports it, you may use + in order to batch the move operations.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -exec mv -t "$ARCH" {} +
Otherwise you can pipe to xargs.
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" | xargs mv -t "$ARCH"
If put altogether:
file_count=0
PATH=/path/to/dir
ARCH=/path/to/dir
file_count=$(find "$PATH" -maxdepth 1 -not -type d | wc -l)
echo "file count $file_count"
if [ $file_count -gt 2 ]
then
echo "moving old files to $ARCH"
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d | sort -r | head -1)
most_recent_file=${most_recent_file%.*}
most_recent_file=${most_recent_file##*/}
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" | xargs mv -t "$ARCH"
fi
As a last note, if your path has newlines, it will not work. If you want to handle this case, you need a few modifications. Counting files would be done like this:
file_count=$(find "$PATH" -maxdepth 1 -not -type d -print . | wc -c)
Getting the most recent file:
most_recent_file=$(find "$PATH" -maxdepth 1 -not -type d -print0 | sort -rz | grep -zm1)
Moving files with xargs:
find "$PATH" -maxdepth 1 -not -type d -not -name "$most_recent_file.*" -print0 | xargs -0 mv -t "$ARCH"
(There’s no problem if moving files using -exec)
I won’t go into details, but just know that the issue is known and these are the kind of solutions you can apply if need be.

how to find identical files by names and at same location in 2 or more directories

I am trying to write a bash script that can give a list of identical files just by name between 2 or more directory locations
diff -srq Ear2.ear/ Ear1.ear/ | grep identical
but seems this is comparing contents as well.
I already have a file that has list of all the target directories I need to compare. However, I need to exclude certain sub-directores while comparing.

An array cross-section would be an interesting way to solve this.
$ mkdir tmp1 tmp2
$ touch tmp1/foo tmp1/bar tmp1/baz
$ touch tmp2/foo tmp2/bar tmp2/slurm
$ cd tmp1; a=( * ); cd -
$ cd tmp2; declare -A b; for f in *; do b[$f]=1; done; cd -
$ for x in "${a[#]}"; do [[ "${b[$x]}" ]] && echo "$x"; done
bar
foo
However, you mentioned that you "need to exclude certain sub-directores while comparing", and your diff includes -r, so you're looking to be selectively recursive.
To achieve this, I'd suggest using bash's globstar and then removing the parts you don't want. For example:
$ shopt -s globstar
$ a=( **/* )
$ for x in "${!a[#]}"; do [[ "${a[$x]}" = tmp3/* ]] && unset a[$x]; done
Note that globstar requires bash version 4.

This takes advantage of the find utility's -prune option to exclude directories:
comm -1 -2 <(cd $1; find . -name "*" -path "./folder1" -prune -o -print | sort) <(cd $2; find . -name "*" -path "./folder1" -prune -o -print | sort)
cd so that we don't include the parent directory in the output of find.
Run find with appropriate parameters to print all files, excluding given subfolders.
Pipe that into sort, so that we can
Use the comm utility via process substitution to show only the lines (aka file names) in common.
Basic Example:
I have the folder structure:
diffdir1/
file1.txt
file2.txt
uniqueTo1.txt
folder1/
file1.txt
folder2/
file1.txt
folderUniqueTo1/
file1.txt
diffdir2/
file1.txt
file2.txt
uniqueTo2.txt
folder1/
file1.txt
folder2/
file1.txt
(Contents do differ between the various file1.txts, although that's not checked here.) Using the above script, I get:
$ ./script.sh diffdir1 diffdir2
.
./file1.txt
./file2.txt
aka only the two files with the same names.
As a sanity check, if I remove the -path "./folder1" -prune -o -print part of the command, this should no longer exclude things under folder1:
$ ./script2.sh diffdir1 diffdir2
.
./file1.txt
./file2.txt
./folder1
./folder1/file1.txt
Using a file for the list of directories and such would just be a matter of modifying what goes into the different parameters of the find command.
Example: Exclude multiple subdirectories
This command will exclude the folders ./abc/xyz/obj64, ./abc/video, and ./sim:
comm -1 -2 <(cd $1; find . -name "*" \( -path "./abc/xyz/obj64" -o -path "./abc/video" -o -path "./sim" \) -prune -o -print | sort) <(cd $2; find . -name "*" \( -path "./abc/xyz/obj64" -o -path "./abc/video" -o -path "./sim" \) -prune -o -print | sort)
Note that the list of paths must be placed inside of \( \) parenthesis. -o means "or", so it's now checking if any of the paths match for pruning.
Example: Include only files matching a particular pattern
Expanding off of the previous example, now let's only return files matching a pattern. In this example I'll search for only files ending in .xml:
comm -1 -2 <(cd $1; find . \( -path "./abc/xyz/obj64" -o -path "./abc/video" -o -path "./sim" \) -prune -o -name "*.xml" -print | sort) <(cd $2; find . \( -path "./abc/xyz/obj64" -o -path "./abc/video" -o -path "./sim" \) -prune -o -name "*.xml" -print | sort)
The difference here is that the -name argument was moved to after pruning. This doesn't make a difference if you are searching for all files ("*"), but does matter when you have a pattern. So it's a good idea to put the -name at the end anyways in case you might change it later.

find option available to omit leading './' in result

I think this is probably a pretty n00ber question but I just gotsta ask it.
When I run:
$ find . -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.ogg" \)
and get:
./01.Adagio - Allegro Vivace.mp3
./03.Allegro Vivace.mp3
./02.Adagio.mp3
./04.Allegro Ma Non Troppo.mp3
why does find prepend a ./ to the file name? I am using this in a script:
fList=()
while read -r -d $'\0'; do
fList+=("$REPLY")
done < <(find . -type f \( -name "*.mp3" -o -name "*.ogg" \) -print0)
fConv "$fList" "$dBaseN"
and I have to use a bit of a hacky-sed-fix at the beginning of a for loop in function 'fConv', accessing the array elements, to remove the leading ./. Is there a find option that would simply omit the leading ./ in the first place?

The ./ at the beginning of the file is the path. The "." means current directory.
You can use "sed" to remove it.
find . -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.ogg" \) | sed 's|./||'
I do not recommend doing this though, since find can search through multiple directories, how would you know if the file found is located in the current directory?

If you ask it to search under /tmp, the results will be on the form /tmp/file:
$ find /tmp
/tmp
/tmp/.X0-lock
/tmp/.com.google.Chrome.cUkZfY
If you ask it to search under . (like you do), the results will be on the form ./file:
$ find .
.
./Documents
./.xmodmap
If you ask it to search through foo.mp3 and bar.ogg, the result will be on the form foo.mp3 and bar.ogg:
$ find *.mp3 *.ogg
click.ogg
slide.ogg
splat.ogg
However, this is just the default. With GNU and other modern finds, you can modify how to print the result. To always print just the last element:
find /foo -printf '%f\0'
If the result is /foo/bar/baz.mp3, this will result in baz.mp3.
To print the path relative to the argument under which it's found, you can use:
find /foo -printf '%P\0'
For /foo/bar/baz.mp3, this will show bar/baz.mp3.
However, you shouldn't be using find at all. This is a job for plain globs, as suggested by R Sahu.
shopt -s nullglob
files=(*.mp3 *.ogg)
echo "Converting ${files[*]}:"
fConv "${files[#]}"

find . -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.ogg" \) -exec basename "{}" \;
Having said that, I think you can use a simpler approach:
for file in *.mp3 *.ogg
do
if [[ -f $file ]]; then
# Use the file
fi
done

If your -maxdepth is 1, you can simply use ls:
$ ls *.mp3 *.ogg
Of course, that will pick up any directory with a *.mp3 or *.ogg suffix, but you probably don't have such a directory anyway.
Another is to munge your results:
$ find . -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.ogg" \) | sed 's#^\./##'
This will remove all ./ prefixes, but not touch other file names. Note the ^ anchor in the substitution command.

Bash Exclude dot-dash prefixes

Can't seem to crack this one.
I have a bash script to search a folder and exclude certain file types.
list=`find . -type f ! \( -name "*data.php" -o -name "*.log" -o -iname "._*" -o -path "*patch" \)`
I want to exclude files which start with dot-dash ._ but the above just refuses to work.
Here's some more of the script, but I am still getting files copied with start with ._
O/S is CentOS 5.3
list=`find . -type f ! \( -name "*data.php" -o -name "*.log" -o -iname "._*" -o -path "*patch" \)`
for a in $list; do
if [ ! -f "$OLDFOL$a" ]; then
cp --preserve=all --parents $a $UPGFOL
continue
fi
diff $a "$OLDFOL$a" > /dev/null
if [[ "$?" == "1" ]]; then
# exists & different so copy
cp --preserve=all --parents $a $UPGFOL
fi
done

First -- don't do it that way.
files="`find ...`"
splits names on whitespace, meaning that Some File becomes two files, Some and File. Even splitting on newlines is unsafe, as valid UNIX filenames can contain $'\n' (any character other than / and null is valid in a UNIX filename). Instead...
getfiles() {
find . -type f '!' '(' \
-name '*data.php' -o \
-name '*.log' -o \
-iname "._*" -o \
-path "*patch" ')' \
-print0
}
while IFS= read -r -d '' file; do
if [[ ! -e $orig_dir/$file ]] ; then
cp --preserve=all --parents "$file" "$dest_dir"
continue
fi
if ! cmp -q "$file" "$orig_dir/$file" ; then
cp --preserve=all --parents "$file" "$dest_dir"
fi
done < <(getfiles)
The above does a number of things right:
It is safe against filenames containing spaces or newlines.
It uses cmp -q, not diff. cmp exits immediately when a change is made, rather than needing to calculate the delta between two files, and is thus far faster.
Read BashFAQ #1, UsingFind, and BashPitfalls #1 to understand some of the differences between this and the original.
Also -- I've validated that this correctly excludes filenames which start with ._ -- but the original version did too. Perhaps what you really want is to exclude filenames matching *._* rather than ._*?

Fast recursive grepping of svn working copy [duplicate]

This question already has answers here:
Exclude .svn directories from grep [duplicate]
(11 answers)
Closed 6 years ago.
I need to search all cpp/h files in svn working copy for "foo", excluding svn's special folders completely. What is the exact command for GNU grep?

I use ack for this purpose, it's like grep but automatically knows how to exclude source control directories (among other useful things).

grep -ir --exclude-dir=.svn foo *
In the working directory will do.
Omit the 'i' if you want the search to be case sensitive.
If you want to check only .cpp and .h files use
grep -ir --include={.cpp,.h} --exclude-dir=.svn foo *

Going a little off-topic:
If you have a working copy with a lot of untracked files (i.e. not version-controlled) and you only want to search source controlled files, you can do
svn ls -R | xargs -d '\n' grep <string-to-search-for>

This is a RTFM. I typed 'man grep' and '/exclude' and got:
--exclude=GLOB
Skip files whose base name matches GLOB (using wildcard
matching). A file-name glob can use *, ?, and [...] as
wildcards, and \ to quote a wildcard or backslash character
literally.
--exclude-from=FILE
Skip files whose base name matches any of the file-name globs
read from FILE (using wildcard matching as described under
--exclude).
--exclude-dir=DIR
Exclude directories matching the pattern DIR from recursive
searches.

I wrote this script which I've added to my .bashrc. It automatically excludes SVN directories from grep, find and locate.

I use these bash aliases for grepping for content and files in svn trees... I find it faster and more pleasant to search from the commandline (and use vim for coding) rather than a GUI-based IDE:
s () {
local PATTERN=$1
local COLOR=$2
shift; shift;
local MOREFLAGS=$*
if ! test -n "$COLOR" ; then
# is stdout connected to terminal?
if test -t 1; then
COLOR=always
else
COLOR=none
fi
fi
find -L . \
-not \( -name .svn -a -prune \) \
-not \( -name templates_c -a -prune \) \
-not \( -name log -a -prune \) \
-not \( -name logs -a -prune \) \
-type f \
-not -name \*.swp \
-not -name \*.swo \
-not -name \*.obj \
-not -name \*.map \
-not -name access.log \
-not -name \*.gif \
-not -name \*.jpg \
-not -name \*.png \
-not -name \*.sql \
-not -name \*.js \
-exec grep -iIHn -E --color=${COLOR} ${MOREFLAGS} -e "${PATTERN}" \{\} \;
}
# s foo | less
sl () {
local PATTERN=$*
s "$PATTERN" always | less
}
# like s but only lists the files that match
smatch () {
local PATTERN=$1
s $PATTERN always -l
}
# recursive search (filenames) - find file
f () {
find -L . -not \( -name .svn -a -prune \) \( -type f -or -type d \) -name "$1"
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find all files recursively which are not in an exclude file - bash

Related

Keep latest pair of files and move older files to another (Unix)

how to find identical files by names and at same location in 2 or more directories

find option available to omit leading './' in result

Bash Exclude dot-dash prefixes

Fast recursive grepping of svn working copy [duplicate]

Categories

Resources