find files and exclude some files and a directory - bash

I want to find in a directory all files with extension .hs but exclude all files in a sub-directory sub and some other files with names containing test.
I read and experimented with the use of find and prune but did not understand the complex logic and none of my attempts worked.
The naive
find . -name "*.hs" -not -name '*sub*' -not -name "*test*"
nor
find . -name "*.hs" -not -path '/sub' -not -name "*test*"
does work. I assume there should be a simple solution to this (relatively) simple issue.
A solution that seems to work is
find . -name "*.hs" -not -name "*test*" | grep -v "sub"
which is simpler than using prune, but can certainly be improved?

Your first attempt excludes all files whose name includes sub.
Your second attempt excludes all files whose path is exactly /sub.
Combine the two to match all files whose path includes sub:
-not -path "*sub*"
However, -prune is the better solution because it skips the directory rather than fruitlessly matching every single entry in it.

Related

Exclude specific path from find's output

I searched inside my filesystem for files with a specific extension using find
find / -type f -name "*.click" 2>/dev/null
In my result I get a lot of files in a specific path which are not interesting for me like
/home/x/Dokumente/click/etc/samplepackage/test.click
/home/x/Dokumente/click/apps/csclient/test.click
/home/x/Dokumente/click/conf/test-device.click
/home/x/Dokumente/click/conf/ip6ndadvertiser.click
/home/x/Dokumente/click/conf/script-parabolawave.click
/home/x/Dokumente/click/conf/fake-iprouter.click
/home/x/Dokumente/click/conf/fromhost-tunnel.click
/home/x/Dokumente/click/conf/simple-dsdv-userlevel.click
/home/x/Dokumente/click/conf/sampler.click
/home/x/Dokumente/click/conf/script-trianglewave.click
/home/x/Dokumente/click/conf/script-squarewave.click
/home/x/Dokumente/click/conf/fastudpsrc.click
/home/x/Dokumente/click/conf/schedorder1.click
/home/x/Dokumente/click/conf/test-ping-userlevel.click
/home/x/Dokumente/click/conf/test-tcp.click
/home/x/Dokumente/click/conf/delay.click
/home/x/Dokumente/click/conf/icmp6error.click
/home/x/Dokumente/click/conf/webgen.click
/home/x/Dokumente/click/conf/test2.click
/home/x/Dokumente/click/conf/grid.click
/home/x/Dokumente/click/conf/thomer-nat.click
/home/x/Dokumente/click/conf/gnat02.click
/home/x/Dokumente/click/conf/test-clicky.click
/home/x/Dokumente/click/conf/ip64-nat3.click
/home/x/Dokumente/click/conf/mazu-nat.click
/home/x/Dokumente/click/conf/ip6print.click
/home/x/Dokumente/click/conf/gnat01.click
/home/x/Dokumente/click/conf/ip601.click
In this article I found an answer and tried like this below
find / -type f -name "*.click" -not -path "./home/ipg7/Dokumente/click/conf" 2>/dev/null
But it doesn't work I still get the output as before.
So how can I exclude a specific path and also write all the permission denied to /dev/null?
You've added -not -path "./home/ipg7/Dokumente/click/conf" which does not match anything in your output due to
starting with a dot,
missing the pattern that matches the file part
Try -not -path "/home/ipg7/Dokumente/click/conf/*" instead
You have to use globbing in path to ignore all matches from a given path and use 2>/dev/null to ignore error:
find / -type f -name "*.click" -not -path "./home/ipg7/Dokumente/click/conf/*" 2>/dev/null

How to exclude subdirectories of a specific directory from find command?

I'm trying to get a list of files which I can pipe to wc -l to get a word count of all of them (not using wc directly so I can filter the file list before using the command).
My directory structure is something like this:
- folder
- file.php
- file2.html
- file3.php
- folder1
- folder2a
- folder3b
- folder4
- file.php
- file2.php
I'd like to exclude certain directories in my find, largely libraries and other stuff that I didn't make. I can do that manually like so:
find /var/www/html/ -type f -not -path "/var/www/html/folder/folder1" -not -path "/var/www/html/folder/folder2a" etc.
However, it's being annoying to have to explicitly specify all the folders, and the list could change at any point, too. I've tried using /* and /** to pattern match but that doesn't work, either. Is there a way for one of these "not"s in my find command that I can exclude all the subdirectories of a particular directory, but not exclude that directory itself? (include its files, but not any of its subdirectories)?
Here's an intuitive guess:
find /var/www/html -not -path '/var/www/html/someotherbadfolder' -type f \( ! -path "/var/www/html/folder" -maxdepth 1 \)
But even find complains about this:
find: warning: you have specified the -maxdepth option after a non-option argument -not, but options are not positional (-maxdepth affects tests specified before it as well as those specified after it). Please specify options before other arguments.
So it seems maxdepth is incapable of being combined in an operation.
There's lots of Q&A about excluding specific subdirectories, but not generically any subdirectories in a particular subdirectory.
I was able to get it to work in a single directory with -maxdepth 1, but the problem is this is an exclusion part of a larger command, and that didn't work once I ran the full command. Potentially, I might need to exclude specific subdirectories as well as any subdirectories in several other specific subdirectories.
Assuming you're specifically looking for files (i.e. not directories):
find /var/www/html -type f -not -path "/var/www/html/folder/*/*"
That's because:
files directly under /var/www/html/folder aren't directories so they don't match the -path clause.
directories directly under /var/www/html/folder don't match -type f.
files under subdirectories of /var/www/html/folder has to have the extra / in the path, so they match the -path expression.
Just with find:
find /var/www/html -type f -not -path '/var/www/html/folder/*/*'
Original answer:
One hack could be grep -v on the output of find:
find /var/www/html/ -type f | grep -v "/var/www/html/folder/.*/" | wc -l

In Bash, how do you delete all files with same name, except the one located in a specific folder?

I have a specific file which is found in several directories. Usually I delete all of them by using the syntax:
find . -name "<Filename>" -delete
However, I want to retain one file from a specific folder, say FOLDER1.
How do I do this using find? (I want to use find because I use -print before -delete to check what files I am deleting. I am apprehensive on using rm since there is danger of deleting files I want to keep.)
Thanks in advance.
You can do it with
find . -name "filename" -and -not -path "./path/to/filename" -delete
You will want either to make sure that the path expression is a relative one, including the initial ./, so that it's matched by the expression, or else use wildcards. So if you know that it's in a folder named myfolder, but you don't know the full path to it, you can use
find . -name "filename" -and -not -path "*/myfolder/filename" -delete
If you don't want to delete anything under any directory named FOLDER1, you can tell find not to recurse down any directory so named at all, using -prune:
find . -name FOLDER1 -prune -o -name filename -delete
This is more efficient than recursing down that directory and then filtering out results that include it later.
Side note: When testing this, be sure you use the explicit -print:
find . -name FOLDER1 -prune -o -name filename -print
...whereas an implicit one won't behave as you expect:
# not what you want: equivalent to the below, not the above:
find . -name FOLDER1 -prune -o -name filename
...will behave as:
find . '(' -name FOLDER1 -prune -o -name filename ')' -print
...which thus includes contents on either side of the -o operator for the action.

Having trouble with parentheses in unix find and correct syntax

This one liner works, the goal:
search a directory
find all files that are newer than a timestamp file
that are NOT named .DS_Store
otherwise, list all those other files.
I came up with this, which works, but I see examples online that use a lot of parentheses for which I am using none. I was thinking there may be a better way:
find /Users/$USER/Library/Messages/Attachments -not -name ".DS_Store" -not -name "timestamp" -name "*" -type f -newer /Users/$USER/Library/Messages/scripts/timestamp
And ultimately I want to take the results and copy them to a specific place. For that I was going to append this:
-exec cp {} archive_files/ \;
You could combine all the -not expressions into a parenthesized group by applying de Morgan's Law:
-not \( -name .DS_Store -o -name timestamp \)
I don't see the point in your simple case, but if you had lots of names to exclude it might be clearer.

What does this bash script means

I've found the following line of code in a script. Could someone explain me what does this following line of code means?
Basically, the purpose of this line is find a set of files to archive. Since I am not familiar with bash scripts, it is difficult for me to understand this line of code.
_filelist=`cd ${_path}; find . -type f -mtime ${ARCHIVE_DELAY} -name "${_filename}" -not -name "${_ignore_filename}" -not -name "${_ignore_filename2}"`
Let's break it down:
cd ${_path} : changes to the directory stored in the ${_path} variable
find is used to find files based on the following criteria:
. : look in the current directory and recurse through all
sub-directories
-type f: look for regular files only (not directories)
-mtime ${ARCHIVE_DELAY} : look for files last modified
${ARCHIVE_DELAY}*24 hours ago
-name "${_filename}": look for files which have name matching ${_filename}
-not -name "${_ignore_filename}" : do not find files which have
name matching ${_ignore_filename}
-not -name "${_ignore_filename2}" : do not find files which have
name matching ${_ignore_filename2}
All the files found are stored in a variable called _filelist.
The backtick (`) symbol assigns to the variable the output of the command.
Your script is assigning to $_filelist what you get by:
Changing directory to $_path
Finding in the current directory (.) files (-type f) where
Name is $_filename (a pattern, I suppose)
Name is not $_ignore_filename or $_ignore_filename2
I think you could as well change that to find ${_path} ... without the cd, but please try it out.
_filelist=`somecode`
makes the variable _filelist contain the output of the command somecode.
Somecode, in this case, is mostly a find command, which searches recursively for files.
find . -type f -mtime ${ARCHIVE_DELAY} -name "${_filename}" -not -name "${_ignore_filename}" -not -name "${_ignore_filename2}"
find .
searches the current dir, but this was just before changed to be _path.
-type f
only searches in ordinary files (not dirs, sockets, ...)
-mtime
specifies the modification time of that files, to be the same as ${ARCHIVE_DELAY}
-name explains
itself, has to be "${_filename}"
-not name
explains itself too, I guess.
So the whole part sets the variable filelist to files, found by some criterias: name, age, and type.

Resources