List files not matching a pattern? - bash

Here's how one might list all files matching a pattern in bash:
ls *.jar
How to list the complement of a pattern? i.e. all files not matching *.jar?

Use egrep-style extended pattern matching.
ls !(*.jar)
This is available starting with bash-2.02-alpha1.
Must first be enabled with
shopt -s extglob
As of bash-4.1-alpha there is a config option to enable this by default.

ls | grep -v '\.jar$'
for instance.

Little known bash expansion rule:
ls !(*.jar)

With an appropriate version of find, you could do something like this, but it's a little overkill:
find . -maxdepth 1 ! -name '*.jar'
find finds files. The . argument specifies you want to start searching from ., i.e. the current directory. -maxdepth 1 tells it you only want to search one level deep, i.e. the current directory. ! -name '*.jar' looks for all files that don't match the regex *.jar.
Like I said, it's a little overkill for this application, but if you remove the -maxdepth 1, you can then recursively search for all non-jar files or what have you easily.

POSIX defines non-matching bracket expressions, so we can let the shell expand the file names for us.
ls *[!j][!a][!r]
This has some quirks though, but at least it is compatible with about any unix shell.

If your ls supports it (man ls) use the --hide=<PATTERN> option. In your case:
$> ls --hide=*.jar
No need to parse the output of ls (because it's very bad) and it scales to not showing multiple types of files. At some point I needed to see what non-source, non-object, non-libtool generated files were in a (cluttered) directory:
$> ls src --hide=*.{lo,c,h,o}
Worked like a charm.

Another approach can be using ls -I flag (Ignore-pattern).
ls -I '*.jar'

And if you want to exclude more than one file extension, separate them with a pipe |, like ls test/!(*.jar|*.bar). Let's try it:
$ mkdir test
$ touch test/1.jar test/1.bar test/1.foo
$ ls test/!(*.jar|*.bar)
test/1.foo
Looking at the other answers you might need to shopt -s extglob first.

One solution would be ls -1|grep -v '\.jar$'

Some mentioned variants of this form:
ls -d *.[!j][!a][!r]
But this seems to be only working on bash, while this seems to work on both bash and zsh:
ls -d *.[^j][^a][^r]

ls -I "*.jar"
-I, --ignore=PATTERN
do not list implied entries matching shell PATTERN
It works without having to execute anything before
It works also inside watch quotes: watch -d 'ls -I "*.gz"', unlike watch 'ls !(*.jar)' which produces: sh: 1: Syntax error: "(" unexpected
Note: For some reason in Centos requires quoting the pattern after -I while Ubuntu does not

Related

Japanese file name listing in shell script, how to avoid unexpected enter escape line [duplicate]

I'm trying to use the ? wildcard to display only 1 character files, and ?.* to display 1 character files with extensions.
what works:
cd /mydir
ls ? ?.*
I'm trying to use this in a shell script so therefor i cant use "cd"
What i'm trying to get to work
ls ? ?.* /mydir
and it gives me the output:
ls: cannot access ?.*: No such file or directory
I've also tried:
ls /mydir ? ?.*
which gives me the exact same output as before.
From a comment you wrote:
im in college for linux administrator and 1 of my current classes in shell scripting. My teacher is just going over basic stuff. And, my current assingment is to get the number of files in the tmp directory of our class server, the number of files that end in .log and the number of files that only have 1 character names and store the data in a file and then display the stored data to the user. I know it's stupid, but it's my assignment.
I only hope that they don't teach you to parse the output of ls in college... it's one of the most terrible things to do. Please refer to these links:
Why you shouldn't parse the output of ls(1)
Don't ever do these
The solution you chose
ls /mydir/? /mydir/?.* | wc -l
is broken in two cases:
If there are no matching files, you'll get an error. You can fix that in two ways: use shopt -s nullglob or just redirect stderr to devnull.
If there's a newline in a file name. Try it: touch $'a.lol\nlol\n\lol\nlol\nlol'. LOL.
The proper bash way is the following:
shopt -s nullglob
shopt -u failglob
files=( /mydir/? /mydir/?.* )
echo "There are ${#files[#]} files found."
When you write ls ? ?.* /mydir, you're trying to display the files matching three distincts patterns: ?, ?.*, and /mydir. You want to match only /mydir/? and /mydir/?.*, hence this command: ls /mydir/? /mydir/?.*.
Edit: while this is a correct answer to the initial question (listing /mydir/? and /mydir/?.*), OP wanted to do this to parse the output and get the file count. See #gniourf_gniourf's answer, which is a much better way to do this.
cd works perfectly within a shell script, use it. For minimal impact on the script, I would use a subshell:
( cd /mydir && ls ? ?.* )
That way, you don't change the current working directory of the script (and neither $OLDPWD, which would be clobbered with cd /mydir; ...; cd -;).
While ls seems like an obvious choice, find is probably more suitable:
find /mydir \! -name "." -a \( -name "?" -o -name "?.*" \)

Finding all commands excluding "."

So far I have this:
ls /usr/bin | grep "^[\.]"
The cmd still gets files with a "." in there.
I have looked at [[:punct:]] but still returns the same thing.
There's grep -v to exclude things. So try
ls /usr/bin | grep -v \\.
man grep says
-v, --invert-match
Selected lines are those not matching any of the specified patterns.
It's generally considered a bad idea to parse ls.
If I understand you correctly, you want all files in /usr/bin that don't have a dot in the name. You can use find to do that:
find /usr/bin -not -name "*.*"
It is more portable (thanks #Adrian) to use a ! instead of -not:
find /usr/bin ! -name "*.*"
Not really clear, what you want:
your command:
ls /usr/bin | grep "^[\.]"
mean, filter the output from ls to show only files, what are start with a dot.
grep "^[\.]"
^ ^^ - escaped dot
+- at the begining of the line
If you want, exclude all files what contains dot, use
ls /usr/bin | grep -v '\.' #or see HenrikN's answer and comments (grep -vF .)
it you want exclude only entries what are starting with dot, use
grep '^[^\.]'
whats mean anything, but dot at the start
Ps: anyway, parsing output form ls is usually an very bad idea. (http://mywiki.wooledge.org/ParsingLs)
You can change your regex to exclude files starting with ".":
ls -a /usr/bin | grep "^[^.]"
This regex selects only files which do not have "." at the start. By the way only ls -a shows files that starts with ".". How did you manage to get them without "-a" ?
This can be achieved with pure bash, if the extglob shell option is enabled.
shopt -s extglob
echo /usr/bin/!(*.*)
# or alternatively:
echo /usr/bin/+([!.])
You may replace echo with ls -d if you want to pipe the list to another command line-wise.
I think you are referring to the current working directory and parent dirctory and not a command with "a dot" in it.
Try this as you probably have ls aliased:
/bin/ls /usr/bin

Using ? wildcard with ls

I'm trying to use the ? wildcard to display only 1 character files, and ?.* to display 1 character files with extensions.
what works:
cd /mydir
ls ? ?.*
I'm trying to use this in a shell script so therefor i cant use "cd"
What i'm trying to get to work
ls ? ?.* /mydir
and it gives me the output:
ls: cannot access ?.*: No such file or directory
I've also tried:
ls /mydir ? ?.*
which gives me the exact same output as before.
From a comment you wrote:
im in college for linux administrator and 1 of my current classes in shell scripting. My teacher is just going over basic stuff. And, my current assingment is to get the number of files in the tmp directory of our class server, the number of files that end in .log and the number of files that only have 1 character names and store the data in a file and then display the stored data to the user. I know it's stupid, but it's my assignment.
I only hope that they don't teach you to parse the output of ls in college... it's one of the most terrible things to do. Please refer to these links:
Why you shouldn't parse the output of ls(1)
Don't ever do these
The solution you chose
ls /mydir/? /mydir/?.* | wc -l
is broken in two cases:
If there are no matching files, you'll get an error. You can fix that in two ways: use shopt -s nullglob or just redirect stderr to devnull.
If there's a newline in a file name. Try it: touch $'a.lol\nlol\n\lol\nlol\nlol'. LOL.
The proper bash way is the following:
shopt -s nullglob
shopt -u failglob
files=( /mydir/? /mydir/?.* )
echo "There are ${#files[#]} files found."
When you write ls ? ?.* /mydir, you're trying to display the files matching three distincts patterns: ?, ?.*, and /mydir. You want to match only /mydir/? and /mydir/?.*, hence this command: ls /mydir/? /mydir/?.*.
Edit: while this is a correct answer to the initial question (listing /mydir/? and /mydir/?.*), OP wanted to do this to parse the output and get the file count. See #gniourf_gniourf's answer, which is a much better way to do this.
cd works perfectly within a shell script, use it. For minimal impact on the script, I would use a subshell:
( cd /mydir && ls ? ?.* )
That way, you don't change the current working directory of the script (and neither $OLDPWD, which would be clobbered with cd /mydir; ...; cd -;).
While ls seems like an obvious choice, find is probably more suitable:
find /mydir \! -name "." -a \( -name "?" -o -name "?.*" \)

Wildcard expansion - searching for one in a set of possibilities

I could have sworn you could do the following:
ls *.{java, cpp}
but that does not seem to work. I know this answer is probably on the site somewhere but I couldn't find it via search.
For instance, if I want to be able to use the globbing with a find command, I would want to do something like
find . -name "*.{java,cpp}" | xargs grep -n 'TODO'
Is this possible without resorting to using the -o binary operator?
It is likely that you are seeing an error message such as this:
ls: cannot access *.{java: No such file or directory
ls: cannot access ,cpp}: No such file or directory
If that's the case, it's because of the space after the comma. Leave it out:
ls *.{java,cpp}
For future reference, it is more helpful to post error messages than to say "it's not working" (please don't take this personally. It's meant for everyone to see. I even do it sometimestoo often).
ls *.{java,cpp} works just fine for me in bash...:
$ ls *.{java,cpp}
a.cpp ope.cpp sc.cpp weso.cpp
helo.java qtt.cpp srcs.cpp
Are you sure it's not working for you...?
find is different, but
find -E . -regex '.*\.(java|cpp)'
should do what you want (in some versions you may not need the -E or you may need a -regextype option there instead, "man find" on your specific system to find out).
But this does work in Bash:
$ ls
a.h a.s main.cpp main.s
$ ls *.{cpp,h}
a.h main.cpp
Are you sure you're in Bash? If you are, maybe an alias is causing the issue: try /bin/ls *.{java,cpp} to make sure you don't call the aliased ls.
Or, just take out the spaces in your list inside the {} -- the space will cause an error because Bash will see *.{java, as one argument to ls, and it will see cpp} as a second argument.
For your particular example, this may also do what you want
grep -rn TODO . --include '*.java' --include '*.cpp'

How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?

Say I want to copy the contents of a directory excluding files and folders whose names contain the word 'Music'.
cp [exclude-matches] *Music* /target_directory
What should go in place of [exclude-matches] to accomplish this?
In Bash you can do it by enabling the extglob option, like this (replace ls with cp and add the target directory, of course)
~/foobar> shopt extglob
extglob off
~/foobar> ls
abar afoo bbar bfoo
~/foobar> ls !(b*)
-bash: !: event not found
~/foobar> shopt -s extglob # Enables extglob
~/foobar> ls !(b*)
abar afoo
~/foobar> ls !(a*)
bbar bfoo
~/foobar> ls !(*foo)
abar bbar
You can later disable extglob with
shopt -u extglob
The extglob shell option gives you more powerful pattern matching in the command line.
You turn it on with shopt -s extglob, and turn it off with shopt -u extglob.
In your example, you would initially do:
$ shopt -s extglob
$ cp !(*Music*) /target_directory
The full available extended globbing operators are (excerpt from man bash):
If the extglob shell option is enabled using the shopt builtin, several extended
pattern matching operators are recognized.A pattern-list is a list of one or more patterns separated by a |. Composite patterns may be formed using one or more of the following sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
So, for example, if you wanted to list all the files in the current directory that are not .c or .h files, you would do:
$ ls -d !(*#(.c|.h))
Of course, normal shell globing works, so the last example could also be written as:
$ ls -d !(*.[ch])
Not in bash (that I know of), but:
cp `ls | grep -v Music` /target_directory
I know this is not exactly what you were looking for, but it will solve your example.
If you want to avoid the mem cost of using the exec command, I believe you can do better with xargs. I think the following is a more efficient alternative to
find foo -type f ! -name '*Music*' -exec cp {} bar \; # new proc for each exec
find . -maxdepth 1 -name '*Music*' -prune -o -print0 | xargs -0 -i cp {} dest/
A trick I haven't seen on here yet that doesn't use extglob, find, or grep is to treat two file lists as sets and "diff" them using comm:
comm -23 <(ls) <(ls *Music*)
comm is preferable over diff because it doesn't have extra cruft.
This returns all elements of set 1, ls, that are not also in set 2, ls *Music*. This requires both sets to be in sorted order to work properly. No problem for ls and glob expansion, but if you're using something like find, be sure to invoke sort.
comm -23 <(find . | sort) <(find . | grep -i '.jpg' | sort)
Potentially useful.
You can also use a pretty simple for loop:
for f in `find . -not -name "*Music*"`
do
cp $f /target/dir
done
In bash, an alternative to shopt -s extglob is the GLOBIGNORE variable. It's not really better, but I find it easier to remember.
An example that may be what the original poster wanted:
GLOBIGNORE="*techno*"; cp *Music* /only_good_music/
When done, unset GLOBIGNORE to be able to rm *techno* in the source directory.
My personal preference is to use grep and the while command. This allows one to write powerful yet readable scripts ensuring that you end up doing exactly what you want. Plus by using an echo command you can perform a dry run before carrying out the actual operation. For example:
ls | grep -v "Music" | while read filename
do
echo $filename
done
will print out the files that you will end up copying. If the list is correct the next step is to simply replace the echo command with the copy command as follows:
ls | grep -v "Music" | while read filename
do
cp "$filename" /target_directory
done
One solution for this can be found with find.
$ mkdir foo bar
$ touch foo/a.txt foo/Music.txt
$ find foo -type f ! -name '*Music*' -exec cp {} bar \;
$ ls bar
a.txt
Find has quite a few options, you can get pretty specific on what you include and exclude.
Edit: Adam in the comments noted that this is recursive. find options mindepth and maxdepth can be useful in controlling this.
The following works lists all *.txt files in the current dir, except those that begin with a number.
This works in bash, dash, zsh and all other POSIX compatible shells.
for FILE in /some/dir/*.txt; do # for each *.txt file
case "${FILE##*/}" in # if file basename...
[0-9]*) continue ;; # starts with digit: skip
esac
## otherwise, do stuff with $FILE here
done
In line one the pattern /some/dir/*.txt will cause the for loop to iterate over all files in /some/dir whose name end with .txt.
In line two a case statement is used to weed out undesired files. – The ${FILE##*/} expression strips off any leading dir name component from the filename (here /some/dir/) so that patters can match against only the basename of the file. (If you're only weeding out filenames based on suffixes, you can shorten this to $FILE instead.)
In line three, all files matching the case pattern [0-9]*) line will be skipped (the continue statement jumps to the next iteration of the for loop). – If you want to you can do something more interesting here, e.g. like skipping all files which do not start with a letter (a–z) using [!a-z]*, or you could use multiple patterns to skip several kinds of filenames e.g. [0-9]*|*.bak to skip files both .bak files, and files which does not start with a number.
this would do it excluding exactly 'Music'
cp -a ^'Music' /target
this and that for excluding things like Music?* or *?Music
cp -a ^\*?'complete' /target
cp -a ^'complete'?\* /target

Resources