bash function grep --exclude-dir not working - bash

I have the following function defined in my .bashrc, but for some reason the --exclude-dir option is not excluding the .git directory. Can anyone see what I've done wrong? I'm using Ubuntu 13.10 if that helps.
function fif # find in files
{
pattern=${1?" Usage: fif <word_pattern> [files pattern]"};
files=${2:+"-iname \"$2\""};
grep "$pattern" --color -n -H -s $(find . $files -type f) --exclude-dir=.git --exclude="*.min.*"
return 0;
}

Make sure not to include a trailing slash when you specify the directory to exclude. For example:
Do this:
$ grep -r --exclude-dir=node_modules firebase .
NOT this:
$ grep -r --exclude-dir=node_modules/ firebase .
(This answer not applicable to OP, but may be helpful for others who find --exclude-dir not to be working -- it worked for me.)

Do a man grep on your system, and see what version you have. Your version of grep may not be able to use --exclude-dirs.
You're really better off using find to find the files you want, then use grep to parse them:
$ find . -name '.git' -type d -prune \
-o -name "*.min.*" -prune \
-o -type f -exec grep --color -n -H {} "$pattern" \;
I'm not a fan of the recursive grep. Its syntax has become bloated, and it's really unnecessary. We have a perfectly good tool for finding files that match a particular criteria, thank you.
In the find program, the -o separate out the various clauses. If a file has not been filtered out by a previous -prune clause, it is passed to the next one. Once you've pruned out all of the .git directories and all of the *.min.* files, you pass the results to the -exec clause that executes your grep command on that one file.
Some people prefer it this way:
$ find . -name '.git' -type d -prune \
-o -name "*.min.*" -prune \
-o -type f -print0 | xargs -0 grep --color -n -H "$pattern"
The -print0 prints out all of the found files separated by the NULL character. The xargs -0 will read in that list of files and pass them to the grep command. The -0 tells xargs that the file names are NULL separated and not whitespace separated. Some xargs will take --null instead of the -0 parameter.

Related

how to grep for "/local" in all perl files in current directory

I am using the following command to grep for string "/local" in all .pl files,can anyone point what is wrong here?
find . *.pl| xargs grep '/local' -sl
Pass -name argument, and quote *.pl:
find . -name "*.pl" | xargs grep '/local' -sl
Why is everyone suggesting "find"? The shell can work out your ".pl" files for you:
grep "/local" *.pl
You could just as easily
find . -type f -name '*.pl' -exec grep '/local/' {} \;
Or a more optimal form if your find supports it, this passes multiple files to grep at a time
find . -type f -name '*.pl' -exec grep '/local/' {} +
-exec tends to be slow.
I'm partial to:
find . -name "*.pl" -print0 | xargs -0 grep -sl '/local'
...because it isn't confused by filenames with newlines in them.
Note however that some versions of GNU grep appear to have a memory leak that is triggered by grep commands with a very long list of filenames to search. Under such circumstances, -exec is more reliable.

Better way to limit the unix command find by filename

I'm getting results using find with filenames that have '~' and .swp, etc. So I did the following, but is there a better way to do this? The '.*.js' -iname '*.js' part feels "redundant".
$ find ./ '.*.js' -iname '*.js' -print0 | xargs -0 grep -n ".*loginError.*"
find: `.*.js': No such file or directory
./js/signin.js:252: foo.loginError();
./js/signin.js:339:foo.loginError = function() {
./js/signin.js:340: foo.log("ui.loginError");
Try using
find . -name \*.js -print0 | xargs -0 grep -n ".*loginError.*"
That will find only files with 'js' extension and not ending in ~ or .swp
EDIT: Added '0' -print0 (edit requires 6 characters so I'm adding this; ergh!)
To do it all in one command without the xargs you could do it like this
find . -name "*.js" -exec grep -n ".*loginError.*" /dev/null {} \;
the /dev/null piece is to make grep think it's searching multiple files and then it'll output the filename correctly, otherwise it'd just print out the line number without telling you which file it's in

How do I write a bash alias/function to grep all files in all subdirectories for a string?

I've been using the following command to grep for a string in all the python source files in and below my current directory:
find . -name '*.py' -exec grep -nHr <string> {} \;
I'd like to simplify things so that I can just type something like
findpy <string>
And get the exact same result. Aliases don't seem sufficient since they only do a string expansion, and the argument I need to specify is not the last argument. It sounds like functions are suitable for the task, so I have several questions:
How do I write it?
Where do I put it?
If you don't want to create an entire script for this, you can do it with just a shell function:
findpy() { find . -name '*.py' -exec grep -nHr "$1" {} \; ; }
...but then you may have to define it in both ~/.bashrc and ~/.bash_profile, so it gets defined for both login and interactive shells (see the INVOCATION section of bash's man page).
All the "find ... -exec" solutions above are OK in the sense that they work, but they are horribly inefficient and will be extremely slow for large trees. The reason is that they launch a new process for every single *.py file. Instead, use xargs(1), and run grep only on files (not directories):
#! /bin/sh
find . -name \*.py -type f | xargs grep -nHr "$1"
For example:
$ time sh -c 'find . -name \*.cpp -type f -exec grep foo {} \; >/dev/null'
real 0m3.747s
$ time sh -c 'find . -name \*.cpp -type f | xargs grep foo >/dev/null'
real 0m0.278s
On a side note, you should take a look at Ack for what you are doing. It is designed as a replacement for Grep written in Perl. Filtering files based on the target language or ignoring .svn directories and the like.
Example (snippet from Trac source):
$ ack --python foo ./mysource
ticket/tests/wikisyntax.py
139:milestone:foo
144:<a class="missing milestone" href="/milestone/foo" rel="nofollow">milestone:foo</a>
ticket/tests/conversion.py
34: ticket['foo'] = 'This is a custom field'
ticket/query.py
239: count_sql = 'SELECT COUNT(*) FROM (' + sql + ') AS foo'
I wanted something similar, and the answer by Idelic reminded of one of the nice features of xargs: that it puts the command at the end. You see, my problem was that I wanted to write a shell alias that would "accept parameters" (really, that it would expand in such a way to allow me to pass parameter so grep).
Here's what I added to my bash_aliases:
alias findpy="find . -type f -name '*.py' | xargs grep"
This way, I could write findpy WORD or findpy -e REGEX or findpy -il WORD - the point being that could use any grep command-line option.
Put the following three lines in a file named findpy
#!/bin/bash
find . -name '*.py' -exec grep -nHr $1 {} \;
Then say
chmod u+x findpy
I normally have a directory called bin in my home directory where I put little shell scripts like this. Make sure to add the directory to your PATH.
The script:
#!/bin/bash
find . -name '*.py' -exec grep -nHr "$1" {} ';'
is how I'd do it.
You write it with an editor like vim and put it somewhere on your path. My normal approach is to have a ~/bin directory and make sure my .profile file (or equivalent) contains:
PATH=$PATH:~/bin
Many versions of grep have options to do recursion, specify filename pattern, etc.
grep --perl-regexp --recursive --include='*.py' --regexp="$1" .
This recurses starting from the current directory (.), looks only at files ending in 'py', uses Perl-style regular expressions.
If your version of grep doesn't support --recursive and --include, then you can still use find and xargs, but be sure to allow for pathnames with embedded spaces by using the -print0 argument to find and the --null option to xargs to handle that.
find . -type f -name '*.py' -print0 | xargs --null grep "$1"
should work.
Add the following line to your ~/.bashrc or ~/.bash_profile or ~/.profile
alias findpy='find . -type f -name "*.py" -print0 | xargs -0 grep'
then you can use it like this
findpy def
or with grep options
findpy -i class
the following alias will ignore the version control meta-directory of git and svn
alias findpy='find . -type f -not -path "*/.git/*" -a -not -path "*/.svn/*" -name "*.py" -print0 | xargs -0 grep'
#######################################################################################
#
# Function to search all files (including sub-directories) that match a given file
# extension ($2) looking for an indicated string ($1) - in a case insensitive manner.
#
# For Example:
#
# -> findfile AllowNegativePayments cpp
#
#
#######################################################################################
findfile ()
{
find . -iname "*.$2*" -type f -print0 | xargs -0 grep -i "$1" {} \; 2> /dev/nul
}
alias _ff='findfile'

What's a more concise way of finding text in a set of files?

I currently use the following command, but it's a little unwieldy to type. What's a shorter alternative?
find . -name '*.txt' -exec grep 'sometext' '{}' \; -print
Here are my requirements:
limit to a file extension (I use SVN and don't want to be searching through all those .svn directories)
can default to the current directory, but it's nice to be able to specify a different directory
must be recursive
UPDATE: Here's my best solution so far:
grep -r 'sometext' * --include='*.txt'
UPDATE #2: After using grep for a bit, I realized that I like the output of my first method better. So, I followed the suggestions of several responders and simply made a shell script and now I call that with two parameters (extension and text to find).
grep has -r (recursive) and --include (to search only in files and directories matching a pattern).
If its too unweildy, write a script that does it and put it in your personal bin directory. I have a 'fif' script which searches source files for text, basically just doing a single find like you have here:
#!/bin/bash
set -f # disable pathname expansion
pattern="-iname *.[chsyl] -o -iname *.[ch]pp -o -iname *.hh -o -iname *.cc
-o -iname *.java -o -iname *.inl"
prune=""
moreargs=true
while $moreargs && [ $# -gt 0 ]; do
case $1 in
-h)
pattern="-iname *.h -o -iname *.hpp -o -iname *.hh"
shift
;;
-prune)
prune="-name $2 -prune -false -o $prune"
shift
shift
;;
*)
moreargs=false;
;;
esac
done
find . $prune $pattern | sed 's/ /\\ /g' | xargs grep "$#"
it started life as a single-line script and got features added over the years as I needed them.
This is much more efficient since it invokes grep many fewer times, though it's hard to say it's more succinct:
find . -name '*.txt' -print0 | xargs -0 grep 'sometext' /dev/null
Notes:
/find -print0 and xargs -0 makes pathnames with embedded blanks work correctly.
The /dev/null argument makes sure grep always prepends a filename.
Install ack and use
ack -aG'\.txt$' 'sometext'
I second ephemient's suggestion of ack. I'm writing this post to highlight a particular issue.
In response to jgormley (in the comments): ack is available as a single file which will work wherever the right Perl version is installed (which is everywhere).
Given that on non-Linux platforms grep regularly does not accept -R, arguably using ack is more portable.
I use zsh, which has recursive globbing. If you needed to look at specific filetypes, the following would be equivalent to your example:
grep 'sometext' **/*.txt
If you don't care about the filetype, the -r option will be better:
grep -r 'sometext' *
Although, A minor tweak to your original example will give you exactly what you want:
find . -name '*.txt' \! -wholename '*/.svn/*' -exec grep 'sometext' '{}' \; -print
If this is something you do frequently, make it a function (put this in your shell config):
function grep_no_svn {
find . -name "${2:-*}" \! -wholename '*/.svn/*' -exec grep "$1" '{}' \; -print
}
Where the first argument to the function is the text you're searching for. So:
$ grep_here_no_svn "sometext"
Or:
$ grep_here_no_svn "sometext" "*.txt"
You could write a script (in bash or whatever -- I have one in Groovy) and place it on the path. E.g.
$ myFind.sh txt targetString
where myFind.sh is:
find . -name "*.$1" -exec grep $2 {} \; -print
I usualy avoid the "man find" by using grep $(find . -name "*,txt")
You say that you like the output of your method (using find) better. The only difference I can see between them is that grepping multiple files will put the filename on the front.
You can always (in GNU grep, but you must be using that or -r and --include wouldn't work) turn the filename off by using -h (--no-filename). The opposite, for anyone who does want filenames but has to use find for some other reason, is -H (--with-filename).

How to remove trailing whitespace of all files recursively?

How can you remove all of the trailing whitespace of an entire project? Starting at a root directory, and removing the trailing whitespace from all files in all folders.
Also, I want to to be able to modify the file directly, and not just print everything to stdout.
Here is an OS X >= 10.6 Snow Leopard solution.
It Ignores .git and .svn folders and their contents. Also it won't leave a backup file.
(export LANG=C LC_CTYPE=C
find . -not \( -name .svn -prune -o -name .git -prune \) -type f -print0 | perl -0ne 'print if -T' | xargs -0 sed -Ei 's/[[:blank:]]+$//'
)
The enclosing parenthesis preserves the L* variables of current shell – executing in subshell.
Use:
find . -type f -print0 | xargs -0 perl -pi.bak -e 's/ +$//'
if you don't want the ".bak" files generated:
find . -type f -print0 | xargs -0 perl -pi -e 's/ +$//'
as a zsh user, you can omit the call to find, and instead use:
perl -pi -e 's/ +$//' **/*
Note: To prevent destroying .git directory, try adding: -not -iwholename '*.git*'.
Two alternative approaches which also work with DOS newlines (CR/LF) and do a pretty good job at avoiding binary files:
Generic solution which checks that the MIME type starts with text/:
while IFS= read -r -d '' -u 9
do
if [[ "$(file -bs --mime-type -- "$REPLY")" = text/* ]]
then
sed -i 's/[ \t]\+\(\r\?\)$/\1/' -- "$REPLY"
else
echo "Skipping $REPLY" >&2
fi
done 9< <(find . -type f -print0)
Git repository-specific solution by Mat which uses the -I option of git grep to skip files which Git considers to be binary:
git grep -I --name-only -z -e '' | xargs -0 sed -i 's/[ \t]\+\(\r\?\)$/\1/'
In Bash:
find dir -type f -exec sed -i 's/ *$//' '{}' ';'
Note: If you're using .git repository, try adding: -not -iwholename '.git'.
Ack was made for this kind of task.
It works just like grep, but knows not to descend into places like .svn, .git, .cvs, etc.
ack --print0 -l '[ \t]+$' | xargs -0 -n1 perl -pi -e 's/[ \t]+$//'
Much easier than jumping through hoops with find/grep.
Ack is available via most package managers (as either ack or ack-grep).
It's just a Perl program, so it's also available in a single-file version that you can just download and run. See: Ack Install
This worked for me in OSX 10.5 Leopard, which does not use GNU sed or xargs.
find dir -type f -print0 | xargs -0 sed -i.bak -E "s/[[:space:]]*$//"
Just be careful with this if you have files that need to be excluded (I did)!
You can use -prune to ignore certain directories or files. For Python files in a git repository, you could use something like:
find dir -not -path '.git' -iname '*.py'
ex
Try using Ex editor (part of Vim):
$ ex +'bufdo!%s/\s\+$//e' -cxa **/*.*
Note: For recursion (bash4 & zsh), we use a new globbing option (**/*.*). Enable by shopt -s globstar.
You may add the following function into your .bash_profile:
# Strip trailing whitespaces.
# Usage: trim *.*
# See: https://stackoverflow.com/q/10711051/55075
trim() {
ex +'bufdo!%s/\s\+$//e' -cxa $*
}
sed
For using sed, check: How to remove trailing whitespaces with sed?
find
Find the following script (e.g. remove_trail_spaces.sh) for removing trailing whitespaces from the files:
#!/bin/sh
# Script to remove trailing whitespace of all files recursively
# See: https://stackoverflow.com/questions/149057/how-to-remove-trailing-whitespace-of-all-files-recursively
case "$OSTYPE" in
darwin*) # OSX 10.5 Leopard, which does not use GNU sed or xargs.
find . -type f -not -iwholename '*.git*' -print0 | xargs -0 sed -i .bak -E "s/[[:space:]]*$//"
find . -type f -name \*.bak -print0 | xargs -0 rm -v
;;
*)
find . -type f -not -iwholename '*.git*' -print0 | xargs -0 perl -pi -e 's/ +$//'
esac
Run this script from the directory which you want to scan. On OSX at the end, it will remove all the files ending with .bak.
Or just:
find . -type f -name "*.java" -exec perl -p -i -e "s/[ \t]$//g" {} \;
which is recommended way by Spring Framework Code Style.
I ended up not using find and not creating backup files.
sed -i '' 's/[[:space:]]*$//g' **/*.*
Depending on the depth of the file tree, this (shorter version) may be sufficient for your needs.
NOTE this also takes binary files, for instance.
Instead of excluding files, here is a variation of the above the explicitly white lists the files, based on file extension, that you want to strip, feel free to season to taste:
find . \( -name *.rb -or -name *.html -or -name *.js -or -name *.coffee -or \
-name *.css -or -name *.scss -or -name *.erb -or -name *.yml -or -name *.ru \) \
-print0 | xargs -0 sed -i '' -E "s/[[:space:]]*$//"
I ended up running this, which is a mix between pojo and adams version.
It will clean both trailing whitespace, and also another form of trailing whitespace, the carriage return:
find . -not \( -name .svn -prune -o -name .git -prune \) -type f \
-exec sed -i 's/[:space:]+$//' \{} \; \
-exec sed -i 's/\r\n$/\n/' \{} \;
It won't touch the .git folder if there is one.
Edit: Made it a bit safer after the comment, not allowing to take files with ".git" or ".svn" in it. But beware, it will touch binary files if you've got some. Use -iname "*.py" -or -iname "*.php" after -type f if you only want it to touch e.g. .py and .php-files.
Update 2: It now replaces all kinds of spaces at end of line (which means tabs as well)
This works well.. add/remove --include for specific file types :
egrep -rl ' $' --include *.c * | xargs sed -i 's/\s\+$//g'
Ruby:
irb
Dir['lib/**/*.rb'].each{|f| x = File.read(f); File.write(f, x.gsub(/[ \t]+$/,"")) }
1) Many other answers use -E. I am not sure why, as that's undocumented BSD compatibility option. -r should be used instead.
2) Other answers use -i ''. That should be just -i (or -i'' if preffered), because -i has the suffix right after.
3) Git specific solution:
git config --global alias.check-whitespace \
'git diff-tree --check $(git hash-object -t tree /dev/null) HEAD'
git check-whitespace | grep trailing | cut -d: -f1 | uniq -u -z | xargs -0 sed --in-place -e 's/[ \t]+$//'
The first one registers a git alias check-whitespace which lists the files with trailing whitespaces.
The second one runs sed on them.
I only use \t rather than [:space:] as I don't typically see vertical tabs, form feeds and non-breakable spaces. Your measurement may vary.
I use regular expressions. 4 steps:
Open the root folder in your editor (I use Visual Studio Code).
Tap the Search icon on the left, and enable the regular expression mode.
Enter " +\n" in the Search bar and "\n" in the Replace bar.
Click "Replace All".
This removes all trailing spaces at the end of each line in all files. And you can exclude some files that don't fit with this need.
This is what works for me (Mac OS X 10.8, GNU sed installed by Homebrew):
find . -path ./vendor -prune -o \
\( -name '*.java' -o -name '*.xml' -o -name '*.css' \) \
-exec gsed -i -E 's/\t/ /' \{} \; \
-exec gsed -i -E 's/[[:space:]]*$//' \{} \; \
-exec gsed -i -E 's/\r\n/\n/' \{} \;
Removed trailing spaces, replaces tabs with spaces, replaces Windows CRLF with Unix \n.
What's interesting is that I have to run this 3-4 times before all files get fixed, by all cleaning gsed instructions.

Resources