Git diff without comments - ruby

Let's say I added two lines to the file hello.rb.
# this is a comment
puts "hello world"
If I do git diff, it will show that I added two lines.
I don't want git to show any line which is a Ruby comment. I tried using git diff -G <regular expression>, but it didn't work for me. How do I do git diff so that it won't show any Ruby comments?

One possibility would be (ab)using git's textconv filters which are applied before the diff is created, so you can even transform binary formats to get a human-readable diff.
You can use any script that reads from a file and writes to stdout, such as this one which strips all lines starting with #:
#!/bin/sh
grep -v "^\s*#" "$1" || test $? = 1
(test $? = 1 corrects the exit code of grep, see https://stackoverflow.com/a/49627999/4085967 for details.)
For C, there is a sed script to remove comments, which I will use in the following example. Download the script:
cd ~/install
wget http://sed.sourceforge.net/grabbag/scripts/remccoms3.sed
chmod +x remccoms3.sed
Add it to your ~/.gitconfig:
[diff "strip-comments"]
textconv=~/install/remccoms3.sed
Add it to the repository's .gitattributes to enable it for certain filetypes:
*.cpp diff=strip-comments
*.c diff=strip-comments
*.h diff=strip-comments
The main downside is that this will be always enabled by default, you can disable it with --no-textconv.

Git cares a lot about the data that you give to it, and tries really hard not to lose any information. For Git, it doesn't make sense to treat some lines as if they haven't changed if they had.
I think that the only reasonable way to do this is to post-process Git output, as Dogbert had already shown in his comment.

Related

Check out git repository that contains invalid filenames in windows [duplicate]

I'm working a shared project using git for version control. I'm on windows while my partner is on Unix.
My partner has named some files with <file1>.txt. When I try to pull these files they are not accepted as the < and > are invalid characters for Windows. This is fine, I don't need to touch the files. However, they are added to my commit as deleted. So, if I push then I'll delete these files which I don't want to do.
I can't use git reset --hard as it finds an invalid path for each of these "deleted" files.
Is there a way to exclude these files from my commits? I've tried adding <file1> to my .git/info/exclude but that didn't work.
You would need to get your partner to change the names to be something that is also valid on Windows. After they have renamed them, what I'd do is this:
Backup any changes that you only have locally (both uncommitted AND committed but not pushed).
Run git reset --hard <commit> where <commit> is any commit from before the files were added.
Run git pull to get all the way to the latest revision (where the files are renamed).
Restore your backed up changes from 1.
This should then get the newer revision where the files aren't named in this, to Windows, illegal way, and they won't be deleted (or ever created) from under git by the OS :)
P.S. I know this is an old question, but I've been getting this issue recently, so hopefully the solution I've arrived at can help others as well.
EDIT:
To avoid this happening again, your partner can add a pre-commit hook that will stop them from committing files with names that would not be allowed on Windows. There's a sample/example often in pre-commit.sample. I've changed the bit a little in the past and end up with something like:
# Cross platform projects tend to avoid non-ASCII filenames; prevent
# them from being added to the repository. We exploit the fact that the
# printable range starts at the space character and ends with tilde.
if [ "$allownonascii" != "true" ] &&
# Note that the use of brackets around a tr range is ok here, (it's
# even required, for portability to Solaris 10's /usr/bin/tr), since
# the square bracket bytes happen to fall in the designated range.
echo $(git diff --cached --name-only --diff-filter=A -z $against | LC_ALL=C)
test $(git diff --cached --name-only --diff-filter=A -z $against |
LC_ALL=C tr -d '[ !#-)+-.0-9;=#-[]-{}~]\0' | wc -c) != 0
then
cat <<\EOF
Error: Attempt to add a non-ASCII file name.
This can cause problems if you want to work with people on other platforms.
To be portable it is advisable to rename the file.
If you know what you are doing you can disable this check using:
  git config hooks.allownonascii true
EOF
exit 1
fi
The '[ !#-)+-.0-9;=#-[]-{}~]\0' bit is the important part that I've changed a little. It defines all the allowed ranges of characters, and the example one only disallows "non-ascii" characters (which is what the comment at the top says), but there are also ascii characters that are not allowed in file names on Windows (such as ? and :).
All the allowed characters are removed, and if there's anything left (wc -c != 0) it errors. It can be a bit difficult to read, as you can't see any of the disallowed characters. It helps if you have a list of the char ranges to look at when reading or editing it.
Ignoring doesn't help if the files are tracked already.
Use Sparse checkout to skip those files.

Set and check for empty var inside a target

I created a make target which fetches a list of files from git ls-files and passes them to phpcs, a coding standards utility.
The problem is that I don't want to run it if no files are returned. (because it then tries to analyze the entire codebase)
phpcs:
FS=$$(git ls-files -om --exclude-standard '*.php'); \
vendor/bin/phpcs --encoding=utf-8 --extensions=php $$FS
How would I exit if $FS (which is set in the first line) is empty?
I'm running the default gnu-make which comes with OSX. (3.8.1)
This isn't really a makefile question. You are using shell commands to perform these operations so you have to come up with a shell solution for your problem: make doesn't enter into it.
You can do something like this:
phpcs:
FS=$$(git ls-files -om --exclude-standard '*.php'); \
test -z "$$FS" || vendor/bin/phpcs --encoding=utf-8 --extensions=php $$FS
PS. There's no such release as GNU make 3.8.1. You mean, I assume, GNU make 3.81.

list files with git status

Background
I am well aware of how git status works, and even about git ls-files. Usually git status is all I need and want, it perfectly answers the question: "What is my status, and what files need my attention?"
However, I have been unable to find a quick command that answers the following question: "What files do I have, and what is their respective status?" So, I need a full listing of the directory (like ls -la) with a column that shows the status of each file/directory.
What I have tried
git status -s --ignored comes quite close to the output format that I want, but it just won't list the files that are unchanged between HEAD, index, and working directory. Also, it will recurse into directories.
git ls-files seems to be able to provide all the required info in scriptable form, but I've been unable to stop it from recursive listing the contents of all directories.
Obviously, I could hack something together that takes the output of these two commands and provides the view I would like to have. However, I would hate to reinvent the wheel if there is already some usable command out there.
Question
Is there some way of listing all files in a directory with their respective git status?
I want a full listing showing exactly the same files that ls would show.
Notes
This other question does not answer mine, because I definitely want an ls equivalent. Including unmodified, ignored, and untracked files, but excluding directory contents.
To restrict the paths Git inspects to just the current directory, use its Unix glob pathspecs. Since git status does a lot of checking against the index and against HEAD, use that, and to fill in the rest of the files ls would show you, use ls, just munge its output to have the same format as git status's output and take only the ones git status didn't already list.
( git status -s -- ':(glob)*'; ls -A --file-type | awk '{print " "$0}' ) \
| sort -t$'\n' -usk1.4
:(glob) tells Git the rest of the pathspec's a Unix glob, i.e. that * should match only one level, just like a (dotglob-enabled) shell wildcard¹.
The -t$'\n' tells sort that the field separator is a newline, i.e. it's all one big field, and -usk1.4 says uniquify, only take the first of a run, stable, preserve input order where it doesn't violate sort key order (which is a little slower so you have to ask for that specifically), k1.4 says the key starts at the first field, the fourth character in that field, with no end given so from there to the end.
¹ Why they decided to make pathspecs match neither like shell specs nor like gitignore specs by default, I might never bother learning, since I so much prefer ignorantly disapproving of their annoying choice.
Because output of git status -s is enough, let's just make a bash routine around this function! Then for unchanged files we could to echo the proper signaling manually. Following the specification we might use two symbols of space ' ' for this purpose either some another symbol. E.g. for directories, which are not tracked by Git anyway, selected symbol '_' as status code:
for FILE in *
do
if [[ -f $FILE ]]
then
if ! [[ $(git status -s $FILE) ]]
then
# first two simbols below is a two-letter status code
echo " $FILE"
else
git status -s "$FILE"
fi
fi
if [[ -d $FILE ]]
then
# first two symbols just selected as status code for directories
echo "__ $FILE"
fi
done
The script works in the same manner as ls. It can be written in one line using ; as well.

Output from git log gets lost when piped to file - what am I missing?

I am trying to get some information about some git commits via the command line as part of a larger automated tool I am building. The information I want is available via this git log command:
git log --branches --graph --oneline --parents
which produces this output:
This is great, because this has the hashes and tags that I want, as well as the commit messages. However, when I pipe this to a file, the stuff in the brackets seems to somehow get lost. I'm not too interested in the colour, but I do want just the plain text as I would expect from any *nix-like program.
This is the output I seem to get instead, which omits some of the output I want (eg, the tag information):
I'm not sure how or why this information gets lost when being piped somewhere. I feel like this might be something incredibly simple and obvious.
I experience the same problem whether I do this in Bash on Arch Linux (with the latest version of git) or in the MINGW64 Bash environment in Windows.
Question: How can I completely capture git log's output without losing the information that is being lost when piping to a file?
You need to add the --decorate option to your log command. Set it either as --decorate=short or --decorate=full.
It appears in your config you've probably got log.decorate set to auto, which means that tags and such are displayed (in short form) when writing to the terminal, but not to a pipe or other file.
Similarly there are config values and command options that govern if (and when) color codes are output; so
git log --branches --graph --oneline --parents --decorate=short --color=always
would output the tags and colors even when redirected to a file.
Note that when scripting you should probably include these options on the command line rather than make assumptions about what config values are set. Depending on what you do with the output, log may or may not be the best command to use in scripting anyway, as git commands are somewhat divided into those meant for human consumption vs those mean for scripting.
Your git command:
git log --branches --graph --oneline --parents
does not produce the same output for me that you show in your first example. It does, however, produce output similar to the second example. The ref names in the brackets (branches and tags) are included when you add the --decorate option. (Here's the documentation.)
Note that your git log format can be controlled in your ~/.gitconfig file. For example, the format you've shown in your question looks like it might be achieved with options like this:
git log --decorate --graph --all --format='%C(auto,yellow)%h%C(auto,reset) %C(auto,yellow)%p%C(auto,reset) %C(auto,red)%d%C(auto,reset) %s %C(auto,green)%an%C(auto,reset) (%C(auto,cyan)%ar%C(auto,reset))'
If you are trying to automate things, you can specify a format that is better tuned to your requirements. Check the git documentation for pretty-formats for details. In particular, look for the %C format, and the %C(auto,…​) notation, which causes colour controls to be used only if the command is run from a terminal.
If you always want to generate the colour information regardless of whether you're using git log interactively, you can remove each ocurrence of auto, in the above line (or alias).

git log --grep does not work in windows

Is there any equivalent for "git log --grep="STRING" in windows?
I've written a python program for linux which requires a reading of commit logs that contain certain string from the git object. This worked fine in linux, but when I ran the same program in windows, git log --grep="STRING" catches nothing.
Here's the code snippet. (fetcher.py)
import os
...
os.chdir(DIRECTORY) # where git obj is
command = "git log --all --grep='KEYWORD' > log.txt"
os.system(command) # run the command in the shell
...
It seems that git internally uses the linux grep for the "--grep" argument such that Windows cannot run this correctly as it misses grep.
Thanks for your help.
As I am not getting any answer for 24 hrs,
I suggest my own solution, which does not utilize grep.
Because git log itself runs without any problem,
I just ran the command without the --grep='STRING' option, then read the output from the shell (or a file) to filter the commit logs which contain 'STRING' by the use of regular expression.
import os
import re
command = "git log --all > log.txt"
os.system(command) # run the command in the shell and store output in log.txt
with open('log.txt', 'r') as fp:
gitlogoutput = fp.readlines() # read the redirected output
if gitlogoutput:
commits = re.split('[\n](?=commit\s\w{40}\nAuthor:\s)', gitlogoutput)
# it splits the output string into each commits
# at every '\n' which is followed by the string 'commit shahash(40bytes)\nAuthor: '
for commit it commits:
if 'KEYWORD' is in commit:
print commit
The approach requires you to add some code, but I believe it does the same thing as the original command does. For better results, you can change the last if statement which is,
if 'KEYWORD' is in commit:
into something that can do more sophisticated search e.g. re.search() method.
In my case, this produced exactly the same result as that of --grep="KEYWORD"
Still, I appreciate your help :)
It seems that git internally uses the linux grep for the "--grep" argument such that Windows cannot run this correctly as it misses grep.
It certainly can, provided your %PATH% includes <git>/usr/bin (which has 200+ Linux commands compiled for Windows)
See this simplified path:
set G=c:\path\to\latest\git
set PATH=%G%\bin;%G%\usr\bin;%G%\mingw64\bin
set PATH=%PATH%;C:\windows\system32;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\
Add the PATH for python, and you can "Linux grep" without any issue.
In Windows, use the following format (replacing the = sign with a space):
git log --grep "STRING"

Resources