When does Git show a file as modified, when it actually isn't? - git-diff

The output of git diff says that the complete contents of one my files has been removed, and then added back. Why is this?
diff --git a/templates/appengine/go/app.yaml b/templates/appengine/go/app.yaml
index 303af12..223b642 100644
--- a/templates/appengine/go/app.yaml
+++ b/templates/appengine/go/app.yaml
## -1,13 +1,13 ##
-# Right now, we direct all URL requests to the Go app we are about to
-# construct, but in the future we may add support for jQuery and Bootstrap
-# to automatically put a nice UI on the / url.
-
-application: {{ app_id.downcase }}
-version: 1
-runtime: go
-api_version: 3
-
-handlers:
-- url: /.*
- script: _go_app
-
+# Right now, we direct all URL requests to the Go app we are about to
+# construct, but in the future we may add support for jQuery and Bootstrap
+# to automatically put a nice UI on the / url.
+
+application: {{ app_id.downcase }}
+version: 1
+runtime: go
+api_version: 3
+
+handlers:
+- url: /.*
+ script: _go_app
+

This kind of git diff output is expected if you have converted the line endings (e.g. by using a badly behaving editor running in Windows with Unix style line endings to begin with resulting to LF -> CR LF conversion). This is typical way to change every line of a file without other than white space changes, which you typically cannot decipher from the raw diff output to terminal.
Option -w for git diff will make git diff ignore white space changes. What will git diff -w look like in your case? If it shows no changes, then the white spaces are the reason for your output.
In such a case, you could do git diff | tr '\r' '~' to see the CR characters changed to '~' characters in the git diff output (assuming Unix tools available).

It shows it because it was removed in one commit and then added back in. So the difference between the current state and the last commit has changed.

Related

Colorizing custom gitmessage file in vim

I have a custom .gitmessage. It is defined in the git config. I've added a ~/.vim/after/ftplugin/gitcommit.vim to redefine a couple of things in the plugin (in support of Conventional Commits), e.g., textwidth.
My .gitmessage looks something like this:
#subject-|---------|---------|---------|---------|---------|---------|---------|---------|-------->|
# Commit messages should follow the Conventional Commits Specification
# <type>(<optional scope>): <imperative subject>
# empty separator line
# <optional body>
# empty separator line
# <optional footer>
#
# API relevant changes || perf commits improve performance
# feat commits add a new feature || revert commit, well, revert a previous
# fix commits fix a bug || commit
...
...
where I provide an ASCII "ruler" for 100 chars (though the VIM filetype plugin config autowraps to the textwidth per my customization, &tc.
However, I would like to colorize the conventional commit sample and the type keywords, e.g., feat, fix, perf, &tc.
I've tried most of standard ASCII escape codes in the .gitmessage, e.g., \\e[1;30m, ^[1;30m (Ctrl-V, esc), and \033[1;30m. Alas, they all just get dumped to the vim editor—it makes sense that vim would make sure that these codes are editable—but gitcommit.vim does colorize the diff in git commit --all --verbose...
This appears to be, maybe, a function of shellescape (see :h shellescape in vim). I see this used in some of the filetype plugins, but I'm not a plugin hacker.
Any pointers?
Vim doesn't care about your escape codes.
A quick and dirty method would be to use :help :match or :help matchadd() in your custom ftplugin:
But the real solution is to create your own syntax script, following the instructions under :help usr_44. See $VIMRUNTIME/syntax/gitcommit.vim for a start.

Find Path to Files containing specific word - How can I know which file in a directory contain the specific word that I want?

I manage to find the word that I desire with the following command in a github repository
git log --all -p | grep 'abc'
abc is a word located in a specific file.
My question is how can I find the file path to the string that I desire? How can I know which file contain the word that I want which is abc ?
For example, doing the above command would get me
(this.b(),this.abc);
but I would like to know which exact folder and which exact file is this piece of string/code coming from.
Any suggestion is appreciated.
git grep -l 'abc'
Option -l list file names where the pattern is found in the current (HEAD) commit.
Just run
git log --all -p
This shows the output in the pager, usually less. You can search, scroll forward and backward.
To search the string, type / a b c Enter (the abc here is a regular expression, not a literal string). Type n to find the next occurrence. When you have found one, you can scroll back with b and look at the patch text to see which file it is. (BTW, type Space to scroll forward; type q to exit the pager.)
The following (tested with GNU awk) should be an approximation of what you want:
git log --all -p |
awk '/^diff --git / {files = $0}
/^## /,/^(commit |diff --git )/ {if(index($0, "abc")) print files}'
We store the diff --git line in variable files. Then, if your string is found between the following line starting with ## and the line starting with commit or diff --git , the files variable is printed.
It is an approximation only because string abc could also be found in the diff --git or commit lines and also because ^## , ^diff --git or ^commit could be found in file contents.
More accurate solutions exist but they are more complicated and those I can think of cannot be 100% perfect.

Filter response from "git diff" command to get only the difference in Shell - Dynamic Solution

I am trying automate a redundant deployment process in my project. In order to achieve that I am trying to get the difference between two branches using "git diff" -- Someway and I am able to achieve that using the following command.
git diff <BRANCH_NAME1> -- common_folder_name/ <BRANCH_NAME2> -- common_folder_name/ > toStoreResponse.txt`
Now the response that I get, looks something like below:
diff --git a/cmc-database/common/readme.txt b/cmc-database/common/readme.txt
index 7820f3d..5a0e484 100644
--- a/cmc-database/common/readme.txt
+++ b/cmc-database/common/readme.txt
## -1 +1,5 ##
-This folder contains common database scripts.
\ No newline at end of file
+This folder contains common database scripts.
+TEST STTESA
\ No newline at end of file
So here in the above response only line/text that is a new line or the difference between the two branches is TEST STTESA and I want to store only that much of text in some different text file using shell / git way.
i.e a file named readme.txt which will only contain TEST STTESA content.
Work around Solution:
I have found a workaround to filter the response - but however it is not 100% what I am looking for. Command looks like below:
git diff <Branch_Name1> -- common-directory/ <Branch_Name2> -- common-directory/ | grep -v common-directory | grep -v index | grep -v # | grep -v \\
The above command returns below response:
-This folder contains common database scripts.
+This folder contains common database scripts.
+TEST STTESA
But I want to be able to store only the difference which is TEST STTESA
As you can easily realize, your solution won't work every time. The grep -v parts make it unportable.
Here is a "step0" solution : You want to match lines that start with a "+" or a "-" and then neither a "+" nor a "-". Use grep for that !
git diff ... | grep "^+[^+]\|^-[^-]"
Some explanation :
First, the \| part in the middle is an "or" statement.
Then, each side starts with a ^ which refers to the beginning of the line. And finally, after the first character, we want to reject some characters, using the [^...] syntax.
The line above translates to English as "Run the diff, and find all the lines that either start with a +, followed by something that is not a +, OR start with a -, followed by something that is not a -.
This will not work properly if you remove a line that started with a -. Nor if you add a line that starts with a +.
For such scenarii, I would tinkle with git diff --color and grep some [32m for the fun.
--diff-filter=[ACDMRTUXB*]
Select only files that are
A Added
C Copied
D Deleted
M Modified
R Renamed
T have their type (mode) changed
U Unmerged
X Unknown
B have had their pairing Broken
and * All-or-none

Apply Staged Changes to New Repo

What mistake am I making in the steps I'm following?
I've edited files in repo Alpha on my local box. I then realized I wanted those changes in a different repo Bravo that is also on my local box. I tried this:
c:/repos/alpha/>git diff --cached > mypatch.txt
I then copy the patch file to the other repo location and type this:
c:/repos/bravo/>git apply mypatch.txt
If the shell I used for the diff and apply was powershell or "Git CMD", then the second command makes the error:
fatal: unrecognized input
If instead I use the "Git Bash" shell to execute the two commands, then I get a different error:
5109e.patch:19: trailing whitespace.
IL.DataUsageGB,
warning: 1 line adds whitespace errors.
I then try to apply the changes more carefully with the following command:
$ git apply --reject --whitespace=fix mypatch.txt
From this I get a dump of numerous errors. Example:
error: while searching for:
);
GO
-- Anchor table ------------------------------------------------------------
-------------------------------------------
-- IL_InvoiceLine table (with 33 attributes)
----------------------------------------------------------------------------
-------------------------------------------
IF Object_ID('dbo.IL_InvoiceLine', 'U') IS NULL
CREATE TABLE [dbo].[IL_InvoiceLine] (
error: patch failed: scripts/bi/sql/Unified_ODS_Schema.sql:302
The branch in repo Alpha and the corresponding branch in repo Bravo both come from the same origin and both have a git status that report "up to date" with the upstream. In other words, the branches are identical except for the staged changes that exist on Alpha. I am expressly avoiding a push/pull with the origin.
Suggestions?
TL;DR
There's nothing wrong, and you can completely ignore the warning. You don't need --reject or --whitespace=fix, but if you do want to use the latter, use it without the former.
Longer
If the shell I used for the diff and apply was powershell ...
This winds up writing the output as Unicode (through some mechanism I cannot describe properly since I don't "do" Windows). You'd have to filter that back to UTF-8 or ASCII to get it to apply.
If instead I use the "Git Bash" shell to execute the two commands, then I get a different error:
5109e.patch:19: trailing whitespace.
IL.DataUsageGB,
warning: 1 line adds whitespace errors.
That's not really an error, that's a warning. It means that your original patch adds a blank before an end-of-line. By default, git apply calls this an "error" but it really means "warning". It's meant to alert you to the fact that there's an invisible character on the line(s) in question, which you may not have intended. (Or maybe you did! For instance, in some Markdown formats, ending a line with two blanks inserts a paragraph break. See aslo Git ignore trailing whitespace in markdown files only.)
What constitutes a "whitespace error" (which really should be "whitespace annoyance" or "whitespace warning" or "whitespace glitch" everywhere, rather than "error") is configurable. By default git diff will highlight such whitespace glitches. While I cannot quite show it here, imagine the - line is in red and the + line is in green and that <space> represents a trailing blank:
- blah blah
+ blah foo blah<space>
This space would be highlighted in red, to make it stand out as a "whitespace error" (which I would call a whitespace glitch or annoyance or warning, but as long as we are using Git we should understand what the phrase "whitespace error" means here).
With --whitespace=fix, git apply will find the things it considers to be whitespace errors and determine whether it can fix them automatically by:
stripping trailing whitespace
removing some space-before-tab spaces
fussing with CRLF vs LF-only
If it can fix them, it will. This includes applying the patch even if the context does not quite match up but can be made to do so by this kind of fussing, so it's more than just "removing trailing whitespace in the added lines".

Running git diff-tree with --numstat and --name-status

I'm writing a script to analyze changes have been made into a git repo.
At some point I need to iterate over all the commits and obtain these information about each of them:
Commit ID
Date
Commit Message
...
Files changed
File Name
Type of change (Added/Modified/Removed/Renamed)
New File Name (in case the change type is "Renamed")
Number of lines added
Number of lines removed
I get the commit messages and dates by git log. The issue I have is with the files.
If I don't want to collect number of lines added/removed, I'd simply use
git diff-tree --no-commit-id --name-status -M -r abcd12345
The output would be something like
A Readme.md
M src/something.js
D src/somethingelse.js
R100 tests/a/file.js tests/b/file.js
Which I can parse and read programmatically.
To get information about lines added/removed, I could use this:
git diff-tree -M -r --numstat abcd12345
The output would be like:
abcd12345
82 0 Readme.md
41 98 src/something.js
0 64 src/somethingelse.js
0 0 tests/{a => b}/file.js
Which is not that machine readable for renamed files.
My question is: Is there any way to combine these two commands? It seems I can't use --numstat with --name-status.
I can run two separate command and merge the result in my script as well. In that case, is there any other switches that I can use to make the result of the second command more machine readable?
Thanks.
I think your analysis (that you need two separate commands) is correct. Use -z to obtain machine-readable output with --numstat (this disables both fancy rename encoding and all special-character-quoting), but note that you will then have to break lines apart at ASCII NULs instead of newlines.

Resources